From gstein@lyra.org  Wed Mar  1 00:12:29 2000
From: gstein@lyra.org (Greg Stein)
Date: Tue, 29 Feb 2000 16:12:29 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBAEJNCFAA.mhammond@skippinet.com.au>
Message-ID: <Pine.LNX.4.10.10002291608020.10607-100000@nebula.lyra.org>

On Wed, 1 Mar 2000, Mark Hammond wrote:
> > Why don't we simply move forward with the assumption that PythonWin and
> > Scintilla will be updated?
> 
> Done :-)

hehe...

> However, I think dropping it now _is_ a little heavy handed.  I decided to
> do a wider search and found a few in, eg, Sam Rushings calldll based ODBC
> package.
> 
> Personally, I would much prefer a warning now, and drop it later.  _Then_ we
> can say we have made enough noise about it.  It would only be 2 years ago
> that I became aware that this "feature" of append was not a feature at all -
> up until then I used it purposely, and habits are sometimes hard to change
> :-)

What's the difference between a warning and an error? If you're running a
program and it suddenly spits out a warning about a misuse of list.append,
I'd certainly see that as "the program did something unexpected; that is
an error."

But this is all moot. Guido has already said that we would be amenable to
a warning/error infrastructure which list.append could use. His
description used some awkward sentences, so I'm not sure (without spending
some brain cycles to parse the email) exactly what his desired defaults
and behavior are. But hey... the possibility is there, and is just waiting
for somebody to code it.

IMO, Guido has left an out for people that are upset with the current
hard-line approach. One of those people just needs to spend a bit of time
coming up with a patch :-)

And yes, Guido is also the Benevolent Dictator and can certainly have his
mind changed, so people can definitely continue pestering him to back away
from the hard-line approach...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From ping@lfw.org  Wed Mar  1 00:20:07 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Tue, 29 Feb 2000 18:20:07 -0600 (CST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.10.10002291608020.10607-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.10.10002291816190.10505-100000@server1.lfw.org>

On Tue, 29 Feb 2000, Greg Stein wrote:
>
> What's the difference between a warning and an error? If you're running a
> program and it suddenly spits out a warning about a misuse of list.append,
> I'd certainly see that as "the program did something unexpected; that is
> an error."

A big, big difference.  Perhaps to one of us, it's the minor inconvenience
of reading the error message and inserting a couple of parentheses in the
appropriate file -- but to the end user, it's the difference between the
program working (albeit noisily) and *not* working.  When the program throws
an exception and stops, it is safe to say most users will declare it broken
and give up.

We can't assume that they're going to be able to figure out what to edit
(or be brave enough to try) just by reading the error message... or even
what interpreter flag to give, if errors (rather than warnings) are the
default behaviour.


-- ?!ng


From klm@digicool.com  Wed Mar  1 00:37:09 2000
From: klm@digicool.com (Ken Manheimer)
Date: Tue, 29 Feb 2000 19:37:09 -0500 (EST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBAEJNCFAA.mhammond@skippinet.com.au>
Message-ID: <Pine.LNX.4.21.0002291925060.22173-100000@korak.digicool.com>

On Wed, 1 Mar 2000, Mark Hammond wrote:

> > Why don't we simply move forward with the assumption that PythonWin and
> > Scintilla will be updated?
> 
> Done :-)
> 
> However, I think dropping it now _is_ a little heavy handed.  I decided to
> do a wider search and found a few in, eg, Sam Rushings calldll based ODBC
> package.
> 
> Personally, I would much prefer a warning now, and drop it later.  _Then_ we
> can say we have made enough noise about it.  It would only be 2 years ago
> that I became aware that this "feature" of append was not a feature at all -
> up until then I used it purposely, and habits are sometimes hard to change
> :-)

I agree with mark.  Why the sudden rush??  It seems to me to be unfair to
make such a change - one that will break peoples code - without advanced
warning, which typically is handled by a deprecation period.  There *are*
going to be people who won't be informed of the change in the short span
of less than a single release. Just because it won't cause you pain isn't
a good reason to disregard the pain of those that will suffer,
particularly when you can do something relatively low-cost to avoid it.

Ken
klm@digicool.com


From gstein@lyra.org  Wed Mar  1 00:57:56 2000
From: gstein@lyra.org (Greg Stein)
Date: Tue, 29 Feb 2000 16:57:56 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.21.0002291925060.22173-100000@korak.digicool.com>
Message-ID: <Pine.LNX.4.10.10002291642080.10607-100000@nebula.lyra.org>

On Tue, 29 Feb 2000, Ken Manheimer wrote:
>...
> I agree with mark.  Why the sudden rush??  It seems to me to be unfair to
> make such a change - one that will break peoples code - without advanced
> warning, which typically is handled by a deprecation period.  There *are*
> going to be people who won't be informed of the change in the short span
> of less than a single release. Just because it won't cause you pain isn't
> a good reason to disregard the pain of those that will suffer,
> particularly when you can do something relatively low-cost to avoid it.

Sudden rush?!?

Mark said he knew about it for a couple years. Same here. It was a long
while ago that .append()'s semantics were specified to "no longer" accept
multiple arguments.

I see in the HISTORY file, that changes were made to Python 1.4 (October,
1996) to avoid calling append() with multiple arguments.

So, that is over three years that append() has had multiple-args
deprecated. There was probably discussion even before that, but I can't
seem to find something to quote. Seems like plenty of time -- far from
rushed.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From klm@digicool.com  Wed Mar  1 01:02:02 2000
From: klm@digicool.com (Ken Manheimer)
Date: Tue, 29 Feb 2000 20:02:02 -0500 (EST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.10.10002291642080.10607-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>

On Tue, 29 Feb 2000, Greg Stein wrote:

> On Tue, 29 Feb 2000, Ken Manheimer wrote:
> >...
> > I agree with mark.  Why the sudden rush??  It seems to me to be unfair to
> > make such a change - one that will break peoples code - without advanced
> > warning, which typically is handled by a deprecation period.  There *are*
> > going to be people who won't be informed of the change in the short span
> > of less than a single release. Just because it won't cause you pain isn't
> > a good reason to disregard the pain of those that will suffer,
> > particularly when you can do something relatively low-cost to avoid it.
> 
> Sudden rush?!?
> 
> Mark said he knew about it for a couple years. Same here. It was a long
> while ago that .append()'s semantics were specified to "no longer" accept
> multiple arguments.
> 
> I see in the HISTORY file, that changes were made to Python 1.4 (October,
> 1996) to avoid calling append() with multiple arguments.
> 
> So, that is over three years that append() has had multiple-args
> deprecated. There was probably discussion even before that, but I can't
> seem to find something to quote. Seems like plenty of time -- far from
> rushed.

None the less, for those practicing it, the incorrectness of it will be
fresh news.  I would be less sympathetic with them if there was recent
warning, eg, the schedule for changing it in the next release was part of
the current release.  But if you tell somebody you're going to change
something, and then don't for a few years, you probably need to renew the
warning before you make the change.  Don't you think so?  Why not?

Ken
klm@digicool.com


From paul@prescod.net  Wed Mar  1 02:56:33 2000
From: paul@prescod.net (Paul Prescod)
Date: Tue, 29 Feb 2000 18:56:33 -0800
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>
Message-ID: <38BC86E1.53F69776@prescod.net>

Software configuration management is HARD. Every sudden backwards
incompatible change (warranted or not) makes it harder. Mutli-arg append
is not hurting anyone as much as a sudden change to it would. It would
be better to leave append() alone and publicize its near-term removal
rather than cause random, part-time supported modules to stop working
because their programmers may be too busy to update them right now.

So no, I'm not stepping up to do it. But I'm also saying that the better
"lazy" option is to put something in a prominent place in the
documentation and otherwise leave it alone.

<aside>
As far as I am concerned, a formal warning-based deprecation mechanism
is necessary for Python's continued evolution. Perhaps we can even
expose the deprecation flag to the programmer so we can say:

if deprecation:
	print "This module isn't supported anymore."

if deprecation:
	print "Use method FooEx instead."

If we had a deprecation mechanism, maybe introducing new keywords would
not be quite so painful. Version x deprecates, version y adds the
keyword. Mayhap we should also deprecate implicit truncating integral
division while we are at it...
</aside>

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"The calculus and the rich body of mathematical analysis to which it
gave rise made modern science possible, but it was the algorithm that
made possible the modern world." 
        - from "Advent of the Algorithm" David Berlinski
	http://www.opengroup.com/mabooks/015/0151003386.shtml


From guido@python.org  Wed Mar  1 04:11:02 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 29 Feb 2000 23:11:02 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: Your message of "Tue, 29 Feb 2000 18:56:33 PST."
 <38BC86E1.53F69776@prescod.net>
References: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>
 <38BC86E1.53F69776@prescod.net>
Message-ID: <200003010411.XAA12988@eric.cnri.reston.va.us>

> Software configuration management is HARD. Every sudden backwards
> incompatible change (warranted or not) makes it harder. Mutli-arg append
> is not hurting anyone as much as a sudden change to it would. It would
> be better to leave append() alone and publicize its near-term removal
> rather than cause random, part-time supported modules to stop working
> because their programmers may be too busy to update them right now.

I'm tired of this rhetoric.  It's not like I'm changing existing
Python installations retroactively.  I'm planning to release a new
version of Python which no longer supports certain long-obsolete and
undocumented behavior.  If you maintain a non-core Python module, you
should test it against the new release and fix anything that comes up.
This is why we have an alpha and beta test cycle and even before that
the CVS version.  If you are a Python user who depends on a 3rd party
module, you need to find out whether the new version is compatible
with the 3rd party code you are using, or whether there's a newer
version available that solves the incompatibility.

There are people who still run Python 1.4 (really!) because they
haven't upgraded.  I don't have a problem with that -- they don't get
much support, but it's their choice, and they may not need the new
features introduced since then.  I expect that lots of people won't
upgrade their Python 1.5.2 to 1.6 right away -- they'll wait until the
other modules/packages they need are compatible with 1.6.  Multi-arg
append probably won't be the only reason why e.g. Digital Creations
may need to release an update to Zope for Python 1.6.  Zope comes with
its own version of Python anyway, so they have control over when they
make the switch.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one@email.msn.com  Wed Mar  1 05:04:35 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 00:04:35 -0500
Subject: [Python-Dev] Size of int across machines (was RE: Blowfish in Python?)
In-Reply-To: <al87lfovzm7.fsf@sirppi.helsinki.fi>
Message-ID: <000201bf833b$a3b01bc0$412d153f@tim>

[Markus Stenberg]
> ...
>  speed was horrendous. >
> I think the main reason was the fact that I had to use _long ints_ for
> calculations, as the normal ints are signed, and apparently the bitwise
> operators do not work as advertised when bit32 is set (=number is
> negative).

[Tim, takes "bitwise operators" to mean & | ^ ~, and expresses surprise]

[Markus, takes umbrage, and expresses umbrage <wink>]
> Hmm.. As far as I'm concerned, shifts for example do screw up.

Do you mean "for example" as in "there are so many let's just pick one at
random", or as in "this is the only one I've stumbled into" <0.9 wink>?

> i.e.
>
> 0xffffffff >> 30
>
> [64bit Python: 3]
> [32bit Python: -1]
>
> As far as I'm concerned, that should _not_ happen. Or maybe it's just me.

I could not have guessed that your complaint was about 64-bit Python from
your "when bit32 is set (=number is negative)" description <wink>.

The behavior shown in a Python compiled under a C in which sizeof(long)==4
matches the Reference Manual (see the "Integer and long integer literals"
and "shifting operations" sections).  So that can't be considered broken
(you may not *like* it, but it's functioning as designed & as documented).

The behavior under a sizeof(long)==8 C seems more of an ill-documented (and
debatable to me too) feature.  The possibility is mentioned in the "The
standard type hierarchy" section (under Numbers -> Integers -> Plain
integers) but really not fleshed out, and the "Integer and long integer
literals" section plainly contradicts it.

Python's going to have to clean up its act here -- 64-bit machines are
getting more common.  There's a move afoot to erase the distinction between
Python ints and longs (in the sense of auto-converting from one to the other
under the covers, as needed).  In that world, your example would work like
the "64bit Python" one.  There are certainly compatability issues, though,
in that int left shifts are end-off now, and on a 32-bit machine any int for
which i & 0x8000000 is true "is negative" (and so sign-extends on a right
shift; note that Python guarantees sign-extending right shifts *regardless*
of what the platform C does (C doesn't define what happens here -- Python
does)).

[description of pain getting a fast C-like "mod 2**32 int +" to work too]

Python really wasn't designed for high-performance bit-fiddling, so you're
(as you've discovered <wink>) swimming upstream with every stroke.  Given
that you can't write a C module here, there's nothing better than to do the
^ & | ~ parts with ints, and fake the rest slowly & painfully.  Note that
you can at least determine the size of a Python int via inspecting
sys.maxint.

sympathetically-unhelpfully y'rs  - tim


From guido@python.org  Wed Mar  1 05:44:10 2000
From: guido@python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 00:44:10 -0500
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: Your message of "Tue, 29 Feb 2000 15:34:21 MST."
 <20000229153421.A16502@acs.ucalgary.ca>
References: <20000229153421.A16502@acs.ucalgary.ca>
Message-ID: <200003010544.AAA13155@eric.cnri.reston.va.us>

[I don't like to cross-post to patches and python-dev, but I think
this belongs in patches because it's a followup to Neil's post there
and also in -dev because of its longer-term importance.]

Thanks for the new patches, Neil!

We had a visitor here at CNRI today, Eric Tiedemann
<est@hyperreal.org>, who had a look at your patches before.  Eric
knows his way around the Scheme, Lisp and GC literature, and presented
a variant on your approach which takes the bite out of the recursive
passes.

Eric had commented earlier on Neil's previous code, and I had used the
morning to make myself familiar with Neil's code.  This was relatively
easy because Neil's code is very clear.

Today, Eric proposed to do away with Neil's hash table altogether --
as long as we're wasting memory, we might as well add 3 fields to each
container object rather than allocating the same amount in a separate
hash table.  Eric expects that this will run faster, although this
obviously needs to be tried.

Container types are: dict, list, tuple, class, instance; plus
potentially user-defined container types such as kjbuckets.  I have a
feeling that function objects should also be considered container
types, because of the cycle involving globals.

Eric's algorithm, then, consists of the following parts.

Each container object has three new fields: gc_next, gc_prev, and
gc_refs.  (Eric calls the gc_refs "refcount-zero".)

We color objects white (initial), gray (root), black (scanned root).
(The terms are explained later; we believe we don't actually need bits
in the objects to store the color; see later.)

All container objects are chained together in a doubly-linked list --
this is the same as Neil's code except Neil does it only for dicts.
(Eric postulates that you need a list header.)

When GC is activated, all objects are colored white; we make a pass
over the entire list and set gc_refs equal to the refcount for each
object.

Next, we make another pass over the list to collect the internal
references.  Internal references are (just like in Neil's version)
references from other container types.  In Neil's version, this was
recursive; in Eric's version, we don't need recursion, since the list
already contains all containers.  So we simple visit the containers in
the list in turn, and for each one we go over all the objects it
references and subtract one from *its* gc_refs field.  (Eric left out
the little detail that we ened to be able to distinguish between
container and non-container objects amongst those references; this can
be a flag bit in the type field.)

Now, similar to Neil's version, all objects for which gc_refs == 0
have only internal references, and are potential garbage; all objects
for which gc_refs > 0 are "roots".  These have references to them from
other places, e.g. from globals or stack frames in the Python virtual
machine.

We now start a second list, to which we will move all roots.  The way
to do this is to go over the first list again and to move each object
that has gc_refs > 0 to the second list.  Objects placed on the second
list in this phase are considered colored gray (roots).

Of course, some roots will reference some non-roots, which keeps those
non-roots alive.  We now make a pass over the second list, where for
each object on the second list, we look at every object it references.
If a referenced object is a container and is still in the first list
(colored white) we *append* it to the second list (colored gray).
Because we append, objects thus added to the second list will
eventually be considered by this same pass; when we stop finding
objects that sre still white, we stop appending to the second list,
and we will eventually terminate this pass.  Conceptually, objects on
the second list that have been scanned in this pass are colored black
(scanned root); but there is no need to to actually make the
distinction.

(How do we know whether an object pointed to is white (in the first
list) or gray or black (in the second)?  We could use an extra
bitfield, but that's a waste of space.  Better: we could set gc_refs
to a magic value (e.g. 0xffffffff) when we move the object to the
second list.  During the meeting, I proposed to set the back pointer
to NULL; that might work too but I think the gc_refs field is more
elegant.  We could even just test for a non-zero gc_refs field; the
roots moved to the second list initially all have a non-zero gc_refs
field already, and for the objects with a zero gc_refs field we could
indeed set it to something arbitrary.)

Once we reach the end of the second list, all objects still left in
the first list are garbage.  We can destroy them in a similar to the
way Neil does this in his code.  Neil calls PyDict_Clear on the
dictionaries, and ignores the rest.  Under Neils assumption that all
cycles (that he detects) involve dictionaries, that is sufficient.  In
our case, we may need a type-specific "clear" function for containers
in the type object.

We discussed more things, but not as thoroughly.  Eric & Eric stressed
the importance of making excellent statistics available about the rate
of garbage collection -- probably as data structures that Python code
can read rather than debugging print statements.  Eric T also sketched
an incremental version of the algorithm, usable for real-time
applications.  This involved keeping the gc_refs field ("external"
reference counts) up-to-date at all times, which would require two
different versions of the INCREF/DECREF macros: one for
adding/deleting a reference from a container, and another for
adding/deleting a root reference.  Also, a 4th color (red) was added,
to distinguish between scanned roots and scanned non-roots.  We
decided not to work this out in more detail because the overhead cost
appeared to be much higher than for the previous algorithm; instead,
we recommed that for real-time requirements the whole GC is disabled
(there should be run-time controls for this, not just compile-time).
We also briefly discussed possibilities for generational schemes.

The general opinion was that we should first implement and test the
algorithm as sketched above, and then changes or extensions could be
made.

I was pleasantly surprised to find Neil's code in my inbox when we
came out of the meeting; I think it would be worthwhile to compare and
contrast the two approaches.  (Hm, maybe there's a paper in it?)

The rest of the afternoon was spent discussing continuations,
coroutines and generators, and the fundamental reason why
continuations are so hard (the C stack getting in the way everywhere).
But that's a topic for another mail, maybe.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one@email.msn.com  Wed Mar  1 05:57:49 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 00:57:49 -0500
Subject: need .append patch (was RE: [Python-Dev] Re: Python-checkins digest, Vol 1 #370 - 8 msgs)
In-Reply-To: <200002291302.IAA04581@eric.cnri.reston.va.us>
Message-ID: <000601bf8343$13575040$412d153f@tim>

[Tim, runs checkappend.py over the entire CVS tree, comes up with
 surprisingly many remaining problems, and surprisingly few false hits]

[Guido fixes mailerdaemon.py, and argues for nuking

    Demo\tkinter\www\           (the whole directory)
    Demo\sgi\video\VcrIndex.py  (unclear whether the dir or just the file)

    Demo\sgi\gl\glstdwin\glstdwin.py   (stdwin-related)
    Demo\ibrowse\ibrowse.py            (stdwin-related)
> All these are stdwin-related.  Stdwin will also go out of service per
> 1.6.
]

Then the sooner someone nukes them from the CVS tree, the sooner my
automated hourly checkappend complaint generator will stop pestering
Python-Dev about them <wink>.

> (Conclusion: most multi-arg append() calls are *very* old,

But part of that is because we went thru this exercise a couple years ago
too, and you repaired all the ones in the less obscure parts of the
distribution then.

> or contributed by others.  Sigh.  I must've given bad examples long
> ago...)

Na, I doubt that.  Most people will not read a language defn, at least not
until "something doesn't work".  If the compiler accepts a thing, they
simply *assume* it's correct.  It's pretty easy (at least for me!) to make
this particular mistake as a careless typo, so I assume that's the "source
origin" for many of these too.  As soon you *notice* you've done it, and
that nothing bad happened, the natural tendencies are to (a) believe it's
OK, and (b) save 4 keystrokes (incl. the SHIFTs) over & over again in the
glorious indefinite future <wink>.

Reminds me of a c.l.py thread a while back, wherein someone did stuff like

    None, x, y, None = function_returning_a_4_tuple

to mean that they didn't care what the 1st & 4th values were.  It happened
to work, so they did it more & more.  Eventually a function containing this
mistake needed to reference None after that line, and "suddenly for no
reason at all Python stopped working".

To the extent that you're serious about CP4E, you're begging for more of
this, not less <wink>.

newbies-even-keep-on-doing-things-that-*don't*-work!-ly y'rs  - tim


From tim_one@email.msn.com  Wed Mar  1 06:50:44 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 01:50:44 -0500
Subject: [Python-Dev] Unicode mapping tables
In-Reply-To: <38BBD1A2.CD29AADD@lemburg.com>
Message-ID: <000701bf834a$77acdfe0$412d153f@tim>

[M.-A. Lemburg]
> ...
> Currently, mapping tables map characters to Unicode characters
> and vice-versa. Now the .translate method will use a different
> kind of table: mapping integer ordinals to integer ordinals.

You mean that if I want to map u"a" to u"A", I have to set up some sort of
dict mapping ord(u"a") to ord(u"A")?  I simply couldn't follow this.

> Question: What is more of efficient: having lots of integers
> in a dictionary or lots of characters ?

My bet is "lots of integers", to reduce both space use and comparison time.

> ...
> Something else that changed is the way .capitalize() works. The
> Unicode version uses the Unicode algorithm for it (see TechRep. 13
> on the www.unicode.org site).

#13 is "Unicode Newline Guidelines".  I assume you meant #21 ("Case
Mappings").

> Here's the new doc string:
>
> S.capitalize() -> unicode
>
> Return a capitalized version of S, i.e. words start with title case
> characters, all remaining cased characters have lower case.
>
> Note that *all* characters are touched, not just the first one.
> The change was needed to get it in sync with the .iscapitalized()
> method which is based on the Unicode algorithm too.
>
> Should this change be propogated to the string implementation ?

Unicode makes distinctions among "upper case", "lower case" and "title
case", and you're trying to get away with a single "capitalize" function.
Java has separate toLowerCase, toUpperCase and toTitleCase methods, and
that's the way to do it.  Whatever you do, leave .capitalize alone for 8-bit
strings -- there's no reason to break code that currently works.
"capitalize" seems a terrible choice of name for a titlecase method anyway,
because of its baggage connotations from 8-bit strings.  Since this stuff is
complicated, I say it would be much better to use the same names for these
things as the Unicode and Java folk do:  there's excellent documentation
elsewhere for all this stuff, and it's Bad to make users mentally translate
unique Python terminology to make sense of the official docs.

So my vote is:  leave capitalize the hell alone <wink>.  Do not implement
capitialize for Unicode strings.  Introduce a new titlecase method for
Unicode strings.  Add a new titlecase method to 8-bit strings too.  Unicode
strings should also have methods to get at uppercase and lowercase (as
Unicode defines those).


From tim_one@email.msn.com  Wed Mar  1 07:36:03 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 02:36:03 -0500
Subject: [Python-Dev] Re: Python / Haskell  (fwd)
In-Reply-To: <Pine.LNX.4.10.10002291126370.9095-100000@akbar.nevex.com>
Message-ID: <000801bf8350$cc4ec580$412d153f@tim>

[Greg Wilson, quoting Philip Wadler]

> Well, what I most want is typing.  But you already know that.

So invite him to contribute to the Types-SIG <0.5 wink>.

> Next after typing?  Full lexical scoping for closures.  I want to write:
>
> 	fun x: fun y: x+y
>
> Not:
>
> 	fun x: fun y, x=x: x+y
>
> Lexically scoped closures would be a big help for the embedding technique
> I described [GVW: in a posting to the Software Carpentry discussion list,
> archived at
>
>  http://software-carpentry.codesourcery.com/lists/sc-discuss/msg00068.html
>
> which discussed how to build a flexible 'make' alternative in Python].

So long as we're not deathly concerned over saving a few lines of easy
boilerplate code, Python already supports this approach wonderfully well --
but via using classes with __call__ methods instead of lexical closures.  I
can't make time to debate this now, but suffice it to say dozens on c.l.py
would be delighted to <wink>.  Philip is understandably attached to the
"functional way of spelling things", but Python's way is at least as usable
for this (and many-- including me --would say more so).

> Next after closures?  Disjoint sums.  E.g.,
>
>    fun area(shape) :
>        switch shape:
>            case Circle(r):
>                return pi*r*r
>            case Rectangle(h,w):
>                return h*w
>
> (I'm making up a Python-like syntax.)  This is an alternative to the OO
> approach.  With the OO approach, it is hard to add area, unless you modify
> the Circle and Rectangle class definitions.

Python allows adding new methods to classes dynamically "from the
outside" -- the original definitions don't need to be touched (although it's
certainly preferable to add new methods directly!).  Take this complaint to
the extreme, and I expect you end up reinventing multimethods (suppose you
need to add an intersection(shape1, shape2) method:  N**2 nesting of
"disjoint sums" starts to appear ludicrous <wink>).

In any case, the Types-SIG already seems to have decided that some form of
"typecase" stmt will be needed; see the archives for that; I expect the use
above would be considered abuse, though; Python has no "switch" stmt of any
kind today, and the use above can already be spelled via

    if isinstance(shape, Circle):
        etc
    elif isinstace(shape, Rectange):
        etc
    else:
        raise TypeError(etc)


From gstein@lyra.org  Wed Mar  1 07:51:29 2000
From: gstein@lyra.org (Greg Stein)
Date: Tue, 29 Feb 2000 23:51:29 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>
Message-ID: <Pine.LNX.4.10.10002292348430.19420-100000@nebula.lyra.org>

On Tue, 29 Feb 2000, Ken Manheimer wrote:
>...
> None the less, for those practicing it, the incorrectness of it will be
> fresh news.  I would be less sympathetic with them if there was recent
> warning, eg, the schedule for changing it in the next release was part of
> the current release.  But if you tell somebody you're going to change
> something, and then don't for a few years, you probably need to renew the
> warning before you make the change.  Don't you think so?  Why not?

I agree.

Note that Guido posted a note to c.l.py on Monday. I believe that meets
your notification criteria.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Wed Mar  1 08:10:28 2000
From: gstein@lyra.org (Greg Stein)
Date: Wed, 1 Mar 2000 00:10:28 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <200003010411.XAA12988@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10002292352590.19420-100000@nebula.lyra.org>

On Tue, 29 Feb 2000, Guido van Rossum wrote:
> I'm tired of this rhetoric.  It's not like I'm changing existing
> Python installations retroactively.  I'm planning to release a new
> version of Python which no longer supports certain long-obsolete and
> undocumented behavior.  If you maintain a non-core Python module, you
> should test it against the new release and fix anything that comes up.
> This is why we have an alpha and beta test cycle and even before that
> the CVS version.  If you are a Python user who depends on a 3rd party
> module, you need to find out whether the new version is compatible
> with the 3rd party code you are using, or whether there's a newer
> version available that solves the incompatibility.
> 
> There are people who still run Python 1.4 (really!) because they
> haven't upgraded.  I don't have a problem with that -- they don't get
> much support, but it's their choice, and they may not need the new
> features introduced since then.  I expect that lots of people won't
> upgrade their Python 1.5.2 to 1.6 right away -- they'll wait until the
> other modules/packages they need are compatible with 1.6.  Multi-arg
> append probably won't be the only reason why e.g. Digital Creations
> may need to release an update to Zope for Python 1.6.  Zope comes with
> its own version of Python anyway, so they have control over when they
> make the switch.

I wholeheartedly support his approach. Just ask Mark Hammond :-) how many
times I've said "let's change the code to make it Right; people aren't
required to upgrade [and break their code]."

Of course, his counter is that people need to upgrade to fix other,
unrelated problems. So I relax and try again later :-). But I still
maintain that they can independently grab the specific fixes and leave the
other changes we make.

Maybe it is grey, but I think this change is quite fine. Especially given
Tim's tool.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tim_one@email.msn.com  Wed Mar  1 08:22:06 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 03:22:06 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.10.10002292352590.19420-100000@nebula.lyra.org>
Message-ID: <000b01bf8357$3af08d60$412d153f@tim>

[Greg Stein]
> ...
> Maybe it is grey, but I think this change is quite fine. Especially given
> Tim's tool.

What the heck does Tim's one-eyed trouser snake have to do with this?  I
know *it* likes to think it's the measure of all things, but, frankly, my
tool barely affects the world at all a mere two feet beyond its base <wink>.

tim-and-his-tool-think-the-change-is-a-mixed-thing-but-on-balance-
    the-best-thing-ly y'rs  - tim


From Fredrik Lundh" <effbot@telia.com  Wed Mar  1 08:40:01 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Wed, 1 Mar 2000 09:40:01 +0100
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.10.10002292348430.19420-100000@nebula.lyra.org>
Message-ID: <00fb01bf8359$c8196a20$34aab5d4@hagrid>

Greg Stein wrote:
> Note that Guido posted a note to c.l.py on Monday. I believe that =
meets
> your notification criteria.

ahem.  do you seriously believe that everyone in the
Python universe reads comp.lang.python?

afaik, most Python programmers don't.

...

so as far as I'm concerned, this was officially deprecated
with Guido's post.  afaik, no official python documentation
has explicitly mentioned this (and the fact that it doesn't
explicitly allow it doesn't really matter, since the docs don't
explicitly allow the x[a, b, c] syntax either.  both work in
1.5.2).

has anyone checked the recent crop of Python books,
btw?  the eff-bot guide uses old syntax in two examples
out of 320.  how about the others?

...

sigh.  running checkappend over a 50k LOC application, I
just realized that it doesn't catch a very common append
pydiom. =20

how fun.  even though 99% of all append calls are "legal",
this "minor" change will break every single application and
library we have :-(

oh, wait.  xmlrpclib isn't affected.  always something!

</F>


From gstein@lyra.org  Wed Mar  1 08:43:02 2000
From: gstein@lyra.org (Greg Stein)
Date: Wed, 1 Mar 2000 00:43:02 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <00fb01bf8359$c8196a20$34aab5d4@hagrid>
Message-ID: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org>

On Wed, 1 Mar 2000, Fredrik Lundh wrote:
> Greg Stein wrote:
> > Note that Guido posted a note to c.l.py on Monday. I believe that meets
> > your notification criteria.
> 
> ahem.  do you seriously believe that everyone in the
> Python universe reads comp.lang.python?
> 
> afaik, most Python programmers don't.

Now you're simply taking my comments out of context. Not a proper thing to
do. Ken said that he wanted notification along certain guidelines. I said
that I believed Guido's post did just that. Period.

Personally, I think it is fine. I also think that a CHANGES file that
arrives with 1.6 that points out the incompatibility is also fine.

>...
> sigh.  running checkappend over a 50k LOC application, I
> just realized that it doesn't catch a very common append
> pydiom.  

And which is that? Care to help out? Maybe just a little bit? Or do you
just want to talk about how bad this change is? :-(

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Wed Mar  1 09:01:52 2000
From: gstein@lyra.org (Greg Stein)
Date: Wed, 1 Mar 2000 01:01:52 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <000b01bf8357$3af08d60$412d153f@tim>
Message-ID: <Pine.LNX.4.10.10003010101100.19420-100000@nebula.lyra.org>

On Wed, 1 Mar 2000, Tim Peters wrote:
> [Greg Stein]
> > ...
> > Maybe it is grey, but I think this change is quite fine. Especially given
> > Tim's tool.
> 
> What the heck does Tim's one-eyed trouser snake have to do with this?  I
> know *it* likes to think it's the measure of all things, but, frankly, my
> tool barely affects the world at all a mere two feet beyond its base <wink>.
> 
> tim-and-his-tool-think-the-change-is-a-mixed-thing-but-on-balance-
>     the-best-thing-ly y'rs  - tim

Heh. Now how is one supposed to respond to *that* ??!

All right. Fine. +3 cool points go to Tim.

:-)

-- 
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Wed Mar  1 09:03:32 2000
From: gstein@lyra.org (Greg Stein)
Date: Wed, 1 Mar 2000 01:03:32 -0800 (PST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src Makefile.in,1.82,1.83
In-Reply-To: <14523.56638.286603.340358@weyr.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003010102080.19420-100000@nebula.lyra.org>

On Tue, 29 Feb 2000, Fred L. Drake, Jr. wrote:
> Guido van Rossum writes:
>  > You can already extract this from the updated documetation on the
>  > website (which has a list of obsolete modules).
>  > 
>  > But you're righ,t it would be good to be open about this.  I'll think
>  > about it.
> 
>   Note that the updated documentation isn't yet "published"; there are 
> no links to it and it hasn't been checked as much as I need it to be
> before announcing it.

Isn't the documentation better than what has been released? In other
words, if you release now, how could you make things worse? If something
does turn up during a check, you can always release again...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From Fredrik Lundh" <effbot@telia.com  Wed Mar  1 09:13:13 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Wed, 1 Mar 2000 10:13:13 +0100
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org>
Message-ID: <011001bf835e$600d1da0$34aab5d4@hagrid>

Greg Stein <gstein@lyra.org> wrote:
> On Wed, 1 Mar 2000, Fredrik Lundh wrote:
> > Greg Stein wrote:
> > > Note that Guido posted a note to c.l.py on Monday. I believe that =
meets
> > > your notification criteria.
> >=20
> > ahem.  do you seriously believe that everyone in the
> > Python universe reads comp.lang.python?
> >=20
> > afaik, most Python programmers don't.
>=20
> Now you're simply taking my comments out of context. Not a proper =
thing to
> do. Ken said that he wanted notification along certain guidelines. I =
said
> that I believed Guido's post did just that. Period.

my point was that most Python programmers won't
see that notification.  when these people download
1.6 final and find that all theirs apps just broke, they
probably won't be happy with a pointer to dejanews.

> And which is that? Care to help out? Maybe just a little bit?

this rather common pydiom:

    append =3D list.append
    for x in something:
        append(...)

it's used a lot where performance matters.

> Or do you just want to talk about how bad this change is? :-(

yes, I think it's bad.  I've been using Python since 1.2,
and no other change has had the same consequences
(wrt. time/money required to fix it)

call me a crappy programmer if you want, but I'm sure
there are others out there who are nearly as bad.  and
lots of them won't be aware of this change until some-
one upgrades the python interpreter on their server.

</F>


From mal@lemburg.com  Wed Mar  1 08:38:52 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 01 Mar 2000 09:38:52 +0100
Subject: [Python-Dev] Unicode mapping tables
References: <000701bf834a$77acdfe0$412d153f@tim>
Message-ID: <38BCD71C.3592E6A@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > ...
> > Currently, mapping tables map characters to Unicode characters
> > and vice-versa. Now the .translate method will use a different
> > kind of table: mapping integer ordinals to integer ordinals.
> 
> You mean that if I want to map u"a" to u"A", I have to set up some sort of
> dict mapping ord(u"a") to ord(u"A")?  I simply couldn't follow this.

I meant:

  'a': u'A' vs. ord('a'): ord(u'A')

The latter wins ;-) Reasoning for the first was that it allows
character sequences to be handled by the same mapping algorithm.
I decided to leave those techniques to some future implementation,
since mapping integers has the nice side-effect of also allowing
sequences to be used as mapping tables... resulting in some
speedup at the cost of memory consumption.

BTW, there are now three different ways to do char translations:

1. char -> unicode  (char mapping codec's decode)
2. unicode -> char  (char mapping codec's encode)
3. unicode -> unicode (unicode's .translate() method)
 
> > Question: What is more of efficient: having lots of integers
> > in a dictionary or lots of characters ?
> 
> My bet is "lots of integers", to reduce both space use and comparison time.

Right. That's what I found too... it's "lots of integers" now :-)
 
> > ...
> > Something else that changed is the way .capitalize() works. The
> > Unicode version uses the Unicode algorithm for it (see TechRep. 13
> > on the www.unicode.org site).
> 
> #13 is "Unicode Newline Guidelines".  I assume you meant #21 ("Case
> Mappings").

Dang. You're right. Here's the URL in case someone
wants to join in:

   http://www.unicode.org/unicode/reports/tr21/tr21-2.html

> > Here's the new doc string:
> >
> > S.capitalize() -> unicode
> >
> > Return a capitalized version of S, i.e. words start with title case
> > characters, all remaining cased characters have lower case.
> >
> > Note that *all* characters are touched, not just the first one.
> > The change was needed to get it in sync with the .iscapitalized()
> > method which is based on the Unicode algorithm too.
> >
> > Should this change be propogated to the string implementation ?
> 
> Unicode makes distinctions among "upper case", "lower case" and "title
> case", and you're trying to get away with a single "capitalize" function.
> Java has separate toLowerCase, toUpperCase and toTitleCase methods, and
> that's the way to do it.

The Unicode implementation has the corresponding:

.upper(), .lower() and .capitalize()

They work just like .toUpperCase, .toLowerCase, .toTitleCase
resp. (well at least they should ;).

> Whatever you do, leave .capitalize alone for 8-bit
> strings -- there's no reason to break code that currently works.
> "capitalize" seems a terrible choice of name for a titlecase method anyway,
> because of its baggage connotations from 8-bit strings.  Since this stuff is
> complicated, I say it would be much better to use the same names for these
> things as the Unicode and Java folk do:  there's excellent documentation
> elsewhere for all this stuff, and it's Bad to make users mentally translate
> unique Python terminology to make sense of the official docs.

Hmm, that's an argument but it breaks the current method
naming scheme of all lowercase letter. Perhaps I should simply
provide a new method for .toTitleCase(), e.g. .title(), and
leave the previous definition of .capitalize() intact...

> So my vote is:  leave capitalize the hell alone <wink>.  Do not implement
> capitialize for Unicode strings.  Introduce a new titlecase method for
> Unicode strings.  Add a new titlecase method to 8-bit strings too.  Unicode
> strings should also have methods to get at uppercase and lowercase (as
> Unicode defines those).

...looks like you're more or less on the same wave length here ;-)

Here's what I'll do:

* implement .capitalize() in the traditional way for Unicode
  objects (simply convert the first char to uppercase)
* implement u.title() to mean the same as Java's toTitleCase()
* don't implement s.title(): the reasoning here is that it would
  confuse the user when she get's different return values for
  the same string (titlecase chars usually live in higher Unicode
  code ranges not reachable in Latin-1)

Thanks for the feedback,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From tim_one@email.msn.com  Wed Mar  1 10:06:58 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 05:06:58 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <00fb01bf8359$c8196a20$34aab5d4@hagrid>
Message-ID: <000e01bf8365$e1e0b9c0$412d153f@tim>

[/F]
> ...
> so as far as I'm concerned, this was officially deprecated
> with Guido's post.  afaik, no official python documentation
> has explicitly mentioned this (and the fact that it doesn't
> explicitly allow it doesn't really matter, since the docs don't
> explicitly allow the x[a, b, c] syntax either.  both work in
> 1.5.2).

The "Subscriptions" section of the Reference Manual explicitly allows for

    dict[a, b, c]

and explicitly does not allow for

    sequence[a, b, c]

The "Mapping Types" section of the Library Ref does not explicitly allow for
it, though, and if you read it as implicitly allowing for it (based on the
Reference Manual's clarification of "key" syntax), you would also have to
read the Library Ref as allowing for

    dict.has_key(a, b, c)

Which 1.5.2 does allow, but which Guido very recently patched to treat as a
syntax error.

> ...
> sigh.  running checkappend over a 50k LOC application, I
> just realized that it doesn't catch a very common append
> pydiom.

[And, later, after prodding by GregS]

> this rather common pydiom:
>
>    append = list.append
>    for x in something:
>        append(...)

This limitation was pointed out in checkappend's module docstring.  Doesn't
make it any easier for you to swallow, but I needed to point out that you
didn't *have* to stumble into this the hard way <wink>.

> how fun.  even though 99% of all append calls are "legal",
> this "minor" change will break every single application and
> library we have :-(
>
> oh, wait.  xmlrpclib isn't affected.  always something!

What would you like to do, then?  The code will be at least as broken a year
from now, and probably more so -- unless you fix it.  So this sounds like an
indirect argument for never changing Python's behavior here.  Frankly, I
expect you could fix the 50K LOC in less time than it took me to write this
naggy response <0.50K wink>.

embrace-change-ly y'rs  - tim


From tim_one@email.msn.com  Wed Mar  1 10:31:12 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 05:31:12 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <000e01bf8365$e1e0b9c0$412d153f@tim>
Message-ID: <001001bf8369$453e9fc0$412d153f@tim>

[Tim. needing sleep]
>     dict.has_key(a, b, c)
> 
> Which 1.5.2 does allow, but which Guido very recently patched to 
> treat as a syntax error.

No, a runtime error.  haskeynanny.py, anyone?

not-me-ly y'rs  - tim


From fredrik@pythonware.com  Wed Mar  1 11:14:18 2000
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Wed, 1 Mar 2000 12:14:18 +0100
Subject: [Python-Dev] breaking list.append()
References: <000e01bf8365$e1e0b9c0$412d153f@tim>
Message-ID: <002101bf836f$4a012220$f29b12c2@secret.pythonware.com>

Tim Peters wrote:
> The "Subscriptions" section of the Reference Manual explicitly allows =
for
>=20
>     dict[a, b, c]
>=20
> and explicitly does not allow for
>=20
>     sequence[a, b, c]

I'd thought we'd agreed that nobody reads the
reference manual ;-)

> What would you like to do, then?

more time to fix it, perhaps?  it's surely a minor
code change, but fixing it can be harder than
you think (just witness Gerrit's bogus patches)

after all, python might be free, but more and more
people are investing lots of money in using it [1].

> The code will be at least as broken a year
> from now, and probably more so -- unless you fix it.=20

sure.  we've already started.  but it's a lot of work,
and it's quite likely that it will take a while until we
can be 100% confident that all the changes are pro-
perly done.

(not all software have a 100% complete test suite that
simply says "yes, this works" or "no, it doesn't")

</F>

1) fwiw, some poor soul over here posted a short note
to the pythonworks mailing, mentioning that we've now
fixed the price.  a major flamewar erupted, and my mail-
box is now full of mail from unknowns telling me that I
must be a complete moron that doesn't understand that
Python is just a toy system, which everyone uses just be-
cause they cannot afford anything better...


From tim_one@email.msn.com  Wed Mar  1 11:26:21 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 06:26:21 -0500
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: <200003010544.AAA13155@eric.cnri.reston.va.us>
Message-ID: <001101bf8370$f881dfa0$412d153f@tim>

Very briefly:

[Guido]
> ...
> Today, Eric proposed to do away with Neil's hash table altogether --
> as long as we're wasting memory, we might as well add 3 fields to each
> container object rather than allocating the same amount in a separate
> hash table.  Eric expects that this will run faster, although this
> obviously needs to be tried.

No, it doesn't <wink>:  it will run faster.

> Container types are: dict, list, tuple, class, instance; plus
> potentially user-defined container types such as kjbuckets.  I
> have a feeling that function objects should also be considered
> container types, because of the cycle involving globals.

Note that the list-migrating steps you sketch later are basically the same
as (but hairier than) the ones JimF and I worked out for M&S-on-RC a few
years ago, right down to using appending to effect a breadth-first traversal
without requiring recursion -- except M&S doesn't have to bother accounting
for sources of refcounts.  Since *this* scheme does more work per item per
scan, to be as fast in the end it has to touch less stuff than M&S.  But the
more kinds of types you track, the more stuff this scheme will have to
chase.

The tradeoffs are complicated & unclear, so I'll just raise an uncomfortable
meta-point <wink>:  you balked at M&S the last time around because of the
apparent need for two link fields + a bit or two per object of a "chaseable
type".  If that's no longer perceived as being a showstopper, M&S should be
reconsidered too.

I happen to be a fan of both approaches <wink>.  The worst part of M&S-on-RC
(== the one I never had a good answer for) is that a non-cooperating
extension type E can't be chased, hence objects reachable only from objects
of type E never get marked, so are vulnerable to bogus collection.  In the
Neil/Toby scheme, objects of type E merely act as  sources of "external"
references, so the scheme fails safe (in the sense of never doing a bogus
collection due to non-cooperating types).

Hmm ... if both approaches converge on keeping a list of all chaseable
objects, and being careful of uncoopoerating types, maybe the only real
difference in the end is whether the root set is given explicitly (as in
traditional M&S) or inferred indirectly (but where "root set" has a
different meaning in the scheme you sketched).

> ...
> In our case, we may need a type-specific "clear" function for containers
> in the type object.

I think definitely, yes.

full-speed-sideways<wink>-ly y'rs  - tim


From mal@lemburg.com  Wed Mar  1 10:40:36 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 01 Mar 2000 11:40:36 +0100
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org> <011001bf835e$600d1da0$34aab5d4@hagrid>
Message-ID: <38BCF3A4.1CCADFCE@lemburg.com>

Fredrik Lundh wrote:
> 
> Greg Stein <gstein@lyra.org> wrote:
> > On Wed, 1 Mar 2000, Fredrik Lundh wrote:
> > > Greg Stein wrote:
> > > > Note that Guido posted a note to c.l.py on Monday. I believe that meets
> > > > your notification criteria.
> > >
> > > ahem.  do you seriously believe that everyone in the
> > > Python universe reads comp.lang.python?
> > >
> > > afaik, most Python programmers don't.
> >
> > Now you're simply taking my comments out of context. Not a proper thing to
> > do. Ken said that he wanted notification along certain guidelines. I said
> > that I believed Guido's post did just that. Period.
> 
> my point was that most Python programmers won't
> see that notification.  when these people download
> 1.6 final and find that all theirs apps just broke, they
> probably won't be happy with a pointer to dejanews.

Dito. Anyone remember the str(2L) == '2' change, BTW ?
That one will cost lots of money in case someone implemented
an eShop using the common str(2L)[:-1] idiom...

There will need to be a big warning sign somewhere that
people see *before* finding the download link. (IMHO, anyways.)

> > And which is that? Care to help out? Maybe just a little bit?
> 
> this rather common pydiom:
> 
>     append = list.append
>     for x in something:
>         append(...)
> 
> it's used a lot where performance matters.

Same here. checkappend.py doesn't find these (a great tool BTW,
thanks Tim; I noticed that it leaks memory badly though).
 
> > Or do you just want to talk about how bad this change is? :-(
> 
> yes, I think it's bad.  I've been using Python since 1.2,
> and no other change has had the same consequences
> (wrt. time/money required to fix it)
> 
> call me a crappy programmer if you want, but I'm sure
> there are others out there who are nearly as bad.  and
> lots of them won't be aware of this change until some-
> one upgrades the python interpreter on their server.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido@python.org  Wed Mar  1 12:07:42 2000
From: guido@python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 07:07:42 -0500
Subject: need .append patch (was RE: [Python-Dev] Re: Python-checkins digest, Vol 1 #370 - 8 msgs)
In-Reply-To: Your message of "Wed, 01 Mar 2000 00:57:49 EST."
 <000601bf8343$13575040$412d153f@tim>
References: <000601bf8343$13575040$412d153f@tim>
Message-ID: <200003011207.HAA13342@eric.cnri.reston.va.us>

> To the extent that you're serious about CP4E, you're begging for more of
> this, not less <wink>.

Which is exactly why I am breaking multi-arg append now -- this is my
last chance.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Mar  1 12:27:10 2000
From: guido@python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 07:27:10 -0500
Subject: [Python-Dev] Unicode mapping tables
In-Reply-To: Your message of "Wed, 01 Mar 2000 09:38:52 +0100."
 <38BCD71C.3592E6A@lemburg.com>
References: <000701bf834a$77acdfe0$412d153f@tim>
 <38BCD71C.3592E6A@lemburg.com>
Message-ID: <200003011227.HAA13396@eric.cnri.reston.va.us>

> Here's what I'll do:
> 
> * implement .capitalize() in the traditional way for Unicode
>   objects (simply convert the first char to uppercase)
> * implement u.title() to mean the same as Java's toTitleCase()
> * don't implement s.title(): the reasoning here is that it would
>   confuse the user when she get's different return values for
>   the same string (titlecase chars usually live in higher Unicode
>   code ranges not reachable in Latin-1)

Huh?  For ASCII at least, titlecase seems to map to ASCII; in your
current implementation, only two Latin-1 characters (u'\265' and
u'\377', I have no easy way to show them in Latin-1) map outside the
Latin-1 range.

Anyway, I would suggest to add a title() call to 8-bit strings as
well; then we can do away with string.capwords(), which does something
similar but different, mostly by accident.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jack@oratrix.nl  Wed Mar  1 12:34:42 2000
From: jack@oratrix.nl (Jack Jansen)
Date: Wed, 01 Mar 2000 13:34:42 +0100
Subject: [Python-Dev] Re: A warning switch?
In-Reply-To: Message by Guido van Rossum <guido@python.org> ,
 Mon, 28 Feb 2000 12:35:12 -0500 , <200002281735.MAA27771@eric.cnri.reston.va.us>
Message-ID: <20000301123442.7DEF8371868@snelboot.oratrix.nl>

> > What about adding a command-line switch for enabling warnings, as has
> > been suggested long ago?  The .append() change could then print a
> > warning in 1.6alphas (and betas?), but still run, and be turned into
> > an error later.
> 
> That's better.  I propose that the warnings are normally on, and that
> there are flags to turn them off or thrn them into errors.

Can we then please have an interface to the "give warning" call (in stead of a 
simple fprintf)? On the mac (and possibly also in PythonWin) it's probably 
better to pop up a dialog (possibly with a "don't show again" button) than do 
a printf which may get lost.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido@python.org  Wed Mar  1 12:55:42 2000
From: guido@python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 07:55:42 -0500
Subject: [Python-Dev] Re: A warning switch?
In-Reply-To: Your message of "Wed, 01 Mar 2000 13:34:42 +0100."
 <20000301123442.7DEF8371868@snelboot.oratrix.nl>
References: <20000301123442.7DEF8371868@snelboot.oratrix.nl>
Message-ID: <200003011255.HAA13489@eric.cnri.reston.va.us>

> Can we then please have an interface to the "give warning" call (in
> stead of a simple fprintf)? On the mac (and possibly also in
> PythonWin) it's probably better to pop up a dialog (possibly with a
> "don't show again" button) than do a printf which may get lost.

Sure.  All you have to do is code it (or get someone else to code it).

<0.9 wink>

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Wed Mar  1 13:32:02 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 01 Mar 2000 14:32:02 +0100
Subject: [Python-Dev] Unicode mapping tables
References: <000701bf834a$77acdfe0$412d153f@tim>
 <38BCD71C.3592E6A@lemburg.com> <200003011227.HAA13396@eric.cnri.reston.va.us>
Message-ID: <38BD1BD2.792E9B73@lemburg.com>

Guido van Rossum wrote:
> 
> > Here's what I'll do:
> >
> > * implement .capitalize() in the traditional way for Unicode
> >   objects (simply convert the first char to uppercase)
> > * implement u.title() to mean the same as Java's toTitleCase()
> > * don't implement s.title(): the reasoning here is that it would
> >   confuse the user when she get's different return values for
> >   the same string (titlecase chars usually live in higher Unicode
> >   code ranges not reachable in Latin-1)
> 
> Huh?  For ASCII at least, titlecase seems to map to ASCII; in your
> current implementation, only two Latin-1 characters (u'\265' and
> u'\377', I have no easy way to show them in Latin-1) map outside the
> Latin-1 range.

You're right, sorry for the confusion. I was thinking of other
encodings like e.g. cp437 which have corresponding characters
in the higher Unicode ranges.

> Anyway, I would suggest to add a title() call to 8-bit strings as
> well; then we can do away with string.capwords(), which does something
> similar but different, mostly by accident.

Ok, I'll do it this way then: s.title() will use C's toupper() and
tolower() for case mapping and u.title() the Unicode routines.

This will be in sync with the rest of the 8-bit string world
(which is locale aware on many platforms AFAIK), even though
it might not return the same string as the corresponding
u.title() call.

u.capwords() will be disabled in the Unicode implemetation...
it wasn't even implemented for the string implementetation,
so there's no breakage ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From akuchlin@mems-exchange.org  Wed Mar  1 14:59:07 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Wed, 1 Mar 2000 09:59:07 -0500 (EST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <011001bf835e$600d1da0$34aab5d4@hagrid>
References: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org>
 <011001bf835e$600d1da0$34aab5d4@hagrid>
Message-ID: <14525.12347.120543.804804@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>yes, I think it's bad.  I've been using Python since 1.2,
>and no other change has had the same consequences
>(wrt. time/money required to fix it)

There are more things in 1.6 that might require fixing existing code:
str(2L) returning '2', the int/long changes, the Unicode changes, and
if it gets added, garbage collection -- and bugs caused by those
changes might not be catchable by a nanny.  IMHO it's too early to
point at the .append() change as breaking too much existing code;
there may be changes that break a lot more.  I'd wait and see what
happens once the 1.6 alphas become available; if c.l.p is filled with
shrieks and groans, GvR might decide to back the offending change out.
(Or he might not...)

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
I have no skills with machines. I fear them, and because I cannot help
attributing human qualities to them, I suspect that they hate me and will kill
me if they can.
    -- Robertson Davies, "Reading"


From klm@digicool.com  Wed Mar  1 15:37:49 2000
From: klm@digicool.com (Ken Manheimer)
Date: Wed, 1 Mar 2000 10:37:49 -0500 (EST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.10.10002292348430.19420-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.21.0003011033030.22173-100000@korak.digicool.com>

On Tue, 29 Feb 2000, Greg Stein wrote:

> On Tue, 29 Feb 2000, Ken Manheimer wrote:
> >...
> > None the less, for those practicing it, the incorrectness of it will be
> > fresh news.  I would be less sympathetic with them if there was recent
> > warning, eg, the schedule for changing it in the next release was part of
> > the current release.  But if you tell somebody you're going to change
> > something, and then don't for a few years, you probably need to renew the
> > warning before you make the change.  Don't you think so?  Why not?
> 
> I agree.
> 
> Note that Guido posted a note to c.l.py on Monday. I believe that meets
> your notification criteria.

Actually, by "part of the current release", i meant having the
deprecation/impending-deletion warning in the release notes for the
release before the one where the deletion happens - saying it's being
deprecated now, will be deleted next time around.

Ken
klm@digicool.com

 I mean, you tell one guy it's blue.  He tells his guy it's brown, and it
 lands on the page sorta purple.         Wavy Gravy/Hugh Romney


From Vladimir.Marangozov@inrialpes.fr  Wed Mar  1 17:07:07 2000
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Wed, 1 Mar 2000 18:07:07 +0100 (CET)
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: <200003010544.AAA13155@eric.cnri.reston.va.us> from "Guido van Rossum" at Mar 01, 2000 12:44:10 AM
Message-ID: <200003011707.SAA01310@python.inrialpes.fr>

Guido van Rossum wrote:
> 
> Thanks for the new patches, Neil!

Thanks from me too!
I notice, however, that hash_resize() still uses a malloc call
instead of PyMem_NEW. Neil, please correct this in your version
immediately ;-)

> 
> We had a visitor here at CNRI today, Eric Tiedemann
> <est@hyperreal.org>, who had a look at your patches before.  Eric
> knows his way around the Scheme, Lisp and GC literature, and presented
> a variant on your approach which takes the bite out of the recursive
> passes.

Avoiding the recursion is valuable, as long we're optimizing the
implementation of one particular scheme. It doesn't bother me that
Neil's scheme is recursive, because I still perceive his code as a
proof of concept.

You're presenting here another scheme based on refcounts arithmetic,
generalized for all container types. The linked list implementation
of this generalized scheme is not directly related to the logic.

I have some suspitions on the logic, so you'll probably want to elaborate
a bit more on it, and convince me that this scheme would actually work.

> Today, Eric proposed to do away with Neil's hash table altogether --
> as long as we're wasting memory, we might as well add 3 fields to each
> container object rather than allocating the same amount in a separate
> hash table.

I cannot agree so easily with this statement, but you should have expecting
this from me :-)  If we're about to opimize storage, I have good reasons
to believe that we don't need 3 additional slots per container (but 1 for
gc_refs, yes).

We could certainly envision allocating the containers within memory pools
of 4K (just as it is done in pymalloc, and close to what we have for
ints & floats). These pools would be labaled as "container's memory",
they would obviously be under our control, and we'd have additional slots
per pool, not per object. As long as we isolate the containers from the
rest, we can enumerate them easily by walking though the pools.

But I'm willing to defer this question for now, as it involves the object
allocators (the builtin allocators + PyObject_NEW for extension types E --
user objects of type E would be automatically taken into account for GC
if there's a flag in the type struct which identifies them as containers).

> Eric expects that this will run faster, although this obviously needs
> to be tried.

Definitely, although I trust Eric & Tim :-)

> 
> Container types are: dict, list, tuple, class, instance; plus
> potentially user-defined container types such as kjbuckets.  I have a
> feeling that function objects should also be considered container
> types, because of the cycle involving globals.

+ other extension container types. And I insist.
Don't forget that we're planning to merge types and classes...

> 
> Eric's algorithm, then, consists of the following parts.
> 
> Each container object has three new fields: gc_next, gc_prev, and
> gc_refs.  (Eric calls the gc_refs "refcount-zero".)
> 
> We color objects white (initial), gray (root), black (scanned root).
> (The terms are explained later; we believe we don't actually need bits
> in the objects to store the color; see later.)
> 
> All container objects are chained together in a doubly-linked list --
> this is the same as Neil's code except Neil does it only for dicts.
> (Eric postulates that you need a list header.)
> 
> When GC is activated, all objects are colored white; we make a pass
> over the entire list and set gc_refs equal to the refcount for each
> object.

Step 1:  for all containers, c->gc_refs = c->ob_refcnt

> 
> Next, we make another pass over the list to collect the internal
> references.  Internal references are (just like in Neil's version)
> references from other container types.  In Neil's version, this was
> recursive; in Eric's version, we don't need recursion, since the list
> already contains all containers.  So we simple visit the containers in
> the list in turn, and for each one we go over all the objects it
> references and subtract one from *its* gc_refs field.  (Eric left out
> the little detail that we ened to be able to distinguish between
> container and non-container objects amongst those references; this can
> be a flag bit in the type field.)

Step 2:  c->gc_refs = c->gc_refs - Nb_referenced_containers_from_c

I guess that you realize that after this step, gc_refs can be zero
or negative.

I'm not sure that you collect "internal" references here (references
from other container types). A list referencing 20 containers, being
itself referenced by one container + one static variable + two times
from the runtime stack, has an initial refcount == 4, so we'll end
up with gc_refs == -16.

A tuple referencing 1 list, referenced once by the stack, will end up
with gc_refs == 0.

Neil's scheme doesn't seem to have this "property".

> 
> Now, similar to Neil's version, all objects for which gc_refs == 0
> have only internal references, and are potential garbage; all objects
> for which gc_refs > 0 are "roots".  These have references to them from
> other places, e.g. from globals or stack frames in the Python virtual
> machine.
> 

Agreed, some roots have gc_refs > 0
I'm not sure that all of them have it, though... Do they?

> We now start a second list, to which we will move all roots.  The way
> to do this is to go over the first list again and to move each object
> that has gc_refs > 0 to the second list.  Objects placed on the second
> list in this phase are considered colored gray (roots).
> 

Step 3: Roots with gc_refs > 0 go to the 2nd list.
        All c->gc_refs <= 0 stay in the 1st list.

> Of course, some roots will reference some non-roots, which keeps those
> non-roots alive.  We now make a pass over the second list, where for
> each object on the second list, we look at every object it references.
> If a referenced object is a container and is still in the first list
> (colored white) we *append* it to the second list (colored gray).
> Because we append, objects thus added to the second list will
> eventually be considered by this same pass; when we stop finding
> objects that sre still white, we stop appending to the second list,
> and we will eventually terminate this pass.  Conceptually, objects on
> the second list that have been scanned in this pass are colored black
> (scanned root); but there is no need to to actually make the
> distinction.
> 

Step 4: Closure on reachable containers which are all moved to the 2nd list.

(Assuming that the objects are checked only via their type, without
involving gc_refs)

> (How do we know whether an object pointed to is white (in the first
> list) or gray or black (in the second)?

Good question? :-)

> We could use an extra  bitfield, but that's a waste of space.
> Better: we could set gc_refs to a magic value (e.g. 0xffffffff) when
> we move the object to the second list.

I doubt that this would work for the reasons mentioned above.

> During the meeting, I proposed to set the back pointer to NULL; that
> might work too but I think the gc_refs field is more elegant. We could
> even just test for a non-zero gc_refs field; the roots moved to the
> second list initially all have a non-zero gc_refs field already, and
> for the objects with a zero gc_refs field we could indeed set it to
> something arbitrary.)

Not sure that "arbitrary" is a good choice if the differentiation
is based solely on gc_refs.

> 
> Once we reach the end of the second list, all objects still left in
> the first list are garbage.  We can destroy them in a similar to the
> way Neil does this in his code.  Neil calls PyDict_Clear on the
> dictionaries, and ignores the rest.  Under Neils assumption that all
> cycles (that he detects) involve dictionaries, that is sufficient.  In
> our case, we may need a type-specific "clear" function for containers
> in the type object.

Couldn't this be done in the object's dealloc function?

Note that both Neil's and this scheme assume that garbage _detection_
and garbage _collection_ is an atomic operation. I must say that
I don't care of having some living garbage if it doesn't hurt my work.
IOW, the used criterion for triggering the detection phase _may_ eventually
differ from the one used for the collection phase. But this is where we
reach the incremental approaches, implying different reasoning as a
whole. My point is that the introduction of a "clear" function depends
on the adopted scheme, whose logic depends on pertinent statistics on
memory consumption of the cyclic garbage.

To make it simple, we first need stats on memory consumption, then we
can discuss objectively on how to implement some particular GC scheme.
I second Eric on the need for excellent statistics.

> 
> The general opinion was that we should first implement and test the
> algorithm as sketched above, and then changes or extensions could be
> made.

I'd like to see it discussed first in conjunction with (1) the possibility
of having a proprietary malloc, (2) the envisioned type/class unification.
Perhaps I'm getting too deep, but once something gets in, it's difficult
to take it out, even when a better solution is found subsequently. Although
I'm enthousiastic about this work on GC, I'm not in a position to evaluate
the true benefits of the proposed schemes, as I still don't have a basis
for evaluating how much garbage my program generates and whether it hurts
the interpreter compared to its overal memory consumption.

> 
> I was pleasantly surprised to find Neil's code in my inbox when we
> came out of the meeting; I think it would be worthwhile to compare and
> contrast the two approaches.  (Hm, maybe there's a paper in it?)

I'm all for it!

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From jeremy@cnri.reston.va.us  Wed Mar  1 17:53:13 2000
From: jeremy@cnri.reston.va.us (Jeremy Hylton)
Date: Wed, 1 Mar 2000 12:53:13 -0500 (EST)
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: <200003011707.SAA01310@python.inrialpes.fr>
References: <200003010544.AAA13155@eric.cnri.reston.va.us>
 <200003011707.SAA01310@python.inrialpes.fr>
Message-ID: <14525.22793.963077.707198@goon.cnri.reston.va.us>

>>>>> "VM" == Vladimir Marangozov <marangoz@python.inrialpes.fr> writes:

  [">>" == Guido explaining Eric Tiedemann's GC design]
  >>  Next, we make another pass over the list to collect the internal
  >> references.  Internal references are (just like in Neil's
  >> version) references from other container types.  In Neil's
  >> version, this was recursive; in Eric's version, we don't need
  >> recursion, since the list already contains all containers.  So we
  >> simple visit the containers in the list in turn, and for each one
  >> we go over all the objects it references and subtract one from
  >> *its* gc_refs field.  (Eric left out the little detail that we
  >> ened to be able to distinguish between container and
  >> non-container objects amongst those references; this can be a
  >> flag bit in the type field.)

  VM> Step 2: c->gc_refs = c->gc_refs -
  VM> Nb_referenced_containers_from_c

  VM> I guess that you realize that after this step, gc_refs can be
  VM> zero or negative.

I think Guido's explanation is slightly ambiguous.  When he says,
"subtract one from *its" gc_refs field" he means subtract one from the
_contained_ object's gc_refs field.  

  VM> I'm not sure that you collect "internal" references here
  VM> (references from other container types). A list referencing 20
  VM> containers, being itself referenced by one container + one
  VM> static variable + two times from the runtime stack, has an
  VM> initial refcount == 4, so we'll end up with gc_refs == -16.

The strategy is not that the container's gc_refs is decremented once
for each object it contains.  Rather, the container decrements each
contained object's gc_refs by one.  So you should never end of with
gc_refs < 0.

  >> During the meeting, I proposed to set the back pointer to NULL;
  >> that might work too but I think the gc_refs field is more
  >> elegant. We could even just test for a non-zero gc_refs field;
  >> the roots moved to the second list initially all have a non-zero
  >> gc_refs field already, and for the objects with a zero gc_refs
  >> field we could indeed set it to something arbitrary.)

I believe we discussed this further and concluded that setting the
back pointer to NULL would not work.  If we make the second list
doubly-linked (like the first one), it is trivial to end GC by
swapping the first and second lists.  If we've zapped the NULL
pointer, then we have to go back and re-set them all.

Jeremy


From mal@lemburg.com  Wed Mar  1 18:44:58 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 01 Mar 2000 19:44:58 +0100
Subject: [Python-Dev] Unicode Snapshot 2000-03-01
Message-ID: <38BD652A.EA2EB0A3@lemburg.com>

There is a new Unicode implementation snaphot available at the secret
URL. It contains quite a few small changes to the internal APIs,
doc strings for all methods and some new methods (e.g. .title()) 
on the Unicode and the string objects. The code page mappings
are now integer->integer which should make them more performant.

Some of the C codec APIs have changed, so you may need to
adapt code that already uses these (Fredrik ?!).

Still missing is a MSVC project file... haven't gotten around yet
to build one. The code does compile on WinXX though, as Finn
Bock told me in private mail.

Please try out the new stuff... Most interesting should be the
code in Lib/codecs.py as it provides a very high level interface
to all those builtin codecs.

BTW: I would like to implement a .readline() method using only
the .read() method as basis. Does anyone have a good idea on
how this could be done without buffering ?
(Unicode has a slightly larger choice of line break chars as C; the
.splitlines() method will deal with these)

Gotta run...
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Fredrik Lundh" <effbot@telia.com  Wed Mar  1 19:20:12 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Wed, 1 Mar 2000 20:20:12 +0100
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org><011001bf835e$600d1da0$34aab5d4@hagrid> <14525.12347.120543.804804@amarok.cnri.reston.va.us>
Message-ID: <034a01bf83b3$e97c8620$34aab5d4@hagrid>

Andrew M. Kuchling wrote:
> There are more things in 1.6 that might require fixing existing code:
> str(2L) returning '2', the int/long changes, the Unicode changes, and
> if it gets added, garbage collection -- and bugs caused by those
> changes might not be catchable by a nanny.

hey, you make it sound like "1.6" should really be "2.0" ;-)

</F>


From nascheme@enme.ucalgary.ca  Wed Mar  1 19:29:02 2000
From: nascheme@enme.ucalgary.ca (nascheme@enme.ucalgary.ca)
Date: Wed, 1 Mar 2000 12:29:02 -0700
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: <200003011707.SAA01310@python.inrialpes.fr>; from marangoz@python.inrialpes.fr on Wed, Mar 01, 2000 at 06:07:07PM +0100
References: <200003010544.AAA13155@eric.cnri.reston.va.us> <200003011707.SAA01310@python.inrialpes.fr>
Message-ID: <20000301122902.B7773@acs.ucalgary.ca>

On Wed, Mar 01, 2000 at 06:07:07PM +0100, Vladimir Marangozov wrote:
> Guido van Rossum wrote:
> > Once we reach the end of the second list, all objects still left in
> > the first list are garbage.  We can destroy them in a similar to the
> > way Neil does this in his code.  Neil calls PyDict_Clear on the
> > dictionaries, and ignores the rest.  Under Neils assumption that all
> > cycles (that he detects) involve dictionaries, that is sufficient.  In
> > our case, we may need a type-specific "clear" function for containers
> > in the type object.
> 
> Couldn't this be done in the object's dealloc function?

No, I don't think so.  The object still has references to it.
You have to be careful about how you break cycles so that memory
is not accessed after it is freed.


    Neil

-- 
"If elected mayor, my first act will be to kill the whole lot of you, and
burn your town to cinders!" -- Groundskeeper Willie


From gvwilson@nevex.com  Wed Mar  1 20:19:30 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Wed, 1 Mar 2000 15:19:30 -0500 (EST)
Subject: [Python-Dev] DDJ article on Python GC
Message-ID: <Pine.LNX.4.10.10003011516160.29299-100000@akbar.nevex.com>

Jon Erickson (editor-in-chief) of "Doctor Dobb's Journal" would like an
article on what's involved in adding garbage collection to Python.  Please
email me if you're interested in tackling it...

Thanks,
Greg


From fdrake@acm.org  Wed Mar  1 20:37:49 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 1 Mar 2000 15:37:49 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src Makefile.in,1.82,1.83
In-Reply-To: <Pine.LNX.4.10.10003010102080.19420-100000@nebula.lyra.org>
References: <14523.56638.286603.340358@weyr.cnri.reston.va.us>
 <Pine.LNX.4.10.10003010102080.19420-100000@nebula.lyra.org>
Message-ID: <14525.32669.909212.716484@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Isn't the documentation better than what has been released? In other
 > words, if you release now, how could you make things worse? If something
 > does turn up during a check, you can always release again...

  Releasing is still somewhat tedious, and I don't want to ask people
to do several substantial downloads & installs.
  So far, a major navigation bug has been fonud in the test version I
posted (just now fixed online); *thats* why I don't like to release
too hastily!  I don't think waiting two more weeks is a problem.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido@python.org  Wed Mar  1 22:53:26 2000
From: guido@python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 17:53:26 -0500
Subject: [Python-Dev] DDJ article on Python GC
In-Reply-To: Your message of "Wed, 01 Mar 2000 15:19:30 EST."
 <Pine.LNX.4.10.10003011516160.29299-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003011516160.29299-100000@akbar.nevex.com>
Message-ID: <200003012253.RAA16056@eric.cnri.reston.va.us>

> Jon Erickson (editor-in-chief) of "Doctor Dobb's Journal" would like an
> article on what's involved in adding garbage collection to Python.  Please
> email me if you're interested in tackling it...

I might -- although I should get Neil, Eric and Tim as co-authors.

I'm halfway implementing the scheme that Eric showed yesterday.  It's
very elegant, but I don't have an idea about its impact performance
yet.

Say hi to Jon -- we've met a few times.  I liked his March editorial,
having just read the same book and had the same feeling of "wow, an
open source project in the 19th century!"

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond@skippinet.com.au  Wed Mar  1 23:09:23 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Thu, 2 Mar 2000 10:09:23 +1100
Subject: [Python-Dev] Re: A warning switch?
In-Reply-To: <200003011255.HAA13489@eric.cnri.reston.va.us>
Message-ID: <ECEPKNMJLHAPFFJHDOJBOELECFAA.mhammond@skippinet.com.au>

> > Can we then please have an interface to the "give warning" call (in
> > stead of a simple fprintf)? On the mac (and possibly also in
> > PythonWin) it's probably better to pop up a dialog (possibly with a
> > "don't show again" button) than do a printf which may get lost.
>
> Sure.  All you have to do is code it (or get someone else to code it).

How about just having either a "sys.warning" function, or maybe even a
sys.stdwarn stream?  Then a simple C API to call this, and we are done :-)
sys.stdwarn sounds OK - it just defaults to sys.stdout, so the Mac and
Pythonwin etc should "just work" by sending the output wherever sys.stdout
goes today...

Mark.


From tim_one@email.msn.com  Thu Mar  2 05:08:39 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 2 Mar 2000 00:08:39 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <38BCF3A4.1CCADFCE@lemburg.com>
Message-ID: <001001bf8405$5f9582c0$732d153f@tim>

[/F]
>     append = list.append
>     for x in something:
>         append(...)

[M.-A. Lemburg]
> Same here.  checkappend.py doesn't find these

As detailed in a c.l.py posting, I have yet to find a single instance of
this actually called with multiple arguments.  Pointing out that it's
*possible* isn't the same as demonstrating it's an actual problem.  I'm
quite willing to believe that it is, but haven't yet seen evidence of it.
For whatever reason, people seem much (and, in my experience so far,
infinitely <wink>) more prone to make the

    list.append(1, 2, 3)

error than the

    maybethisisanappend(1, 2, 3)

error.

> (a great tool BTW, thanks Tim; I noticed that it leaks memory badly
> though).

Which Python?  Which OS?  How do you know?  What were you running it over?

Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the
total (code + data) virtual memory allocated to it peaked at about 2Mb a few
seconds into the run, and actually decreased as time went on.  So, akin to
the bound method multi-argument append problem, the "checkappend leak
problem" is something I simply have no reason to believe <wink>.  Check your
claim again?  checkappend.py itself obviously creates no cycles or holds on
to any state across files, so if you're seeing a leak it must be a bug in
some other part of the version of Python + std libraries you're using.
Maybe a new 1.6 bug?  Something you did while adding Unicode?  Etc.  Tell us
what you were running.

Has anyone else seen a leak?


From tim_one@email.msn.com  Thu Mar  2 05:50:19 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 2 Mar 2000 00:50:19 -0500
Subject: [Python-Dev] str vs repr at prompt again (FW: String printing behavior?)
Message-ID: <001401bf840b$3177ba60$732d153f@tim>

Another unsolicited testimonial that countless users are oppressed by
auto-repr (as opposed to auto-str) at the interpreter prompt.  Just trying
to keep a once-hot topic from going stone cold forever <wink>.


-----Original Message-----
From: python-list-admin@python.org [mailto:python-list-admin@python.org]
On Behalf Of Ted Drain
Sent: Wednesday, March 01, 2000 5:42 PM
To: python-list@python.org
Subject: String printing behavior?


Hi all,
I've got a question about the string printing behavior.  If I define a
functions as:

>>> def foo():
...    return "line1\nline2"

>>> foo()
'line1\013line2'

>>> print foo()
line1
line2

>>>

It seems to me that the default printing behavior for strings should match
behavior of the print routine.  I realize that some people may want to
see embedded control codes, but I would advocate a seperate method for
printing raw byte sequences.

We are using the python interactive prompt as a pseudo-matlab like user
interface and the current printing behavior is very confusing to users.
It also means that functions that return text (like help routines)
must print the string rather than returning it.  Returning the string
is much more flexible because it allows the string to be captured
easily and redirected.

Any thoughts?

Ted

--
Ted Drain   Jet Propulsion Laboratory    Ted.Drain@jpl.nasa.gov
--
http://www.python.org/mailman/listinfo/python-list


From mal@lemburg.com  Thu Mar  2 07:42:33 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 02 Mar 2000 08:42:33 +0100
Subject: [Python-Dev] breaking list.append()
References: <001001bf8405$5f9582c0$732d153f@tim>
Message-ID: <38BE1B69.E0B88B41@lemburg.com>

Tim Peters wrote:
> 
> [/F]
> >     append = list.append
> >     for x in something:
> >         append(...)
> 
> [M.-A. Lemburg]
> > Same here.  checkappend.py doesn't find these
> 
> As detailed in a c.l.py posting, I have yet to find a single instance of
> this actually called with multiple arguments.  Pointing out that it's
> *possible* isn't the same as demonstrating it's an actual problem.  I'm
> quite willing to believe that it is, but haven't yet seen evidence of it.

Haven't had time to check this yet, but I'm pretty sure
there are some instances of this idiom in my code. Note that
I did in fact code like this on purpose: it saves a tuple
construction for every append, which can make a difference
in tight loops...

> For whatever reason, people seem much (and, in my experience so far,
> infinitely <wink>) more prone to make the
> 
>     list.append(1, 2, 3)
> 
> error than the
> 
>     maybethisisanappend(1, 2, 3)
> 
> error.

Of course... still there are hidden instances of the problem
which are yet to be revealed. For my own code the siutation
is even worse, since I sometimes did:

add = list.append
for x in y:
   add(x,1,2)

> > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly
> > though).
> 
> Which Python?  Which OS?  How do you know?  What were you running it over?

That's Python 1.5 on Linux2. I let the script run over
a large lib directory and my projects directory. In the
projects directory the script consumed as much as 240MB
of process size.
 
> Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the
> total (code + data) virtual memory allocated to it peaked at about 2Mb a few
> seconds into the run, and actually decreased as time went on.  So, akin to
> the bound method multi-argument append problem, the "checkappend leak
> problem" is something I simply have no reason to believe <wink>.  Check your
> claim again?  checkappend.py itself obviously creates no cycles or holds on
> to any state across files, so if you're seeing a leak it must be a bug in
> some other part of the version of Python + std libraries you're using.
> Maybe a new 1.6 bug?  Something you did while adding Unicode?  Etc.  Tell us
> what you were running.

I'll try the same thing again using Python1.5.2 and the CVS version.
 
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Thu Mar  2 07:46:49 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 02 Mar 2000 08:46:49 +0100
Subject: [Python-Dev] breaking list.append()
References: <001001bf8405$5f9582c0$732d153f@tim> <38BE1B69.E0B88B41@lemburg.com>
Message-ID: <38BE1C69.C8A9E6B0@lemburg.com>

"M.-A. Lemburg" wrote:
> 
> > > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly
> > > though).
> >
> > Which Python?  Which OS?  How do you know?  What were you running it over?
> 
> That's Python 1.5 on Linux2. I let the script run over
> a large lib directory and my projects directory. In the
> projects directory the script consumed as much as 240MB
> of process size.
> 
> > Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the
> > total (code + data) virtual memory allocated to it peaked at about 2Mb a few
> > seconds into the run, and actually decreased as time went on.  So, akin to
> > the bound method multi-argument append problem, the "checkappend leak
> > problem" is something I simply have no reason to believe <wink>.  Check your
> > claim again?  checkappend.py itself obviously creates no cycles or holds on
> > to any state across files, so if you're seeing a leak it must be a bug in
> > some other part of the version of Python + std libraries you're using.
> > Maybe a new 1.6 bug?  Something you did while adding Unicode?  Etc.  Tell us
> > what you were running.
> 
> I'll try the same thing again using Python1.5.2 and the CVS version.

Using the Unicode patched CVS version there's no leak anymore.
Couldn't find a 1.5.2 version on my machine... I'll build one
later.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido@python.org  Thu Mar  2 15:32:32 2000
From: guido@python.org (Guido van Rossum)
Date: Thu, 02 Mar 2000 10:32:32 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
Message-ID: <200003021532.KAA17088@eric.cnri.reston.va.us>

I was looking at the code that invokes __del__, with the intent to
implement a feature from Java: in Java, a finalizer is only called
once per object, even if calling it makes the object live longer.

To implement this, we need a flag in each instance that means "__del__
was called".  I opened the creation code for instances, looking for
the right place to set the flag.  I then realized that it might be
smart, now that we have this flag anyway, to set it to "true" during
initialization.  There are a number of exits from the initialization
where the object is created but not fully initialized, where the new
object is DECREF'ed and NULL is returned.  When such an exit is taken,
__del__ is called on an incompletely initialized object!  Example:

	>>> class C:
	  def __del__(self): print "deleting", self

	>>> x = C(1)
 !-->   deleting <__main__.C instance at 1686d8>
	Traceback (innermost last):
	  File "<stdin>", line 1, in ?
	TypeError: this constructor takes no arguments
	>>>

Now I have a choice to make.  If the class has an __init__, should I
clear the flag only after __init__ succeeds?  This means that if
__init__ raises an exception, __del__ is never called.  This is an
incompatibility.  It's possible that someone has written code that
relies on __del__ being called even when __init__ fails halfway, and
then their code would break.

But it is just as likely that calling __del__ on a partially
uninitialized object is a bad mistake, and I am doing all these cases
a favor by not calling __del__ when __init__ failed!

Any opinions?  If nobody speaks up, I'll make the change.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From bwarsaw@cnri.reston.va.us (Barry A. Warsaw)  Thu Mar  2 16:44:00 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) (Barry A. Warsaw)
Date: Thu, 2 Mar 2000 11:44:00 -0500 (EST)
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
References: <200003021532.KAA17088@eric.cnri.reston.va.us>
Message-ID: <14526.39504.36065.657527@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido@python.org> writes:

    GvR> Now I have a choice to make.  If the class has an __init__,
    GvR> should I clear the flag only after __init__ succeeds?  This
    GvR> means that if __init__ raises an exception, __del__ is never
    GvR> called.  This is an incompatibility.  It's possible that
    GvR> someone has written code that relies on __del__ being called
    GvR> even when __init__ fails halfway, and then their code would
    GvR> break.

It reminds me of the separation between object allocation and
initialization in ObjC.  

    GvR> But it is just as likely that calling __del__ on a partially
    GvR> uninitialized object is a bad mistake, and I am doing all
    GvR> these cases a favor by not calling __del__ when __init__
    GvR> failed!

    GvR> Any opinions?  If nobody speaks up, I'll make the change.

I think you should set the flag right before you call __init__(),
i.e. after (nearly all) the C level initialization has occurred.
Here's why: your "favor" can easily be accomplished by Python
constructs in the __init__():

class MyBogo:
    def __init__(self):
	self.get_delified = 0
	do_sumtin_exceptional()
	self.get_delified = 1

    def __del__(self):
	if self.get_delified:
	    ah_sweet_release()

-Barry


From gstein@lyra.org  Thu Mar  2 17:14:35 2000
From: gstein@lyra.org (Greg Stein)
Date: Thu, 2 Mar 2000 09:14:35 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ only after successful
 __init__?
In-Reply-To: <200003021532.KAA17088@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003020913520.2146-100000@nebula.lyra.org>

On Thu, 2 Mar 2000, Guido van Rossum wrote:
>...
> But it is just as likely that calling __del__ on a partially
> uninitialized object is a bad mistake, and I am doing all these cases
> a favor by not calling __del__ when __init__ failed!
> 
> Any opinions?  If nobody speaks up, I'll make the change.

+1 on calling __del__ IFF __init__ completes successfully.


Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From jeremy@cnri.reston.va.us  Thu Mar  2 17:15:14 2000
From: jeremy@cnri.reston.va.us (Jeremy Hylton)
Date: Thu, 2 Mar 2000 12:15:14 -0500 (EST)
Subject: [Python-Dev] str vs repr at prompt again (FW: String printing behavior?)
In-Reply-To: <001401bf840b$3177ba60$732d153f@tim>
References: <001401bf840b$3177ba60$732d153f@tim>
Message-ID: <14526.41378.374653.497993@goon.cnri.reston.va.us>

>>>>> "TP" == Tim Peters <tim_one@email.msn.com> writes:

  TP> Another unsolicited testimonial that countless users are
  TP> oppressed by auto-repr (as opposed to auto-str) at the
  TP> interpreter prompt.  Just trying to keep a once-hot topic from
  TP> going stone cold forever <wink>.

  [Signature from the included message:]

  >> -- Ted Drain Jet Propulsion Laboratory Ted.Drain@jpl.nasa.gov --

This guy is probably a rocket scientist.  We want the language to be
useful for everybody, not just rocket scientists. <wink>

Jeremy


From guido@python.org  Thu Mar  2 22:45:37 2000
From: guido@python.org (Guido van Rossum)
Date: Thu, 02 Mar 2000 17:45:37 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: Your message of "Thu, 02 Mar 2000 11:44:00 EST."
 <14526.39504.36065.657527@anthem.cnri.reston.va.us>
References: <200003021532.KAA17088@eric.cnri.reston.va.us>
 <14526.39504.36065.657527@anthem.cnri.reston.va.us>
Message-ID: <200003022245.RAA20265@eric.cnri.reston.va.us>

> >>>>> "GvR" == Guido van Rossum <guido@python.org> writes:
> 
>     GvR> Now I have a choice to make.  If the class has an __init__,
>     GvR> should I clear the flag only after __init__ succeeds?  This
>     GvR> means that if __init__ raises an exception, __del__ is never
>     GvR> called.  This is an incompatibility.  It's possible that
>     GvR> someone has written code that relies on __del__ being called
>     GvR> even when __init__ fails halfway, and then their code would
>     GvR> break.

[Barry]
> It reminds me of the separation between object allocation and
> initialization in ObjC.  

Is that good or bad?

>     GvR> But it is just as likely that calling __del__ on a partially
>     GvR> uninitialized object is a bad mistake, and I am doing all
>     GvR> these cases a favor by not calling __del__ when __init__
>     GvR> failed!
> 
>     GvR> Any opinions?  If nobody speaks up, I'll make the change.
> 
> I think you should set the flag right before you call __init__(),
> i.e. after (nearly all) the C level initialization has occurred.
> Here's why: your "favor" can easily be accomplished by Python
> constructs in the __init__():
> 
> class MyBogo:
>     def __init__(self):
> 	self.get_delified = 0
> 	do_sumtin_exceptional()
> 	self.get_delified = 1
> 
>     def __del__(self):
> 	if self.get_delified:
> 	    ah_sweet_release()

But the other behavior (call __del__ even when __init__ fails) can
also easily be accomplished in Python:

    class C:

        def __init__(self):
            try:
                ...stuff that may fail...
            except:
                self.__del__()
                raise

        def __del__(self):
            ...cleanup...

I believe that in almost all cases the programmer would be happier if
__del__ wasn't called when their __init__ fails.  This makes it easier
to write a __del__ that can assume that all the object's fields have
been properly initialized.

In my code, typically when __init__ fails, this is a symptom of a
really bad bug (e.g. I just renamed one of __init__'s arguments and
forgot to fix all references), and I don't care much about cleanup
behavior.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From bwarsaw@cnri.reston.va.us  Thu Mar  2 22:52:31 2000
From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us)
Date: Thu, 2 Mar 2000 17:52:31 -0500 (EST)
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
References: <200003021532.KAA17088@eric.cnri.reston.va.us>
 <14526.39504.36065.657527@anthem.cnri.reston.va.us>
 <200003022245.RAA20265@eric.cnri.reston.va.us>
Message-ID: <14526.61615.362973.624022@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido@python.org> writes:

    GvR> But the other behavior (call __del__ even when __init__
    GvR> fails) can also easily be accomplished in Python:

It's a fair cop.

    GvR> I believe that in almost all cases the programmer would be
    GvR> happier if __del__ wasn't called when their __init__ fails.
    GvR> This makes it easier to write a __del__ that can assume that
    GvR> all the object's fields have been properly initialized.

That's probably fine; I don't have strong feelings either way.

-Barry

P.S. Interesting what X-Oblique-Strategy was randomly inserted in this
message (but I'm not sure which approach is more "explicit" :).

-Barry


From tim_one@email.msn.com  Fri Mar  3 05:38:59 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Fri, 3 Mar 2000 00:38:59 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: <200003021532.KAA17088@eric.cnri.reston.va.us>
Message-ID: <000001bf84d2$c711e2e0$092d153f@tim>

[Guido]
> I was looking at the code that invokes __del__, with the intent to
> implement a feature from Java: in Java, a finalizer is only called
> once per object, even if calling it makes the object live longer.

Why?  That is, in what way is this an improvement over current behavior?

Note that Java is a bit subtle:  a finalizer is only called once by magic;
explicit calls "don't count".

The Java rules add up to quite a confusing mish-mash.  Python's rules are
*currently* clearer.

I deal with possible exceptions in Python constructors the same way I do in
C++ and Java:  if there's a destructor, don't put anything in __init__ that
may raise an uncaught exception.  Anything dangerous is moved into a
separate .reset() (or .clear() or ...) method.  This works well in practice.

> To implement this, we need a flag in each instance that means "__del__
> was called".

At least <wink>.

> I opened the creation code for instances, looking for the right place
> to set the flag.  I then realized that it might be smart, now that we
> have this flag anyway, to set it to "true" during initialization.  There
> are a number of exits from the initialization where the object is created
> but not fully initialized, where the new object is DECREF'ed and NULL is
> returned.  When such an exit is taken, __del__ is called on an
> incompletely initialized object!

I agree *that* isn't good.  Taken on its own, though, it argues for adding
an "instance construction completed" flag that __del__ later checks, as if
its body were:

    if self.__instance_construction_completed:
        body

That is, the problem you've identified here could be addressed directly.

> Now I have a choice to make.  If the class has an __init__, should I
> clear the flag only after __init__ succeeds?  This means that if
> __init__ raises an exception, __del__ is never called.  This is an
> incompatibility.  It's possible that someone has written code that
> relies on __del__ being called even when __init__ fails halfway, and
> then their code would break.
>
> But it is just as likely that calling __del__ on a partially
> uninitialized object is a bad mistake, and I am doing all these cases
> a favor by not calling __del__ when __init__ failed!
>
> Any opinions?  If nobody speaks up, I'll make the change.

I'd be in favor of fixing the actual problem; I don't understand the point
to the rest of it, especially as it has the potential to break existing code
and I don't see a compensating advantage (surely not compatibility w/
JPython -- JPython doesn't invoke __del__ methods at all by magic, right?
or is that changing, and that's what's driving this?).

too-much-magic-is-dizzying-ly y'rs  - tim


From bwarsaw@cnri.reston.va.us  Fri Mar  3 05:50:16 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 3 Mar 2000 00:50:16 -0500 (EST)
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
References: <200003021532.KAA17088@eric.cnri.reston.va.us>
 <000001bf84d2$c711e2e0$092d153f@tim>
Message-ID: <14527.21144.9421.958311@anthem.cnri.reston.va.us>

>>>>> "TP" == Tim Peters <tim_one@email.msn.com> writes:

    TP> (surely not compatibility w/ JPython -- JPython doesn't invoke
    TP> __del__ methods at all by magic, right?  or is that changing,
    TP> and that's what's driving this?).

No, JPython doesn't invoke __del__ methods by magic, and I don't have
any plans to change that.

-Barry


From ping@lfw.org  Fri Mar  3 09:00:21 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 3 Mar 2000 01:00:21 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ only after successful
 __init__?
In-Reply-To: <Pine.LNX.4.10.10003020913520.2146-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.10.10003030049150.1788-100000@skuld.lfw.org>

On Thu, 2 Mar 2000, Greg Stein wrote:
> On Thu, 2 Mar 2000, Guido van Rossum wrote:
> >...
> > But it is just as likely that calling __del__ on a partially
> > uninitialized object is a bad mistake, and I am doing all these cases
> > a favor by not calling __del__ when __init__ failed!
> > 
> > Any opinions?  If nobody speaks up, I'll make the change.
> 
> +1 on calling __del__ IFF __init__ completes successfully.

That would be my vote as well.

What convinced me of this is the following:

If it's up to the implementation of __del__ to deal with a problem
that happened during initialization, you only know about the problem
with very coarse granularity.  It's a pain (or even impossible) to
then rediscover the information you need to recover adequately.

If on the other hand you deal with the problem in __init__, then
you have much better control over what is happening, because you
can position try/except blocks precisely where you need them to
deal with specific potential problems.  Each block can take care
of its case appropriately, and re-raise if necessary.

In general, it seems to me that what you want to do when __init__
runs afoul is going to be different from what you want to do to
take care of object cleanup in __del__.  So it doesn't belong
there -- it belongs in an except: clause in __init__.

Even though it's an incompatibility, i really think this is the
right behaviour.


-- ?!ng

"To be human is to continually change.  Your desire to remain as you are
is what ultimately limits you."
    -- The Puppet Master, Ghost in the Shell


From guido@python.org  Fri Mar  3 16:13:16 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 03 Mar 2000 11:13:16 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: Your message of "Fri, 03 Mar 2000 00:38:59 EST."
 <000001bf84d2$c711e2e0$092d153f@tim>
References: <000001bf84d2$c711e2e0$092d153f@tim>
Message-ID: <200003031613.LAA21571@eric.cnri.reston.va.us>

> [Guido]
> > I was looking at the code that invokes __del__, with the intent to
> > implement a feature from Java: in Java, a finalizer is only called
> > once per object, even if calling it makes the object live longer.

[Tim]
> Why?  That is, in what way is this an improvement over current behavior?
> 
> Note that Java is a bit subtle:  a finalizer is only called once by magic;
> explicit calls "don't count".

Of course.  Same in my proposal.  But I wouldn't call it "by magic" --
just "on behalf of the garbage collector".

> The Java rules add up to quite a confusing mish-mash.  Python's rules are
> *currently* clearer.

I don't find the Java rules confusing.  It seems quite useful that the
GC promises to call the finalizer at most once -- this can simplify
the finalizer logic.  (Otherwise it may have to ask itself, "did I
clean this already?" and leave notes for itself.)  Explicit finalizer
calls are always a mistake and thus "don't count" -- the response to
that should in general be "don't do that" (unless you have
particularly stupid callers -- or very fearful lawyers :-).

> I deal with possible exceptions in Python constructors the same way I do in
> C++ and Java:  if there's a destructor, don't put anything in __init__ that
> may raise an uncaught exception.  Anything dangerous is moved into a
> separate .reset() (or .clear() or ...) method.  This works well in practice.

Sure, but the rule "if __init__ fails, __del__ won't be called" means
that we don't have to program our __init__ or __del__ quite so
defensively.  Most people who design a __del__ probably assume that
__init__ has run to completion.  The typical scenario (which has
happened to me!  And I *implemented* the damn thing!) is this:
__init__ opens a file and assigns it to an instance variable; __del__
closes the file.  This is tested a few times and it works great.  Now
in production the file somehow unexpectedly fails to be openable.
Sure, the programmer should've expected that, but she didn't.  Now, at
best, the failed __del__ creates an additional confusing error
message on top of the traceback generated by IOError.  At worst, the
failed __del__ could wreck the original traceback.

Note that I'm not proposing to change the C level behavior; when a
Py<Object>_New() function is halfway its initialization and decides to
bail out, it does a DECREF(self) and you bet that at this point the
<object>_dealloc() function gets called (via
self->ob_type->tp_dealloc).  Occasionally I need to initialize certain
fields to NULL so that the dealloc() function doesn't try to free
memory that wasn't allocated.  Often it's as simple as using XDECREF
instead of DECREF in the dealloc() function (XDECREF is safe when the
argument is NULL, DECREF dumps core, saving a load-and-test if you are
sure its arg is a valid object).

> > To implement this, we need a flag in each instance that means "__del__
> > was called".
> 
> At least <wink>.
> 
> > I opened the creation code for instances, looking for the right place
> > to set the flag.  I then realized that it might be smart, now that we
> > have this flag anyway, to set it to "true" during initialization.  There
> > are a number of exits from the initialization where the object is created
> > but not fully initialized, where the new object is DECREF'ed and NULL is
> > returned.  When such an exit is taken, __del__ is called on an
> > incompletely initialized object!
> 
> I agree *that* isn't good.  Taken on its own, though, it argues for adding
> an "instance construction completed" flag that __del__ later checks, as if
> its body were:
> 
>     if self.__instance_construction_completed:
>         body
> 
> That is, the problem you've identified here could be addressed directly.

Sure -- but I would argue that when __del__ returns,
__instance_construction_completed should be reset to false, because
the destruction (conceptually, at least) cancels out the construction!

> > Now I have a choice to make.  If the class has an __init__, should I
> > clear the flag only after __init__ succeeds?  This means that if
> > __init__ raises an exception, __del__ is never called.  This is an
> > incompatibility.  It's possible that someone has written code that
> > relies on __del__ being called even when __init__ fails halfway, and
> > then their code would break.
> >
> > But it is just as likely that calling __del__ on a partially
> > uninitialized object is a bad mistake, and I am doing all these cases
> > a favor by not calling __del__ when __init__ failed!
> >
> > Any opinions?  If nobody speaks up, I'll make the change.
> 
> I'd be in favor of fixing the actual problem; I don't understand the point
> to the rest of it, especially as it has the potential to break existing code
> and I don't see a compensating advantage (surely not compatibility w/
> JPython -- JPython doesn't invoke __del__ methods at all by magic, right?
> or is that changing, and that's what's driving this?).

JPython's a red herring here.

I think that the proposed change probably *fixes* much morecode that
is subtly wrong than it breaks code that is relying on __del__ being
called after a partial __init__.  All the rules relating to __del__
are confusing (e.g. what __del__ can expect to survive in its
globals).

Also note Ping's observation:

| If it's up to the implementation of __del__ to deal with a problem
| that happened during initialization, you only know about the problem
| with very coarse granularity.  It's a pain (or even impossible) to
| then rediscover the information you need to recover adequately.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one@email.msn.com  Fri Mar  3 16:49:52 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Fri, 3 Mar 2000 11:49:52 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: <200003031613.LAA21571@eric.cnri.reston.va.us>
Message-ID: <000501bf8530$7f8c78a0$b0a0143f@tim>

[Tim]
>> Note that Java is a bit subtle:  a finalizer is only called
>> once by magic; explicit calls "don't count".

[Guido]
> Of course.  Same in my proposal.

OK -- that wasn't clear.

> But I wouldn't call it "by magic" -- just "on behalf of the garbage
> collector".

Yup, magically called <wink>.

>> The Java rules add up to quite a confusing mish-mash.  Python's
>> rules are *currently* clearer.

> I don't find the Java rules confusing.

"add up" == "taken as a whole"; include the Java spec's complex state
machine for cleanup semantics, and the later complications added by three
(four?) distinct flavors of weak reference, and I doubt 1 Java programmer in
1,000 actually understands the rules.  This is why I'm wary of moving in the
Java *direction* here.  Note that Java programmers in past c.l.py threads
have generally claimed Java's finalizers are so confusing & unpredictable
they don't use them at all!  Which, in the end, is probably a good idea in
Python too <0.5 wink>.

> It seems quite useful that the GC promises to call the finalizer at
> most once -- this can simplify the finalizer logic.

Granting that explicit calls are "use at your own risk", the only
user-visible effect of "called only once" is in the presence of
resurrection.  Now in my Python experience, on the few occasions I've
resurrected an object in __del__, *of course* I expected __del__ to get
called again if the object is about to die again!  Typical:

    def __del__(self):
        if oops_i_still_need_to_stay_alive:
            resurrect(self)
        else:
            # really going away
            release(self.critical_resource)

Call __del__ only once, and code like this is busted bigtime.

OTOH, had I written __del__ logic that relied on being called only once,
switching the implementation to call it more than once would break *that*
bigtime.  Neither behavior is an obvious all-cases win to me, or even a
plausibly most-cases win.  But Python already took a stand on this & so I
think you need a *good* reason to change semantics now.

> ...
> Sure, but the rule "if __init__ fails, __del__ won't be called" means
> that we don't have to program our __init__ or __del__ quite so
> defensively.  Most people who design a __del__ probably assume that
> __init__ has run to completion. ...

This is (or can easily be made) a separate issue, & I agreed the first time
this seems worth fixing (although if nobody has griped about it in a decade
of use, it's hard to call it a major bug <wink>).

> ...
> Sure -- but I would argue that when __del__ returns,
>__instance_construction_completed should be reset to false, because
> the destruction (conceptually, at least) cancels out the construction!

In the __del__ above (which is typical of the cases of resurrection I've
seen), there is no such implication.  Perhaps this is philosophical abuse of
Python's intent, but if so it relied only on trusting its advertised
semantics.

> I think that the proposed change probably *fixes* much morecode that
> is subtly wrong than it breaks code that is relying on __del__ being
> called after a partial __init__.

Yes, again, I have no argument against refusing to call __del__ unless
__init__ succeeded.  Going beyond that to a new "called at most once" rule
is indeed going beyond that, *will* break reasonable old code, and holds no
particular attraction that I can see (it trades making one kind of
resurrection scenario easier at the cost of making other kinds harder).

If there needs to be incompatible change here, curiously enough I'd be more
in favor of making resurrection illegal period (which could *really*
simplify gc's headaches).

> All the rules relating to __del__ are confusing (e.g. what __del__ can
> expect to survive in its globals).

Problems unique to final shutdown don't seem relevant here.

> Also note Ping's observation: ...

I can't agree with that yet another time without being quadruply redundant
<wink>.


From guido@python.org  Fri Mar  3 16:50:08 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 03 Mar 2000 11:50:08 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: Your message of "Wed, 01 Mar 2000 00:44:10 EST."
 <200003010544.AAA13155@eric.cnri.reston.va.us>
References: <20000229153421.A16502@acs.ucalgary.ca>
 <200003010544.AAA13155@eric.cnri.reston.va.us>
Message-ID: <200003031650.LAA21647@eric.cnri.reston.va.us>

We now have two implementations of Eric Tiedemann's idea: Neil and I
both implemented it.  It's too soon to post the patch sets (both are
pretty rough) but I've got another design question.

Once we've identified a bunch of objects that are only referring to
each other (i.e., one or more cycles) we have to dispose of them.

The question is, how?  We can't just call free on each of the objects;
some may not be allocated with malloc, and some may contain pointers
to other malloc'ed memory that also needs to be freed.

So we have to get their destructors involved.  But how?  Calling
ob->ob_type->tp_dealloc(ob) for an object who reference count is
unsafe -- this will destroy the object while there are still
references to it!  Those references are all coming from other objects
that are part of the same cycle; those objects will also be
deallocated and they will reference the deallocated objects (if only
to DECREF them).

Neil uses the same solution that I use when finalizing the Python
interpreter -- find the dictionaries and call PyDict_Clear() on them.
(In his unpublished patch, he also clears the lists using
PyList_SetSlice(list, 0, list->ob_size, NULL).  He's also generalized
so that *every* object can define a tp_clear function in its type
object.)

As long as every cycle contains at least one dictionary or list
object, this will break cycles reliably and get rid of all the
garbage.  (If you wonder why: clearing the dict DECREFs the next
object(s) in the cycle; if the last dict referencing a particular
object is cleared, the last DECREF will deallocate that object, which
will in turn DECREF the objects it references, and so forth.  Since
none of the objects in the cycle has incoming references from outside
the cycle, we can prove that this will delete all objects as long as
there's a dict or list in each cycle.

However, there's a snag.  It's the same snag as what finalizing the
Python interpreter runs into -- it has to do with __del__ methods and
the undefined order in which the dictionaries are cleared.

For example, it's quite possible that the first dictionary we clear is
the __dict__ of an instance, so this zaps all its instance variables.
Suppose this breaks the cycle, so then the instance itself gets
DECREFed to zero.  Its deallocator will be called.  If it's got a
__del__, this __del__ will be called -- but all the instance variables
have already been zapped, so it will fail miserably!

It's also possible that the __dict__ of a class involved in a cycle
gets cleared first, in which case the __del__ no longer "exists", and
again the cleanup is skipped.

So the question is: What to *do*?

My solution is to make an extra pass over all the garbage objects
*before* we clear dicts and lists, and for those that are instances
and have __del__ methods, call their __del__ ("by magic", as Tim calls
it in another post).  The code in instance_dealloc() already does the
right thing here: it calls __del__, then discovers that the reference
count is > 0 ("I'm not dead yet" :-), and returns without freeing the
object.  (This is also why I want to introduce a flag ensuring that
__del__ gets called by instance_dealloc at most once: later when the
instance gets DECREFed to 0, instance_dealloc is called again and will
correctly free the object; but we don't want __del__ called again.)
[Note for Neil: somehow I forgot to add this logic to the code;
in_del_called isn't used!  The change is obvious though.]

This still leaves a problem for the user: if two class instances
reference each other and both have a __del__, we can't predict whose
__del__ is called first when they are called as part of cycle
collection.  The solution is to write each __del__ so that it doesn't
depend on the other __del__.

Someone (Tim?) in the past suggested a different solution (probably
found in another language): for objects that are collected as part of
a cycle, the destructor isn't called at all.  The memory is freed
(since it's no longer reachable), but the destructor is not called --
it is as if the object lives on forever.

This is theoretically superior, but not practical: when I have an
object that creates a temp file, I want to be able to reliably delete
the temp file in my destructor, even when I'm part of a cycle!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jack@oratrix.nl  Fri Mar  3 16:57:54 2000
From: jack@oratrix.nl (Jack Jansen)
Date: Fri, 03 Mar 2000 17:57:54 +0100
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: Message by Guido van Rossum <guido@python.org> ,
 Fri, 03 Mar 2000 11:50:08 -0500 , <200003031650.LAA21647@eric.cnri.reston.va.us>
Message-ID: <20000303165755.490EA371868@snelboot.oratrix.nl>

The __init__ rule for calling __del__ has me confused. Is this per-class or 
per-object?

I.e. what will happen in the following case:

class Purse:
	def __init__(self):
		self.balance = WithdrawCashFromBank(1000)

	def __del__(self):
		PutCashBackOnBank(self.balance)
		self.balance = 0

class LossyPurse(Purse):
	def __init__(self):
		Purse.__init__(self)
		 raise 'kaboo! kaboo!'

If the new scheme means that the __del__ method of Purse isn't called I think 
I don't like it. In the current scheme I can always program defensively:
	def __del__(self):
		try:
			b = self.balance
			self.balance = 0
		except AttributeError:
			pass
		else:
			PutCashBackOnBank(b)
but in a new scheme with a per-object "__del__ must be called" flag I can't...
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido@python.org  Fri Mar  3 17:05:00 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 03 Mar 2000 12:05:00 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: Your message of "Fri, 03 Mar 2000 11:49:52 EST."
 <000501bf8530$7f8c78a0$b0a0143f@tim>
References: <000501bf8530$7f8c78a0$b0a0143f@tim>
Message-ID: <200003031705.MAA21700@eric.cnri.reston.va.us>

OK, so we're down to this one point: if __del__ resurrects the object,
should __del__ be called again later?  Additionally, should
resurrection be made illegal?

I can easily see how __del__ could *accidentally* resurrect the object
as part of its normal cleanup -- e.g. you make a call to some other
routine that helps with the cleanup, passing self as an argument, and
this other routine keeps a helpful cache of the last argument for some
reason.  I don't see how we could forbid this type of resurrection.
(What are you going to do?  You can't raise an exception from
instance_dealloc, since it is called from DECREF.  You can't track
down the reference and replace it with a None easily.)
In this example, the helper routine will eventually delete the object
from its cache, at which point it is truly deleted.  It would be
harmful, not helpful, if __del__ was called again at this point.

Now, it is true that the current docs for __del__ imply that
resurrection is possible.  The intention of that note was to warn
__del__ writers that in the case of accidental resurrection __del__
might be called again.  The intention certainly wasn't to allow or
encourage intentional resurrection.

Would there really be someone out there who uses *intentional*
resurrection?  I severely doubt it.  I've never heard of this.

[Jack just finds a snag]

> The __init__ rule for calling __del__ has me confused. Is this per-class or 
> per-object?
> 
> I.e. what will happen in the following case:
> 
> class Purse:
> 	def __init__(self):
> 		self.balance = WithdrawCashFromBank(1000)
> 
> 	def __del__(self):
> 		PutCashBackOnBank(self.balance)
> 		self.balance = 0
> 
> class LossyPurse(Purse):
> 	def __init__(self):
> 		Purse.__init__(self)
> 		 raise 'kaboo! kaboo!'
> 
> If the new scheme means that the __del__ method of Purse isn't called I think 
> I don't like it. In the current scheme I can always program defensively:
> 	def __del__(self):
> 		try:
> 			b = self.balance
> 			self.balance = 0
> 		except AttributeError:
> 			pass
> 		else:
> 			PutCashBackOnBank(b)
> but in a new scheme with a per-object "__del__ must be called" flag I can't...

Yes, that's a problem.  But there are other ways for the subclass to
break the base class's invariant (e.g. it could override __del__
without calling the base class' __del__).

So I think it's a red herring.  In Python 3000, typechecked classes
may declare invariants that are enforced by the inheritance mechanism;
then we may need to keep track which base class constructors succeeded
and only call corresponding destructors.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Fri Mar  3 18:17:11 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 03 Mar 2000 19:17:11 +0100
Subject: [Python-Dev] Design question: call __del__ only after successful
 __init__?
References: <000501bf8530$7f8c78a0$b0a0143f@tim> <200003031705.MAA21700@eric.cnri.reston.va.us>
Message-ID: <38C001A7.6CF8F365@lemburg.com>

Guido van Rossum wrote:
> 
> OK, so we're down to this one point: if __del__ resurrects the object,
> should __del__ be called again later?  Additionally, should
> resurrection be made illegal?

Yes and no :-)

One example comes to mind: implementations of weak references,
which manage weak object references themselves (as soon as
__del__ is called the weak reference implementation takes
over the object). Another example is that of free list
like implementations which reduce object creation times
by implementing smart object recycling, e.g. objects could
keep allocated dictionaries alive or connections to databases
open, etc.

As for the second point: 
Calling __del__ again is certainly needed to keep application
logic sane... after all, __del__ should be called whenever the
refcount reaches 0 -- and that can happend more than once
in the objects life-time if reanimation occurs.

> I can easily see how __del__ could *accidentally* resurrect the object
> as part of its normal cleanup -- e.g. you make a call to some other
> routine that helps with the cleanup, passing self as an argument, and
> this other routine keeps a helpful cache of the last argument for some
> reason.  I don't see how we could forbid this type of resurrection.
> (What are you going to do?  You can't raise an exception from
> instance_dealloc, since it is called from DECREF.  You can't track
> down the reference and replace it with a None easily.)
> In this example, the helper routine will eventually delete the object
> from its cache, at which point it is truly deleted.  It would be
> harmful, not helpful, if __del__ was called again at this point.

I'd say this is an application logic error -- nothing that
the mechanism itself can help with automagically. OTOH,
turning multi calls to __del__ off, would make certain
techniques impossible.

> Now, it is true that the current docs for __del__ imply that
> resurrection is possible.  The intention of that note was to warn
> __del__ writers that in the case of accidental resurrection __del__
> might be called again.  The intention certainly wasn't to allow or
> encourage intentional resurrection.

I don't think that docs are the right argument here ;-)
It is simply the reference counting logic that plays its role:
__del__ is called when refcount reaches 0, which usually
means that the object is about to be garbage collected...
unless the object is rereferenced by some other object and
thus gets reanimated.
 
> Would there really be someone out there who uses *intentional*
> resurrection?  I severely doubt it.  I've never heard of this.

BTW, I can't see what the original question has to do with this
discussion ... calling __del__ only after successful __init__
is ok, IMHO, but what does this have to do with the way __del__
itself is implemented ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Fri Mar  3 18:30:36 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 03 Mar 2000 19:30:36 +0100
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
References: <20000229153421.A16502@acs.ucalgary.ca>
 <200003010544.AAA13155@eric.cnri.reston.va.us> <200003031650.LAA21647@eric.cnri.reston.va.us>
Message-ID: <38C004CC.1FE0A501@lemburg.com>

[Guido about ways to cleanup cyclic garbage]

FYI, I'm using a special protocol for disposing of cyclic
garbage: the __cleanup__ protocol. The purpose of this call
is probably similar to Neil's tp_clear: it is intended to
let objects break possible cycles in their own storage scope,
e.g. instances can delete instance variables which they
know can cause cyclic garbage.

The idea is simple: give all power to the objects rather
than try to solve everything with one magical master plan.

The mxProxy package has details on the protocol. The __cleanup__
method is called by the Proxy when the Proxy is about to be deleted.
If all references to an object go through the Proxy, the
__cleanup__ method call can easily break cycles to have the
refcount reach zero in which case __del__ is called. Since the
object knows about this scheme it can take precautions to
make sure that __del__ still works after __cleanup__ was
called.

Anyway, just a thought... there are probably many ways to do
all this.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From tismer@tismer.com  Fri Mar  3 18:51:55 2000
From: tismer@tismer.com (Christian Tismer)
Date: Fri, 03 Mar 2000 19:51:55 +0100
Subject: [Python-Dev] Design question: call __del__ only after successful
 __init__?
References: <000501bf8530$7f8c78a0$b0a0143f@tim> <200003031705.MAA21700@eric.cnri.reston.va.us>
Message-ID: <38C009CB.72BD49CA@tismer.com>


Guido van Rossum wrote:
> 
> OK, so we're down to this one point: if __del__ resurrects the object,
> should __del__ be called again later?  Additionally, should
> resurrection be made illegal?

[much stuff]

Just a random note:

What if we had a __del__ with zombie behavior?

Assume an instance that is about to be destructed.
Then __del__ is called via normal method lookup.
What we want is to let this happen only once.
Here the Zombie:
After method lookup, place a dummy __del__ into the
to-be-deleted instance dict, and we are sure that
this does not harm.
Kinda "yes its there, but a broken link ". The zombie
always works by doing nothing. Makes some sense?

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From gstein@lyra.org  Fri Mar  3 23:09:48 2000
From: gstein@lyra.org (Greg Stein)
Date: Fri, 3 Mar 2000 15:09:48 -0800 (PST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>

You may as well remove the entire "vi" concept from ConfigParser. Since
"vi" can be *only* a '=' or ':', then you aren't truly checking anything
in the "if" statement. Further, "vi" is used nowhere else, so that
variable and the corresponding regex group can be nuked altogether.

IMO, I'm not sure why the ";" comment form was initially restricted to
just one option format in the first place.

Cheers,
-g

On Fri, 3 Mar 2000, Jeremy Hylton wrote:
> Update of /projects/cvsroot/python/dist/src/Lib
> In directory bitdiddle:/home/jhylton/python/src/Lib
> 
> Modified Files:
> 	ConfigParser.py 
> Log Message:
> allow comments beginning with ; in key: value as well as key = value
> 
> 
> Index: ConfigParser.py
> ===================================================================
> RCS file: /projects/cvsroot/python/dist/src/Lib/ConfigParser.py,v
> retrieving revision 1.16
> retrieving revision 1.17
> diff -C2 -r1.16 -r1.17
> *** ConfigParser.py	2000/02/28 23:23:55	1.16
> --- ConfigParser.py	2000/03/03 20:43:57	1.17
> ***************
> *** 359,363 ****
>                           optname, vi, optval = mo.group('option', 'vi', 'value')
>                           optname = string.lower(optname)
> !                         if vi == '=' and ';' in optval:
>                               # ';' is a comment delimiter only if it follows
>                               # a spacing character
> --- 359,363 ----
>                           optname, vi, optval = mo.group('option', 'vi', 'value')
>                           optname = string.lower(optname)
> !                         if vi in ('=', ':') and ';' in optval:
>                               # ';' is a comment delimiter only if it follows
>                               # a spacing character
> 
> 
> _______________________________________________
> Python-checkins mailing list
> Python-checkins@python.org
> http://www.python.org/mailman/listinfo/python-checkins
> 

-- 
Greg Stein, http://www.lyra.org/


From jeremy@cnri.reston.va.us  Fri Mar  3 23:15:32 2000
From: jeremy@cnri.reston.va.us (Jeremy Hylton)
Date: Fri, 3 Mar 2000 18:15:32 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us>
 <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>
Message-ID: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>

Thanks for catching that.  I didn't look at the context.  I'm going to
wait, though, until I talk to Fred to mess with the code any more.

General question for python-dev readers: What are your experiences
with ConfigParser?  I just used it to build a simple config parser for
IDLE and found it hard to use for several reasons.  The biggest
problem was that the file format is undocumented.  I also found it
clumsy to have to specify section and option arguments. I ended up
writing a proxy that specializes on section so that get takes only an
option argument.

It sounds like ConfigParser code and docs could use a general cleanup.
Are there any other issues to take care of as part of that cleanup?

Jeremy


From gstein@lyra.org  Fri Mar  3 23:35:09 2000
From: gstein@lyra.org (Greg Stein)
Date: Fri, 3 Mar 2000 15:35:09 -0800 (PST)
Subject: [Python-Dev] ConfigParser stuff (was: CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17)
In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003031525230.14301-100000@nebula.lyra.org>

On Fri, 3 Mar 2000, Jeremy Hylton wrote:
> Thanks for catching that.  I didn't look at the context.  I'm going to
> wait, though, until I talk to Fred to mess with the code any more.

Not a problem. I'm glad that diffs are now posted to -checkins. :-)

> General question for python-dev readers: What are your experiences
> with ConfigParser?

Love it!

> I just used it to build a simple config parser for
> IDLE and found it hard to use for several reasons.  The biggest
> problem was that the file format is undocumented.

In my most complex use of ConfigParser, I had to override SECTCRE to allow
periods in the section name. Of course, that was quite interesting since
the variable is __SECTRE in 1.5.2 (i.e. I had to compensate for the
munging).

I also change OPTCRE to allow a few more charaters ("@" in particular,
which even the update doesn't do). Not a problem nowadays since those are
public.

My subclass also defines a set() method and a delsection() method. These
are used because I write the resulting changes back out to a file. It
might be nice to have a method which writes out a config file (with an
"AUTOGENERATED BY ConfigParser.py -- DO NOT EDIT BY HAND"; or maybe
"... BY <appname> ...").

> I also found it
> clumsy to have to specify section and option arguments.

I found these were critical in my application. I also take advantage of
the sections in my "edna" application for logical organization.

> I ended up
> writing a proxy that specializes on section so that get takes only an
> option argument.
> 
> It sounds like ConfigParser code and docs could use a general cleanup.
> Are there any other issues to take care of as part of that cleanup?

A set() method and a writefile() type of method would be nice.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tim_one@email.msn.com  Sat Mar  4 01:38:43 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Fri, 3 Mar 2000 20:38:43 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <200003031650.LAA21647@eric.cnri.reston.va.us>
Message-ID: <000001bf857a$60b45ac0$c6a0143f@tim>

[Guido]
> ...
> Someone (Tim?) in the past suggested a different solution (probably
> found in another language): for objects that are collected as part of
> a cycle, the destructor isn't called at all.  The memory is freed
> (since it's no longer reachable), but the destructor is not called --
> it is as if the object lives on forever.

Stroustrup has written in favor of this for C++.  It's exactly the kind of
overly slick "good argument" he would never accept from anyone else <0.1
wink>.

> This is theoretically superior, but not practical: when I have an
> object that creates a temp file, I want to be able to reliably delete
> the temp file in my destructor, even when I'm part of a cycle!

A member of the C++ committee assured me Stroustrup is overwhelmingly
opposed on this.  I don't even agree it's theoretically superior:  it relies
on the fiction that gc "may never occur", and that's just silly in practice.

You're moving down the Java path.  I can't possibly do a better job of
explaining the Java rules than the Java Language Spec. does for itself.  So
pick that up and study section 12.6 (Finalization of Class Instances).  The
end result makes little sense to users, but is sufficient to guarantee that
Java itself never blows up.

Note, though, that there is NO good answer to finalizers in cycles!  The
implementation cannot be made smart enough to both avoid trouble and "do the
right thing" from the programmer's POV, because the latter is unknowable.
Somebody has to lose, one way or another.

Rather than risk doing a wrong thing, the BDW collector lets cycles with
finalizers leak.  But it also has optional hacks to support exceptions for
use with C++ (which sometimes creates self-cycles) and Java.  See

    http://reality.sgi.com/boehm_mti/finalization.html

for Boehm's best concentrated <wink> thoughts on the subject.

The only principled approach I know of comes out of the Scheme world.
Scheme has no finalizers, of course.  But it does have gc, and the concept
of "guardians" was invented to address all gc finalization problems in one
stroke.  It's extremely Scheme-like in providing a perfectly general
mechanism with no policy whatsoever.  You (the Scheme programmer) can create
guardian objects, and "register" other objects with a guardian.  At any
time, you can ask a guardian whether some object registered with it is
"ready to die" (i.e., the only thing keeping it alive is its registration
with the guardian).  If so, you can ask it to give you one.  Everything else
is up to you:  if you want to run a finalizer, your problem.  If there are
cycles, also your problem.  Even if there are simple non-cyclic
dependencies, your problem.  Etc.

So those are the extremes:  BDW avoids blame by refusing to do anything.
Java avoids blame by exposing an impossibly baroque implementation-driven
finalization model.  Scheme avoids blame by refusing to do anything "by
magic", but helps you to shoot yourself with the weapon of your choice.

That bad news is that I don't know of a scheme *not* at an extreme!

It's extremely un-Pythonic to let things leak (despite that it has let
things leak for a decade <wink>), but also extremely un-Pythonic to make
some wild-ass guess.

So here's what I'd consider doing:  explicit is better than implicit, and in
the face of ambiguity refuse the temptation to guess.  If a trash cycle
contains a finalizer (my, but that has to be rare. in practice, in
well-designed code!), don't guess, but make it available to the user.  A
gc.guardian() call could expose such beasts, or perhaps a callback could be
registered, invoked when gc finds one of these things.  Anyone crazy enough
to create cyclic trash with finalizers then has to take responsibility for
breaking the cycle themself.  This puts the burden on the person creating
the problem, and they can solve it in the way most appropriate to *their*
specific needs.  IOW, the only people who lose under this scheme are the
ones begging to lose, and their "loss" consists of taking responsibility.

when-a-problem-is-impossible-to-solve-favor-sanity<wink>-ly y'rs  - tim


From gstein@lyra.org  Sat Mar  4 02:59:26 2000
From: gstein@lyra.org (Greg Stein)
Date: Fri, 3 Mar 2000 18:59:26 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim>
Message-ID: <Pine.LNX.4.10.10003031812160.14301-100000@nebula.lyra.org>

On Fri, 3 Mar 2000, Tim Peters wrote:
>...
> Note, though, that there is NO good answer to finalizers in cycles!  The

"Note" ?? Not just a note, but I'd say an axiom :-)

By definition, you have two objects referring to each other in some way.
How can you *definitely* know how to break the link between them? Do you
call A's finalizer or B's first? If they're instances, do you just whack
their __dict__ and hope for the best?

>...
> So here's what I'd consider doing:  explicit is better than implicit, and in
> the face of ambiguity refuse the temptation to guess.  If a trash cycle
> contains a finalizer (my, but that has to be rare. in practice, in
> well-designed code!), don't guess, but make it available to the user.  A
> gc.guardian() call could expose such beasts, or perhaps a callback could be
> registered, invoked when gc finds one of these things.  Anyone crazy enough
> to create cyclic trash with finalizers then has to take responsibility for
> breaking the cycle themself.  This puts the burden on the person creating
> the problem, and they can solve it in the way most appropriate to *their*
> specific needs.  IOW, the only people who lose under this scheme are the
> ones begging to lose, and their "loss" consists of taking responsibility.

I'm not sure if Tim is saying the same thing, but I'll write down a
concreate idea for cleaning garbage cycles.

First, a couple observations:

* Some objects can always be reliably "cleaned": lists, dicts, tuples.
  They just drop their contents, with no invocations against any of them.

  Note that an instance without a __del__ has no opinion on how it is
  cleaned.
  (this is related to Tim's point about whether a cycle has a finalizer)

* The other objects may need to *use* their referenced objects in some way
  to clean out cycles.

Since the second set of objects (possibly) need more care during their
cleanup, we must concentrate on how to solve their problem.

Back up a step: to determine where an object falls, let's define a
tp_clean type slot. It returns an integer and takes one parameter: an
operation integer.

    Py_TPCLEAN_CARE_CHECK      /* check whether care is needed */
    Py_TPCLEAN_CARE_EXEC       /* perform the careful cleaning */
    Py_TPCLEAN_EXEC            /* perform a non-careful cleaning */

Given a set of objects that require special cleaning mechanisms, there is
no way to tell where to start first. So... just pick the first one. Call
its tp_clean type slot with CARE_EXEC. For instances, this maps to
__clean__. If the instance does not have a __clean__, then tp_clean
returns FALSE meaning that it could not clean this object. The algorithm
moves on to the next object in the set.

If tp_clean returns TRUE, then the object has been "cleaned" and is moved
to the "no special care needed" list of objects, awaiting its reference
count to hit zero.

Note that objects in the "care" and "no care" lists may disappear during
the careful-cleaning process.

If the careful-cleaning algorithm hits the end of the careful set of
objects and the set is non-empty, then throw an exception:
GCImpossibleError. The objects in this set each said they could not be
cleaned carefully AND they were not dealloc'd during other objects'
cleaning.

[ it could be possible to define a *dynamic* CARE_EXEC that will succeed
  if you call it during a second pass; I'm not sure this is a Good Thing
  to allow, however. ]

This also implies that a developer should almost *always* consider writing
a __clean__ method whenever they write a __del__ method. That method MAY
be called when cycles need to be broken; the object should delete any
non-essential variables in such a way that integrity is retained (e.g. it
fails gracefully when methods are called and __del__ won't raise an
error). For example, __clean__ could call a self.close() to shut down its
operation. Whatever... you get the idea.

At the end of the iteration of the "care" set, then you may have objects
remaining in the "no care" set. By definition, these objects don't care
about their internal references to other objects (they don't need them
during deallocation). We iterate over this set, calling tp_clean(EXEC).
For lists, dicts, and tuples, the tp_clean(EXEC) call simply clears out
the references to other objects (but does not dealloc the object!). Again:
objects in the "no care" set will go away during this process. By the end
of the iteration over the "no care" set, it should be empty.

[ note: the iterations over these sets should probably INCREF/DECREF
  across the calls; otherwise, the object could be dealloc'd during the
  tp_clean call. ]

[ if the set is NOT empty, then tp_clean(EXEC) did not remove all possible
  references to other objects; not sure what this means. is it an error?
  maybe you just force a tp_dealloc on the remaining objects. ]

Note that the tp_clean mechanism could probably be used during the Python
finalization, where Python does a bunch of special-casing to clean up
modules. Specifically: a module does not care about its contents during
its deallocation, so it is a "no care" object; it responds to
tp_clean(EXEC) by clearing its dictionary. Class objects are similar: they
can clear their dict (which contains a module reference which usually
causes a loop) during tp_clean(EXEC). Module cleanup is easy once objects
with CARE_CHECK have been handled -- all that funny logic in there is to
deal with "care" objects.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tim_one@email.msn.com  Sat Mar  4 03:26:54 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Fri, 3 Mar 2000 22:26:54 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.LNX.4.10.10003031812160.14301-100000@nebula.lyra.org>
Message-ID: <000401bf8589$7d1364e0$c6a0143f@tim>

[Tim]
> Note, though, that there is NO good answer to finalizers in cycles!  The

[Greg Stein]
> "Note" ?? Not just a note, but I'd say an axiom :-)

An axiom is accepted without proof:  we have plenty of proof that there's no
thoroughly good answer (i.e., every language that has ever addressed this
issue -- along with every language that ever will <wink>).

> By definition, you have two objects referring to each other in some way.
> How can you *definitely* know how to break the link between them? Do you
> call A's finalizer or B's first? If they're instances, do you just whack
> their __dict__ and hope for the best?

Exactly.  The *programmer* may know the right thing to do, but the Python
implementation can't possibly know.  Facing both facts squarely constrains
the possibilities to the only ones that are all of understandable,
predictable and useful.  Cycles with finalizers must be a Magic-Free Zone
else you lose at least one of those three:  even Guido's kung fu isn't
strong enough to outguess this.

[a nice implementation sketch, of what seems an overly elaborate scheme,
 if you believe cycles with finalizers are rare in intelligently designed
 code)
]

Provided Guido stays interested in this, he'll make his own fun.  I'm just
inviting him to move in a sane direction <0.9 wink>.

One caution:

> ...
> If the careful-cleaning algorithm hits the end of the careful set of
> objects and the set is non-empty, then throw an exception:
> GCImpossibleError.

Since gc "can happen at any time", this is very severe (c.f. Guido's
objection to making resurrection illegal).  Hand a trash cycle back to the
programmer instead, via callback or request or whatever, and it's all
explicit without more cruft in the implementation.  It's alive again when
they get it back, and they can do anything they want with it (including
resurrecting it, or dropping it again, or breaking cycles -- anything).  I'd
focus on the cycles themselves, not on the types of objects involved.  I'm
not pretending to address the "order of finalization at shutdown" question,
though (although I'd agree they're deeply related:  how do you follow a
topological sort when there *isn't* one?  well, you don't, because you
can't).

realistically y'rs  - tim


From gstein@lyra.org  Sat Mar  4 08:43:45 2000
From: gstein@lyra.org (Greg Stein)
Date: Sat, 4 Mar 2000 00:43:45 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <000401bf8589$7d1364e0$c6a0143f@tim>
Message-ID: <Pine.LNX.4.10.10003040000260.14301-100000@nebula.lyra.org>

On Fri, 3 Mar 2000, Tim Peters wrote:
>...
> [a nice implementation sketch, of what seems an overly elaborate scheme,
>  if you believe cycles with finalizers are rare in intelligently designed
>  code)
> ]

Nah. Quite simple to code up, but a bit longer to explain in English :-)

The hardest part is finding the cycles, but Guido already posted a long
explanation about that. Once that spits out the doubly-linked list of
objects, then you're set.

1) scan the list calling tp_clean(CARE_CHECK), shoving "care needed"
   objects to a second list
2) scan the care-needed list calling tp_clean(CARE_EXEC). if TRUE is
   returned, then the object was cleaned and moves to the "no care" list.
3) assert len(care-needed list) == 0
4) scan the no-care list calling tp_clean(EXEC)
5) (questionable) assert len(no-care list) == 0

The background makes it longer. The short description of the algorithm is
easy. Step (1) could probably be merged right into one of the scans in the
GC algorithm (e.g. during the placement into the "these are cyclical
garbage" list)

> Provided Guido stays interested in this, he'll make his own fun.  I'm just
> inviting him to move in a sane direction <0.9 wink>.

hehe... Agreed.

> One caution:
> 
> > ...
> > If the careful-cleaning algorithm hits the end of the careful set of
> > objects and the set is non-empty, then throw an exception:
> > GCImpossibleError.
> 
> Since gc "can happen at any time", this is very severe (c.f. Guido's
> objection to making resurrection illegal).

GCImpossibleError would simply be a subclass of MemoryError. Makes sense
to me, and definitely allows for its "spontaneity."

> Hand a trash cycle back to the
> programmer instead, via callback or request or whatever, and it's all
> explicit without more cruft in the implementation.  It's alive again when
> they get it back, and they can do anything they want with it (including
> resurrecting it, or dropping it again, or breaking cycles -- anything).  I'd
> focus on the cycles themselves, not on the types of objects involved.  I'm
> not pretending to address the "order of finalization at shutdown" question,
> though (although I'd agree they're deeply related:  how do you follow a
> topological sort when there *isn't* one?  well, you don't, because you
> can't).

I disagree. I don't think a Python-level function is going to have a very
good idea of what to do. IMO, this kind of semantics belong down in the
interpreter with a specific, documented algorithm. Throwing it out to
Python won't help -- that function will still have to use a "standard
pattern" for getting the cyclical objects to toss themselves. I think that
standard pattern should be a language definition. Without a standard
pattern, then you're saying the application will know what to do, but that
is kind of weird -- what happens when an unexpected cycle arrives?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From Moshe Zadka <mzadka@geocities.com>  Sat Mar  4 09:50:19 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 4 Mar 2000 11:50:19 +0200 (IST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib
 ConfigParser.py,1.16,1.17
In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003041145580.1138-100000@sundial>

On Fri, 3 Mar 2000, Jeremy Hylton wrote:

> It sounds like ConfigParser code and docs could use a general cleanup.
> Are there any other issues to take care of as part of that cleanup?

One thing that bothered me once:

I want to be able to have something like:

[section]
tag = 1
tag = 2

And be able to retrieve ("section", "tag") -> ["1", "2"].
Can be awfully useful for things that make sense several time. 
Perhaps there should be two functions, one that reads a single-tag and
one that reads a multi-tag?

File format: I'm sure I'm going to get yelled at, but why don't we 
make it XML? Hard to edit, yadda, yadda, but you can easily write a
special purpose widget to edit XConfig (that's what we'll call the DTD)
files.

hopefull-yet-not-naive-ly y'rs, Z.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From gstein@lyra.org  Sat Mar  4 10:05:15 2000
From: gstein@lyra.org (Greg Stein)
Date: Sat, 4 Mar 2000 02:05:15 -0800 (PST)
Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.GSO.4.10.10003041145580.1138-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003040201540.14301-100000@nebula.lyra.org>

On Sat, 4 Mar 2000, Moshe Zadka wrote:
> On Fri, 3 Mar 2000, Jeremy Hylton wrote:
> > It sounds like ConfigParser code and docs could use a general cleanup.
> > Are there any other issues to take care of as part of that cleanup?
> 
> One thing that bothered me once:
> 
> I want to be able to have something like:
> 
> [section]
> tag = 1
> tag = 2
> 
> And be able to retrieve ("section", "tag") -> ["1", "2"].
> Can be awfully useful for things that make sense several time. 
> Perhaps there should be two functions, one that reads a single-tag and
> one that reads a multi-tag?

Structured values would be nice. Several times, I've needed to decompose
the right hand side into lists.

> File format: I'm sure I'm going to get yelled at, but why don't we 
> make it XML? Hard to edit, yadda, yadda, but you can easily write a
> special purpose widget to edit XConfig (that's what we'll call the DTD)
> files.

Write a whole new module. ConfigParser is for files that look like the
above.

There isn't a reason to NOT use XML, but it shouldn't go into
ConfigParser.

<IMO>
I find the above style much easier for *humans*, than an XML file, to
specify options. XML is good for computers; not so good for humans.
</IMO>

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From Moshe Zadka <mzadka@geocities.com>  Sat Mar  4 10:46:40 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 4 Mar 2000 12:46:40 +0200 (IST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim>
Message-ID: <Pine.GSO.4.10.10003041242100.1138-100000@sundial>

[Tim Peters]
> ...If a trash cycle
> contains a finalizer (my, but that has to be rare. in practice, in
> well-designed code!), 

This shows something Tim himself has often said -- he never programmed a
GUI. It's very hard to build a GUI (especially with Tkinter) which is
cycle-less, but the classes implementing the GUI often have __del__'s
to break system-allocated resources.

So, it's not as rare as we would like to believe, which is the reason
I haven't given this answer.

which-is-not-the-same-thing-as-disagreeing-with-it-ly y'rs, Z.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From Moshe Zadka <mzadka@geocities.com>  Sat Mar  4 11:16:19 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 4 Mar 2000 13:16:19 +0200 (IST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.LNX.4.10.10003040000260.14301-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003041313270.1138-100000@sundial>

On Sat, 4 Mar 2000, Greg Stein wrote:

> I disagree. I don't think a Python-level function is going to have a very
> good idea of what to do
<snip>

Much better then the Python interpreter...

<snip>
> Throwing it out to Python won't help
<snip>
> what happens when an unexpected cycle arrives?

Don't delete it.
It's as simple as that, since it's a bug.

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From Moshe Zadka <mzadka@geocities.com>  Sat Mar  4 11:29:33 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 4 Mar 2000 13:29:33 +0200 (IST)
Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.LNX.4.10.10003040201540.14301-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003041317140.1138-100000@sundial>

On Sat, 4 Mar 2000, Greg Stein wrote:

> Write a whole new module. ConfigParser is for files that look like the
> above.

Gotcha.

One problem: two configurations modules might cause the classic "which
should I use?" confusion.

> <IMO>
> I find the above style much easier for *humans*, than an XML file, to
> specify options. XML is good for computers; not so good for humans.
> </IMO>

Of course: what human could delimit his text with <tag> and </tag>?

oh-no-another-c.l.py-bot-ly y'rs, Z.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From gstein@lyra.org  Sat Mar  4 11:38:46 2000
From: gstein@lyra.org (Greg Stein)
Date: Sat, 4 Mar 2000 03:38:46 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.GSO.4.10.10003041313270.1138-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003040329370.14301-100000@nebula.lyra.org>

On Sat, 4 Mar 2000, Moshe Zadka wrote:
> On Sat, 4 Mar 2000, Greg Stein wrote:
> > I disagree. I don't think a Python-level function is going to have a very
> > good idea of what to do
> <snip>
> 
> Much better then the Python interpreter...

If your function receives two instances (A and B), what are you going to
do? How can you know what their policy is for cleaning up in the face of a
cycle?

I maintain that you would call the equivalent of my proposed __clean__.
There isn't much else you'd be able to do, unless you had a completely
closed system, you expected cycles between specific types of objects, and
you knew a way to clean them up. Even then, you would still be calling
something like __clean__ to let the objects do whatever they needed.

I'm suggesting that __clean__ should be formalized (as part of tp_clean).
Throwing the handling "up to Python" isn't going to do much for you.

Seriously... I'm all for coding more stuff in Python rather than C, but
this just doesn't feel right. Getting the objects GC'd is a language
feature, and a specific pattern/method/recommendation is best formulated
as an interpreter mechanism.

> <snip>
> > Throwing it out to Python won't help
> <snip>
> > what happens when an unexpected cycle arrives?
> 
> Don't delete it.
> It's as simple as that, since it's a bug.

The point behind this stuff is to get rid of it, rather than let it linger
on. If the objects have finalizers (which is how we get to this step!),
then it typically means there is a resource they must release. Getting the
object cleaned and dealloc'd becomes quite important.

Cheers,
-g

p.s. did you send in a patch for the instance_contains() thing yet?

-- 
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Sat Mar  4 11:43:12 2000
From: gstein@lyra.org (Greg Stein)
Date: Sat, 4 Mar 2000 03:43:12 -0800 (PST)
Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.GSO.4.10.10003041317140.1138-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003040338510.14301-100000@nebula.lyra.org>

On Sat, 4 Mar 2000, Moshe Zadka wrote:
> On Sat, 4 Mar 2000, Greg Stein wrote:
> > Write a whole new module. ConfigParser is for files that look like the
> > above.
> 
> Gotcha.
> 
> One problem: two configurations modules might cause the classic "which
> should I use?" confusion.

Nah. They wouldn't *both* be called ConfigParser. And besides, I see the
XML format more as a persistence mechanism rather than a configuration
mechanism. I'd call the module something like "XMLPersist".

> > <IMO>
> > I find the above style much easier for *humans*, than an XML file, to
> > specify options. XML is good for computers; not so good for humans.
> > </IMO>
> 
> Of course: what human could delimit his text with <tag> and </tag>?

Feh. As a communciation mechanism, dropping in that stuff... it's easy.

<appository>But</appository><comma/><noun>I</noun>
<verb><tense>would<modifier>not</modifier></tense>want</verb> ... bleck.

I wouldn't want to use XML for configuration stuff. It just gets ugly.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gvwilson@nevex.com  Sat Mar  4 16:46:24 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Sat, 4 Mar 2000 11:46:24 -0500 (EST)
Subject: [Python-Dev] HTMLgen-style interface to SQL?
Message-ID: <Pine.LNX.4.10.10003041145580.21381-100000@akbar.nevex.com>

[short form]

I'm looking for an object-oriented toolkit that will do for SQL what
Perl's CGI.pm module, or Python's HTMLgen, does for HTML.  Pointers,
examples, or expressions of interest would be welcome.

[long form]

Lincoln Stein's CGI.pm module for Perl allows me to build HTML in an
object-oriented way, instead of getting caught in the Turing tarpit of
string substitution and printf. DOM does the same (in a variety of
languages) for XML.

Right now, if I want to interact with an SQL database from Perl or Python,
I have to embed SQL strings in my programs. I would like to have a
DOM-like ability to build and manipulate queries as objects, then call a
method that translate the query structure into SQL to send to the
database. Alternatively, if there is an XML DTD for SQL (how's that for a
chain of TLAs?), and some tool to convert the XML/SQL to pure SQL, so that
I could build my query using DOM, that would be cool too.

RSVP,

Greg Wilson
gvwilson@nevex.com


From Moshe Zadka <mzadka@geocities.com>  Sat Mar  4 18:02:54 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 4 Mar 2000 20:02:54 +0200 (IST)
Subject: [Python-Dev] Re: [Patches] selfnanny.py: checking for "self" in every method
In-Reply-To: <200003041724.MAA05053@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003041955560.4094-100000@sundial>

On Sat, 4 Mar 2000, Guido van Rossum wrote:

> Before we all start writing nannies and checkers, how about a standard
> API design first?  

I thoroughly agree -- we should have a standard API. I tried to write 
selfnanny so it could be callable from any API possible (e.g., it can
take either a file, a string, an ast or a tuple representation)

> I will want to call various nannies from a "Check"
> command that I plan to add to IDLE.  

Very cool: what I imagine is a sort of modular PyLint.

> I already did this with tabnanny,
> and found that it's barely possible -- it's really written to run like
> a script.

Mine definitely isn't: it's designed to run both like a script and like
a module. One outstanding bug: no docos. To be supplied upon request <0.5
wink>. I just wanted to float it out and see if people think that this
particular nanny is worth while.

> Since parsing is expensive, we probably want to share the parse tree.

Yes. Probably as an AST, and transform to tuples/lists inside the
checkers.

> Ideas?

Here's a strawman API:
There's a package called Nanny
Every module in that package should have a function called check_ast.
It's argument is an AST object, and it's output should be a list 
of three-tuples: (line-number, error-message, None) or 
(line-number, error-message, (column-begin, column-end)) (each tuple can
be a different form). 

Problems?
(I'm CCing to python-dev. Please follow up to that discussion to
python-dev only, as I don't believe it belongs in patches)
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From gvwilson@nevex.com  Sat Mar  4 18:26:20 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Sat, 4 Mar 2000 13:26:20 -0500 (EST)
Subject: [Python-Dev] Re: selfnanny.py / nanny architecture
In-Reply-To: <Pine.GSO.4.10.10003041955560.4094-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003041312320.21722-100000@akbar.nevex.com>

> > Guido van Rossum wrote:
> > Before we all start writing nannies and checkers, how about a standard
> > API design first?  

> Moshe Zadka wrote:
> Here's a strawman API:
> There's a package called Nanny
> Every module in that package should have a function called check_ast.
> It's argument is an AST object, and it's output should be a list 
> of three-tuples: (line-number, error-message, None) or 
> (line-number, error-message, (column-begin, column-end)) (each tuple can
> be a different form). 

Greg Wilson wrote:

The SUIF (Stanford University Intermediate Format) group has been working
on an extensible compiler framework for about ten years now.  The
framework is based on an extensible AST spec; anyone can plug in a new
analysis or optimization algorithm by writing one or more modules that
read and write decorated ASTs. (See http://suif.stanford.edu for more
information.)

Based on their experience, I'd suggest that every nanny take an AST as an
argument, and add complaints in place as decorations to the nodes.  A
terminal nanny could then collect these and display them to the user. I
think this architecture will make it simpler to write meta-nannies.

I'd further suggest that the AST be something that can be manipulated
through DOM, since (a) it's designed for tree-crunching, (b) it's already
documented reasonably well, (c) it'll save us re-inventing a wheel, and
(d) generating human-readable output in a variety of customizable formats
ought to be simple (well, simpler than the alternatives).

Greg


From jeremy@cnri.reston.va.us  Sun Mar  5 02:10:28 2000
From: jeremy@cnri.reston.va.us (Jeremy Hylton)
Date: Sat, 4 Mar 2000 21:10:28 -0500 (EST)
Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.GSO.4.10.10003041317140.1138-100000@sundial>
References: <Pine.LNX.4.10.10003040201540.14301-100000@nebula.lyra.org>
 <Pine.GSO.4.10.10003041317140.1138-100000@sundial>
Message-ID: <14529.49684.219826.466310@bitdiddle.cnri.reston.va.us>

>>>>> "MZ" == Moshe Zadka <moshez@math.huji.ac.il> writes: 
 
  MZ> On Sat, 4 Mar 2000, Greg Stein wrote: 
  >> Write a whole new module. ConfigParser is for files that look 
  >> like the above. 
 
  MZ> Gotcha. 
 
  MZ> One problem: two configurations modules might cause the classic 
  MZ> "which should I use?" confusion. 
 
I don't think this is a hard decision to make.  ConfigParser is good 
for simple config files that are going to be maintained by humans with

a text editor. 
 
An XML-based configuration file is probably the right solution when 
humans aren't going to maintain the config files by hand.  Perhaps XML
will eventually be the right solution in both cases, but only if XML 
editors are widely available. 
 
  >> <IMO> I find the above style much easier for *humans*, than an 
  >> XML file, to specify options. XML is good for computers; not so 
  >> good for humans.  </IMO> 
 
  MZ> Of course: what human could delimit his text with <tag> and 
  MZ> </tag>? 
 
Could?  I'm sure there are more ways on Linux and Windows to mark up
text than are dreamt of in your philosophy, Moshe <wink>.  The
question is what is easiest to read and understand?

Jeremy


From tim_one@email.msn.com  Sun Mar  5 02:22:16 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 21:22:16 -0500
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" in every method
In-Reply-To: <200003041724.MAA05053@eric.cnri.reston.va.us>
Message-ID: <000201bf8649$a17383e0$f42d153f@tim>

[Guido van Rossum]
> Before we all start writing nannies and checkers, how about a standard
> API design first?  I will want to call various nannies from a "Check"
> command that I plan to add to IDLE.  I already did this with tabnanny,
> and found that it's barely possible -- it's really written to run like
> a script.

I like Moshe's suggestion fine, except with an abstract base class named
Nanny with a virtual method named check_ast.  Nannies should (of course)
derive from that.

> Since parsing is expensive, we probably want to share the parse tree.

What parse tree?  Python's parser module produces an AST not nearly "A
enough" for reasonably productive nanny writing.  GregS & BillT have
improved on that, but it's not in the std distrib.  Other "problems" include
the lack of original source lines in the trees, and lack of column-number
info.

Note that by the time Python has produced a parse tree, all evidence of the
very thing tabnanny is looking for has been removed.  That's why she used
the tokenize module to begin with.

God knows tokenize is too funky to use too when life gets harder (check out
checkappend.py's tokeneater state machine for a preliminary taste of that).

So the *only* solution is to adopt Christian's Stackless so I can rewrite
tokenize as a coroutine like God intended <wink>.

Seriously, I don't know of anything that produces a reasonably usable (for
nannies) parse tree now, except via modifying a Python grammar for use with
John Aycock's SPARK; the latter also comes with very pleasant & powerful
tree pattern-matching abilities.  But it's probably too slow for everyday
"just folks" use.  Grabbing the GregS/BillT enhancement is probably the most
practical thing we could build on right now (but tabnanny will have to
remain a special case).

unsure-about-the-state-of-simpleparse-on-mxtexttools-for-this-ly y'rs  - tim


From tim_one@email.msn.com  Sun Mar  5 03:24:18 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 22:24:18 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <38BE1B69.E0B88B41@lemburg.com>
Message-ID: <000301bf8652$4aadaf00$f42d153f@tim>

Just noting that two instances of this were found in Zope.

[/F]
>     append = list.append
>     for x in something:
>         append(...)

[Tim]
> As detailed in a c.l.py posting, I have yet to find a single instance of
> this actually called with multiple arguments.  Pointing out that it's
> *possible* isn't the same as demonstrating it's an actual problem.  I'm
> quite willing to believe that it is, but haven't yet seen evidence of it.


From fdrake@acm.org  Sun Mar  5 03:55:27 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Sat, 4 Mar 2000 22:55:27 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us>
 <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>
 <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
Message-ID: <14529.55983.263225.691427@weyr.cnri.reston.va.us>

Jeremy Hylton writes:
 > Thanks for catching that.  I didn't look at the context.  I'm going to
 > wait, though, until I talk to Fred to mess with the code any more.

  I did it that way since the .ini format allows comments after values 
(the ';' comments after a '=' vi; '#' comments are a ConfigParser
thing), but there's no equivalent concept for RFC822 parsing, other
than '(...)' in addresses.  The code was trying to allow what was
expected from the .ini crowd without breaking the "native" use of
ConfigParser.

 > General question for python-dev readers: What are your experiences
 > with ConfigParser?  I just used it to build a simple config parser for
 > IDLE and found it hard to use for several reasons.  The biggest
 > problem was that the file format is undocumented.  I also found it
 > clumsy to have to specify section and option arguments. I ended up
 > writing a proxy that specializes on section so that get takes only an
 > option argument.
 > 
 > It sounds like ConfigParser code and docs could use a general cleanup.
 > Are there any other issues to take care of as part of that cleanup?

  I agree that the API to ConfigParser sucks, and I think also that
the use of it as a general solution is a big mistake.  It's a messy
bit of code that doesn't need to be, supports a really nasty mix of
syntaxes, and can easily bite users who think they're getting
something .ini-like (the magic names and interpolation is a bad
idea!).  While it suited the original application well enough,
something with .ini syntax and interpolation from a subclass would
have been *much* better.
  I think we should create a new module, inilib, that implements
exactly .ini syntax in a base class that can be intelligently
extended.  ConfigParser should be deprecated.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From tim_one@email.msn.com  Sun Mar  5 04:11:12 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 23:11:12 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: <200003031705.MAA21700@eric.cnri.reston.va.us>
Message-ID: <000601bf8658$d81d34e0$f42d153f@tim>

[Guido]
> OK, so we're down to this one point: if __del__ resurrects the object,
> should __del__ be called again later?  Additionally, should
> resurrection be made illegal?

I give up on the latter, so it really is just one.

> I can easily see how __del__ could *accidentally* resurrect the object
> as part of its normal cleanup ...
> In this example, the helper routine will eventually delete the object
> from its cache, at which point it is truly deleted.  It would be
> harmful, not helpful, if __del__ was called again at this point.

If this is something that happens easily, and current behavior is harmful,
don't you think someone would have complained about it by now?  That is,
__del__ *is* "called again at this point" now, and has been for years &
years.  And if it happens easily, it *is* happening now, and in an unknown
amount of existing code.  (BTW, I doubt it happens at all <wink> -- people
tend to write very simple __del__ methods, so far as I've ever seen)

> Now, it is true that the current docs for __del__ imply that
> resurrection is possible.

"imply" is too weak.  The Reference Manual's "3.3.1 Basic customization"
flat-out says it's possible ("though not recommended").  The precise meaning
of the word "may" in the following sentence is open to debate, though.

> The intention of that note was to warn __del__ writers that in the case
> of accidental resurrection

Sorry, but I can't buy this:  saying that *accidents* are "not recommended"
is just too much of a stretch <wink/frown>.

> __del__ might be called again.

That's a plausible reading of the following "may", but not the only one.  I
believe it's the one you intended, but it's not the meaning I took prior to
this.

> The intention certainly wasn't to allow or encourage intentional
resurrection.

Well, I think it plainly says it's supported ("though not recommended").  I
used it intentionally at KSR, and even recommended it on c.l.py in the dim
past (in one of those "dark & useless" threads <wink>).

> Would there really be someone out there who uses *intentional*
> resurrection?  I severely doubt it.  I've never heard of this.

Why would anyone tell you about something that *works*?!  You rarely hear
the good stuff, you know.  I gave the typical pattern in the preceding msg.
To flesh out the motivation more, you have some external resource that's
very expensive to set up (in KSR's case, it was an IPC connection to a
remote machine).  Rights to use that resource are handed out in the form of
an object.  When a client is done using the resource, they *should*
explicitly use the object's .release() method, but you can't rely on that.
So the object's __del__ method looks like (for example):

def __del__(self):

    # Code not shown to figure out whether to disconnect:  the downside to
    # disconnecting is that it can cost a bundle to create a new connection.
    # If the whole app is shutting down, then of course we want to
disconnect.
    # Or if a timestamp trace shows that we haven't been making good use of
    # all the open connections lately, we may want to disconnect too.

    if decided_to_disconnect:
        self.external_resource.disconnect()
    else:
        # keep the connection alive for reuse
        global_available_connection_objects.append(self)

This is simple & effective, and it relies on both intentional resurrection
and __del__ getting called repeatedly.  I don't claim there's no other way
to write it, just that there's *been* no problem doing this for a millennium
<wink>.

Note that MAL spontaneously sketched similar examples, although I can't say
whether he's actually done stuff like this.


Going back up a level, in another msg you finally admitted <wink> that you
want "__del__ called only once" for the same reason Java wants it:  because
gc has no idea what to do when faced with finalizers in a trash cycle, and
settles for an unprincipled scheme whose primary virtue is that "it doesn't
blow up" -- and "__del__ called only once" happens to be convenient for that
scheme.

But toss such cycles back to the user to deal with at the Python level, and
all those problems go away (along with the artificial need to change
__del__).  The user can break the cycles in an order that makes sense to the
app (or they can let 'em leak!  up to them).

    >>> print gc.get_cycle.__doc__
    Return a list of objects comprising a single garbage cycle; [] if none.

    At least one of the objects has a finalizer, so Python can't determine
the
    intended order of destruction.  If you don't break the cycle, Python
will
    neither run any finalizers for the contained objects nor reclaim their
    memory.  If you do break the cycle, and dispose of the list, Python will
    follow its normal reference-counting rules for running finalizers and
    reclaiming memory.

That this "won't blow up" either is just the least of its virtues <wink>.

you-break-it-you-buy-it-ly y'rs  - tim


From tim_one@email.msn.com  Sun Mar  5 04:56:54 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 23:56:54 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.LNX.4.10.10003040000260.14301-100000@nebula.lyra.org>
Message-ID: <000001bf865f$3acb99a0$432d153f@tim>

[Tim sez "toss insane cycles back on the user"]

[Greg Stein]
> I disagree. I don't think a Python-level function is going to have a very
> good idea of what to do.

You've already assumed that Python coders know exactly what to do, else they
couldn't have coded the new __clean__ method your proposal relies on.  I'm
taking what strikes me as the best part of Scheme's Guardian idea:  don't
assume *anything* about what users "should" do to clean up their trash.
Leave it up to them:  their problem, their solution.  I think finalizers in
trash cycles should be so rare in well-written code that it's just not worth
adding much of anything in the implementation to cater to it.

> IMO, this kind of semantics belong down in the interpreter with a
> specific, documented algorithm. Throwing it out to Python won't help
> -- that function will still have to use a "standard pattern" for getting
> the cyclical objects to toss themselves.

They can use any pattern they want, and if the pattern doesn't *need* to be
coded in C as part of the implementation, it shouldn't be.

> I think that standard pattern should be a language definition.

I distrust our ability to foresee everything users may need over the next 10
years:  how can we know today that the first std pattern you dreamed up off
the top of your head is the best approach to an unbounded number of problems
we haven't yet seen a one of <wink>?

> Without a standard pattern, then you're saying the application will know
> what to do, but that is kind of weird -- what happens when an unexpected
> cycle arrives?

With the hypothetical gc.get_cycle() function I mentioned before, they
should inspect objects in the list they get back, and if they find they
don't know what to do with them, they can still do anything <wink> they
want.  Examples include raising an exception, dialing my home pager at 3am
to insist I come in to look at it, or simply let the list go away (at which
point the objects in the list will again become a trash cycle containing a
finalizer).

If several distinct third-party modules get into this act, I *can* see where
it could become a mess.  That's why Scheme "guardians" is plural:  a given
module could register its "problem objects" in advance with a specific
guardian of its own, and query only that guardian later for things ready to
die.  This probably can't be implemented in Python, though, without support
for weak references (or lots of brittle assumptions about specific refcount
values).

agreeably-disagreeing-ly y'rs  - tim


From tim_one@email.msn.com  Sun Mar  5 04:56:58 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 23:56:58 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.GSO.4.10.10003041242100.1138-100000@sundial>
Message-ID: <000101bf865f$3cb0d460$432d153f@tim>

[Tim]
> ...If a trash cycle contains a finalizer (my, but that has to be rare.
> in practice, in well-designed code!),

[Moshe Zadka]
> This shows something Tim himself has often said -- he never programmed a
> GUI. It's very hard to build a GUI (especially with Tkinter) which is
> cycle-less, but the classes implementing the GUI often have __del__'s
> to break system-allocated resources.
>
> So, it's not as rare as we would like to believe, which is the reason
> I haven't given this answer.

I wrote Cyclops.py when trying to track down leaks in IDLE.  The
extraordinary thing we discovered is that "even real gc" would not have
reclaimed the cycles.  They were legitimately reachable, because, indeed,
"everything points to everything else".  Guido fixed almost all of them by
explicitly calling new "close" methods.  I believe IDLE has no __del__
methods at all now.  Tkinter.py currently contains two.

so-they-contained-__del__-but-weren't-trash-ly y'rs  - tim


From tim_one@email.msn.com  Sun Mar  5 06:05:24 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Sun, 5 Mar 2000 01:05:24 -0500
Subject: [Python-Dev] Unicode mapping tables
In-Reply-To: <38BCD71C.3592E6A@lemburg.com>
Message-ID: <000601bf8668$cbbdd640$432d153f@tim>

[M.-A. Lemburg]
> ...
> Here's what I'll do:
>
> * implement .capitalize() in the traditional way for Unicode
>   objects (simply convert the first char to uppercase)

Given .title(), is .capitalize() of use for Unicode strings?  Or is it just
a temptation to do something senseless in the Unicode world?  If it doesn't
make sense, leave it out (this *seems* like compulsion <wink> to implement
all current string methods in *some* way for Unicode, whether or not they
make sense).


From Moshe Zadka <mzadka@geocities.com>  Sun Mar  5 06:16:22 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sun, 5 Mar 2000 08:16:22 +0200 (IST)
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" in
 every method
In-Reply-To: <000201bf8649$a17383e0$f42d153f@tim>
Message-ID: <Pine.GSO.4.10.10003050800540.7148-100000@sundial>

On Sat, 4 Mar 2000, Tim Peters wrote:

> I like Moshe's suggestion fine, except with an abstract base class named
> Nanny with a virtual method named check_ast.  Nannies should (of course)
> derive from that.

Why? The C++ you're programming damaged your common sense cycles?

> > Since parsing is expensive, we probably want to share the parse tree.
> 
> What parse tree?  Python's parser module produces an AST not nearly "A
> enough" for reasonably productive nanny writing.

As a note, selfnanny uses the parser module AST.

>  GregS & BillT have
> improved on that, but it's not in the std distrib.  Other "problems" include
> the lack of original source lines in the trees,

The parser module has source lines.

> and lack of column-number info.

Yes, that sucks.

> Note that by the time Python has produced a parse tree, all evidence of the
> very thing tabnanny is looking for has been removed.  That's why she used
> the tokenize module to begin with.

Well, it's one of the few nannies which would be in that position.

> God knows tokenize is too funky to use too when life gets harder (check out
> checkappend.py's tokeneater state machine for a preliminary taste of that).

Why doesn't checkappend.py uses the parser module?

> Grabbing the GregS/BillT enhancement is probably the most
> practical thing we could build on right now 

You got some pointers?

> (but tabnanny will have to remain a special case).

tim-will-always-be-a-special-case-in-our-hearts-ly y'rs, Z.

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From tim_one@email.msn.com  Sun Mar  5 07:01:12 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Sun, 5 Mar 2000 02:01:12 -0500
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method
In-Reply-To: <Pine.GSO.4.10.10003050800540.7148-100000@sundial>
Message-ID: <000901bf8670$97d8f320$432d153f@tim>

[Tim]
>> [make Nanny a base class]

[Moshe Zadka]
> Why?

Because it's an obvious application for OO design.  A common base class
formalizes the interface and can provide useful utilities for subclasses.

> The C++ you're programming damaged your common sense cycles?

Yes, very, but that isn't relevant here <wink>.  It's good Python sense too.

>> [parser module produces trees far too concrete for comfort]

> As a note, selfnanny uses the parser module AST.

Understood, but selfnanny has a relatively trivial task.  Hassling with
tuples nested dozens deep for even relatively simple stmts is both a PITA
and a time sink.

>> [parser doesn't give source lines]

> The parser module has source lines.

No, it does not (it only returns terminals, as isolated strings).  The
tokenize module does deliver original source lines in their entirety (as
well as terminals, as isolated strings; and column numbers).

>> and lack of column-number info.

> Yes, that sucks.

> ...
> Why doesn't checkappend.py uses the parser module?

Because it wanted to display the acutal source line containing an offending
"append" (which, again, the parse module does not supply).  Besides, it was
a trivial variation on tabnanny.py, of which I have approximately 300 copies
on my disk <wink>.

>> Grabbing the GregS/BillT enhancement is probably the most
>> practical thing we could build on right now

> You got some pointers?

Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab
transformer.py from the  zip file.  The latter supplies a very useful
post-processing pass over the parse module's output, squashing it *way*
down.


From Moshe Zadka <mzadka@geocities.com>  Sun Mar  5 07:08:41 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sun, 5 Mar 2000 09:08:41 +0200 (IST)
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self"
 inevery method
In-Reply-To: <000901bf8670$97d8f320$432d153f@tim>
Message-ID: <Pine.GSO.4.10.10003050906030.7148-100000@sundial>

On Sun, 5 Mar 2000, Tim Peters wrote:

> [Tim]
> >> [make Nanny a base class]
> 
> [Moshe Zadka]
> > Why?
> 
> Because it's an obvious application for OO design.  A common base class
> formalizes the interface and can provide useful utilities for subclasses.

The interface is just one function. You're welcome to have a do-nothing
nanny that people *can* derive from: I see no point in making them derive
from a base class.

> > As a note, selfnanny uses the parser module AST.
> 
> Understood, but selfnanny has a relatively trivial task.

That it does, and it was painful.

> >> [parser doesn't give source lines]
> 
> > The parser module has source lines.
> 
> No, it does not (it only returns terminals, as isolated strings). 

Sorry, misunderstanding: it seemed obvious to me you wanted line numbers.
For lines, use the linecache module...

> > You got some pointers?
> 
> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab
> transformer.py from the  zip file. 

I'll have a look.
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From Fredrik Lundh" <effbot@telia.com  Sun Mar  5 09:24:37 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Sun, 5 Mar 2000 10:24:37 +0100
Subject: [Python-Dev] return statements in lambda
Message-ID: <006f01bf8686$391ced80$34aab5d4@hagrid>

from "Python for Lisp Programmers":
http://www.norvig.com/python-lisp.html

> Don't forget return. Writing def twice(x): x+x is tempting
> and doesn't signal a warning or > ception, but you probably
> meant to have a return in there. This is particularly irksome
> because in a lambda you are prohibited from writing return,
> but the semantics is to do the return.=20

maybe adding an (optional but encouraged) "return"
to lambda would be an improvement?

    lambda x: x + 10

vs.

    lambda x: return x + 10

or is this just more confusing...  opinions?

</F>


From guido@python.org  Sun Mar  5 12:04:56 2000
From: guido@python.org (Guido van Rossum)
Date: Sun, 05 Mar 2000 07:04:56 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: Your message of "Sat, 04 Mar 2000 22:55:27 EST."
 <14529.55983.263225.691427@weyr.cnri.reston.va.us>
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
 <14529.55983.263225.691427@weyr.cnri.reston.va.us>
Message-ID: <200003051204.HAA05367@eric.cnri.reston.va.us>

[Fred]
>   I agree that the API to ConfigParser sucks, and I think also that
> the use of it as a general solution is a big mistake.  It's a messy
> bit of code that doesn't need to be, supports a really nasty mix of
> syntaxes, and can easily bite users who think they're getting
> something .ini-like (the magic names and interpolation is a bad
> idea!).  While it suited the original application well enough,
> something with .ini syntax and interpolation from a subclass would
> have been *much* better.
>   I think we should create a new module, inilib, that implements
> exactly .ini syntax in a base class that can be intelligently
> extended.  ConfigParser should be deprecated.

Amen.

Some thoughts:

- You could put it all in ConfigParser.py but with new classnames.
(Not sure though, since the ConfigParser class, which is really a
kind of weird variant, will be assumed to be the main class because
its name is that of the module.)

- Variants on the syntax could be given through some kind of option
system rather than through subclassing -- they should be combinable
independently.  Som possible options (maybe I'm going overboard here)
could be:

	- comment characters: ('#', ';', both, others?)
	- comments after variables allowed? on sections?
	- variable characters: (':', '=', both, others?)
	- quoting of values with "..." allowed?
	- backslashes in "..." allowed?
	- does backslash-newline mean a continuation?
	- case sensitivity for section names (default on)
	- case sensitivity for option names (default off)
	- variables allowed before first section name?
	- first section name?  (default "main")
	- character set allowed in section names
	- character set allowed in variable names
	- %(...) substitution?

(Well maybe the whole substitution thing should really be done through
a subclass -- it's too weird for normal use.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Sun Mar  5 12:17:31 2000
From: guido@python.org (Guido van Rossum)
Date: Sun, 05 Mar 2000 07:17:31 -0500
Subject: [Python-Dev] Unicode mapping tables
In-Reply-To: Your message of "Sun, 05 Mar 2000 01:05:24 EST."
 <000601bf8668$cbbdd640$432d153f@tim>
References: <000601bf8668$cbbdd640$432d153f@tim>
Message-ID: <200003051217.HAA05395@eric.cnri.reston.va.us>

> [M.-A. Lemburg]
> > ...
> > Here's what I'll do:
> >
> > * implement .capitalize() in the traditional way for Unicode
> >   objects (simply convert the first char to uppercase)

[Tim]
> Given .title(), is .capitalize() of use for Unicode strings?  Or is it just
> a temptation to do something senseless in the Unicode world?  If it doesn't
> make sense, leave it out (this *seems* like compulsion <wink> to implement
> all current string methods in *some* way for Unicode, whether or not they
> make sense).

The intention of this is to make code that does something using
strings do exactly the same strings if those strings happen to be
Unicode strings with the same values.

The capitalize method returns self[0].upper() + self[1:] -- that may
not make sense for e.g. Japanese, but it certainly does for Russian or
Greek.

It also does this in JPython.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Sun Mar  5 12:24:41 2000
From: guido@python.org (Guido van Rossum)
Date: Sun, 05 Mar 2000 07:24:41 -0500
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method
In-Reply-To: Your message of "Sun, 05 Mar 2000 02:01:12 EST."
 <000901bf8670$97d8f320$432d153f@tim>
References: <000901bf8670$97d8f320$432d153f@tim>
Message-ID: <200003051224.HAA05410@eric.cnri.reston.va.us>

> >> [parser doesn't give source lines]
> 
> > The parser module has source lines.
> 
> No, it does not (it only returns terminals, as isolated strings).  The
> tokenize module does deliver original source lines in their entirety (as
> well as terminals, as isolated strings; and column numbers).

Moshe meant line numbers - -it has those.

> > Why doesn't checkappend.py uses the parser module?
> 
> Because it wanted to display the acutal source line containing an offending
> "append" (which, again, the parse module does not supply).  Besides, it was
> a trivial variation on tabnanny.py, of which I have approximately 300 copies
> on my disk <wink>.

Of course another argument for making things more OO.  (The code used
in tabnanny.py to process files and recursively directories fronm
sys.argv is replicated a thousand times in various scripts of mine --
Tim took it from my now-defunct takpolice.py.  This should be in the
std library somehow...)

> >> Grabbing the GregS/BillT enhancement is probably the most
> >> practical thing we could build on right now
> 
> > You got some pointers?
> 
> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab
> transformer.py from the  zip file.  The latter supplies a very useful
> post-processing pass over the parse module's output, squashing it *way*
> down.

Those of you who have seen the compiler-sig should know that Jeremy
made an improvement which will find its way into p2c.  It's currently
on display in the Python CVS tree in the nondist branch: see
http://www.python.org/pipermail/compiler-sig/2000-February/000011.html
and the ensuing thread for more details.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Sun Mar  5 13:46:13 2000
From: guido@python.org (Guido van Rossum)
Date: Sun, 05 Mar 2000 08:46:13 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: Your message of "Fri, 03 Mar 2000 22:26:54 EST."
 <000401bf8589$7d1364e0$c6a0143f@tim>
References: <000401bf8589$7d1364e0$c6a0143f@tim>
Message-ID: <200003051346.IAA05539@eric.cnri.reston.va.us>

I'm beginning to believe that handing cycles with finalizers to the
user is better than calling __del__ with a different meaning, and I
tentatively withdraw my proposal to change the rules for when __del__
is called (even when __init__ fails; I haven't had any complaints
about that either).

There seem to be two competing suggestions for solutions: (1) call
some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the
object; (2) Tim's proposal of an interface to ask the garbage
collector for a trash cycle with a finalizer (or for an object with a
finalizer in a trash cycle?).

Somehow Tim's version looks less helpful to me, because it *seems*
that whoever gets to handle the cycle (the main code of the program?)
isn't necessarily responsible for creating it (some library you didn't
even know was used under the covers of some other library you called).

Of course, it's also posssible that a trash cycle is created by code
outside the responsibility of the finalizer.

But still, I have a hard time understanding how Tim's version would be
used.  Greg or Marc-Andre's version I understand.

What keeps nagging me though is what to do when there's a finalizer
but no cleanup method.  I guess the trash cycle remains alive.  Is
this acceptable?  (I guess so, because we've given the programmer a
way to resolve the trash: provide a cleanup method.)

If we detect individual cycles (the current algorithm doesn't do that
yet, though it seems easy enough to do another scan), could we
special-case cycles with only one finalizer and no cleaner-upper?
(I'm tempted to call the finalizer because it seems little harm can be
done -- but then of course there's the problem of the finalizer being
called again when the refcount really goes to zero. :-( )

> Exactly.  The *programmer* may know the right thing to do, but the Python
> implementation can't possibly know.  Facing both facts squarely constrains
> the possibilities to the only ones that are all of understandable,
> predictable and useful.  Cycles with finalizers must be a Magic-Free Zone
> else you lose at least one of those three:  even Guido's kung fu isn't
> strong enough to outguess this.
> 
> [a nice implementation sketch, of what seems an overly elaborate scheme,
>  if you believe cycles with finalizers are rare in intelligently designed
>  code)
> ]
> 
> Provided Guido stays interested in this, he'll make his own fun.  I'm just
> inviting him to move in a sane direction <0.9 wink>.

My current tendency is to go with the basic __cleanup__ and nothing
more, calling each instance's __cleanup__ before clobbering
directories and lists -- which should break all cycles safely.

> One caution:
> 
> > ...
> > If the careful-cleaning algorithm hits the end of the careful set of
> > objects and the set is non-empty, then throw an exception:
> > GCImpossibleError.
> 
> Since gc "can happen at any time", this is very severe (c.f. Guido's
> objection to making resurrection illegal).

Not quite.  Cycle detection is presumably only called every once in a
while on memory allocation, and memory *allocation* (as opposed to
deallocation) is allowed to fail.  Of course, this will probably run
into various coding bugs where allocation failure isn't dealt with
properly, because in practice this happens so rarely...

> Hand a trash cycle back to the
> programmer instead, via callback or request or whatever, and it's all
> explicit without more cruft in the implementation.  It's alive again when
> they get it back, and they can do anything they want with it (including
> resurrecting it, or dropping it again, or breaking cycles --
> anything).

That was the idea with calling the finalizer too: it would be called
between INCREF/DECREF, so the object would be considered alive for the
duration of the finalizer call.

Here's another way of looking at my error: for dicts and lists, I
would call a special *clear* function; but for instances, I would call
*dealloc*, however intending it to perform a *clear*.

I wish we didn't have to special-case finalizers on class instances
(since each dealloc function is potentially a combination of a
finalizer and a deallocation routine), but the truth is that they
*are* special -- __del__ has no responsibility for deallocating
memory, only for deallocating external resources (such as temp files).

And even if we introduced a tp_clean protocol that would clear dicts
and lists and call __cleanup__ for instances, we'd still want to call
it first for instances, because an instance depends on its __dict__
for its __cleanup__ to succeed (but the __dict__ doesn't depend on the
instance for its cleanup).  Greg's 3-phase tp_clean protocol seems
indeed overly elaborate but I guess it deals with such dependencies in
the most general fashion.

> I'd focus on the cycles themselves, not on the types of objects
> involved.  I'm not pretending to address the "order of finalization
> at shutdown" question, though (although I'd agree they're deeply
> related: how do you follow a topological sort when there *isn't*
> one?  well, you don't, because you can't).

In theory, you just delete the last root (a C global pointing to
sys.modules) and you run the garbage collector.  It might be more
complicated in practiceto track down all roots.  Another practical
consideration is that now there are cycles of the form

<function object> <=> <module dict>

which suggests that we should make function objects traceable.  Also,
modules can cross-reference, so module objects should be made
traceable.  I don't think that this will grow the sets of traced
objects by too much (since the dicts involved are already traced, and
a typical program has way fewer functions and modules than it has
class instances).  On the other hand, we may also have to trace
(un)bound method objects, and these may be tricky because they are
allocated and deallocated at high rates (once per typical method
call).

Back to the drawing board...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@mojam.com (Skip Montanaro)  Sun Mar  5 16:42:30 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Sun, 5 Mar 2000 10:42:30 -0600 (CST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <200003051346.IAA05539@eric.cnri.reston.va.us>
References: <000401bf8589$7d1364e0$c6a0143f@tim>
 <200003051346.IAA05539@eric.cnri.reston.va.us>
Message-ID: <14530.36471.11654.666900@beluga.mojam.com>

    Guido> What keeps nagging me though is what to do when there's a
    Guido> finalizer but no cleanup method.  I guess the trash cycle remains
    Guido> alive.  Is this acceptable?  (I guess so, because we've given the
    Guido> programmer a way to resolve the trash: provide a cleanup method.)

That assumes the programmer even knows there's a cycle, right?  I'd like to
see this scheme help provide debugging assistance.  If a cycle is discovered
but the programmer hasn't declared a cleanup method for the object it wants
to cleanup, a default cleanup method is called if it exists
(e.g. sys.default_cleanup), which would serve mostly as an alert (print
magic hex values to stderr, popup a Tk bomb dialog, raise the blue screen of
death, ...) as opposed to actually breaking any cycles.  Presumably the
programmer would define sys.default_cleanup during development and leave it
undefined during production.

Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/


From paul@prescod.net  Sat Mar  4 01:04:43 2000
From: paul@prescod.net (Paul Prescod)
Date: Fri, 03 Mar 2000 17:04:43 -0800
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>
 <38BC86E1.53F69776@prescod.net> <200003010411.XAA12988@eric.cnri.reston.va.us>
Message-ID: <38C0612B.7C92F8C4@prescod.net>

Guido van Rossum wrote:
> 
> ..
> Multi-arg
> append probably won't be the only reason why e.g. Digital Creations
> may need to release an update to Zope for Python 1.6.  Zope comes with
> its own version of Python anyway, so they have control over when they
> make the switch.

My concernc is when I want to build an application with a module that
only works with Python 1.5.2 and another one that only works with Python
1.6. If we can avoid that situation by making 1.6 compatible with 1.5.2.
we should. By the time 1.7 comes around I will accept that everyone has
had enough time to update their modules. Remember that many module
authors are just part time volunteers. They may only use Python every
few months when they get a spare weekend!

I really hope that Andrew is wrong when he predicts that there may be
lots of different places where Python 1.6 breaks code! I'm in favor of
being a total jerk when it comes to Py3K but Python has been pretty
conservative thus far.

Could someone remind in one sentence what the downside is for treating
this as a warning condition as Java does with its deprecated features?
Then the CP4E people don't get into bad habits and those same CP4E
people trying to use older modules don't run into frustrating runtime
errors. Do it for the CP4E people! (how's that for rhetoric)
-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"We still do not know why mathematics is true and whether it is
certain. But we know what we do not know in an immeasurably richer way
than we did. And learning this has been a remarkable achievement,
among the greatest and least known of the modern era." 
        - from "Advent of the Algorithm" David Berlinski
	http://www.opengroup.com/mabooks/015/0151003386.shtml


From jeremy@cnri.reston.va.us  Sun Mar  5 17:46:14 2000
From: jeremy@cnri.reston.va.us (Jeremy Hylton)
Date: Sun, 5 Mar 2000 12:46:14 -0500 (EST)
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method
In-Reply-To: <000901bf8670$97d8f320$432d153f@tim>
References: <Pine.GSO.4.10.10003050800540.7148-100000@sundial>
 <000901bf8670$97d8f320$432d153f@tim>
Message-ID: <14530.40294.593407.777859@bitdiddle.cnri.reston.va.us>

>>>>> "TP" == Tim Peters <tim_one@email.msn.com> writes:

  >>> Grabbing the GregS/BillT enhancement is probably the most
  >>> practical thing we could build on right now

  >> You got some pointers?

  TP> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and
  TP> grab transformer.py from the zip file.  The latter supplies a
  TP> very useful post-processing pass over the parse module's output,
  TP> squashing it *way* down.

The compiler tools in python/nondist/src/Compiler include Bill &
Greg's transformer code, a class-based AST (each node is a subclass of
the generic node), and a visitor framework for walking the AST.  

The APIs and organization are in a bit of flux; Mark Hammond suggested
some reorganization that I've not finished yet.  I may finish it up
this evening.

The transformer module does a good job of incuding line numbers, but
I've occasionally run into a node that didn't have a lineno
attribute when I expected it would.  I haven't taken the time to
figure out if my expection was unreasonable or if the transformer
should be fixed.

The compiler-sig might be a good place to discuss this further.  A
warning framework was one of my original goals for the SIG.  I imagine
we could convince Guido to move warnings + compiler tools into the
standard library if they end up being useful.

Jeremy


From mal@lemburg.com  Sun Mar  5 19:57:32 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sun, 05 Mar 2000 20:57:32 +0100
Subject: [Python-Dev] Unicode mapping tables
References: <000601bf8668$cbbdd640$432d153f@tim>
Message-ID: <38C2BC2C.FFEB72C3@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > ...
> > Here's what I'll do:
> >
> > * implement .capitalize() in the traditional way for Unicode
> >   objects (simply convert the first char to uppercase)
> 
> Given .title(), is .capitalize() of use for Unicode strings?  Or is it just
> a temptation to do something senseless in the Unicode world?  If it doesn't
> make sense, leave it out (this *seems* like compulsion <wink> to implement
> all current string methods in *some* way for Unicode, whether or not they
> make sense).

.capitalize() only touches the first char of the string - not
sure whether it makes sense in both worlds ;-)

Anyhow, the difference is there but subtle: string.capitalize()
will use C's toupper() which is locale dependent, while
unicode.capitalize() uses Unicode's toTitleCase() for the first
character.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Sun Mar  5 20:15:47 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sun, 05 Mar 2000 21:15:47 +0100
Subject: [Python-Dev] Design question: call __del__ only after successful
 __init__?
References: <000601bf8658$d81d34e0$f42d153f@tim>
Message-ID: <38C2C073.CD51688@lemburg.com>

Tim Peters wrote:
> 
> [Guido]
> > Would there really be someone out there who uses *intentional*
> > resurrection?  I severely doubt it.  I've never heard of this.
> 
> Why would anyone tell you about something that *works*?!  You rarely hear
> the good stuff, you know.  I gave the typical pattern in the preceding msg.
> To flesh out the motivation more, you have some external resource that's
> very expensive to set up (in KSR's case, it was an IPC connection to a
> remote machine).  Rights to use that resource are handed out in the form of
> an object.  When a client is done using the resource, they *should*
> explicitly use the object's .release() method, but you can't rely on that.
> So the object's __del__ method looks like (for example):
> 
> def __del__(self):
> 
>     # Code not shown to figure out whether to disconnect:  the downside to
>     # disconnecting is that it can cost a bundle to create a new connection.
>     # If the whole app is shutting down, then of course we want to
> disconnect.
>     # Or if a timestamp trace shows that we haven't been making good use of
>     # all the open connections lately, we may want to disconnect too.
> 
>     if decided_to_disconnect:
>         self.external_resource.disconnect()
>     else:
>         # keep the connection alive for reuse
>         global_available_connection_objects.append(self)
> 
> This is simple & effective, and it relies on both intentional resurrection
> and __del__ getting called repeatedly.  I don't claim there's no other way
> to write it, just that there's *been* no problem doing this for a millennium
> <wink>.
> 
> Note that MAL spontaneously sketched similar examples, although I can't say
> whether he's actually done stuff like this.

Not exactly this, but similar things in the weak reference
implementation of mxProxy.

The idea came from a different area: the C implementation
of Python uses free lists a lot and these are basically
implementations of the same idiom: save an allocated
resource for reviving it at some later point.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From nascheme@enme.ucalgary.ca  Mon Mar  6 00:27:54 2000
From: nascheme@enme.ucalgary.ca (nascheme@enme.ucalgary.ca)
Date: Sun, 5 Mar 2000 17:27:54 -0700
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim>; from tim_one@email.msn.com on Fri, Mar 03, 2000 at 08:38:43PM -0500
References: <200003031650.LAA21647@eric.cnri.reston.va.us> <000001bf857a$60b45ac0$c6a0143f@tim>
Message-ID: <20000305172754.A14998@acs.ucalgary.ca>

On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote:
> So here's what I'd consider doing:  explicit is better than implicit, and in
> the face of ambiguity refuse the temptation to guess.

I like Marc's suggestion.  Here is my proposal:

Allow classes to have a new method, __cleanup__ or whatever you
want to call it.  When tp_clear is called for an instance, it
checks for this method.  If it exists, call it, otherwise delete
the container objects from the instance's dictionary.  When
collecting cycles, call tp_clear for instances first.

Its simple and allows the programmer to cleanly break cycles if
they insist on creating them and using __del__ methods.


    Neil


From tim_one@email.msn.com  Mon Mar  6 07:13:21 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Mon, 6 Mar 2000 02:13:21 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <38C0612B.7C92F8C4@prescod.net>
Message-ID: <000401bf873b$745f8320$ea2d153f@tim>

[Paul Prescod]
> ...
> Could someone remind in one sentence what the downside is for treating
> this as a warning condition as Java does with its deprecated features?

Simply the lack of anything to build on:  Python has no sort of runtime
warning system now, and nobody has volunteered to create one.  If you do
<wink>, remember that stdout & stderr may go to the bit bucket in a GUI app.

The bit about dropping the "L" suffix on longs seems unwarnable-about in any
case (short of warning every time anyone uses long()).

remember-that-you-asked-for-the-problems-not-for-solutions<wink>-ly y'rs
    - tim


From tim_one@email.msn.com  Mon Mar  6 07:33:49 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Mon, 6 Mar 2000 02:33:49 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: <38C2C073.CD51688@lemburg.com>
Message-ID: <000701bf873e$5032eca0$ea2d153f@tim>

[M.-A. Lemburg, on the resurrection/multiple-__del__ "idiom"]
> ...
> The idea came from a different area: the C implementation
> of Python uses free lists a lot and these are basically
> implementations of the same idiom: save an allocated
> resource for reviving it at some later point.

Excellent analogy!  Thanks.  Now that you phrased it in this clarifying way,
I recall that very much the same point was raised in the papers that
resulted in the creation of guardians in Scheme.  I don't know that anyone
is actually using Python __del__ this way today (I am not), but you reminded
me why I thought it was natural at one time <wink>.

generally-__del__-aversive-now-except-in-c++-where-destructors-are-
    guaranteed-to-be-called-when-you-except-them-to-be-ly y'rs  - tim


From tim_one@email.msn.com  Mon Mar  6 08:12:06 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Mon, 6 Mar 2000 03:12:06 -0500
Subject: [Python-Dev] return statements in lambda
In-Reply-To: <006f01bf8686$391ced80$34aab5d4@hagrid>
Message-ID: <000901bf8743$a9f61aa0$ea2d153f@tim>

[/F]
> maybe adding an (optional but encouraged) "return"
> to lambda would be an improvement?
>
>     lambda x: x + 10
>
> vs.
>
>     lambda x: return x + 10
>
> or is this just more confusing...  opinions?

It was an odd complaint to begin with, since Lisp-heads aren't used to using
"return" anyway.  More of a symptom of taking a shallow syntactic approach
to a new (to them) language.

For non-Lisp heads, I think it's more confusing in the end, blurring the
distinction between stmts and expressions ("the body of a lambda must be an
expression" ... "ok, i lied, unless it's a 'return' stmt).  If Guido had it
to do over again, I vote he rejects the original patch <wink>.  Short of
that, would have been better if the lambda arglist required parens, and if
the body were required to be a single return stmt (that would sure end the
"lambda x: print x" FAQ -- few would *expect* "return print x" to work!).

hindsight-is-great<wink>-ly y'rs  - tim


From tim_one@email.msn.com  Mon Mar  6 09:09:45 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Mon, 6 Mar 2000 04:09:45 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <200003051346.IAA05539@eric.cnri.reston.va.us>
Message-ID: <000b01bf874b$b6fe9da0$ea2d153f@tim>

[Guido]
> I'm beginning to believe that handing cycles with finalizers to the
> user is better than calling __del__ with a different meaning,

You won't be sorry:  Python has the chance to be the first language that's
both useful and sane here!

> and I tentatively withdraw my proposal to change the rules for when
> __del__is called (even when __init__ fails; I haven't had any complaints
> about that either).

Well, everyone liked the parenthetical half of that proposal, although
Jack's example did  point out a real surprise with it.

> There seem to be two competing suggestions for solutions: (1) call
> some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the
> object; (2) Tim's proposal of an interface to ask the garbage
> collector for a trash cycle with a finalizer (or for an object with a
> finalizer in a trash cycle?).

Or a maximal strongly-connected component, or *something* -- unsure.

> Somehow Tim's version looks less helpful to me, because it *seems*
> that whoever gets to handle the cycle (the main code of the program?)
> isn't necessarily responsible for creating it (some library you didn't
> even know was used under the covers of some other library you called).

Yes, to me too.  This is the Scheme "guardian" idea in a crippled form
(Scheme supports as many distinct guardians as the programmer cares to
create), and even in its full-blown form it supplies "a perfectly general
mechanism with no policy whatsoever".

Greg convinced me (although I haven't admitted this yet <wink>) that "no
policy whatsoever" is un-Pythonic too.  *Some* policy is helpful, so I won't
be pushing the guardian idea any more (although see immediately below for an
immediate backstep on that <wink>).

> ...
> What keeps nagging me though is what to do when there's a finalizer
> but no cleanup method.  I guess the trash cycle remains alive.  Is
> this acceptable?  (I guess so, because we've given the programmer a
> way to resolve the trash: provide a cleanup method.)

BDW considers it better to leak than to risk doing a wrong thing, and I
agree wholeheartedly with that.  GC is one place you want to have a "100%
language".

This is where something like a guardian can remain useful:  while leaking is
OK because you've given them an easy & principled alternative, leaking
without giving them a clear way to *know* about it is not OK.  If gc pushes
the leaked stuff off to the side, the gc module should (say) supply an entry
point that returns all the leaked stuff in a list.  Then users can *know*
they're leaking, know how badly they're leaking, and examine exactly the
objects that are leaking.  Then they've got the info they need to repair
their program (or at least track down the 3rd-party module that's leaking).
As with a guardian, they *could* also build a reclamation scheme on top of
it, but that would no longer be the main (or even an encouraged) thrust.

> If we detect individual cycles (the current algorithm doesn't do that
> yet, though it seems easy enough to do another scan), could we
> special-case cycles with only one finalizer and no cleaner-upper?
> (I'm tempted to call the finalizer because it seems little harm can be
> done -- but then of course there's the problem of the finalizer being
> called again when the refcount really goes to zero. :-( )

"Better safe than sorry" is my immediate view on this -- you can't know that
the finalizer won't resurrect the cycle, and "finalizer called iff refcount
hits 0" is a wonderfully simple & predictable rule.  That's worth a lot to
preserve, unless & until it proves to be a disaster in practice.


As to the details of cleanup, I haven't succeeded in making the time to
understand all the proposals.  But I've done my primary job here if I've
harassed everyone into not repeating the same mistakes all previous
languages have made <0.9 wink>.

> ...
> I wish we didn't have to special-case finalizers on class instances
> (since each dealloc function is potentially a combination of a
> finalizer and a deallocation routine), but the truth is that they
> *are* special -- __del__ has no responsibility for deallocating
> memory, only for deallocating external resources (such as temp files).

And the problem is that __del__ can do anything whatsoever than can be
expressed in Python, so there's not a chance in hell of outguessing it.

> ...
> Another practical consideration is that now there are cycles of the form
>
> <function object> <=> <module dict>
>
> which suggests that we should make function objects traceable.  Also,
> modules can cross-reference, so module objects should be made
> traceable.  I don't think that this will grow the sets of traced
> objects by too much (since the dicts involved are already traced, and
> a typical program has way fewer functions and modules than it has
> class instances).  On the other hand, we may also have to trace
> (un)bound method objects, and these may be tricky because they are
> allocated and deallocated at high rates (once per typical method
> call).

This relates to what I was trying to get at with my response to your gc
implementation sketch:  mark-&-sweep needs to chase *everything*, so the set
of chased types is maximal from the start.  Adding chased types to the
"indirectly infer what's unreachable via accounting for internal refcounts
within the transitive closure" scheme can end up touching nearly as much as
a full M-&-S pass per invocation.  I don't know where the break-even point
is, but the more stuff you chase in the latter scheme the less often you
want to run it.

About high rates, so long as a doubly-linked list allows efficient removal
of stuff that dies via refcount exhaustion, you won't actually *chase* many
bound method objects (i.e.,  they'll usually go away by themselves).

Note in passing that bound method objects often showed up in cycles in IDLE,
although you usually managed to break those in other ways.

> Back to the drawing board...

Good!  That means you're making real progress <wink>.

glad-someone-is-ly y'rs  - tim


From mal@lemburg.com  Mon Mar  6 10:01:31 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 06 Mar 2000 11:01:31 +0100
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
References: <200003031650.LAA21647@eric.cnri.reston.va.us> <000001bf857a$60b45ac0$c6a0143f@tim> <20000305172754.A14998@acs.ucalgary.ca>
Message-ID: <38C381FB.E222D6E4@lemburg.com>

nascheme@enme.ucalgary.ca wrote:
> 
> On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote:
> > So here's what I'd consider doing:  explicit is better than implicit, and in
> > the face of ambiguity refuse the temptation to guess.
> 
> I like Marc's suggestion.  Here is my proposal:
> 
> Allow classes to have a new method, __cleanup__ or whatever you
> want to call it.  When tp_clear is called for an instance, it
> checks for this method.  If it exists, call it, otherwise delete
> the container objects from the instance's dictionary.  When
> collecting cycles, call tp_clear for instances first.
> 
> Its simple and allows the programmer to cleanly break cycles if
> they insist on creating them and using __del__ methods.

Right :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Mon Mar  6 11:57:29 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 06 Mar 2000 12:57:29 +0100
Subject: [Python-Dev] Unicode character property methods
Message-ID: <38C39D29.A29CE67F@lemburg.com>

As you may have noticed, the Unicode objects provide
new methods .islower(), .isupper() and .istitle(). Finn Bock
mentioned that Java also provides .isdigit() and .isspace().

Question: should Unicode also provide these character
property methods: .isdigit(), .isnumeric(), .isdecimal()
and .isspace() ? Plus maybe .digit(), .numeric() and
.decimal() for the corresponding decoding ?

Similar APIs are already available through the unicodedata
module, but could easily be moved to the Unicode object
(they cause the builtin interpreter to grow a bit in size 
due to the new mapping tables).

BTW, string.atoi et al. are currently not mapped to
string methods... should they be ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido@python.org  Mon Mar  6 13:29:04 2000
From: guido@python.org (Guido van Rossum)
Date: Mon, 06 Mar 2000 08:29:04 -0500
Subject: [Python-Dev] Unicode character property methods
In-Reply-To: Your message of "Mon, 06 Mar 2000 12:57:29 +0100."
 <38C39D29.A29CE67F@lemburg.com>
References: <38C39D29.A29CE67F@lemburg.com>
Message-ID: <200003061329.IAA09529@eric.cnri.reston.va.us>

> As you may have noticed, the Unicode objects provide
> new methods .islower(), .isupper() and .istitle(). Finn Bock
> mentioned that Java also provides .isdigit() and .isspace().
> 
> Question: should Unicode also provide these character
> property methods: .isdigit(), .isnumeric(), .isdecimal()
> and .isspace() ? Plus maybe .digit(), .numeric() and
> .decimal() for the corresponding decoding ?

What would be the difference between isdigit, isnumeric, isdecimal?
I'd say don't do more than Java.  I don't understand what the
"corresponding decoding" refers to.  What would "3".decimal() return?

> Similar APIs are already available through the unicodedata
> module, but could easily be moved to the Unicode object
> (they cause the builtin interpreter to grow a bit in size 
> due to the new mapping tables).
> 
> BTW, string.atoi et al. are currently not mapped to
> string methods... should they be ?

They are mapped to int() c.s.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake@acm.org  Mon Mar  6 15:09:55 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 6 Mar 2000 10:09:55 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <200003051204.HAA05367@eric.cnri.reston.va.us>
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us>
 <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>
 <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
 <14529.55983.263225.691427@weyr.cnri.reston.va.us>
 <200003051204.HAA05367@eric.cnri.reston.va.us>
Message-ID: <14531.51779.650532.881626@weyr.cnri.reston.va.us>

Guido van Rossum writes:
 > - You could put it all in ConfigParser.py but with new classnames.
 > (Not sure though, since the ConfigParser class, which is really a
 > kind of weird variant, will be assumed to be the main class because
 > its name is that of the module.)

  The ConfigParser class could be clearly marked as deprecated both in 
the source/docstring and in the documentation.  But the class itself
should not be used in any way.

 > - Variants on the syntax could be given through some kind of option
 > system rather than through subclassing -- they should be combinable
 > independently.  Som possible options (maybe I'm going overboard here)
 > could be:

  Yes, you are going overboard.  It should contain exactly what's
right for .ini files, and that's it.
  There are really three aspects to the beast: reading, using, and
writing.  I think there should be a class which does the right thing
for using the informatin in the file, and reading & writing can be
handled through functions or helper classes.  That separates the
parsing issues from the use issues, and alternate syntaxes will be
easy enough to implement by subclassing the helper or writing a new
function.  An "editable" version that allows loading & saving without
throwing away comments, ordering, etc. would require a largely
separate implementation of all three aspects (or at least the reader
and writer).

 > (Well maybe the whole substitution thing should really be done through
 > a subclass -- it's too weird for normal use.)

  That and the ad hoc syntax are my biggest beefs with ConfigParser.
But it can easily be added by a subclass as long as the method to
override is clearly specified in the documenation (it should only
require one!).


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake@acm.org  Mon Mar  6 17:47:44 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 6 Mar 2000 12:47:44 -0500 (EST)
Subject: [Python-Dev] PyBufferProcs
Message-ID: <14531.61248.941076.803617@weyr.cnri.reston.va.us>

  While working on the documentation, I've noticed a naming
inconsistency regarding PyBufferProcs; it's peers are all named
Py*Methods (PySequenceMethods, PyNumberMethods, etc.).
  I'd like to propose that a synonym, PyBufferMethods, be made for
PyBufferProcs, and use that in the core implementations and the
documentation.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From jeremy@cnri.reston.va.us  Mon Mar  6 19:28:12 2000
From: jeremy@cnri.reston.va.us (Jeremy Hylton)
Date: Mon, 6 Mar 2000 14:28:12 -0500 (EST)
Subject: [Python-Dev] example checkers based on compiler package
Message-ID: <14532.1740.90292.440395@goon.cnri.reston.va.us>

There was some discussion on python-dev over the weekend about
generating warnings, and Moshe Zadke posted a selfnanny that warned
about methods that didn't have self as the first argument.

I think these kinds of warnings are useful, and I'd like to see a more
general framework for them built are Python abstract syntax originally
from P2C.  Ideally, they would be available as command line tools and
integrated into GUIs like IDLE in some useful way.

I've included a couple of quick examples I coded up last night based
on the compiler package (recently re-factored) that is resident in
python/nondist/src/Compiler.  The analysis on the one that checks for
name errors is a bit of a mess, but the overall structure seems right.

I'm hoping to collect a few more examples of checkers and generalize
from them to develop a framework for checking for errors and reporting
them.

Jeremy

------------ checkself.py ------------
"""Check for methods that do not have self as the first argument"""

from compiler import parseFile, walk, ast, misc

class Warning:
    def __init__(self, filename, klass, method, lineno, msg):
        self.filename = filename
        self.klass = klass
        self.method = method
        self.lineno = lineno
        self.msg = msg

    _template = "%(filename)s:%(lineno)s %(klass)s.%(method)s: %(msg)s"

    def __str__(self):
        return  self._template % self.__dict__

class NoArgsWarning(Warning):
    super_init = Warning.__init__
    
    def __init__(self, filename, klass, method, lineno):
        self.super_init(filename, klass, method, lineno,
                        "no arguments")

class NotSelfWarning(Warning):
    super_init = Warning.__init__
    
    def __init__(self, filename, klass, method, lineno, argname):
        self.super_init(filename, klass, method, lineno,
                        "self slot is named %s" % argname)

class CheckSelf:
    def __init__(self, filename):
        self.filename = filename
        self.warnings = []
        self.scope = misc.Stack()

    def inClass(self):
        if self.scope:
            return isinstance(self.scope.top(), ast.Class)
        return 0        

    def visitClass(self, klass):
        self.scope.push(klass)
        self.visit(klass.code)
        self.scope.pop()
        return 1

    def visitFunction(self, func):
        if self.inClass():
            classname = self.scope.top().name
            if len(func.argnames) == 0:
                w = NoArgsWarning(self.filename, classname, func.name,
                                  func.lineno)
                self.warnings.append(w)
            elif func.argnames[0] != "self":
                w = NotSelfWarning(self.filename, classname, func.name,
                                   func.lineno, func.argnames[0])
                self.warnings.append(w)
        self.scope.push(func)
        self.visit(func.code)
        self.scope.pop()
        return 1

def check(filename):
    global p, check
    p = parseFile(filename)
    check = CheckSelf(filename)
    walk(p, check)
    for w in check.warnings:
        print w

if __name__ == "__main__":
    import sys

    # XXX need to do real arg processing
    check(sys.argv[1])

------------ badself.py ------------
def foo():
    return 12

class Foo:
    def __init__():
        pass

    def foo(self, foo):
        pass

    def bar(this, that):
        def baz(this=that):
            return this
        return baz

def bar():
    class Quux:
        def __init__(self):
            self.sum = 1
        def quam(x, y):
            self.sum = self.sum + (x * y)
    return Quux()

------------ checknames.py ------------
"""Check for NameErrors"""

from compiler import parseFile, walk
from compiler.misc import Stack, Set

import __builtin__
from UserDict import UserDict

class Warning:
    def __init__(self, filename, funcname, lineno):
        self.filename = filename
        self.funcname = funcname
        self.lineno = lineno

    def __str__(self):
        return self._template % self.__dict__

class UndefinedLocal(Warning):
    super_init = Warning.__init__
    
    def __init__(self, filename, funcname, lineno, name):
        self.super_init(filename, funcname, lineno)
        self.name = name

    _template = "%(filename)s:%(lineno)s  %(funcname)s undefined local %(name)s"

class NameError(UndefinedLocal):
    _template = "%(filename)s:%(lineno)s  %(funcname)s undefined name %(name)s"

class NameSet(UserDict):
    """Track names and the line numbers where they are referenced"""
    def __init__(self):
        self.data = self.names = {}

    def add(self, name, lineno):
        l = self.names.get(name, [])
        l.append(lineno)
        self.names[name] = l

class CheckNames:
    def __init__(self, filename):
        self.filename = filename
        self.warnings = []
        self.scope = Stack()
        self.gUse = NameSet()
        self.gDef = NameSet()
        # _locals is the stack of local namespaces
        # locals is the top of the stack
        self._locals = Stack()
        self.lUse = None
        self.lDef = None
        self.lGlobals = None # var declared global
        # holds scope,def,use,global triples for later analysis
        self.todo = []

    def enterNamespace(self, node):
##        print node.name
        self.scope.push(node)
        self.lUse = use = NameSet()
        self.lDef = _def = NameSet()
        self.lGlobals = gbl = NameSet()
        self._locals.push((use, _def, gbl))

    def exitNamespace(self):
##        print
        self.todo.append((self.scope.top(), self.lDef, self.lUse,
                          self.lGlobals))
        self.scope.pop()
        self._locals.pop()
        if self._locals:
            self.lUse, self.lDef, self.lGlobals = self._locals.top()
        else:
            self.lUse = self.lDef = self.lGlobals = None

    def warn(self, warning, funcname, lineno, *args):
        args = (self.filename, funcname, lineno) + args
        self.warnings.append(apply(warning, args))

    def defName(self, name, lineno, local=1):
##        print "defName(%s, %s, local=%s)" % (name, lineno, local)
        if self.lUse is None:
            self.gDef.add(name, lineno)
        elif local == 0:
            self.gDef.add(name, lineno)
            self.lGlobals.add(name, lineno)
        else:
            self.lDef.add(name, lineno)

    def useName(self, name, lineno, local=1):
##        print "useName(%s, %s, local=%s)" % (name, lineno, local)
        if self.lUse is None:
            self.gUse.add(name, lineno)
        elif local == 0:
            self.gUse.add(name, lineno)
            self.lUse.add(name, lineno)            
        else:
            self.lUse.add(name, lineno)

    def check(self):
        for s, d, u, g in self.todo:
            self._check(s, d, u, g, self.gDef)
        # XXX then check the globals

    def _check(self, scope, _def, use, gbl, globals):
        # check for NameError
        # a name is defined iff it is in def.keys()
        # a name is global iff it is in gdefs.keys()
        gdefs = UserDict()
        gdefs.update(globals)
        gdefs.update(__builtin__.__dict__)
        defs = UserDict()
        defs.update(gdefs)
        defs.update(_def)
        errors = Set()
        for name in use.keys():
            if not defs.has_key(name):
                firstuse = use[name][0]
                self.warn(NameError, scope.name, firstuse, name)
                errors.add(name)

        # check for UndefinedLocalNameError
        # order == use & def sorted by lineno
        # elements are lineno, flag, name
        # flag = 0 if use, flag = 1 if def
        order = []
        for name, lines in use.items():
            if gdefs.has_key(name) and not _def.has_key(name):
                # this is a global ref, we can skip it
                continue
            for lineno in lines:
                order.append(lineno, 0, name)
        for name, lines in _def.items():
            for lineno in lines:
                order.append(lineno, 1, name)
        order.sort()
        # ready contains names that have been defined or warned about
        ready = Set()
        for lineno, flag, name in order:
            if flag == 0: # use
                if not ready.has_elt(name) and not errors.has_elt(name):
                    self.warn(UndefinedLocal, scope.name, lineno, name)
                    ready.add(name) # don't warn again
            else:
                ready.add(name)

    # below are visitor methods
        

    def visitFunction(self, node, noname=0):
        for expr in node.defaults:
            self.visit(expr)
        if not noname:
            self.defName(node.name, node.lineno)
        self.enterNamespace(node)
        for name in node.argnames:
            self.defName(name, node.lineno)
        self.visit(node.code)
        self.exitNamespace()
        return 1

    def visitLambda(self, node):
        return self.visitFunction(node, noname=1)

    def visitClass(self, node):
        for expr in node.bases:
            self.visit(expr)
        self.defName(node.name, node.lineno)
        self.enterNamespace(node)
        self.visit(node.code)
        self.exitNamespace()
        return 1

    def visitName(self, node):
        self.useName(node.name, node.lineno)

    def visitGlobal(self, node):
        for name in node.names:
            self.defName(name, node.lineno, local=0)

    def visitImport(self, node):
        for name in node.names:
            self.defName(name, node.lineno)

    visitFrom = visitImport

    def visitAssName(self, node):
        self.defName(node.name, node.lineno)
    
def check(filename):
    global p, checker
    p = parseFile(filename)
    checker = CheckNames(filename)
    walk(p, checker)
    checker.check()
    for w in checker.warnings:
        print w

if __name__ == "__main__":
    import sys

    # XXX need to do real arg processing
    check(sys.argv[1])

------------ badnames.py ------------
# XXX can we detect race conditions on accesses to global variables?
#     probably can (conservatively) by noting variables _created_ by
#     global decls in funcs
import string
import time

def foo(x):
    return x + y

def foo2(x):
    return x + z

a = 4

def foo3(x):
    a, b = x, a

def bar(x):
    z = x
    global z

def bar2(x):
    f = string.strip
    a = f(x)
    import string
    return string.lower(a)

def baz(x, y):
    return x + y + z

def outer(x):
    def inner(y):
        return x + y
    return inner


From gstein@lyra.org  Mon Mar  6 21:09:33 2000
From: gstein@lyra.org (Greg Stein)
Date: Mon, 6 Mar 2000 13:09:33 -0800 (PST)
Subject: [Python-Dev] PyBufferProcs
In-Reply-To: <14531.61248.941076.803617@weyr.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003061122120.17063-100000@nebula.lyra.org>

On Mon, 6 Mar 2000, Fred L. Drake, Jr. wrote:
>   While working on the documentation, I've noticed a naming
> inconsistency regarding PyBufferProcs; it's peers are all named
> Py*Methods (PySequenceMethods, PyNumberMethods, etc.).
>   I'd like to propose that a synonym, PyBufferMethods, be made for
> PyBufferProcs, and use that in the core implementations and the
> documentation.

+0

Although.. I might say that it should be renamed, and a synonym (#define
or typedef?) be provided for the old name.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal@lemburg.com  Mon Mar  6 22:04:14 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 06 Mar 2000 23:04:14 +0100
Subject: [Python-Dev] Unicode character property methods
References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us>
Message-ID: <38C42B5E.42801755@lemburg.com>

Guido van Rossum wrote:
> 
> > As you may have noticed, the Unicode objects provide
> > new methods .islower(), .isupper() and .istitle(). Finn Bock
> > mentioned that Java also provides .isdigit() and .isspace().
> >
> > Question: should Unicode also provide these character
> > property methods: .isdigit(), .isnumeric(), .isdecimal()
> > and .isspace() ? Plus maybe .digit(), .numeric() and
> > .decimal() for the corresponding decoding ?
> 
> What would be the difference between isdigit, isnumeric, isdecimal?
> I'd say don't do more than Java.  I don't understand what the
> "corresponding decoding" refers to.  What would "3".decimal() return?

These originate in the Unicode database; see

ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html

Here are the descriptions:

"""
6
      Decimal digit value
                        normative
                                     This is a numeric field. If the
                                     character has the decimal digit
                                     property, as specified in Chapter
                                     4 of the Unicode Standard, the
                                     value of that digit is represented
                                     with an integer value in this field
   7
      Digit value
                        normative
                                     This is a numeric field. If the
                                     character represents a digit, not
                                     necessarily a decimal digit, the
                                     value is here. This covers digits
                                     which do not form decimal radix
                                     forms, such as the compatibility
                                     superscript digits
   8
      Numeric value
                        normative
                                     This is a numeric field. If the
                                     character has the numeric
                                     property, as specified in Chapter
                                     4 of the Unicode Standard, the
                                     value of that character is
                                     represented with an integer or
                                     rational number in this field. This
                                     includes fractions as, e.g., "1/5" for
                                     U+2155 VULGAR FRACTION
                                     ONE FIFTH Also included are
                                     numerical values for compatibility
                                     characters such as circled
                                     numbers.

u"3".decimal() would return 3. u"\u2155".

Some more examples from the unicodedata module (which makes
all fields of the database available in Python):

>>> unicodedata.decimal(u"3")
3
>>> unicodedata.decimal(u"�")
2
>>> unicodedata.digit(u"�")
2
>>> unicodedata.numeric(u"�")
2.0
>>> unicodedata.numeric(u"\u2155")
0.2
>>> unicodedata.numeric(u'\u215b')
0.125

> > Similar APIs are already available through the unicodedata
> > module, but could easily be moved to the Unicode object
> > (they cause the builtin interpreter to grow a bit in size
> > due to the new mapping tables).
> >
> > BTW, string.atoi et al. are currently not mapped to
> > string methods... should they be ?
> 
> They are mapped to int() c.s.

Hmm, I just noticed that int() et friends don't like
Unicode... shouldn't they use the "t" parser marker 
instead of requiring a string or tp_int compatible
type ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido@python.org  Mon Mar  6 23:12:33 2000
From: guido@python.org (Guido van Rossum)
Date: Mon, 06 Mar 2000 18:12:33 -0500
Subject: [Python-Dev] Unicode character property methods
In-Reply-To: Your message of "Mon, 06 Mar 2000 23:04:14 +0100."
 <38C42B5E.42801755@lemburg.com>
References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us>
 <38C42B5E.42801755@lemburg.com>
Message-ID: <200003062312.SAA11697@eric.cnri.reston.va.us>

[MAL]
> > > As you may have noticed, the Unicode objects provide
> > > new methods .islower(), .isupper() and .istitle(). Finn Bock
> > > mentioned that Java also provides .isdigit() and .isspace().
> > >
> > > Question: should Unicode also provide these character
> > > property methods: .isdigit(), .isnumeric(), .isdecimal()
> > > and .isspace() ? Plus maybe .digit(), .numeric() and
> > > .decimal() for the corresponding decoding ?

[Guido]
> > What would be the difference between isdigit, isnumeric, isdecimal?
> > I'd say don't do more than Java.  I don't understand what the
> > "corresponding decoding" refers to.  What would "3".decimal() return?

[MAL]
> These originate in the Unicode database; see
> 
> ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html
> 
> Here are the descriptions:
> 
> """
> 6
>       Decimal digit value
>                         normative
>                                      This is a numeric field. If the
>                                      character has the decimal digit
>                                      property, as specified in Chapter
>                                      4 of the Unicode Standard, the
>                                      value of that digit is represented
>                                      with an integer value in this field
>    7
>       Digit value
>                         normative
>                                      This is a numeric field. If the
>                                      character represents a digit, not
>                                      necessarily a decimal digit, the
>                                      value is here. This covers digits
>                                      which do not form decimal radix
>                                      forms, such as the compatibility
>                                      superscript digits
>    8
>       Numeric value
>                         normative
>                                      This is a numeric field. If the
>                                      character has the numeric
>                                      property, as specified in Chapter
>                                      4 of the Unicode Standard, the
>                                      value of that character is
>                                      represented with an integer or
>                                      rational number in this field. This
>                                      includes fractions as, e.g., "1/5" for
>                                      U+2155 VULGAR FRACTION
>                                      ONE FIFTH Also included are
>                                      numerical values for compatibility
>                                      characters such as circled
>                                      numbers.
> 
> u"3".decimal() would return 3. u"\u2155".
> 
> Some more examples from the unicodedata module (which makes
> all fields of the database available in Python):
> 
> >>> unicodedata.decimal(u"3")
> 3
> >>> unicodedata.decimal(u"�")
> 2
> >>> unicodedata.digit(u"�")
> 2
> >>> unicodedata.numeric(u"�")
> 2.0
> >>> unicodedata.numeric(u"\u2155")
> 0.2
> >>> unicodedata.numeric(u'\u215b')
> 0.125

Hm, very Unicode centric.  Probably best left out of the general
string methods.  Isspace() seems useful, and an isdigit() that is only
true for ASCII '0' - '9' also makes sense.

What about "123".isdigit()?  What does Java say?  Or do these only
apply to single chars there?  I think "123".isdigit() should be true
if "abc".islower() is true.

> > > Similar APIs are already available through the unicodedata
> > > module, but could easily be moved to the Unicode object
> > > (they cause the builtin interpreter to grow a bit in size
> > > due to the new mapping tables).
> > >
> > > BTW, string.atoi et al. are currently not mapped to
> > > string methods... should they be ?
> > 
> > They are mapped to int() c.s.
> 
> Hmm, I just noticed that int() et friends don't like
> Unicode... shouldn't they use the "t" parser marker 
> instead of requiring a string or tp_int compatible
> type ?

Good catch.  Go ahead.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Moshe Zadka <mzadka@geocities.com>  Tue Mar  7 05:25:43 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Tue, 7 Mar 2000 07:25:43 +0200 (IST)
Subject: [Python-Dev] Re: example checkers based on compiler package
In-Reply-To: <14532.1740.90292.440395@goon.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003070712480.4496-100000@sundial>

On Mon, 6 Mar 2000, Jeremy Hylton wrote:

> I think these kinds of warnings are useful, and I'd like to see a more
> general framework for them built are Python abstract syntax originally
> from P2C.  Ideally, they would be available as command line tools and
> integrated into GUIs like IDLE in some useful way.

Yes! Guido already suggested we have a standard API to them. One thing
I suggested was that the abstract API include not only the input (one form
or another of an AST), but the output: so IDE's wouldn't have to parse
strings, but get a warning class. Something like a:

An output of a warning can be a subclass of GeneralWarning, and should
implemented the following methods:

	1. line-no() -- returns an integer
	2. columns() -- returns either a pair of integers, or None
        3. message() -- returns a string containing a message
	4. __str__() -- comes for free if inheriting GeneralWarning,
	                and formats the warning message.

> I've included a couple of quick examples I coded up last night based
> on the compiler package (recently re-factored) that is resident in
> python/nondist/src/Compiler.  The analysis on the one that checks for
> name errors is a bit of a mess, but the overall structure seems right.

One thing I had trouble with is that in my implementation of selfnanny,
I used Python's stack for recursion while you used an explicit stack.
It's probably because of the visitor pattern, which is just another
argument for co-routines and generators.

> I'm hoping to collect a few more examples of checkers and generalize
> from them to develop a framework for checking for errors and reporting
> them.

Cool! 
Brainstorming: what kind of warnings would people find useful? In
selfnanny, I wanted to include checking for assigment to self, and
checking for "possible use before definition of local variables" sounds
good. Another check could be a CP4E "checking that no two identifiers
differ only by case". I might code up a few if I have the time...

What I'd really want (but it sounds really hard) is a framework for
partial ASTs: warning people as they write code.

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From mwh21@cam.ac.uk  Tue Mar  7 08:31:23 2000
From: mwh21@cam.ac.uk (Michael Hudson)
Date: 07 Mar 2000 08:31:23 +0000
Subject: [Python-Dev] Re: [Compiler-sig] Re: example checkers based on compiler package
In-Reply-To: Moshe Zadka's message of "Tue, 7 Mar 2000 07:25:43 +0200 (IST)"
References: <Pine.GSO.4.10.10003070712480.4496-100000@sundial>
Message-ID: <m3u2ij89lw.fsf@atrus.jesus.cam.ac.uk>

Moshe Zadka <moshez@math.huji.ac.il> writes:

> On Mon, 6 Mar 2000, Jeremy Hylton wrote:
> 
> > I think these kinds of warnings are useful, and I'd like to see a more
> > general framework for them built are Python abstract syntax originally
> > from P2C.  Ideally, they would be available as command line tools and
> > integrated into GUIs like IDLE in some useful way.
> 
> Yes! Guido already suggested we have a standard API to them. One thing
> I suggested was that the abstract API include not only the input (one form
> or another of an AST), but the output: so IDE's wouldn't have to parse
> strings, but get a warning class. 

That would be seriously cool.

> Something like a:
> 
> An output of a warning can be a subclass of GeneralWarning, and should
> implemented the following methods:
> 
> 	1. line-no() -- returns an integer
> 	2. columns() -- returns either a pair of integers, or None
>         3. message() -- returns a string containing a message
> 	4. __str__() -- comes for free if inheriting GeneralWarning,
> 	                and formats the warning message.

Wouldn't it make sense to include function/class name here too?  A
checker is likely to now, and it would save reparsing to find it out.

[little snip]
 
> > I'm hoping to collect a few more examples of checkers and generalize
> > from them to develop a framework for checking for errors and reporting
> > them.
> 
> Cool! 
> Brainstorming: what kind of warnings would people find useful? In
> selfnanny, I wanted to include checking for assigment to self, and
> checking for "possible use before definition of local variables" sounds
> good. Another check could be a CP4E "checking that no two identifiers
> differ only by case". I might code up a few if I have the time...

Is there stuff in the current Compiler code to do control flow
analysis?  You'd need that to check for use before definition in
meaningful cases, and also if you ever want to do any optimisation...

> What I'd really want (but it sounds really hard) is a framework for
> partial ASTs: warning people as they write code.

I agree (on both points).

Cheers,
M.

-- 
very few people approach me in real life and insist on proving they are
drooling idiots.                         -- Erik Naggum, comp.lang.lisp


From mal@lemburg.com  Tue Mar  7 09:14:25 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 10:14:25 +0100
Subject: [Python-Dev] Unicode character property methods
References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us>
 <38C42B5E.42801755@lemburg.com> <200003062312.SAA11697@eric.cnri.reston.va.us>
Message-ID: <38C4C871.F47E17A3@lemburg.com>

Guido van Rossum wrote:
> [MAL about adding .isdecimal(), .isdigit() and .isnumeric()]
> > Some more examples from the unicodedata module (which makes
> > all fields of the database available in Python):
> >
> > >>> unicodedata.decimal(u"3")
> > 3
> > >>> unicodedata.decimal(u"�")
> > 2
> > >>> unicodedata.digit(u"�")
> > 2
> > >>> unicodedata.numeric(u"�")
> > 2.0
> > >>> unicodedata.numeric(u"\u2155")
> > 0.2
> > >>> unicodedata.numeric(u'\u215b')
> > 0.125
> 
> Hm, very Unicode centric.  Probably best left out of the general
> string methods.  Isspace() seems useful, and an isdigit() that is only
> true for ASCII '0' - '9' also makes sense.

Well, how about having all three on Unicode objects
and only .isdigit() on string objects ?
 
> What about "123".isdigit()?  What does Java say?  Or do these only
> apply to single chars there?  I think "123".isdigit() should be true
> if "abc".islower() is true.

In the current uPython implementation u"123".isdigit() is true;
same for the other two methods.
 
> > > > Similar APIs are already available through the unicodedata
> > > > module, but could easily be moved to the Unicode object
> > > > (they cause the builtin interpreter to grow a bit in size
> > > > due to the new mapping tables).
> > > >
> > > > BTW, string.atoi et al. are currently not mapped to
> > > > string methods... should they be ?
> > >
> > > They are mapped to int() c.s.
> >
> > Hmm, I just noticed that int() et friends don't like
> > Unicode... shouldn't they use the "t" parser marker
> > instead of requiring a string or tp_int compatible
> > type ?
> 
> Good catch.  Go ahead.

Done. float(), int() and long() now accept charbuf
compatible objects as argument.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Tue Mar  7 09:23:35 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 10:23:35 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
Message-ID: <38C4CA97.5D0AA9D@lemburg.com>

Before starting to code away, I would like to know which
of the new Unicode methods should also be available on
string objects.

Here are the currently available methods:

Unicode objects      string objects
------------------------------------
capitalize           capitalize
center              
count                count
encode              
endswith             endswith
expandtabs          
find                 find
index                index
isdecimal           
isdigit             
islower             
isnumeric           
isspace             
istitle             
isupper             
join                 join
ljust               
lower                lower
lstrip               lstrip
replace              replace
rfind                rfind
rindex               rindex
rjust               
rstrip               rstrip
split                split
splitlines          
startswith           startswith
strip                strip
swapcase             swapcase
title                title
translate            translate (*)
upper                upper
zfill               

(*) The two hvae slightly different implementations, e.g.
deletions are handled differently.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fredrik@pythonware.com  Tue Mar  7 11:54:56 2000
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Tue, 7 Mar 2000 12:54:56 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
References: <38C4CA97.5D0AA9D@lemburg.com>
Message-ID: <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com>

> Unicode objects      string objects
> expandtabs         =20

yes.

I'm pretty sure there's "expandtabs" code in the
strop module.  maybe barry missed it?

> center
> ljust
> rjust             =20

probably.

the implementation is trivial, and ljust/rjust are
somewhat useful, so you might as well add them
all (just cut and paste from the unicode class).

what about rguido and lguido, btw?

> zfill              =20

no.

</F>


From guido@python.org  Tue Mar  7 13:52:00 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 07 Mar 2000 08:52:00 -0500
Subject: [Python-Dev] finalization again
Message-ID: <200003071352.IAA13571@eric.cnri.reston.va.us>

Warning: long message.  If you're not interested in reading all this,
please skip to "Conclusion" at the end.

At Tim's recommendation I had a look at what section 12.6 of the Java
language spec says about finalizers. The stuff there is sure seductive
for language designers...

Have a look at te diagram at
http://java.sun.com/docs/books/jls/html/12.doc.html#48746. In all its
(seeming) complexity, it helped me understand some of the issues of
finalization better. Rather than the complex 8-state state machine
that it appears to be, think of it as a simple 3x3 table. The three
rows represent the categories reachable, finalizer-reachable
(abbreviated in the diagram as f-reachable), and unreachable. These
categories correspond directly to categories of objects that the
Schemenauer-Tiedemann cycle-reclamation scheme deals with: after
moving all the reachable objects to the second list (first the roots
and then the objects reachable from the roots), the first list is left
with the unreachable and finalizer-reachable objects.

If we want to distinguish between unreachable and finalizer-reachable
at this point, a straightforward application of the same algorithm
will work well: Create a third list (this will contain the
finalizer-reachable objects). Start by filling it with all the objects
from the first list (which contains the potential garbage at this
point) that have a finalizer. We can look for objects that have
__del__ or __clean__ or for which tp_clean(CARE_EXEC)==true, it
doesn't matter here.(*) Then walk through the third list, following
each object's references, and move all referenced objects that are
still in the first list to the third list. Now, we have:

List 1: truly unreachable objects. These have no finalizers and can be
discarded right away.

List 2: truly reachable objects. (Roots and objects reachable from
roots.) Leave them alone.

List 3: finalizer-reachable objects. This contains objects that are
unreachable but have a finalizer, and objects that are only reachable
through those.

We now have to decide on a policy for invoking finalizers. Java
suggests the following: Remember the "roots" of the third list -- the
nodes that were moved there directly from the first list because they
have a finalizer. These objects are marked *finalizable* (a category
corresponding to the second *column* of the Java diagram). The Java
spec allows the Java garbage collector to call all of these finalizers
in any order -- even simultaneously in separate threads. Java never
allows an object to go back from the finalizable to the unfinalized
state (there are no arrows pointing left in the diagram). The first
finalizer that is called could make its object reachable again (up
arrow), thereby possibly making other finalizable objects reachable
too. But this does not cancel their scheduled finalization! The
conclusion is that Java can sometimes call finalization on unreachable
objects -- but only if those objects have gone through a phase in
their life where they were unreachable or at least
finalizer-unreachable.

I agree that this is the best that Java can do: if there are cycles
containing multiple objects with finalizers, there is no way (short of
asking the programmer(s)) to decide which object to finalize first. We
could pick one at random, run its finalizer, and start garbage
collection all over -- if the finalizer doesn't resurrect anything,
this will give us the same set of unreachable objects, from which we
could pick the next finalizable object, and so on. That looks very
inefficient, might not terminate (the same object could repeatedly
show up as the candidate for finalization), and it's still arbitrary:
the programmer(s) still can't predict which finalizer in a cycle with
multiple finalizers will be called first. Assuming the recommended
characteristics of finalizers (brief and robust), it won't make much
difference if we call all finalizers (of the now-finalizeable objects)
"without looking back". Sure, some objects may find themselves in a
perfectly reachable position with their finalizer called -- but they
did go through a "near-death experience". I don't find this
objectionable, and I don't see how Java could possibly do better for
cycles with multiple finalizers.

Now let's look again at the rule that an object's finalizer will be
called at most once automatically by the garbage collector. The
transitions between the colums of the Java diagram enforce this: the
columns are labeled from left to right with unfinalized, finalizable,
and finalized, and there are no transition arrows pointing left. (In
my description above, already finalized objects are considered not to
have a finalizer.) I think this rule makes a lot of sense given Java's
multi-threaded garbage collection: the invocation of finalizers could
run concurreltly with another garbage collection, and we don't want
this to find some of the same finalizable objects and call their
finalizers again!

We could mark them with a "finalization in progress" flag only while
their finalizer is running, but in a cycle with multiple finalizers it
seems we should keep this flag set until *all* finalizers for objects
in the cycle have run. But we don't actually know exactly what the
cycles are: all we know is "these objects are involved in trash
cycles". More detailed knowledge would require yet another sweep, plus
a more hairy two-dimensional data structure (a list of separate
cycles).  And for what? as soon as we run finalizers from two separate
cycles, those cycles could be merged again (e.g. the first finalizer
could resurrect its cycle, and the second one could link to it). Now
we have a pool of objects that are marked "finalization in progress"
until all their finalizations terminate. For an incremental concurrent
garbage collector, this seems a pain, since it may continue to find
new finalizable objects and add them to the pile. Java takes the
logical conclusion: the "finalization in progress" flag is never
cleared -- and renamed to "finalized".

Conclusion
----------

Are the Java rules complex? Yes. Are there better rules possible? I'm
not so sure, given the requirement of allowing concurrent incremental
garbage collection algorithms that haven't even been invented
yet. (Plus the implied requirement that finalizers in trash cycles
should be invoked.) Are the Java rules difficult for the user? Only
for users who think they can trick finalizers into doing things for
them that they were not designed to do. I would think the following
guidelines should do nicely for the rest of us:

1. Avoid finalizers if you can; use them only to release *external*
(e.g. OS) resources.

2. Write your finalizer as robust as you can, with as little use of
other objects as you can.

3. Your only get one chance. Use it.

Unlike Scheme guardians or the proposed __cleanup__ mechanism, you
don't have to know whether your object is involved in a cycle -- your
finalizer will still be called.

I am reconsidering to use the __del__ method as the finalizer. As a
compromise to those who want their __del__ to run whenever the
reference count reaches zero, the finalized flag can be cleared
explicitly. I am considering to use the following implementation:
after retrieving the __del__ method, but before calling it,
self.__del__ is set to None (better, self.__dict__['__del__'] = None,
to avoid confusing __setattr__ hooks). The object call remove
self.__del__ to clear the finalized flag. I think I'll use the same
mechanism to prevent __del__ from being called upon a failed
initialization.

Final note: the semantics "__del__ is called whenever the reference
count reaches zero" cannot be defended in the light of a migration to
different forms of garbage collection (e.g. JPython).  There may not
be a reference count.

--Guido van Rossum (home page: http://www.python.org/~guido/)

____
(*) Footnote: there's one complication: to ask a Python class instance
if it has a finalizer, we have to use PyObject_Getattr(obj, ...). If
the object's class has a __getattr__ hook, this can invoke arbitrary
Python code -- even if the answer to the question is "no"! This can
make the object reachable again (in the Java diagram, arrows pointing
up or up and right). We could either use instance_getattr1(), which
avoids the __getattr__ hook, or mark all class instances as
finalizable until proven innocent.


From gward@cnri.reston.va.us  Tue Mar  7 14:04:30 2000
From: gward@cnri.reston.va.us (Greg Ward)
Date: Tue, 7 Mar 2000 09:04:30 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <200003051204.HAA05367@eric.cnri.reston.va.us>; from guido@python.org on Sun, Mar 05, 2000 at 07:04:56AM -0500
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> <14529.55983.263225.691427@weyr.cnri.reston.va.us> <200003051204.HAA05367@eric.cnri.reston.va.us>
Message-ID: <20000307090430.A16948@cnri.reston.va.us>

On 05 March 2000, Guido van Rossum said:
> - Variants on the syntax could be given through some kind of option
> system rather than through subclassing -- they should be combinable
> independently.  Som possible options (maybe I'm going overboard here)
> could be:
> 
> 	- comment characters: ('#', ';', both, others?)
> 	- comments after variables allowed? on sections?
> 	- variable characters: (':', '=', both, others?)
> 	- quoting of values with "..." allowed?
> 	- backslashes in "..." allowed?
> 	- does backslash-newline mean a continuation?
> 	- case sensitivity for section names (default on)
> 	- case sensitivity for option names (default off)
> 	- variables allowed before first section name?
> 	- first section name?  (default "main")
> 	- character set allowed in section names
> 	- character set allowed in variable names
> 	- %(...) substitution?

I agree with Fred that this level of flexibility is probably overkill
for a config file parser; you don't want every application author who
uses the module to have to explain his particular variant of the syntax.

However, if you're interested in a class that *does* provide some of the
above flexibility, I have written such a beast.  It's currently used to
parse the Distutils MANIFEST.in file, and I've considered using it for
the mythical Distutils config files.  (And it also gets heavy use in my
day job.)  It's really a class for reading a file in preparation for
"text processing the Unix way", though: it doesn't say anything about
syntax, it just worries about blank lines, comments, continuations, and
a few other things.  Here's the class docstring:

class TextFile:

    """Provides a file-like object that takes care of all the things you
       commonly want to do when processing a text file that has some
       line-by-line syntax: strip comments (as long as "#" is your comment
       character), skip blank lines, join adjacent lines by escaping the
       newline (ie. backslash at end of line), strip leading and/or
       trailing whitespace, and collapse internal whitespace.  All of these
       are optional and independently controllable.

       Provides a 'warn()' method so you can generate warning messages that
       report physical line number, even if the logical line in question
       spans multiple physical lines.  Also provides 'unreadline()' for
       implementing line-at-a-time lookahead.

       Constructor is called as:

           TextFile (filename=None, file=None, **options)

       It bombs (RuntimeError) if both 'filename' and 'file' are None;
       'filename' should be a string, and 'file' a file object (or
       something that provides 'readline()' and 'close()' methods).  It is
       recommended that you supply at least 'filename', so that TextFile
       can include it in warning messages.  If 'file' is not supplied,
       TextFile creates its own using the 'open()' builtin.

       The options are all boolean, and affect the value returned by
       'readline()':
         strip_comments [default: true]
           strip from "#" to end-of-line, as well as any whitespace
           leading up to the "#" -- unless it is escaped by a backslash
         lstrip_ws [default: false]
           strip leading whitespace from each line before returning it
         rstrip_ws [default: true]
           strip trailing whitespace (including line terminator!) from
           each line before returning it
         skip_blanks [default: true}
           skip lines that are empty *after* stripping comments and
           whitespace.  (If both lstrip_ws and rstrip_ws are true,
           then some lines may consist of solely whitespace: these will
           *not* be skipped, even if 'skip_blanks' is true.)
         join_lines [default: false]
           if a backslash is the last non-newline character on a line
           after stripping comments and whitespace, join the following line
           to it to form one "logical line"; if N consecutive lines end
           with a backslash, then N+1 physical lines will be joined to
           form one logical line.
         collapse_ws [default: false]  
           after stripping comments and whitespace and joining physical
           lines into logical lines, all internal whitespace (strings of
           whitespace surrounded by non-whitespace characters, and not at
           the beginning or end of the logical line) will be collapsed
           to a single space.

       Note that since 'rstrip_ws' can strip the trailing newline, the
       semantics of 'readline()' must differ from those of the builtin file
       object's 'readline()' method!  In particular, 'readline()' returns
       None for end-of-file: an empty string might just be a blank line (or
       an all-whitespace line), if 'rstrip_ws' is true but 'skip_blanks' is
       not."""

Interested in having something like this in the core?  Adding more
options is possible, but the code is already on the hairy side to
support all of these.  And I'm not a big fan of the subtle difference in
semantics with file objects, but honestly couldn't think of a better way
at the time.

If you're interested, you can download it from

    http://www.mems-exchange.org/exchange/software/python/text_file/

or just use the version in the Distutils CVS tree.

        Greg


From mal@lemburg.com  Tue Mar  7 14:38:09 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 15:38:09 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com>
Message-ID: <38C51451.D38B21FE@lemburg.com>

Fredrik Lundh wrote:
> 
> > Unicode objects      string objects
> > expandtabs
> 
> yes.
> 
> I'm pretty sure there's "expandtabs" code in the
> strop module.  maybe barry missed it?
> 
> > center
> > ljust
> > rjust
> 
> probably.
> 
> the implementation is trivial, and ljust/rjust are
> somewhat useful, so you might as well add them
> all (just cut and paste from the unicode class).
> 
> what about rguido and lguido, btw?

Ooops, forgot those, thanks :-)
 
> > zfill
> 
> no.

Why not ?

Since the string implementation had all of the above
marked as TBD, I added all four.

What about the other new methods (.isXXX() and .splitlines()) ?

.isXXX() are mostly needed due to the extended character
properties in Unicode. They would be new to the string object
world.

.splitlines() is Unicode aware and also treats CR/LF
combinations across platforms:

S.splitlines([maxsplit]]) -> list of strings

Return a list of the lines in S, breaking at line boundaries.
If maxsplit is given, at most maxsplit are done. Line breaks are not
included in the resulting list.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido@python.org  Tue Mar  7 15:38:18 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 07 Mar 2000 10:38:18 -0500
Subject: [Python-Dev] Adding Unicode methods to string objects
In-Reply-To: Your message of "Tue, 07 Mar 2000 15:38:09 +0100."
 <38C51451.D38B21FE@lemburg.com>
References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com>
 <38C51451.D38B21FE@lemburg.com>
Message-ID: <200003071538.KAA13977@eric.cnri.reston.va.us>

> > > zfill
> > 
> > no.
> 
> Why not ?

Zfill is (or ought to be) deprecated.  It stems from times before we
had things like "%08d" % x and no longer serves a useful purpose.
I doubt anyone would miss it.

(Of course, now /F will claim that PIL will break in 27 places because
of this. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one@email.msn.com  Tue Mar  7 17:07:40 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Tue, 7 Mar 2000 12:07:40 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <200003071352.IAA13571@eric.cnri.reston.va.us>
Message-ID: <000701bf8857$a56ed660$a72d153f@tim>

[Guido]
> ...
> Conclusion
> ----------
>
> Are the Java rules complex? Yes. Are there better rules possible? I'm
> not so sure, given the requirement of allowing concurrent incremental
> garbage collection algorithms that haven't even been invented
> yet.

Guy Steele worked his ass off on Java's rules.  He had as much real-world
experience with implementing GC as anyone, via his long & deep Lisp
implementation background (both SW & HW), and indeed invented several key
techniques in high-performance GC.  But he had no background in GC with
user-defined finalizers -- and it shows!

> (Plus the implied requirement that finalizers in trash cycles
> should be invoked.) Are the Java rules difficult for the user? Only
> for users who think they can trick finalizers into doing things for
> them that they were not designed to do.

This is so implementation-centric it's hard to know what to say <0.5 wink>.
The Java rules weren't designed to do much of anything except guarantee that
Java (1) would eventually reclaim all unreachable objects, and (2) wouldn't
expose dangling pointers to user finalizers, or chase any itself.  Whatever
*useful* finalizer semantics may remain are those that just happened to
survive.

> ...
> Unlike Scheme guardians or the proposed __cleanup__ mechanism, you
> don't have to know whether your object is involved in a cycle -- your
> finalizer will still be called.

This is like saying a user doesn't have to know whether the new drug
prescribed for them by their doctor has potentially fatal side effects --
they'll be forced to take it regardless <wink>.

> ...
> Final note: the semantics "__del__ is called whenever the reference
> count reaches zero" cannot be defended in the light of a migration to
> different forms of garbage collection (e.g. JPython).  There may not
> be a reference count.

1. I don't know why JPython doesn't execute __del__ methods at all now, but
have to suspect that the Java rules imply an implementation so grossly
inefficient in the presence of __del__ that Barry simply doesn't want to
endure the speed complaints.  The Java spec itself urges implementations to
special-case the snot out of classes that don't  override the default
do-nothing finalizer, for "go fast" reasons too.

2. The "refcount reaches 0" rule in CPython is merely a wonderfully concrete
way to get across the idea of "destruction occurs in an order consistent
with a topological sort of the points-to graph".  The latter is explicit in
the BDW collector, which has no refcounts; the topsort concept is applicable
and thoroughly natural in all languages; refcounts in CPython give an
exploitable hint about *when* collection will occur, but add no purely
semantic constraint beyond the topsort requirement (they neatly *imply* the
topsort requirement).  There is no topsort in the presence of cycles, so
cycles create problems in all languages.  The same "throw 'em back at the
user" approach makes just as much sense from the topsort view as the RC
view; it doesn't rely on RC at all.

stop-the-insanity<wink>-ly y'rs  - tim


From guido@python.org  Tue Mar  7 17:33:31 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 07 Mar 2000 12:33:31 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Tue, 07 Mar 2000 12:07:40 EST."
 <000701bf8857$a56ed660$a72d153f@tim>
References: <000701bf8857$a56ed660$a72d153f@tim>
Message-ID: <200003071733.MAA14926@eric.cnri.reston.va.us>

[Tim tells Guido again that he finds the Java rules bad, slinging some
mud at Guy Steel, but without explaining what the problem with them
is, and then asks:]

> 1. I don't know why JPython doesn't execute __del__ methods at all now, but
> have to suspect that the Java rules imply an implementation so grossly
> inefficient in the presence of __del__ that Barry simply doesn't want to
> endure the speed complaints.  The Java spec itself urges implementations to
> special-case the snot out of classes that don't  override the default
> do-nothing finalizer, for "go fast" reasons too.

Something like that, yes, although it was Jim Hugunin.  I have a
feeling it has to do with the dynamic of __del__ -- this would imply
that *all* Python class instances would appear to Java to have a
finalizer -- just in most cases it would do a failing lookup of
__del__ and bail out quickly.  Maybe some source code or class
analysis looking for a __del__ could fix this, at the cost of not
allowing one to patch __del__ into an existing class after instances
have already been created.  I don't find that breach of dynamicism a
big deal -- e.g. CPython keeps copies of __getattr__, __setattr__ and
__delattr__ in the class for similar reasons.

> 2. The "refcount reaches 0" rule in CPython is merely a wonderfully concrete
> way to get across the idea of "destruction occurs in an order consistent
> with a topological sort of the points-to graph".  The latter is explicit in
> the BDW collector, which has no refcounts; the topsort concept is applicable
> and thoroughly natural in all languages; refcounts in CPython give an
> exploitable hint about *when* collection will occur, but add no purely
> semantic constraint beyond the topsort requirement (they neatly *imply* the
> topsort requirement).  There is no topsort in the presence of cycles, so
> cycles create problems in all languages.  The same "throw 'em back at the
> user" approach makes just as much sense from the topsort view as the RC
> view; it doesn't rely on RC at all.

Indeed.  I propose to throw it back at the user by calling __del__.

The typical user defines __del__ because they want to close a file,
say goodbye nicely on a socket connection, or delete a temp file.
That sort of thing.  This is what finalizers are *for*.  As an author
of this kind of finalizer, I don't see why I need to know whether I'm
involved in a cycle or not.  I want my finalizer called when my object
goes away, and I don't want my object kept alive by unreachable
cycles.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Tue Mar  7 17:39:15 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 18:39:15 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> <38C51451.D38B21FE@lemburg.com>
Message-ID: <38C53EC3.5292ECF@lemburg.com>

I've ported most of the Unicode methods to strings now.
Here's the new table:

Unicode objects      string objects
------------------------------------------------------------
capitalize           capitalize
center               center
count                count
encode              
endswith             endswith
expandtabs           expandtabs
find                 find
index                index
isdecimal           
isdigit              isdigit
islower              islower
isnumeric           
isspace              isspace
istitle              istitle
isupper              isupper
join                 join
ljust                ljust
lower                lower
lstrip               lstrip
replace              replace
rfind                rfind
rindex               rindex
rjust                rjust
rstrip               rstrip
split                split
splitlines           splitlines
startswith           startswith
strip                strip
swapcase             swapcase
title                title
translate            translate
upper                upper
zfill                zfill

I don't think that .isdecimal() and .isnumeric() are
needed for strings since most of the added mappings
refer to Unicode char points.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Tue Mar  7 17:42:53 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 18:42:53 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com>
 <38C51451.D38B21FE@lemburg.com> <200003071538.KAA13977@eric.cnri.reston.va.us>
Message-ID: <38C53F9D.44C3A0F3@lemburg.com>

Guido van Rossum wrote:
> 
> > > > zfill
> > >
> > > no.
> >
> > Why not ?
> 
> Zfill is (or ought to be) deprecated.  It stems from times before we
> had things like "%08d" % x and no longer serves a useful purpose.
> I doubt anyone would miss it.
> 
> (Of course, now /F will claim that PIL will break in 27 places because
> of this. :-)

Ok, I'll remove it from both implementations again... (there
was some email overlap).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From bwarsaw@cnri.reston.va.us  Tue Mar  7 19:24:39 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 7 Mar 2000 14:24:39 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <200003071352.IAA13571@eric.cnri.reston.va.us>
 <000701bf8857$a56ed660$a72d153f@tim>
Message-ID: <14533.22391.447739.901802@anthem.cnri.reston.va.us>

>>>>> "TP" =3D=3D Tim Peters <tim_one@email.msn.com> writes:

    TP> 1. I don't know why JPython doesn't execute __del__ methods at
    TP> all now, but have to suspect that the Java rules imply an
    TP> implementation so grossly inefficient in the presence of
    TP> __del__ that Barry simply doesn't want to endure the speed
    TP> complaints.

Actually, it was JimH that discovered this performance gotcha.  The
problem is that if you want to support __del__, you've got to take the
finalize() hit for every instance (i.e. PyInstance object) and it's
just not worth it.

<doing!> I just realized that it would be relatively trivial to add a
subclass of PyInstance differing only in that it has a finalize()
method which would invoke __del__().  Now when the class gets defined,
the __del__() would be mined and cached and we'd look at that cache
when creating an instance.  If there's a function there, we create a
PyFinalizableInstance, otherwise we create a PyInstance.  The cache
means you couldn't dynamically add a __del__ later, but I don't think
that's a big deal.  It wouldn't be hard to look up the __del__ every
time, but that'd be a hit for every instance creation (as opposed to
class creation), so again, it's probably not worth it.

I just did a quick and dirty hack and it seems at first blush to
work.  I'm sure there's something I'm missing :).

For those of you who don't care about JPython, you can skip the rest.

Okay, first the Python script to exercise this, then the
PyFinalizableInstance.java file, and then the diffs to PyClass.java.

JPython-devers, is it worth adding this?

-------------------- snip snip --------------------del.py
class B:
    def __del__(self):
        print 'In my __del__'

b =3D B()
del b

from java.lang import System
System.gc()
-------------------- snip snip --------------------PyFinalizableInstanc=
e.java
// Copyright =A9 Corporation for National Research Initiatives

// These are just like normal instances, except that their classes incl=
uded
// a definition for __del__(), i.e. Python's finalizer.  These two inst=
ance
// types have to be separated due to Java performance issues.

package org.python.core;

public class PyFinalizableInstance extends PyInstance=20
{
    public PyFinalizableInstance(PyClass iclass) {
        super(iclass);
    }

    // __del__ method is invoked upon object finalization.
    protected void finalize() {
        __class__.__del__.__call__(this);
    }
}
-------------------- snip snip --------------------
Index: PyClass.java
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /projects/cvsroot/jpython/dist/org/python/core/PyClass.java,v=

retrieving revision 2.8
diff -c -r2.8 PyClass.java
*** PyClass.java=091999/10/04 20:44:28=092.8
--- PyClass.java=092000/03/07 19:02:29
***************
*** 21,27 ****
         =20
      // Store these methods for performance optimization
      // These are only used by PyInstance
!     PyObject __getattr__, __setattr__, __delattr__, __tojava__;
 =20
      // Holds the classes for which this is a proxy
      // Only used when subclassing from a Java class
--- 21,27 ----
         =20
      // Store these methods for performance optimization
      // These are only used by PyInstance
!     PyObject __getattr__, __setattr__, __delattr__, __tojava__, __del=
__;
 =20
      // Holds the classes for which this is a proxy
      // Only used when subclassing from a Java class
***************
*** 111,116 ****
--- 111,117 ----
          __setattr__ =3D lookup("__setattr__", false);
          __delattr__ =3D lookup("__delattr__", false);
          __tojava__ =3D lookup("__tojava__", false);
+         __del__ =3D lookup("__del__", false);
      }
         =20
      protected void findModule(PyObject dict) {
***************
*** 182,188 ****
      }
 =20
      public PyObject __call__(PyObject[] args, String[] keywords) {
!         PyInstance inst =3D new PyInstance(this);
          inst.__init__(args, keywords);
          return inst;
      }
--- 183,194 ----
      }
 =20
      public PyObject __call__(PyObject[] args, String[] keywords) {
!         PyInstance inst;
!         if (__del__ =3D=3D null)
!             inst =3D new PyInstance(this);
!         else
!             // the class defined an __del__ method
!             inst =3D new PyFinalizableInstance(this);
          inst.__init__(args, keywords);
          return inst;
      }


From bwarsaw@cnri.reston.va.us  Tue Mar  7 19:35:44 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 7 Mar 2000 14:35:44 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <000701bf8857$a56ed660$a72d153f@tim>
 <200003071733.MAA14926@eric.cnri.reston.va.us>
Message-ID: <14533.23056.517661.633574@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido@python.org> writes:

    GvR> Maybe some source code or class analysis looking for a
    GvR> __del__ could fix this, at the cost of not allowing one to
    GvR> patch __del__ into an existing class after instances have
    GvR> already been created.  I don't find that breach of dynamicism
    GvR> a big deal -- e.g. CPython keeps copies of __getattr__,
    GvR> __setattr__ and __delattr__ in the class for similar reasons.

For those of you who enter the "Being Guido van Rossum" door like I
just did, please keep in mind that it dumps you out not on the NJ
Turnpike, but in the little ditch back behind CNRI.  Stop by and say
hi after you brush yourself off.

-Barry


From Tim_Peters@Dragonsys.com  Tue Mar  7 22:30:16 2000
From: Tim_Peters@Dragonsys.com (Tim_Peters@Dragonsys.com)
Date: Tue, 7 Mar 2000 17:30:16 -0500
Subject: [Python-Dev] finalization again
Message-ID: <8525689B.007AB2BA.00@notes-mta.dragonsys.com>

[Guido]
> Tim tells Guido again that he finds the Java rules bad, slinging some
> mud at Guy Steele, but without explaining what the problem with them
> is ...

Slinging mud?  Let's back off here.  You've read the Java spec and were
impressed.  That's fine -- it is impressive <wink>.  But go on from
there and see where it leads in practice.  That Java's GC model did a
masterful job but includes a finalization model users dislike is really
just conventional wisdom in the Java world.  My sketch of Guy Steele's
involvement was an attempt to explain why both halves of that are valid.

I didn't think "explaining the problem" was necessary, as it's been
covered in depth multiple times in c.l.py threads, by Java programmers
as well as by me.  Searching the web for articles about this turns up
many; the first one I hit is typical:

    http://www.quoininc.com/quoininc/Design_Java0197.html

eventually concludes

    Consequently we recommend that [Java] programmers support but do
    not rely on finalization. That is, place all finalization semantics
    in finalize() methods, but call those methods explicitly and in the
    order required.  The points below provide more detail.

That's par for the Java course:  advice to write finalizers to survive
being called multiple times, call them explicitly, and do all you can
to ensure that the "by magic" call is a nop.  The lack of ordering
rules in the language forces people to "do it by hand" (as the Java
spec acknowledges: "It is straightforward to implement a Java class
that will cause a set of finalizer-like methods to be invoked in a
specified order for a set of objects when all the objects become
unreachable. Defining such a class is left as an exercise for the
reader."  But from what I've seen, that exercise is beyond the
imagination of most Java programmers!  The perceived need for ordering
is not.).

It's fine that you want to restrict finalizers to "simple" cases; it's
not so fine if the language can't ensure that simple cases are the only
ones the user can write, & can neither detect & complain at runtime
about cases it didn't intend to support.  The Java spec is unhelpful
here too:

   Therefore, we recommend that the design of finalize methods be kept
   simple and that they be programmed defensively, so that they will
   work in all cases.

Mom and apple pie, but what does it mean, exactly?  The spec realizes
that you're going to be tempted to try things that won't work, but
can't really explain what those are in terms simpler than the full set
of implementation consequences.  As a result, users hate it -- but
don't take my word for that!  If you look & don't find that Java's
finalization rules are widely viewed as "a problem to be wormed around"
by serious Java programmers, fine -- then you've got a much better
search engine than mine <wink>.

As for why I claim following topsort rules is very likely to work out
better, they follow from the nature of the problem, and can be
explained as such, independent of implementation details.  See the
Boehm reference for more about topsort.

will-personally-use-python-regardless-ly y'rs  - tim


From guido@python.org  Wed Mar  8 00:50:38 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 07 Mar 2000 19:50:38 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Tue, 07 Mar 2000 17:30:16 EST."
 <8525689B.007AB2BA.00@notes-mta.dragonsys.com>
References: <8525689B.007AB2BA.00@notes-mta.dragonsys.com>
Message-ID: <200003080050.TAA19264@eric.cnri.reston.va.us>

> [Guido]
> > Tim tells Guido again that he finds the Java rules bad, slinging some
> > mud at Guy Steele, but without explaining what the problem with them
> > is ...
> 
> Slinging mud?  Let's back off here.  You've read the Java spec and were
> impressed.  That's fine -- it is impressive <wink>.  But go on from
> there and see where it leads in practice.  That Java's GC model did a
> masterful job but includes a finalization model users dislike is really
> just conventional wisdom in the Java world.  My sketch of Guy Steele's
> involvement was an attempt to explain why both halves of that are valid.

Granted.  I can read Java code and sometimes I write some, but I'm not
a Java programmer by any measure, and I wasn't aware that finalize()
has a general bad rep.

> I didn't think "explaining the problem" was necessary, as it's been
> covered in depth multiple times in c.l.py threads, by Java programmers
> as well as by me.  Searching the web for articles about this turns up
> many; the first one I hit is typical:
> 
>     http://www.quoininc.com/quoininc/Design_Java0197.html
> 
> eventually concludes
> 
>     Consequently we recommend that [Java] programmers support but do
>     not rely on finalization. That is, place all finalization semantics
>     in finalize() methods, but call those methods explicitly and in the
>     order required.  The points below provide more detail.
> 
> That's par for the Java course:  advice to write finalizers to survive
> being called multiple times, call them explicitly, and do all you can
> to ensure that the "by magic" call is a nop.

It seems the authors make one big mistake: they recommend to call
finalize() explicitly.  This may be par for the Java course: the
quality of the materials is often poor, and that has to be taken into
account when certain features have gotten a bad rep.  (These authors
also go on at length about the problems of GC in a real-time situation
-- attempts to use Java in sutations for which it is inappropriate are
also par for the cours, inspired by all the hype.)

Note that e.g. Bruce Eckel in "Thinking in Java" makes it clear that
you should never call finalize() explicitly (except that you should
always call super.fuinalize() in your finalize() method).  (Bruce goes
on at length explaining that there aren't a lot of things you should
use finalize() for -- except to observe the garbage collector. :-)

> The lack of ordering
> rules in the language forces people to "do it by hand" (as the Java
> spec acknowledges: "It is straightforward to implement a Java class
> that will cause a set of finalizer-like methods to be invoked in a
> specified order for a set of objects when all the objects become
> unreachable. Defining such a class is left as an exercise for the
> reader."  But from what I've seen, that exercise is beyond the
> imagination of most Java programmers!  The perceived need for ordering
> is not.).

True, but note that Python won't have the ordering problem, at least
not as long as we stick to reference counting as the primary means of
GC.  The ordering problem in Python will only happen when there are
cycles, and there you really can't blame the poor GC design!

> It's fine that you want to restrict finalizers to "simple" cases; it's
> not so fine if the language can't ensure that simple cases are the only
> ones the user can write, & can neither detect & complain at runtime
> about cases it didn't intend to support.  The Java spec is unhelpful
> here too:
> 
>    Therefore, we recommend that the design of finalize methods be kept
>    simple and that they be programmed defensively, so that they will
>    work in all cases.
> 
> Mom and apple pie, but what does it mean, exactly?  The spec realizes
> that you're going to be tempted to try things that won't work, but
> can't really explain what those are in terms simpler than the full set
> of implementation consequences.  As a result, users hate it -- but
> don't take my word for that!  If you look & don't find that Java's
> finalization rules are widely viewed as "a problem to be wormed around"
> by serious Java programmers, fine -- then you've got a much better
> search engine than mine <wink>.

Hm.  Of course programmers hate finalizers.  They hate GC as well.
But they hate even more not to have it (witness the relentless
complaints about Python's "lack of GC" -- and Java's GC is often
touted as one of the reasons for its superiority over C++).

I think this stuff is just hard!  (Otherwise why would we be here
having this argument?)

> As for why I claim following topsort rules is very likely to work out
> better, they follow from the nature of the problem, and can be
> explained as such, independent of implementation details.  See the
> Boehm reference for more about topsort.

Maybe we have a disconnect?  We *are* using topsort -- for
non-cyclical data structures.  Reference counting ensure that.
Nothing in my design changes that.  The issue at hand is what to do
with *cyclical* data structures, where topsort doesn't help.  Boehm,
on http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html,
says: "Cycles involving one or more finalizable objects are never
finalized."

The question remains, what to do with trash cycles?  I find having a
separate __cleanup__ protocol cumbersome.  I think that the "finalizer
only called once by magic" rule is reasonable.  I believe that the
ordering problems will be much less than in Java, because we use
topsort whenever we can.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one@email.msn.com  Wed Mar  8 06:25:56 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 8 Mar 2000 01:25:56 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <200003080050.TAA19264@eric.cnri.reston.va.us>
Message-ID: <001401bf88c7$29f2a320$452d153f@tim>

[Guido]
> Granted.  I can read Java code and sometimes I write some, but I'm not
> a Java programmer by any measure, and I wasn't aware that finalize()
> has a general bad rep.

It does, albeit often for bad reasons.

1. C++ programmers seeking to emulate techniques based on C++'s
   rigid specification of the order and timing of destruction of autos.

2. People pushing the limits (as in the URL I happened to post).

3. People trying to do anything <wink>.  Java's finalization semantics
   are very weak, and s-l-o-w too (under most current implementations).

Now I haven't used Java for real in about two years, and avoided finalizers
completely when I did use it.  I can't recall any essential use of __del__ I
make in Python code, either.  So what Python does here makes no personal
difference to me.  However, I frequently respond to complaints & questions
on c.l.py, and don't want to get stuck trying to justify Java's uniquely
baroque rules outside of comp.lang.java <0.9 wink>.

>> [Tim, passes on the first relevant URL he finds:
>>  http://www.quoininc.com/quoininc/Design_Java0197.html]

> It seems the authors make one big mistake: they recommend to call
> finalize() explicitly.  This may be par for the Java course: the
> quality of the materials is often poor, and that has to be taken into
> account when certain features have gotten a bad rep.

Well, in the "The Java Programming Language", Gosling recommends to:

a) Add a method called close(), that tolerates being called multiple
   times.

b) Write a finalize() method whose body calls close().

People tended to do that at first, but used a bunch of names other than
"close" too.  I guess people eventually got weary of having two methods that
did the same thing, so decided to just use the single name Java guaranteed
would make sense.

> (These authors also go on at length about the problems of GC in a real-
> time situation -- attempts to use Java in sutations for which it is
> inappropriate are also par for the course, inspired by all the hype.)

I could have picked any number of other URLs, but don't regret picking this
one:  you can't judge a ship in smooth waters, and people will push *all*
features beyond their original intents.  Doing so exposes weaknesses.
Besides, Sun won't come out & say Java is unsuitable for real-time, no
matter how obvious it is <wink>.

> Note that e.g. Bruce Eckel in "Thinking in Java" makes it clear that
> you should never call finalize() explicitly (except that you should
> always call super.fuinalize() in your finalize() method).

You'll find lots of conflicting advice here, be it about Java or C++.  Java
may be unique, though, in the universality of the conclusion Bruce draws
here:

> (Bruce goes on at length explaining that there aren't a lot of things
> you should use finalize() for -- except to observe the garbage collector.
:-)

Frankly, I think Java would be better off without finalizers.  Python could
do fine without __del__ too -- if you and I were the only users <0.6 wink>.

[on Java's lack of ordering promises]
> True, but note that Python won't have the ordering problem, at least
> not as long as we stick to reference counting as the primary means of
> GC.  The ordering problem in Python will only happen when there are
> cycles, and there you really can't blame the poor GC design!

I cannot.  Nor do I intend to.  The cyclic ordering problem isn't GC's
fault, it's the program's; but GC's *response* to it is entirely GC's
responsibility.

>> ... The Java spec is unhelpful here too:
>>
>>  Therefore, we recommend that the design of finalize methods be kept
>>  simple and that they be programmed defensively, so that they will
>>  work in all cases.
>>
>> Mom and apple pie, but what does it mean, exactly?  The spec realizes
>> that you're going to be tempted to try things that won't work, but
>> can't really explain what those are in terms simpler than the full set
>> of implementation consequences.  As a result, users hate it -- but
>> don't take my word for that!  If you look & don't find that Java's
>> finalization rules are widely viewed as "a problem to be wormed around"
>> by serious Java programmers, fine -- then you've got a much better
>> search engine than mine <wink>.

> Hm.  Of course programmers hate finalizers.

Oh no!  C++ programmers *love* destructors!  I mean it, they're absolutely
gaga over them.  I haven't detected signs that CPython programmers hate
__del__ either, except at shutdown time.  Regardless of language, they love
them when they're predictable and work as expected, they hate them when
they're unpredictable and confusing.  C++ auto destructors are extremely
predictable (e.g., after "{SomeClass a, b; ...}", b is destructed before a,
and both destructions are guaranteed before leaving the block they're
declared in, regardless of whether via return, exception, goto or falling
off the end).  CPython's __del__ is largely predictable (modulo shutdown,
cycles, and sometimes exceptions).  The unhappiness in the Java world comes
from Java finalizers' unpredictability and consequent all-around uselessness
in messy real life.

> They hate GC as well.

Yes, when it's unpredictable and confusing <wink>.

> But they hate even more not to have it (witness the relentless
> complaints about Python's "lack of GC" -- and Java's GC is often
> touted as one of the reasons for its superiority over C++).

Back when JimF & I were looking at gc, we may have talked each other into
really believing that paying careful attention to RC issues leads to cleaner
and more robust designs.  In fact, I still believe that, and have never
clamored for "real gc" in Python.  Jim now may even be opposed to "real gc".
But Jim and I and you all think a lot about the art of programming, and most
users just don't have time or inclination for that -- the slowly changing
nature of c.l.py is also clear evidence of this.  I'm afraid this makes
growing "real GC" a genuine necessity for Python's continued growth.  It's
not a *bad* thing in any case.  Think of it as a marketing requirement <0.7
wink>.

> I think this stuff is just hard!  (Otherwise why would we be here
> having this argument?)

Honest to Guido, I think it's because you're sorely tempted to go down an
un-Pythonic path here, and I'm fighting that.  I said early on there are no
thoroughly good answers (yes, it's hard), but that's nothing new for Python!
We're having this argument solely because you're confusing Python with some
other language <wink>.

[a 2nd or 3rd plug for taking topsort seriously]
> Maybe we have a disconnect?

Not in the technical analysis, but in what conclusions to take from it.

> We *are* using topsort -- for non-cyclical data structures.  Reference
> counting ensure that. Nothing in my design changes that.

And it's great!  Everyone understands the RC rules pretty quickly, lots of
people like them a whole lot, and if it weren't for cyclic trash everything
would be peachy.

> The issue at hand is what to do with *cyclical* data structures, where
> topsort doesn't help.  Boehm, on
> http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html,
> says: "Cycles involving one or more finalizable objects are never
> finalized."

This is like some weird echo chamber, where the third time I shout something
the first one comes back without any distortion at all <wink>.  Yes, Boehm's
first rule is "Do No Harm".  It's a great rule.  Python follows the same
rule all over the place; e.g., when you see

    x = "4" + 2

you can't possibly know what was intended, so you refuse to guess:  you
would rather *kill* the program than make a blind guess!  I see cycles with
finalizers as much the same:  it's plain wrong to guess when you can't
possibly know what was intended.  Because topsort is the only principled way
to decide order of finalization, and they've *created* a situation where a
topsort doesn't exist, what they're handing you is no less amibiguous than
in trying to add a string to an int.  This isn't the time to abandon topsort
as inconvenient, it's the time to defend it as inviolate principle!

The only throughly rational response is "you know, this doesn't make
sense -- since I can't know what you want here, I refuse to pretend that I
can".  Since that's "the right" response everywhere else in Python, what the
heck is so special about this case?  It's like you decided Python *had* to
allow adding strings to ints, and now we're going to argue about whether
Perl, Awk or Tcl makes the best unprincipled guess <wink>.

> The question remains, what to do with trash cycles?

A trash cycle without a finalizer isn't a problem, right?  In that case,
topsort rules have no visible consquence so it doesn't matter in what order
you merely reclaim the memory.

If it has an object with a finalizer, though, at the very worst you can let
it leak, and  make the collection of leaked objects available for
inspection.  Even that much is a *huge* "improvement" over what they have
today:  most cycles won't have a finalizer and so will get reclaimed, and
for the rest they'll finally have a simple way to identify exactly where the
problem is, and a simple criterion for predicting when it will happen.  If
that's not "good enough", then without abandoning principle the user needs
to have some way to reduce such a cycle *to* a topsort case themself.

> I find having a separate __cleanup__ protocol cumbersome.

Same here, but if you're not comfortable leaking, and you agree Python is
not in the business of guesing in inherently ambiguous situations, maybe
that's what it takes!  MAL and GregS both gravitated to this kind of thing
at once, and that's at least suggestive; and MAL has actually been using his
approach.  It's explicit, and that's Pythonic on the face of it.

> I think that the "finalizer only called once by magic" rule is reasonable.

If it weren't for its specific use in emulating Java's scheme, would you
still be in favor of that?  It's a little suspicious that it never came up
before <wink>.

> I believe that the ordering problems will be much less than in Java,
because
> we use topsort whenever we can.

No argument here, except that I believe there's never sufficient reason to
abandon topsort ordering.  Note that BDW's adamant refusal to yield on this
hasn't stopped "why doesn't Python use BDW?" from becoming a FAQ <wink>.

a-case-where-i-expect-adhering-to-principle-is-more-pragmatic-
    in-the-end-ly y'rs  - tim


From tim_one@email.msn.com  Wed Mar  8 07:48:24 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 8 Mar 2000 02:48:24 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
Message-ID: <001801bf88d2$af0037c0$452d153f@tim>

Mike has a darned good point here.  Anyone have a darned good answer <wink>?

-----Original Message-----
From: python-list-admin@python.org [mailto:python-list-admin@python.org]
On Behalf Of Mike Fletcher
Sent: Tuesday, March 07, 2000 2:08 PM
To: Python Listserv (E-mail)
Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be
adopted?

Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage
all over the hard-disk, traffic rerouted through the bit-bucket, you aren't
getting to work anytime soon Mrs. Programmer) and wondering why we have a
FAQ instead of having the win32pipe stuff rolled into the os module to fix
it.  Is there some incompatibility?  Is there a licensing problem?

Ideas?
Mike
__________________________________
 Mike C. Fletcher
 Designer, VR Plumber
 http://members.home.com/mcfletch

-- 
http://www.python.org/mailman/listinfo/python-list


From mal@lemburg.com  Wed Mar  8 08:36:57 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 09:36:57 +0100
Subject: [Python-Dev] finalization again
References: <8525689B.007AB2BA.00@notes-mta.dragonsys.com> <200003080050.TAA19264@eric.cnri.reston.va.us>
Message-ID: <38C61129.2F8C9E95@lemburg.com>

> [Guido]
> The question remains, what to do with trash cycles?  I find having a
> separate __cleanup__ protocol cumbersome.  I think that the "finalizer
> only called once by magic" rule is reasonable.  I believe that the
> ordering problems will be much less than in Java, because we use
> topsort whenever we can.

Note that the __cleanup__ protocol is intended to break cycles
*before* calling the garbage collector. After those cycles are broken,
ordering is not a problem anymore and because __cleanup__ can
do its task on a per-object basis all magic is left in the hands
of the programmer.

The __cleanup__ protocol as I use it is designed to be called
in situations where the system knows that all references into
a cycle are about to be dropped (I typically use small cyclish
object systems in my application, e.g. ones that create and
reference namespaces which include a reference to the hosting
object itself). In my application that is done by using mxProxies
at places where I know these cyclic object subsystems are being
referenced. In Python the same could be done whenever the
interpreter knows that a certain object is about to be
deleted, e.g. during shutdown (important for embedding Python
in other applications such as Apache) or some other major
subsystem finalization, e.g. unload of a module or killing
of a thread (yes, I know these are nonos, but they could
be useful, esp. the thread kill operation in multi-threaded
servers).

After __cleanup__ has done its thing, the finalizer can either
choose to leave all remaining cycles in memory (and leak) or
apply its own magic to complete the task. In any case, __del__
should be called when the refcount reaches 0. (I find it somewhat
strange that people are argueing to keep external resources
alive even though there is a chance of freeing them.)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Wed Mar  8 08:46:14 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 09:46:14 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff
 going to be adopted?
References: <001801bf88d2$af0037c0$452d153f@tim>
Message-ID: <38C61356.E0598DBF@lemburg.com>

Tim Peters wrote:
> 
> Mike has a darned good point here.  Anyone have a darned good answer <wink>?
> 
> -----Original Message-----
> From: python-list-admin@python.org [mailto:python-list-admin@python.org]
> On Behalf Of Mike Fletcher
> Sent: Tuesday, March 07, 2000 2:08 PM
> To: Python Listserv (E-mail)
> Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be
> adopted?
> 
> Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage
> all over the hard-disk, traffic rerouted through the bit-bucket, you aren't
> getting to work anytime soon Mrs. Programmer) and wondering why we have a
> FAQ instead of having the win32pipe stuff rolled into the os module to fix
> it.  Is there some incompatibility?  Is there a licensing problem?
> 
> Ideas?

I'd suggest moving the popen from the C modules into os.py
as Python API and then applying all necessary magic to either
use the win32pipe implementation (if available) or the native
C one from the posix module in os.py.

Unless, of course, the win32 stuff (or some of it) makes it into
the core.

I'm mostly interested in this for my platform.py module... 
BTW, is there any interest of moving it into the core ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido@python.org  Wed Mar  8 12:10:53 2000
From: guido@python.org (Guido van Rossum)
Date: Wed, 08 Mar 2000 07:10:53 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: Your message of "Wed, 08 Mar 2000 09:46:14 +0100."
 <38C61356.E0598DBF@lemburg.com>
References: <001801bf88d2$af0037c0$452d153f@tim>
 <38C61356.E0598DBF@lemburg.com>
Message-ID: <200003081210.HAA19931@eric.cnri.reston.va.us>

> Tim Peters wrote:
> > 
> > Mike has a darned good point here.  Anyone have a darned good answer <wink>?
> > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be
> > adopted?
> > 
> > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage
> > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't
> > getting to work anytime soon Mrs. Programmer) and wondering why we have a
> > FAQ instead of having the win32pipe stuff rolled into the os module to fix
> > it.  Is there some incompatibility?  Is there a licensing problem?

MAL:
> I'd suggest moving the popen from the C modules into os.py
> as Python API and then applying all necessary magic to either
> use the win32pipe implementation (if available) or the native
> C one from the posix module in os.py.
> 
> Unless, of course, the win32 stuff (or some of it) makes it into
> the core.

No concrete plans -- except that I think the registry access is
supposed to go in.  Haven't seen the code on patches@python.org yet
though.

> I'm mostly interested in this for my platform.py module... 
> BTW, is there any interest of moving it into the core ?

"it" == platform.py?  Little interest from me personally; I suppose it
could go in Tools/scripts/...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Mar  8 14:06:53 2000
From: guido@python.org (Guido van Rossum)
Date: Wed, 08 Mar 2000 09:06:53 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Wed, 08 Mar 2000 01:25:56 EST."
 <001401bf88c7$29f2a320$452d153f@tim>
References: <001401bf88c7$29f2a320$452d153f@tim>
Message-ID: <200003081406.JAA20033@eric.cnri.reston.va.us>

> A trash cycle without a finalizer isn't a problem, right?  In that case,
> topsort rules have no visible consquence so it doesn't matter in what order
> you merely reclaim the memory.

When we have a pile of garbage, we don't know whether it's all
connected or whether it's lots of little cycles.  So if we find
[objects with -- I'm going to omit this] finalizers, we have to put
those on a third list and put everything reachable from them on that
list as well (the algorithm I described before).

What's left on the first list then consists of finalizer-free garbage.
We dispose of this garbage by clearing dicts and lists.  Hopefully
this makes the refcount of some of the finalizers go to zero -- those
are finalized in the normal way.

And now we have to deal with the inevitable: finalizers that are part
of cycles.  It makes sense to reduce the graph of objects to a graph
of finalizers only.  Example:

  A <=> b -> C <=> d

A and C have finalizers.  C is part of a cycle (C-d) that contains no
other finalizers, but C is also reachable from A.  A is part of a
cycle (A-b) that keeps it alive.  The interesting thing here is that
if we only look at the finalizers, there are no cycles!

If we reduce the graph to only finalizers (setting aside for now the
problem of how to do that -- we may need to allocate more memory to
hold the reduced greaph), we get:

  A -> C

We can now finalize A (even though its refcount is nonzero!).  And
that's really all we can do!  A could break its own cycle, thereby
disposing of itself and b.  It could also break C's cycle, disposing
of C and d.  It could do nothing.  Or it could resurrect A, thereby
resurrecting all of A, b, C, and d.

This leads to (there's that weird echo again :-) Boehm's solution:
Call A's finalizer and leave the rest to the next time the garbage
collection runs.

Note that we're now calling finalizers on objects with a non-zero
refcount.  At some point (probably as a result of finalizing A) its
refcount will go to zero.  We should not finalize it again -- this
would serve no purpose.  Possible solution:

  INCREF(A);
  A->__del__();
  if (A->ob_refcnt == 1)
      A->__class__ = NULL; /* Make a finalizer-less */
  DECREF(A);

This avoids finalizing twice if the first finalization broke all
cycles in which A is involved.  But if it doesn't, A is still cyclical
garbage with a finalizer!  Even if it didn't resurrect itself.

Instead of the code fragment above, we could mark A as "just
finalized" and when it shows up at the head of the tree (of finalizers
in cyclical trash) again on the next garbage collection, to discard it
without calling the finalizer again (because this clearly means that
it didn't resurrect itself -- at least not for a very long time).

I would be happier if we could still have a rule that says that a
finalizer is called only once by magic -- even if we have two forms of
magic: refcount zero or root of the tree.  Tim: I don't know if you
object against this rule as a matter of principle (for the sake of
finalizers that resurrect the object) or if your objection is really
against the unordered calling of finalizers legitimized by Java's
rules.  I hope the latter, since I think it that this rule (__del__
called only once by magic) by itself is easy to understand and easy to
deal with, and I believe it may be necessary to guarantee progress for
the garbage collector.

The problem is that the collector can't easily tell whether A has
resurrected itself.  Sure, if the refcount is 1 after the finalizer
run, I know it didn't resurrect itself.  But even if it's higher than
before, that doesn't mean it's resurrected: it could have linked to
itself.  Without doing a full collection I can't tell the difference.
If I wait until a full collection happens again naturally, and look at
the "just finalized flag", I can't tell the difference between the
case whereby the object resurrected itself but died again before the
next collection, and the case where it was dead already.  So I don't
know how many times it was expecting the "last rites" to be performed,
and the object can't know whether to expect them again or not.  This
seems worse than the only-once rule to me.

Even if someone once found a good use for resurrecting inside __del__,
against all recommendations, I don't mind breaking their code, if it's
for a good cause.  The Java rules aren't a good cause.  But top-sorted
finalizer calls seem a worthy cause.

So now we get to discuss what to do with multi-finalizer cycles, like:

  A <=> b <=> C

Here the reduced graph is:

  A <=> C

About this case you say:

> If it has an object with a finalizer, though, at the very worst you can let
> it leak, and  make the collection of leaked objects available for
> inspection.  Even that much is a *huge* "improvement" over what they have
> today:  most cycles won't have a finalizer and so will get reclaimed, and
> for the rest they'll finally have a simple way to identify exactly where the
> problem is, and a simple criterion for predicting when it will happen.  If
> that's not "good enough", then without abandoning principle the user needs
> to have some way to reduce such a cycle *to* a topsort case themself.
> 
> > I find having a separate __cleanup__ protocol cumbersome.
> 
> Same here, but if you're not comfortable leaking, and you agree Python is
> not in the business of guesing in inherently ambiguous situations, maybe
> that's what it takes!  MAL and GregS both gravitated to this kind of thing
> at once, and that's at least suggestive; and MAL has actually been using his
> approach.  It's explicit, and that's Pythonic on the face of it.
> 
> > I think that the "finalizer only called once by magic" rule is reasonable.
> 
> If it weren't for its specific use in emulating Java's scheme, would you
> still be in favor of that?  It's a little suspicious that it never came up
> before <wink>.

Suspicious or not, it still comes up.  I still like it.  I still think
that playing games with resurrection is evil.  (Maybe my spiritual
beliefs shine through here -- I'm a convinced atheist. :-)

Anyway, once-only rule aside, we still need a protocol to deal with
cyclical dependencies between finalizers.  The __cleanup__ approach is
one solution, but it also has a problem: we have a set of finalizers.
Whose __cleanup__ do we call?  Any?  All?  Suggestions?

Note that I'd like some implementation freedom: I may not want to
bother with the graph reduction algorithm at first (which seems very
hairy) so I'd like to have the right to use the __cleanup__ API
as soon as I see finalizers in cyclical trash.  I don't mind disposing
of finalizer-free cycles first, but once I have more than one
finalizer left in the remaining cycles, I'd like the right not to
reduce the graph for topsort reasons -- that algorithm seems hard.

So we're back to the __cleanup__ design.  Strawman proposal: for all
finalizers in a trash cycle, call their __cleanup__ method, in
arbitrary order.  After all __cleanup__ calls are done, if the objects
haven't all disposed of themselves, they are all garbage-collected
without calling __del__.  (This seems to require another garbage
colelction cycle -- so perhaps there should also be a once-only rule
for __cleanup__?)

Separate question: what if there is no __cleanup__?  This should
probably be reported: "You have cycles with finalizers, buddy!  What
do you want to do about them?"  This same warning could be given when
there is a __cleanup__ but it doesn't break all cycles.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Wed Mar  8 13:34:06 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 14:34:06 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff
 going to be adopted?
References: <001801bf88d2$af0037c0$452d153f@tim>
 <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us>
Message-ID: <38C656CE.B0ACFF35@lemburg.com>

Guido van Rossum wrote:
> 
> > Tim Peters wrote:
> > >
> > > Mike has a darned good point here.  Anyone have a darned good answer <wink>?
> > > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be
> > > adopted?
> > >
> > > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage
> > > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't
> > > getting to work anytime soon Mrs. Programmer) and wondering why we have a
> > > FAQ instead of having the win32pipe stuff rolled into the os module to fix
> > > it.  Is there some incompatibility?  Is there a licensing problem?
> 
> MAL:
> > I'd suggest moving the popen from the C modules into os.py
> > as Python API and then applying all necessary magic to either
> > use the win32pipe implementation (if available) or the native
> > C one from the posix module in os.py.
> >
> > Unless, of course, the win32 stuff (or some of it) makes it into
> > the core.
> 
> No concrete plans -- except that I think the registry access is
> supposed to go in.  Haven't seen the code on patches@python.org yet
> though.

Ok, what about the optional "use win32pipe if available" idea then ?
 
> > I'm mostly interested in this for my platform.py module...
> > BTW, is there any interest of moving it into the core ?
> 
> "it" == platform.py? 

Right.

> Little interest from me personally; I suppose it
> could go in Tools/scripts/...

Hmm, it wouldn't help much in there I guess... after all, it defines
APIs which are to be queried by other scripts. The default
action to print the platform information to stdout is just
a useful addition.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido@python.org  Wed Mar  8 14:33:53 2000
From: guido@python.org (Guido van Rossum)
Date: Wed, 08 Mar 2000 09:33:53 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: Your message of "Wed, 08 Mar 2000 14:34:06 +0100."
 <38C656CE.B0ACFF35@lemburg.com>
References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us>
 <38C656CE.B0ACFF35@lemburg.com>
Message-ID: <200003081433.JAA20177@eric.cnri.reston.va.us>

> > MAL:
> > > I'd suggest moving the popen from the C modules into os.py
> > > as Python API and then applying all necessary magic to either
> > > use the win32pipe implementation (if available) or the native
> > > C one from the posix module in os.py.
> > >
> > > Unless, of course, the win32 stuff (or some of it) makes it into
> > > the core.
[Guido]
> > No concrete plans -- except that I think the registry access is
> > supposed to go in.  Haven't seen the code on patches@python.org yet
> > though.
> 
> Ok, what about the optional "use win32pipe if available" idea then ?

Sorry, I meant please send me the patch!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake@acm.org  Wed Mar  8 14:59:46 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 8 Mar 2000 09:59:46 -0500 (EST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us>
References: <001801bf88d2$af0037c0$452d153f@tim>
 <38C61356.E0598DBF@lemburg.com>
 <200003081210.HAA19931@eric.cnri.reston.va.us>
Message-ID: <14534.27362.139106.701784@weyr.cnri.reston.va.us>

Guido van Rossum writes:
 > "it" == platform.py?  Little interest from me personally; I suppose it
 > could go in Tools/scripts/...

  I think platform.py is pretty nifty, but I'm not entirely sure how
it's expected to be used.  Perhaps Marc-Andre could explain further
the motivation behind the module?
  My biggest requirement is that it be accompanied by documentation.
The coolness factor and shared use of hackerly knowledge would
probably get *me* to put it in, but there are a lot of things about
which I'll disagree with Guido just to hear his (well-considered)
thoughts on the matter.  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From mal@lemburg.com  Wed Mar  8 17:37:43 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 18:37:43 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 ... code for thought.
References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us>
 <38C656CE.B0ACFF35@lemburg.com> <200003081433.JAA20177@eric.cnri.reston.va.us>
Message-ID: <38C68FE7.63943C5C@lemburg.com>

Guido van Rossum wrote:
> 
> > > MAL:
> > > > I'd suggest moving the popen from the C modules into os.py
> > > > as Python API and then applying all necessary magic to either
> > > > use the win32pipe implementation (if available) or the native
> > > > C one from the posix module in os.py.
> > > >
> > > > Unless, of course, the win32 stuff (or some of it) makes it into
> > > > the core.
> [Guido]
> > > No concrete plans -- except that I think the registry access is
> > > supposed to go in.  Haven't seen the code on patches@python.org yet
> > > though.
> >
> > Ok, what about the optional "use win32pipe if available" idea then ?
> 
> Sorry, I meant please send me the patch!

Here's the popen() interface I use in platform.py. It should
serve well as basis for a os.popen patch... (don't have time
to do it myself right now):

class _popen:

    """ Fairly portable (alternative) popen implementation.

        This is mostly needed in case os.popen() is not available, or
        doesn't work as advertised, e.g. in Win9X GUI programs like
        PythonWin or IDLE.

        XXX Writing to the pipe is currently not supported.

    """
    tmpfile = ''
    pipe = None
    bufsize = None
    mode = 'r'

    def __init__(self,cmd,mode='r',bufsize=None):

        if mode != 'r':
            raise ValueError,'popen()-emulation only support read mode'
        import tempfile
        self.tmpfile = tmpfile = tempfile.mktemp()
        os.system(cmd + ' > %s' % tmpfile)
        self.pipe = open(tmpfile,'rb')
        self.bufsize = bufsize
        self.mode = mode

    def read(self):

        return self.pipe.read()

    def readlines(self):

        if self.bufsize is not None:
            return self.pipe.readlines()

    def close(self,

              remove=os.unlink,error=os.error):

        if self.pipe:
            rc = self.pipe.close()
        else:
            rc = 255
        if self.tmpfile:
            try:
                remove(self.tmpfile)
            except error:
                pass
        return rc

    # Alias
    __del__ = close

def popen(cmd, mode='r', bufsize=None):

    """ Portable popen() interface.
    """
    # Find a working popen implementation preferring win32pipe.popen
    # over os.popen over _popen
    popen = None
    if os.environ.get('OS','') == 'Windows_NT':
        # On NT win32pipe should work; on Win9x it hangs due to bugs
        # in the MS C lib (see MS KnowledgeBase article Q150956)
        try:
            import win32pipe
        except ImportError:
            pass
        else:
            popen = win32pipe.popen
    if popen is None:
        if hasattr(os,'popen'):
            popen = os.popen
            # Check whether it works... it doesn't in GUI programs
            # on Windows platforms
            if sys.platform == 'win32': # XXX Others too ?
                try:
                    popen('')
                except os.error:
                    popen = _popen
        else:
            popen = _popen
    if bufsize is None:
        return popen(cmd,mode)
    else:
        return popen(cmd,mode,bufsize)

if __name__ == '__main__':
    print """
I confirm that, to the best of my knowledge and belief, this
contribution is free of any claims of third parties under
copyright, patent or other rights or interests ("claims").  To
the extent that I have any such claims, I hereby grant to CNRI a
nonexclusive, irrevocable, royalty-free, worldwide license to
reproduce, distribute, perform and/or display publicly, prepare
derivative versions, and otherwise use this contribution as part
of the Python software and its related documentation, or any
derivative versions thereof, at no cost to CNRI or its licensed
users, and to authorize others to do so.

I acknowledge that CNRI may, at its sole discretion, decide
whether or not to incorporate this contribution in the Python
software and its related documentation.  I further grant CNRI
permission to use my name and other identifying information
provided to CNRI by me for use in connection with the Python
software and its related documentation.
"""

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Wed Mar  8 17:44:59 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 18:44:59 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff
 going to be adopted?
References: <001801bf88d2$af0037c0$452d153f@tim>
 <38C61356.E0598DBF@lemburg.com>
 <200003081210.HAA19931@eric.cnri.reston.va.us> <14534.27362.139106.701784@weyr.cnri.reston.va.us>
Message-ID: <38C6919B.EA3EE2E7@lemburg.com>

"Fred L. Drake, Jr." wrote:
> 
> Guido van Rossum writes:
>  > "it" == platform.py?  Little interest from me personally; I suppose it
>  > could go in Tools/scripts/...
> 
>   I think platform.py is pretty nifty, but I'm not entirely sure how
> it's expected to be used.  Perhaps Marc-Andre could explain further
> the motivation behind the module?

It was first intended to provide a way to format a platform
identifying file name for the mxCGIPython project and then
quickly moved on to provide many different APIs to query
platform specific information.

    architecture(executable='/usr/local/bin/python', bits='', linkage='') :
        Queries the given executable (defaults to the Python interpreter
        binary) for various architecture informations.
        
        Returns a tuple (bits,linkage) which contain information about
        the bit architecture and the linkage format used for the
        executable. Both values are returned as strings.
        
        Values that cannot be determined are returned as given by the
        parameter presets. If bits is given as '', the sizeof(long) is
        used as indicator for the supported pointer size.
        
        The function relies on the system's "file" command to do the
        actual work. This is available on most if not all Unix
        platforms. On some non-Unix platforms and then only if the
        executable points to the Python interpreter defaults from
        _default_architecture are used.

    dist(distname='', version='', id='') :
        Tries to determine the name of the OS distribution name
        
        The function first looks for a distribution release file in
        /etc and then reverts to _dist_try_harder() in case no
        suitable files are found.
        
        Returns a tuple distname,version,id which default to the
        args given as parameters.

    java_ver(release='', vendor='', vminfo=('', '', ''), osinfo=('', '', '')) :
        Version interface for JPython.
        
        Returns a tuple (release,vendor,vminfo,osinfo) with vminfo being
        a tuple (vm_name,vm_release,vm_vendor) and osinfo being a
        tuple (os_name,os_version,os_arch).
        
        Values which cannot be determined are set to the defaults
        given as parameters (which all default to '').

    libc_ver(executable='/usr/local/bin/python', lib='', version='') :
        Tries to determine the libc version against which the
        file executable (defaults to the Python interpreter) is linked.
        
        Returns a tuple of strings (lib,version) which default to the
        given parameters in case the lookup fails.
        
        Note that the function has intimate knowledge of how different
        libc versions add symbols to the executable is probably only
        useable for executables compiled using gcc. 
        
        The file is read and scanned in chunks of chunksize bytes.

    mac_ver(release='', versioninfo=('', '', ''), machine='') :
        Get MacOS version information and return it as tuple (release,
        versioninfo, machine) with versioninfo being a tuple (version,
        dev_stage, non_release_version).
        
        Entries which cannot be determined are set to ''. All tuple
        entries are strings.
        
        Thanks to Mark R. Levinson for mailing documentation links and
        code examples for this function. Documentation for the
        gestalt() API is available online at:
        
           http://www.rgaros.nl/gestalt/

    machine() :
        Returns the machine type, e.g. 'i386'
        
        An empty string is returned if the value cannot be determined.

    node() :
        Returns the computer's network name (may not be fully qualified !)
        
        An empty string is returned if the value cannot be determined.

    platform(aliased=0, terse=0) :
        Returns a single string identifying the underlying platform
        with as much useful information as possible (but no more :).
        
        The output is intended to be human readable rather than
        machine parseable. It may look different on different
        platforms and this is intended.
        
        If "aliased" is true, the function will use aliases for
        various platforms that report system names which differ from
        their common names, e.g. SunOS will be reported as
        Solaris. The system_alias() function is used to implement
        this.
        
        Setting terse to true causes the function to return only the
        absolute minimum information needed to identify the platform.

    processor() :
        Returns the (true) processor name, e.g. 'amdk6'
        
        An empty string is returned if the value cannot be
        determined. Note that many platforms do not provide this
        information or simply return the same value as for machine(),
        e.g.  NetBSD does this.

    release() :
        Returns the system's release, e.g. '2.2.0' or 'NT'
        
        An empty string is returned if the value cannot be determined.

    system() :
        Returns the system/OS name, e.g. 'Linux', 'Windows' or 'Java'.
        
        An empty string is returned if the value cannot be determined.

    system_alias(system, release, version) :
        Returns (system,release,version) aliased to common
        marketing names used for some systems.
        
        It also does some reordering of the information in some cases
        where it would otherwise cause confusion.

    uname() :
        Fairly portable uname interface. Returns a tuple
        of strings (system,node,release,version,machine,processor)
        identifying the underlying platform.
        
        Note that unlike the os.uname function this also returns
        possible processor information as additional tuple entry.
        
        Entries which cannot be determined are set to ''.

    version() :
        Returns the system's release version, e.g. '#3 on degas'
        
        An empty string is returned if the value cannot be determined.

    win32_ver(release='', version='', csd='', ptype='') :
        Get additional version information from the Windows Registry
        and return a tuple (version,csd,ptype) referring to version
        number, CSD level and OS type (multi/single
        processor).
        
        As a hint: ptype returns 'Uniprocessor Free' on single
        processor NT machines and 'Multiprocessor Free' on multi
        processor machines. The 'Free' refers to the OS version being
        free of debugging code. It could also state 'Checked' which
        means the OS version uses debugging code, i.e. code that
        checks arguments, ranges, etc. (Thomas Heller).
        
        Note: this functions only works if Mark Hammond's win32
        package is installed and obviously only runs on Win32
        compatible platforms.
        
        XXX Is there any way to find out the processor type on WinXX ?
        
        XXX Is win32 available on Windows CE ?
        
        Adapted from code posted by Karl Putland to comp.lang.python.

>   My biggest requirement is that it be accompanied by documentation.
> The coolness factor and shared use of hackerly knowledge would
> probably get *me* to put it in, but there are a lot of things about
> which I'll disagree with Guido just to hear his (well-considered)
> thoughts on the matter.  ;)

The module is doc-string documented (see above).
This should server well as basis for the latex docs.

--
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From DavidA@ActiveState.com  Wed Mar  8 18:36:01 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Wed, 8 Mar 2000 10:36:01 -0800
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us>
Message-ID: <NDBBJPNCJLKKIOBLDOMJOEHOCBAA.DavidA@ActiveState.com>

> "it" == platform.py?  Little interest from me personally; I suppose it
> could go in Tools/scripts/...

FWIW, I think it belongs in the standard path. It allows one to do the
equivalent of
if os.platform == '...'  but in a much more useful way.

--david


From mhammond@skippinet.com.au  Wed Mar  8 21:36:12 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Thu, 9 Mar 2000 08:36:12 +1100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us>
Message-ID: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>

> No concrete plans -- except that I think the registry access is
> supposed to go in.  Haven't seen the code on patches@python.org yet
> though.

FYI, that is off with Trent who is supposed to be testing it on the Alpha.

Re win32pipe - I responded to that post suggesting that we do with os.pipe
and win32pipe what was done with os.path.abspath/win32api - optionally try
to import the win32 specific module and use it.

My only "concern" is that this then becomes more code for Guido to maintain
in the core, even though Guido has expressed a desire to get out of the
installers business.

Assuming the longer term plan is for other people to put together
installation packages, and that these people are free to redistribute
win32api/win32pipe, Im wondering if it is worth bothering with?

Mark.


From trentm@ActiveState.com  Wed Mar  8 14:42:06 2000
From: trentm@ActiveState.com (Trent Mick)
Date: Wed, 8 Mar 2000 14:42:06 -0000
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <38C6919B.EA3EE2E7@lemburg.com>
Message-ID: <NDBBKLNNJCFFMINBECLEMEIGCDAA.trentm@ActiveState.com>

MAL:
>     architecture(executable='/usr/local/bin/python', bits='',
> linkage='') :
>
>         Values that cannot be determined are returned as given by the
>         parameter presets. If bits is given as '', the sizeof(long) is
>         used as indicator for the supported pointer size.

Just a heads up, using sizeof(long) will not work on forthcoming WIN64
(LLP64 data model) to determine the supported pointer size. You would want
to use the 'P' struct format specifier instead, I think (I am speaking in
relative ignorance). However, the docs say that a PyInt is used to store 'P'
specified value, which as a C long, will not hold a pointer on LLP64. Hmmmm.
The keyword perhaps is "forthcoming".

This is the code in question in platform.py:

    # Use the sizeof(long) as default number of bits if nothing
    # else is given as default.
    if not bits:
        import struct
        bits = str(struct.calcsize('l')*8) + 'bit'


Guido:
> > No concrete plans -- except that I think the registry access is
> > supposed to go in.  Haven't seen the code on patches@python.org yet
> > though.
>
Mark Hammond:
> FYI, that is off with Trent who is supposed to be testing it on the Alpha.

My Alpha is in pieces right now! I will get to it soon. I will try it on
Win64 as well, if I can.


Trent


Trent Mick
trentm@activestate.com


From guido@python.org  Thu Mar  9 02:59:51 2000
From: guido@python.org (Guido van Rossum)
Date: Wed, 08 Mar 2000 21:59:51 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: Your message of "Thu, 09 Mar 2000 08:36:12 +1100."
 <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>
References: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>
Message-ID: <200003090259.VAA20928@eric.cnri.reston.va.us>

> My only "concern" is that this then becomes more code for Guido to maintain
> in the core, even though Guido has expressed a desire to get out of the
> installers business.

Theoretically, it shouldn't need much maintenance.  I'm more concerned
that it will have different semantics than on Unix so that in practice
you'd need to know about the platform anyway (apart from the fact that
the installed commands are different, of course).

> Assuming the longer term plan is for other people to put together
> installation packages, and that these people are free to redistribute
> win32api/win32pipe, Im wondering if it is worth bothering with?

So that everybody could use os.popen() regardless of whether they're
on Windows or Unix.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond@skippinet.com.au  Thu Mar  9 03:31:21 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Thu, 9 Mar 2000 14:31:21 +1100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <200003090259.VAA20928@eric.cnri.reston.va.us>
Message-ID: <ECEPKNMJLHAPFFJHDOJBKEDLCGAA.mhammond@skippinet.com.au>

[Me]
> > Assuming the longer term plan is for other people to put together
> > installation packages, and that these people are free to redistribute
> > win32api/win32pipe, Im wondering if it is worth bothering with?

[Guido]
> So that everybody could use os.popen() regardless of whether they're
> on Windows or Unix.

Sure.  But what I meant was "should win32pipe code move into the core, or
should os.pipe() just auto-detect and redirect to win32pipe if installed?"

I was suggesting that over the longer term, it may be reasonable to assume
that win32pipe _will_ be installed, as everyone who releases installers for
Python should include it :-)  It could also be written in such a way that it
prints a warning message when win32pipe doesnt exist, so in 99% of cases, it
will answer the FAQ before they have had a chance to ask it :-)

It also should be noted that the win32pipe support for popen on Windows
95/98 includes a small, dedicated .exe - this just adds to the maintenance
burden.

But it doesnt worry me at all what happens - I was just trying to save you
work <wink>.  Anyone is free to take win32pipe and move the relevant code
into the core anytime they like, with my and Bill's blessing.  It quite
suits me that people have to download win32all to get this working, so I
doubt I will get around to it any time soon :-)

Mark.


From tim_one@email.msn.com  Thu Mar  9 03:52:58 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 8 Mar 2000 22:52:58 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>
Message-ID: <000401bf897a$f5a7e620$0d2d153f@tim>

I had another take on all this, which I'll now share <wink> since nobody
seems inclined to fold in the Win32 popen:  perhaps os.popen should not be
supported at all under Windows!

The current function is a mystery wrapped in an enigma -- sometimes it
works, sometimes it doesn't, and I've never been able to outguess which one
will obtain (there's more to it than just whether a console window is
attached).  If it's not reliable (it's not), and we can't document the
conditions under which it can be used safely (I can't), Python shouldn't
expose it.

Failing that, the os.popen docs should caution it's "use at your own risk"
under Windows, and that this is directly inherited from MS's popen
implementation.


From tim_one@email.msn.com  Thu Mar  9 09:40:26 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 04:40:26 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <200003081406.JAA20033@eric.cnri.reston.va.us>
Message-ID: <000701bf89ab$80cb8e20$0d2d153f@tim>

[Guido, with some implementation details and nice examples]

Normally I'd eat this up -- today I'm gasping for air trying to stay afloat.
I'll have to settle for sketching the high-level approach I've had in the
back of my mind.  I start with the pile of incestuous stuff Toby/Neil
discovered have no external references.  It consists of dead cycles, and
perhaps also non-cycles reachable only from dead cycles.

1. The "points to" relation on this pile defines a graph G.

2. From any graph G, we can derive a related graph G' consisting of the
maximal strongly connected components (SCCs) of G.  Each (super)node of G'
is an SCC of G, where (super)node A' of G' points to (super)node B' of G'
iff there exists a node A in A' that points to (wrt G) some node B in B'.
It's not obvious, but the SCCs can be found in linear time (via Tarjan's
algorithm, which is simple but subtle; Cyclops.py uses a much dumber
brute-force approach, which is nevertheless perfectly adequate in the
absence of massively large cycles -- premature optimization is the root etc
<0.5 wink>).

3. G' is necessarily a DAG.  For if distinct A' and B' are both reachable
from each other in G', then every pair of A in A' and B in B' are reachable
from each other in G, contradicting that A' and B' are distinct maximal SCCs
(that is, the union of A' and B' is also an SCC).

4. The point to all this:  Every DAG can be topsorted.  Start with the nodes
of G' without predecessors.  There must be at least one, because G' is a
DAG.

5. For every node A' in G' without predecessors (wrt G'), it either does or
does not contain an object with a potentially dangerous finalizer.  If it
does not, let's call it a safe node.  If there are no safe nodes without
predecessors, GC is stuck, and for good reason:  every object in the whole
pile is reachable from an object with a finalizer, which could change the
topology in near-arbitrary ways.  The unsafe nodes without predecessors (and
again, by #4, there must be at least one) are the heart of the problem, and
this scheme identifies them precisely.

6. Else there is a safe node A'.  For each A in A', reclaim it, following
the normal refcount rules (or in an implementation w/o RC, by following a
topsort of "points to" in the original G).  This *may* cause reclamation of
an object X with a finalizer outside of A'.  But doing so cannot cause
resurrection of anything in A' (X is reachable from A' else cleaning up A'
couldn't have affected X, and if anything in A' were also reachable from X,
X would have been in A' to begin with (SCC!), contradicting that A' is
safe).  So the objects in A' can get reclaimed without difficulty.

7. The simplest thing to do now is just stop:  rebuild it from scratch the
next time the scheme is invoked.  If it was *possible* to make progress
without guessing, we did; and if it was impossible, we identified the
precise SCC(s) that stopped us.  Anything beyond that is optimization <0.6
wink>.

Seems the most valuable optimization would be to keep track of whether an
object with a finalizer gets reclaimed in step 6 (so long as that doesn't
happen, the mutations that can occur to the structure of G' seem nicely
behaved enough that it should be possible to loop back to step #5 without
crushing pain).


On to Guido's msg:

[Guido]
> When we have a pile of garbage, we don't know whether it's all
> connected or whether it's lots of little cycles.  So if we find
> [objects with -- I'm going to omit this] finalizers, we have to put
> those on a third list and put everything reachable from them on that
> list as well (the algorithm I described before).

SCC determination gives precise answers to all that.

> What's left on the first list then consists of finalizer-free garbage.
> We dispose of this garbage by clearing dicts and lists.  Hopefully
> this makes the refcount of some of the finalizers go to zero -- those
> are finalized in the normal way.

In Python it's even possible for a finalizer to *install* a __del__ method
that didn't previously exist, into the class of one of the objects on your
"first list".  The scheme above is meant to be bulletproof in the face of
abuses even I can't conceive of <wink>.

More mundanely, clearing an item on your first list can cause a chain of
events that runs a finalizer, which in turn can resurrect one of the objects
on your first list (and so it should *not* get reclaimed).  Without doing
the SCC bit, I don't think you can out-think that (the reasoning above
showed that the finalizer can't resurrect something in the *same* SCC as the
object that started it all, but that argument cannot be extended to objects
in other safe SCCs:  they're vulnerable).

> And now we have to deal with the inevitable: finalizers that are part
> of cycles.  It makes sense to reduce the graph of objects to a graph
> of finalizers only.  Example:
>
>   A <=> b -> C <=> d
>
> A and C have finalizers.  C is part of a cycle (C-d) that contains no
> other finalizers, but C is also reachable from A.  A is part of a
> cycle (A-b) that keeps it alive.  The interesting thing here is that
> if we only look at the finalizers, there are no cycles!

The scheme above derives G':

    A' -> C'

where A' consists of the A<=>b cycle and C' the C<=>d cycle.  That there are
no cycles in G' isn't surprising, it's just the natural consequence of doing
the natural analysis <wink>.  The scheme above refuses to do anything here,
because the only node in G' without a predecessor (namely A') isn't "safe".

> If we reduce the graph to only finalizers (setting aside for now the
> problem of how to do that -- we may need to allocate more memory to
> hold the reduced greaph), we get:
>
>   A -> C

You should really have self-loops on both A and C, right? (because A is
reachable from itself via chasing pointers; ditto for C)

> We can now finalize A (even though its refcount is nonzero!).  And
> that's really all we can do!  A could break its own cycle, thereby
> disposing of itself and b.  It could also break C's cycle, disposing
> of C and d.  It could do nothing.  Or it could resurrect A, thereby
> resurrecting all of A, b, C, and d.
>
> This leads to (there's that weird echo again :-) Boehm's solution:
> Call A's finalizer and leave the rest to the next time the garbage
> collection runs.

This time the echo came back distorted <wink>:

   [Boehm]
   Cycles involving one or more finalizable objects are never finalized.

A<=>b is "a cycle involving one or more finalizable objects", so he won't
touch it.  The scheme at the top doesn't either.  If you handed him your
*derived* graph (but also without the self-loops), he would; me too.  KISS!

> Note that we're now calling finalizers on objects with a non-zero
> refcount.

I don't know why you want to do this.  As the next several paragraphs
confirm, it creates real headaches for the implementation, and I'm unclear
on what it buys in return.  Is "we'll do something by magic for cycles with
no more than one finalizer" a major gain for the user over "we'll do
something by magic for cycles with no finalizer"?  0, 1 and infinity *are*
the only interesting numbers <wink>, but the difference between 0 and 1
*here* doesn't seem to me worth signing up for any pain at all.

> At some point (probably as a result of finalizing A) its
> refcount will go to zero.  We should not finalize it again -- this
> would serve no purpose.

I don't believe BDW (or the scheme at the top) has this problem (simply
because the only way to run finalizer in a cycle under them is for the user
to break the cycle explicitly -- so if an object's finalizer gets run, the
user caused it directly, and so can never claim surprise).

>  Possible solution:
>
>   INCREF(A);
>   A->__del__();
>   if (A->ob_refcnt == 1)
>       A->__class__ = NULL; /* Make a finalizer-less */
>   DECREF(A);
>
> This avoids finalizing twice if the first finalization broke all
> cycles in which A is involved.  But if it doesn't, A is still cyclical
> garbage with a finalizer!  Even if it didn't resurrect itself.
>
> Instead of the code fragment above, we could mark A as "just
> finalized" and when it shows up at the head of the tree (of finalizers
> in cyclical trash) again on the next garbage collection, to discard it
> without calling the finalizer again (because this clearly means that
> it didn't resurrect itself -- at least not for a very long time).

I don't think you need to do any of this -- unless you think you need to do
the thing that created the need for this, which I didn't think you needed to
do either <wink>.

> I would be happier if we could still have a rule that says that a
> finalizer is called only once by magic -- even if we have two forms of
> magic: refcount zero or root of the tree.  Tim: I don't know if you
> object against this rule as a matter of principle (for the sake of
> finalizers that resurrect the object) or if your objection is really
> against the unordered calling of finalizers legitimized by Java's
> rules.  I hope the latter, since I think it that this rule (__del__
> called only once by magic) by itself is easy to understand and easy to
> deal with, and I believe it may be necessary to guarantee progress for
> the garbage collector.

My objections to Java's rules have been repeated enough.

I would have no objection to "__del__ called only once" if it weren't for
that Python currently does something different.  I don't know whether people
rely on that now; if they do, it's a much more dangerous thing to change
than adding a new keyword (the compiler gives automatic 100% coverage of the
latter; but nothing mechanical can help people track down reliance-- whether
deliberate or accidental --on the former).

My best *guess* is that __del__ is used rarely; e.g., there are no more than
40 instances of it in the whole CVS tree, including demo directories; and
they all look benign (at least three have bodies consisting of "pass"!).
The most complicated one I found in my own code is:

    def __del__(self):
        self.break_cycles()

    def break_cycles(self):
        for rule in self.rules:
            if rule is not None:
                rule.cleanse()

But none of this self-sampling is going to comfort some guy in France who
has a megaline of code relying on it.  Good *bet*, though <wink>.

> [and another cogent explanation of why breaking the "leave cycles with
>  finalizers" alone injunction creates headaches]

> ...
> Even if someone once found a good use for resurrecting inside __del__,
> against all recommendations, I don't mind breaking their code, if it's
> for a good cause.  The Java rules aren't a good cause.  But top-sorted
> finalizer calls seem a worthy cause.

They do to me too, except that I say even a cycle involving but a single
object (w/ finalizer) looping on itself is the user's problem.

> So now we get to discuss what to do with multi-finalizer cycles, like:
>
>   A <=> b <=> C
>
> Here the reduced graph is:
>
>   A <=> C

The SCC reduction is simply to

    A

and, right, the scheme at the top punts.

> [more the on once-only rule chopped]
> ...
> Anyway, once-only rule aside, we still need a protocol to deal with
> cyclical dependencies between finalizers.  The __cleanup__ approach is
> one solution, but it also has a problem: we have a set of finalizers.
> Whose __cleanup__ do we call?  Any?  All?  Suggestions?

This is why a variant of guardians were more appealing to me at first:  I
could ask a guardian for the entire SCC, so I get the *context* of the
problem as well as the final microscopic symptom.

I see Marc-Andre already declined to get sucked into the magical part of
this <wink>.  Greg should speak for his scheme, and I haven't made time to
understand it fully; my best guess is to call x.__cleanup__ for every object
in the SCC (but there's no clear way to decide which order to call them in,
and unless they're more restricted than __del__ methods they can create all
the same problems __del__ methods can!).

> Note that I'd like some implementation freedom: I may not want to
> bother with the graph reduction algorithm at first (which seems very
> hairy) so I'd like to have the right to use the __cleanup__ API
> as soon as I see finalizers in cyclical trash.  I don't mind disposing
> of finalizer-free cycles first, but once I have more than one
> finalizer left in the remaining cycles, I'd like the right not to
> reduce the graph for topsort reasons -- that algorithm seems hard.

I hate to be realistic <wink>, but modern GC algorithms are among the
hardest you'll ever see in any field; even the outer limits of what we've
talked about here is baby stuff.  Sun's Java group (the one in Chelmsford,
MA, down the road from me) had a group of 4+ people (incl. the venerable Mr.
Steele) working full-time for over a year on the last iteration of Java's
GC.  The simpler BDW is a megabyte of code spread over 100+ files.  Etc --
state of the art GC can be crushingly hard.

So I've got nothing against taking shortcuts at first -- there's actually no
realistic alternative.  I think we're overlooking the obvious one, though:
if any finalizer appears in any trash cycle, tough luck.  Python 3000 --
which may be a spelling of 1.7 <wink>, but doesn't *need* to be a spelling
of 1.6.

> So we're back to the __cleanup__ design.  Strawman proposal: for all
> finalizers in a trash cycle, call their __cleanup__ method, in
> arbitrary order.  After all __cleanup__ calls are done, if the objects
> haven't all disposed of themselves, they are all garbage-collected
> without calling __del__.  (This seems to require another garbage
> colelction cycle -- so perhaps there should also be a once-only rule
> for __cleanup__?)
>
> Separate question: what if there is no __cleanup__?  This should
> probably be reported: "You have cycles with finalizers, buddy!  What
> do you want to do about them?"  This same warning could be given when
> there is a __cleanup__ but it doesn't break all cycles.

If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly
1" isn't special to me), I will consider it to be a bug.  So I want a way to
get it back from gc, so I can see what the heck it is, so I can fix my code
(or harass whoever did it to me).  __cleanup__ suffices for that, so the
very act of calling it is all I'm really after ("Python invoked __cleanup__
== Tim has a bug").

But after I outgrow that <wink>, I'll certainly want the option to get
another kind of complaint if __cleanup__ doesn't break the cycles, and after
*that* I couldn't care less.  I've given you many gracious invitations to
say that you don't mind leaking in the face of a buggy program <wink>, but
as you've declined so far, I take it that never hearing another gripe about
leaking is a Primary Life Goal.  So collection without calling __del__ is
fine -- but so is collection with calling it!  If we're going to (at least
implicitly) approve of this stuff, it's probably better *to* call __del__,
if for no other reason than to catch your case of some poor innocent object
caught in a cycle not of its making that expects its __del__ to abort
starting World War III if it becomes unreachable <wink>.

whatever-we-don't-call-a-mistake-is-a-feature-ly y'rs  - tim


From fdrake@acm.org  Thu Mar  9 14:25:35 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 9 Mar 2000 09:25:35 -0500 (EST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <000401bf897a$f5a7e620$0d2d153f@tim>
References: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>
 <000401bf897a$f5a7e620$0d2d153f@tim>
Message-ID: <14535.46175.991970.135642@weyr.cnri.reston.va.us>

Tim Peters writes:
 > Failing that, the os.popen docs should caution it's "use at your own risk"
 > under Windows, and that this is directly inherited from MS's popen
 > implementation.

Tim (& others),
  Would this additional text be sufficient for the os.popen()
documentation?

	\strong{Note:} This function behaves unreliably under Windows
        due to the native implementation of \cfunction{popen()}.

  If someone cares to explain what's weird about it, that might be
appropriate as well, but I've never used this under Windows.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From mal@lemburg.com  Thu Mar  9 14:42:37 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 09 Mar 2000 15:42:37 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff
 going to be adopted?
References: <NDBBKLNNJCFFMINBECLEMEIGCDAA.trentm@ActiveState.com>
Message-ID: <38C7B85D.E6090670@lemburg.com>

Trent Mick wrote:
> 
> MAL:
> >     architecture(executable='/usr/local/bin/python', bits='',
> > linkage='') :
> >
> >         Values that cannot be determined are returned as given by the
> >         parameter presets. If bits is given as '', the sizeof(long) is
> >         used as indicator for the supported pointer size.
> 
> Just a heads up, using sizeof(long) will not work on forthcoming WIN64
> (LLP64 data model) to determine the supported pointer size. You would want
> to use the 'P' struct format specifier instead, I think (I am speaking in
> relative ignorance). However, the docs say that a PyInt is used to store 'P'
> specified value, which as a C long, will not hold a pointer on LLP64. Hmmmm.
> The keyword perhaps is "forthcoming".
> 
> This is the code in question in platform.py:
> 
>     # Use the sizeof(long) as default number of bits if nothing
>     # else is given as default.
>     if not bits:
>         import struct
>         bits = str(struct.calcsize('l')*8) + 'bit'

Python < 1.5.2 doesn't support 'P', but anyway, I'll change
those lines according to your suggestion.
 
Does struct.calcsize('P')*8 return 64 on 64bit-platforms as
it should (probably ;) ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jim@interet.com  Thu Mar  9 15:45:54 2000
From: jim@interet.com (James C. Ahlstrom)
Date: Thu, 09 Mar 2000 10:45:54 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff
 going to be adopted?
References: <000401bf897a$f5a7e620$0d2d153f@tim>
Message-ID: <38C7C732.D9086C34@interet.com>

Tim Peters wrote:
> 
> I had another take on all this, which I'll now share <wink> since nobody
> seems inclined to fold in the Win32 popen:  perhaps os.popen should not be
> supported at all under Windows!
> 
> The current function is a mystery wrapped in an enigma -- sometimes it
> works, sometimes it doesn't, and I've never been able to outguess which one
> will obtain (there's more to it than just whether a console window is
> attached).  If it's not reliable (it's not), and we can't document the
> conditions under which it can be used safely (I can't), Python shouldn't
> expose it.

OK, I admit I don't understand this either, but here goes...

It looks like Python popen() uses the Windows _popen() function.
The _popen() docs say that it creates a spawned copy of the command
processor (shell) with the given string argument.  It further states
that
it does NOT work in a Windows program and ONLY works when called from a
Windows Console program.

From this I assume that popen() works from python.exe (it is a Console
app) if the command can be directly executed by the shell (like "dir"),
or
if the command starts a Console Windows application.  It can't work when
starting a regular Windows program because those don't have a stdin nor
stdout.  But Console apps do have stdin and stdout, and these are
inherited
by other Console programs in Unix fashion.

Is this what doesn't work?  If so, there is a bug in _popen(). 
Otherwise
we are just expecting Unix behavior from Windows.  Or perhaps we expect
popen() to work from a Windows non-Console app, which _popen() is
guaranteed not to do.

If there is something wrong with _popen() then the way to fix it is
to avoid using it and create the pipes directly.  For an example
look in the docs under:

  Platform SDK
    Windows Base Services
      Executables
        Processes and Threads
          Using Processes and Threads
            Creating a Child Process with Redirected Input and Output

The sample code can be extraced and put into posixmodule.c.  Note that
this is what OS2 does.  See the #ifdef.

> Failing that, the os.popen docs should caution it's "use at your own risk"
> under Windows, and that this is directly inherited from MS's popen
> implementation.

Of course, the strength of Python is portable code.  popen() should be
fixed the right way.

JimA


From tim_one@email.msn.com  Thu Mar  9 17:14:17 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 12:14:17 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <38C7C732.D9086C34@interet.com>
Message-ID: <000401bf89ea$e6e54180$79a0143f@tim>

[James C. Ahlstrom]
> OK, I admit I don't understand this either, but here goes...
>
> It looks like Python popen() uses the Windows _popen() function.
> The _popen() docs say ...

Screw the docs.  Pretend you're a newbie and *try* it.  Here:

import os
p = os.popen("dir")
while 1:
    line = p.readline()
    if not line:
        break
    print line

Type that in by hand, or stick it in a file & run it from a cmdline
python.exe (which is a Windows console program).  Under Win95 the process
freezes solid, and even trying to close the DOS box doesn't work.  You have
to bring up the task manager and kill it that way.  I once traced this under
the debugger -- it's hung inside an MS DLL.  "dir" is not entirely arbitrary
here:  for *some* cmds it works fine, for others not.  The set of which work
appears to vary across Windows flavors.  Sometimes you can worm around it by
wrapping "a bad" cmd in a .bat file, and popen'ing the latter instead; but
sometimes not.

After hours of poke-&-hope (in the past), as I said, I've never been able to
predict which cases will work.

> ...
> It further states that it does NOT work in a Windows program and ONLY
> works when called from a Windows Console program.

The latter is a necessary condition but not sufficient; don't know what *is*
sufficient, and AFAIK nobody else does either.

> From this I assume that popen() works from python.exe (it is a Console
> app) if the command can be directly executed by the shell (like "dir"),

See above for a counterexample to both <wink>.  I actually have much better
luck with cmds command.com *doesn't* know anything about.  So this appears
to vary by shell too.

> ...
> If there is something wrong with _popen() then the way to fix it is
> to avoid using it and create the pipes directly.

libc pipes ares as flaky as libc popen under Windows, Jim!  MarkH has the
only versions of these things that come close to working under Windows (he
wraps the native Win32 spellings of these things; MS's libc entry points
(which Python uses now) are much worse).

> ...
> Of course, the strength of Python is portable code.  popen() should be
> fixed the right way.

pipes too, but users get baffled by popen much more often simply because
they try popen much more often.

there's-no-question-about-whether-it-works-right-it-doesn't-ly y'rs  - tim


From gstein@lyra.org  Thu Mar  9 17:47:23 2000
From: gstein@lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 09:47:23 -0800 (PST)
Subject: [Python-Dev] platform.py (was: Fixing os.popen on Win32 => is the win32pipe stuff
 going to be adopted?)
In-Reply-To: <38C7B85D.E6090670@lemburg.com>
Message-ID: <Pine.LNX.4.10.10003090946420.18225-100000@nebula.lyra.org>

On Thu, 9 Mar 2000, M.-A. Lemburg wrote:
>...
> Python < 1.5.2 doesn't support 'P', but anyway, I'll change
> those lines according to your suggestion.
>  
> Does struct.calcsize('P')*8 return 64 on 64bit-platforms as
> it should (probably ;) ?

Yes. It returns sizeof(void *).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal@lemburg.com  Thu Mar  9 14:55:36 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 09 Mar 2000 15:55:36 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff
 going to be adopted?
References: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>
 <000401bf897a$f5a7e620$0d2d153f@tim> <14535.46175.991970.135642@weyr.cnri.reston.va.us>
Message-ID: <38C7BB68.9FAE3BE9@lemburg.com>

"Fred L. Drake, Jr." wrote:
> 
> Tim Peters writes:
>  > Failing that, the os.popen docs should caution it's "use at your own risk"
>  > under Windows, and that this is directly inherited from MS's popen
>  > implementation.
> 
> Tim (& others),
>   Would this additional text be sufficient for the os.popen()
> documentation?
> 
>         \strong{Note:} This function behaves unreliably under Windows
>         due to the native implementation of \cfunction{popen()}.
> 
>   If someone cares to explain what's weird about it, that might be
> appropriate as well, but I've never used this under Windows.

Ehm, hasn't anyone looked at the code I posted yesterday ?
It goes a long way to deal with these inconsistencies... even
though its not perfect (yet ;).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake@acm.org  Thu Mar  9 18:52:40 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 9 Mar 2000 13:52:40 -0500 (EST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff
 going to be adopted?
In-Reply-To: <38C7BB68.9FAE3BE9@lemburg.com>
References: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>
 <000401bf897a$f5a7e620$0d2d153f@tim>
 <14535.46175.991970.135642@weyr.cnri.reston.va.us>
 <38C7BB68.9FAE3BE9@lemburg.com>
Message-ID: <14535.62200.158087.102380@weyr.cnri.reston.va.us>

M.-A. Lemburg writes:
 > Ehm, hasn't anyone looked at the code I posted yesterday ?
 > It goes a long way to deal with these inconsistencies... even
 > though its not perfect (yet ;).

  I probably sent that before I'd read everything, and I'm not the one 
to change the popen() implementation.
  At this point, I'm waiting for someone who understands the details
to decide what happens (if anything) to the implementation before I
check in any changes to the docs.
  My inclination is to fix popen() on Windows to do the right thing,
but I don't know enough about pipes & process management on Windows to 
get into that fray.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From nascheme@enme.ucalgary.ca  Thu Mar  9 19:37:31 2000
From: nascheme@enme.ucalgary.ca (nascheme@enme.ucalgary.ca)
Date: Thu, 9 Mar 2000 12:37:31 -0700
Subject: [Python-Dev] finalization again
Message-ID: <20000309123731.A3664@acs.ucalgary.ca>

[Tim, explaining something I was thinking about more clearly than
I ever could]

>It's not obvious, but the SCCs can be found in linear time (via Tarjan's
>algorithm, which is simple but subtle;

Wow, it seems like it should be more expensive than that.  What
are the space requirements?  Also, does the simple algorithm you
used in Cyclops have a name?

>If there are no safe nodes without predecessors, GC is stuck,
>and for good reason: every object in the whole pile is reachable
>from an object with a finalizer, which could change the topology
>in near-arbitrary ways. The unsafe nodes without predecessors
>(and again, by #4, there must be at least one) are the heart of
>the problem, and this scheme identifies them precisely.

Exactly.  What is our policy on these unsafe nodes?  Guido seems
to feel that it is okay for the programmer to create them and
Python should have a way of collecting them.  Tim seems to feel
that the programmer should not create them in the first place.  I
agree with Tim.

If topological finalization is used, it is possible for the
programmer to design their classes so that this problem does not
happen.  This is explained on Hans Boehm's finalization web page.

If the programmer can or does not redesign their classes I don't
think it is unreasonable to leak memory.  We can link these
cycles to a global list of garbage or print a debugging message.
This is a large improvement over the current situation (ie.
leaking memory with no debugging even for cycles without
finalizers).


    Neil

-- 
"If you're a great programmer, you make all the routines depend on each
other, so little mistakes can really hurt you." -- Bill Gates, ca. 1985.


From gstein@lyra.org  Thu Mar  9 19:50:29 2000
From: gstein@lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 11:50:29 -0800 (PST)
Subject: [Python-Dev] finalization again
In-Reply-To: <20000309123731.A3664@acs.ucalgary.ca>
Message-ID: <Pine.LNX.4.10.10003091148180.18817-100000@nebula.lyra.org>

On Thu, 9 Mar 2000 nascheme@enme.ucalgary.ca wrote:
>...
> If the programmer can or does not redesign their classes I don't
> think it is unreasonable to leak memory.  We can link these
> cycles to a global list of garbage or print a debugging message.
> This is a large improvement over the current situation (ie.
> leaking memory with no debugging even for cycles without
> finalizers).

I think we throw an error (as a subclass of MemoryError).

As an alternative, is it possible to move those cycles to the garbage list
and then never look at them again? That would speed up future collection
processing.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From guido@python.org  Thu Mar  9 19:51:46 2000
From: guido@python.org (Guido van Rossum)
Date: Thu, 09 Mar 2000 14:51:46 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Thu, 09 Mar 2000 11:50:29 PST."
 <Pine.LNX.4.10.10003091148180.18817-100000@nebula.lyra.org>
References: <Pine.LNX.4.10.10003091148180.18817-100000@nebula.lyra.org>
Message-ID: <200003091951.OAA26184@eric.cnri.reston.va.us>

> As an alternative, is it possible to move those cycles to the garbage list
> and then never look at them again? That would speed up future collection
> processing.

With the current approach, that's almost automatic :-)

I'd rather reclaim the memory too.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gmcm@hypernet.com  Thu Mar  9 19:54:16 2000
From: gmcm@hypernet.com (Gordon McMillan)
Date: Thu, 9 Mar 2000 14:54:16 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <000401bf89ea$e6e54180$79a0143f@tim>
References: <38C7C732.D9086C34@interet.com>
Message-ID: <1259490837-400325@hypernet.com>

[Tim re popen on Windows]

...
> the debugger -- it's hung inside an MS DLL.  "dir" is not entirely arbitrary
> here:  for *some* cmds it works fine, for others not.  The set of which work
> appears to vary across Windows flavors.  Sometimes you can worm around it by
> wrapping "a bad" cmd in a .bat file, and popen'ing the latter instead; but
> sometimes not.

It doesn't work for commands builtin to whatever "shell" you're 
using. That's different between cmd and command, and the 
various flavors, versions and extensions thereof.

FWIW, I gave up a long time ago. I use redirection and a 
tempfile. The few times I've wanted "interactive" control, I've 
used Win32Process, dup'ed, inherited handles... the whole 9 
yards. Why? Look at all the questions about popen and child 
processes in general, on platforms where it *works*, (if it 
weren't for Donn Cave, nobody'd get it to work anywhere 
<wink>).
 
To reiterate Tim's point: *none* of the c runtime routines for 
process control on Windows are adequate (beyond os.system 
and living with a DOS box popping up). The raw Win32 
CreateProcess does everything you could possibly want, but 
takes a week or more to understand, (if this arg is a that, then 
that arg is a whatsit, and the next is limited to the values X 
and Z unless...).

your-brain-on-Windows-ly y'rs

- Gordon


From guido@python.org  Thu Mar  9 19:55:23 2000
From: guido@python.org (Guido van Rossum)
Date: Thu, 09 Mar 2000 14:55:23 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Thu, 09 Mar 2000 04:40:26 EST."
 <000701bf89ab$80cb8e20$0d2d153f@tim>
References: <000701bf89ab$80cb8e20$0d2d153f@tim>
Message-ID: <200003091955.OAA26217@eric.cnri.reston.va.us>

[Tim describes a more formal approach based on maximal strongly
connected components (SCCs).]

I like the SCC approach -- it's what I was struggling to invent but
came short of discovering.

However:

[me]
> > What's left on the first list then consists of finalizer-free garbage.
> > We dispose of this garbage by clearing dicts and lists.  Hopefully
> > this makes the refcount of some of the finalizers go to zero -- those
> > are finalized in the normal way.

[Tim]
> In Python it's even possible for a finalizer to *install* a __del__ method
> that didn't previously exist, into the class of one of the objects on your
> "first list".  The scheme above is meant to be bulletproof in the face of
> abuses even I can't conceive of <wink>.

Are you *sure* your scheme deals with this?  Let's look at an example.
(Again, lowercase nodes have no finalizers.)  Take G:

  a <=> b -> C

This is G' (a and b are strongly connected):

  a' -> C'

C is not reachable from any root node.  We decide to clear a and b.
Let's suppose we happen to clear b first.  This removes the last
reference to C, C's finalizer runs, and it installs a finalizer on
a.__class__.  So now a' has turned into A', and we're halfway
committing a crime we said we would never commit (touching cyclical
trash with finalizers).

I propose to disregard this absurd possibility, except to the extent
that Python shouldn't crash -- but we make no guarantees to the user.

> More mundanely, clearing an item on your first list can cause a chain of
> events that runs a finalizer, which in turn can resurrect one of the objects
> on your first list (and so it should *not* get reclaimed).  Without doing
> the SCC bit, I don't think you can out-think that (the reasoning above
> showed that the finalizer can't resurrect something in the *same* SCC as the
> object that started it all, but that argument cannot be extended to objects
> in other safe SCCs:  they're vulnerable).

I don't think so.  While my poor wording ("finalizer-free garbage")
didn't make this clear, my references to earlier algorithms were
intended to imply that this is garbage that consists of truly
unreachable objects.  I have three lists: let's call them T(rash),
R(oot-reachable), and F(inalizer-reachable).  The Schemenauer
c.s. algorithm moves all reachable nodes to R.  I then propose to move
all finalizers to F, and to run another pass of Schemenauer c.s. to
also move all finalizer-reachable (but not root-reachable) nodes to F.

I truly believe that (barring the absurdity of installing a new
__del__) the objects on T at this point cannot be resurrected by a
finalizer that runs, since they aren't reachable from any finalizers:
by virtue of Schemenauer c.s. (which computes a reachability closure
given some roots) anything reachable from a finalizer is on F by now
(if it isn't on R -- again, nothing on T is reachable from R, because
R is calculated a closure).

So, unless there's still a bug in my thinking here, I think that as
long as we only want to clear SCCs with 0 finalizers, T is exactly the
set of nodes we're looking for.

> This time the echo came back distorted <wink>:
> 
>    [Boehm]
>    Cycles involving one or more finalizable objects are never finalized.
> 
> A<=>b is "a cycle involving one or more finalizable objects", so he won't
> touch it.  The scheme at the top doesn't either.  If you handed him your
> *derived* graph (but also without the self-loops), he would; me too.  KISS!
> 
> > Note that we're now calling finalizers on objects with a non-zero
> > refcount.
> 
> I don't know why you want to do this.  As the next several paragraphs
> confirm, it creates real headaches for the implementation, and I'm unclear
> on what it buys in return.  Is "we'll do something by magic for cycles with
> no more than one finalizer" a major gain for the user over "we'll do
> something by magic for cycles with no finalizer"?  0, 1 and infinity *are*
> the only interesting numbers <wink>, but the difference between 0 and 1
> *here* doesn't seem to me worth signing up for any pain at all.

I do have a reason: if a maximal SCC has only one finalizer, there can
be no question about the ordering between finalizer calls.  And isn't
the whole point of this discussion to have predictable ordering of
finalizer calls in the light of trash recycling?

> I would have no objection to "__del__ called only once" if it weren't for
> that Python currently does something different.  I don't know whether people
> rely on that now; if they do, it's a much more dangerous thing to change
> than adding a new keyword (the compiler gives automatic 100% coverage of the
> latter; but nothing mechanical can help people track down reliance-- whether
> deliberate or accidental --on the former).
[...]
> But none of this self-sampling is going to comfort some guy in France who
> has a megaline of code relying on it.  Good *bet*, though <wink>.

OK -- so your objection is purely about backwards compatibility.
Apart from that, I strongly feel that the only-once rule is a good
one.  And I don't think that the compatibility issue weighs very
strongly here (given all the other problems that typically exist with
__del__).

> I see Marc-Andre already declined to get sucked into the magical part of
> this <wink>.  Greg should speak for his scheme, and I haven't made time to
> understand it fully; my best guess is to call x.__cleanup__ for every object
> in the SCC (but there's no clear way to decide which order to call them in,
> and unless they're more restricted than __del__ methods they can create all
> the same problems __del__ methods can!).

Yes, but at least since we're defining a new API (in a reserved
portion of the method namespace) there are no previous assumptions to
battle.

> > Note that I'd like some implementation freedom: I may not want to
> > bother with the graph reduction algorithm at first (which seems very
> > hairy) so I'd like to have the right to use the __cleanup__ API
> > as soon as I see finalizers in cyclical trash.  I don't mind disposing
> > of finalizer-free cycles first, but once I have more than one
> > finalizer left in the remaining cycles, I'd like the right not to
> > reduce the graph for topsort reasons -- that algorithm seems hard.
> 
> I hate to be realistic <wink>, but modern GC algorithms are among the
> hardest you'll ever see in any field; even the outer limits of what we've
> talked about here is baby stuff.  Sun's Java group (the one in Chelmsford,
> MA, down the road from me) had a group of 4+ people (incl. the venerable Mr.
> Steele) working full-time for over a year on the last iteration of Java's
> GC.  The simpler BDW is a megabyte of code spread over 100+ files.  Etc --
> state of the art GC can be crushingly hard.
> 
> So I've got nothing against taking shortcuts at first -- there's actually no
> realistic alternative.  I think we're overlooking the obvious one, though:
> if any finalizer appears in any trash cycle, tough luck.  Python 3000 --
> which may be a spelling of 1.7 <wink>, but doesn't *need* to be a spelling
> of 1.6.

Kind of sad though -- finally knowing about cycles and then not being
able to do anything about them.

> > So we're back to the __cleanup__ design.  Strawman proposal: for all
> > finalizers in a trash cycle, call their __cleanup__ method, in
> > arbitrary order.  After all __cleanup__ calls are done, if the objects
> > haven't all disposed of themselves, they are all garbage-collected
> > without calling __del__.  (This seems to require another garbage
> > colelction cycle -- so perhaps there should also be a once-only rule
> > for __cleanup__?)
> >
> > Separate question: what if there is no __cleanup__?  This should
> > probably be reported: "You have cycles with finalizers, buddy!  What
> > do you want to do about them?"  This same warning could be given when
> > there is a __cleanup__ but it doesn't break all cycles.
> 
> If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly
> 1" isn't special to me), I will consider it to be a bug.  So I want a way to
> get it back from gc, so I can see what the heck it is, so I can fix my code
> (or harass whoever did it to me).  __cleanup__ suffices for that, so the
> very act of calling it is all I'm really after ("Python invoked __cleanup__
> == Tim has a bug").
> 
> But after I outgrow that <wink>, I'll certainly want the option to get
> another kind of complaint if __cleanup__ doesn't break the cycles, and after
> *that* I couldn't care less.  I've given you many gracious invitations to
> say that you don't mind leaking in the face of a buggy program <wink>, but
> as you've declined so far, I take it that never hearing another gripe about
> leaking is a Primary Life Goal.  So collection without calling __del__ is
> fine -- but so is collection with calling it!  If we're going to (at least
> implicitly) approve of this stuff, it's probably better *to* call __del__,
> if for no other reason than to catch your case of some poor innocent object
> caught in a cycle not of its making that expects its __del__ to abort
> starting World War III if it becomes unreachable <wink>.

I suppose we can print some obnoxious message to stderr like

"""Your program has created cyclical trash involving one or more
objects with a __del__ method; calling their __cleanup__ method didn't
resolve the cycle(s).  I'm going to call the __del__ method(s) but I
can't guarantee that they will be called in a meaningful order,
because of the cyclical dependencies."""

But I'd still like to reclaim the memory.  If this is some
long-running server process that is executing arbitrary Python
commands sent to it by clients, it's not nice to leak, period.
(Because of this, I will also need to trace functions, methods and
modules -- these create massive cycles that currently require painful
cleanup.  Of course I also need to track down all the roots
then... :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gstein@lyra.org  Thu Mar  9 19:59:48 2000
From: gstein@lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 11:59:48 -0800 (PST)
Subject: [Python-Dev] finalization again
In-Reply-To: <200003091951.OAA26184@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003091157560.18817-100000@nebula.lyra.org>

On Thu, 9 Mar 2000, Guido van Rossum wrote:
> > As an alternative, is it possible to move those cycles to the garbage list
> > and then never look at them again? That would speed up future collection
> > processing.
> 
> With the current approach, that's almost automatic :-)
> 
> I'd rather reclaim the memory too.

Well, yah. I would too :-)  I'm at ApacheCon right now, so haven't read
the thread in detail, but it seems that people saw my algorithm as a bit
too complex. Bah. IMO, it's a pretty straightforward way for the
interpreter to get cycles cleaned up. (whether the objects in the cycles
are lists/dicts, class instances, or extension types!)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Thu Mar  9 20:18:06 2000
From: gstein@lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 12:18:06 -0800 (PST)
Subject: [Python-Dev] finalization again
In-Reply-To: <200003091955.OAA26217@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003091205510.18817-100000@nebula.lyra.org>

On Thu, 9 Mar 2000, Guido van Rossum wrote:
>...
> I don't think so.  While my poor wording ("finalizer-free garbage")
> didn't make this clear, my references to earlier algorithms were
> intended to imply that this is garbage that consists of truly
> unreachable objects.  I have three lists: let's call them T(rash),
> R(oot-reachable), and F(inalizer-reachable).  The Schemenauer
> c.s. algorithm moves all reachable nodes to R.  I then propose to move
> all finalizers to F, and to run another pass of Schemenauer c.s. to
> also move all finalizer-reachable (but not root-reachable) nodes to F.
>...
> [Tim Peters]
> > I see Marc-Andre already declined to get sucked into the magical part of
> > this <wink>.  Greg should speak for his scheme, and I haven't made time to
> > understand it fully; my best guess is to call x.__cleanup__ for every object
> > in the SCC (but there's no clear way to decide which order to call them in,
> > and unless they're more restricted than __del__ methods they can create all
> > the same problems __del__ methods can!).

My scheme was to identify objects in F, but only those with a finalizer
(not the closure). Then call __cleanup__ on each of them, in arbitrary
order. If any are left after the sequence of __cleanup__ calls, then I
call it an error.

[ note that my proposal defined checking for a finalizer by calling
  tp_clean(TPCLEAN_CARE_CHECK); this accounts for class instances and for
  extension types with "heavy" processing in tp_dealloc ]

The third step was to use tp_clean to try and clean all other objects in a
safe fashion. Specifically: the objects have no finalizers, so there is no
special care needed in finalizing, so this third step should nuke
references that are stored in the object. This means object pointers are
still valid (we haven't dealloc'd), but the insides have been emptied. If
the third step does not remove all cycles, then one of the PyType objects
did not remove all references during the tp_clean call.

>...
> > If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly
> > 1" isn't special to me), I will consider it to be a bug.  So I want a way to
> > get it back from gc, so I can see what the heck it is, so I can fix my code
> > (or harass whoever did it to me).  __cleanup__ suffices for that, so the
> > very act of calling it is all I'm really after ("Python invoked __cleanup__
> > == Tim has a bug").

Agreed.

>...
> I suppose we can print some obnoxious message to stderr like

A valid alternative to raising an exception, but it falls into the whole
trap of "where does stderr go?"

>...
> But I'd still like to reclaim the memory.  If this is some
> long-running server process that is executing arbitrary Python
> commands sent to it by clients, it's not nice to leak, period.

If an exception is raised, the top-level server loop can catch it, log the
error, and keep going. But yes: it will leak.

> (Because of this, I will also need to trace functions, methods and
> modules -- these create massive cycles that currently require painful
> cleanup.  Of course I also need to track down all the roots
> then... :-)

Yes. It would be nice to have these participate in the "cleanup protocol"
that I've described. It should help a lot at Python finalization time,
effectively moving some special casing from import.c to the objects
themselves.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From jim@interet.com  Thu Mar  9 20:20:23 2000
From: jim@interet.com (James C. Ahlstrom)
Date: Thu, 09 Mar 2000 15:20:23 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff
 going to be adopted?
References: <000401bf89ea$e6e54180$79a0143f@tim>
Message-ID: <38C80787.7791A1A6@interet.com>

Tim Peters wrote:
> Screw the docs.  Pretend you're a newbie and *try* it.

I did try it.

> 
> import os
> p = os.popen("dir")
> while 1:
>     line = p.readline()
>     if not line:
>         break
>     print line
> 
> Type that in by hand, or stick it in a file & run it from a cmdline
> python.exe (which is a Windows console program).  Under Win95 the process
> freezes solid, and even trying to close the DOS box doesn't work.  You have
> to bring up the task manager and kill it that way.  I once traced this under

Point on the curve:  This program works perfectly on my
machine running NT.

> libc pipes ares as flaky as libc popen under Windows, Jim!  MarkH has the
> only versions of these things that come close to working under Windows (he
> wraps the native Win32 spellings of these things; MS's libc entry points
> (which Python uses now) are much worse).

I believe you when you say popen() is flakey.  It is a little
harder to believe it is not possible to write a _popen()
replacement using pipes which works.

Of course I wanted you to do it instead of me!  Well, if
I get any time before 1.6 comes out...

JimA


From gstein@lyra.org  Thu Mar  9 20:31:38 2000
From: gstein@lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 12:31:38 -0800 (PST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe
 stuff  going to be adopted?
In-Reply-To: <38C80787.7791A1A6@interet.com>
Message-ID: <Pine.LNX.4.10.10003091230510.18817-100000@nebula.lyra.org>

On Thu, 9 Mar 2000, James C. Ahlstrom wrote:
>...
> > libc pipes ares as flaky as libc popen under Windows, Jim!  MarkH has the
> > only versions of these things that come close to working under Windows (he
> > wraps the native Win32 spellings of these things; MS's libc entry points
> > (which Python uses now) are much worse).
> 
> I believe you when you say popen() is flakey.  It is a little
> harder to believe it is not possible to write a _popen()
> replacement using pipes which works.
> 
> Of course I wanted you to do it instead of me!  Well, if
> I get any time before 1.6 comes out...

It *has* been done. Bill Tutt did it a long time ago. That's what
win32pipe is all about.

-g

-- 
Greg Stein, http://www.lyra.org/


From jim@interet.com  Thu Mar  9 21:04:59 2000
From: jim@interet.com (James C. Ahlstrom)
Date: Thu, 09 Mar 2000 16:04:59 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipestuff
 going to be adopted?
References: <Pine.LNX.4.10.10003091230510.18817-100000@nebula.lyra.org>
Message-ID: <38C811FB.B6096FA4@interet.com>

Greg Stein wrote:
> 
> On Thu, 9 Mar 2000, James C. Ahlstrom wrote:
> > Of course I wanted you to do it instead of me!  Well, if
> > I get any time before 1.6 comes out...
> 
> It *has* been done. Bill Tutt did it a long time ago. That's what
> win32pipe is all about.

Thanks for the heads up!

Unfortunately, win32pipe is not in the core, and probably
covers more ground than just popen() and so might be a
maintenance problem.  And popen() is not written in it anyway.
So we are Not There Yet (TM).  Which I guess was Tim's
original point.

JimA


From mhammond@skippinet.com.au  Thu Mar  9 21:36:14 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Fri, 10 Mar 2000 08:36:14 +1100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <38C80787.7791A1A6@interet.com>
Message-ID: <ECEPKNMJLHAPFFJHDOJBEEEHCGAA.mhammond@skippinet.com.au>

> Point on the curve:  This program works perfectly on my
> machine running NT.

And running from Python.exe.  I bet you didnt try it from a GUI.

The situation is worse WRT Windows 95.  MS has a knowledge base article
describing the bug, and telling you how to work around it by using a
decicated .EXE.

So, out of the box, popen works only on a NT from a console - pretty sorry
state of affairs :-(

> I believe you when you say popen() is flakey.  It is a little
> harder to believe it is not possible to write a _popen()
> replacement using pipes which works.

Which is what I believe win32pipe.popen* are.

Mark.


From guido@python.org  Fri Mar 10 01:13:51 2000
From: guido@python.org (Guido van Rossum)
Date: Thu, 09 Mar 2000 20:13:51 -0500
Subject: [Python-Dev] writelines() not thread-safe
Message-ID: <200003100113.UAA27337@eric.cnri.reston.va.us>

Christian Tismer just did an exhaustive search for thread unsafe use
of Python operations, and found two weaknesses.  One is
posix.listdir(), which I had already found; the other is
file.writelines().  Here's a program that demonstrates the bug;
basically, while writelines is walking down the list, another thread
could truncate the list, causing PyList_GetItem() to fail or a string
object to be deallocated while writelines is using it.  On my SOlaris
7 system it typically crashes in the first or second iteration.

It's easy to fix: just don't use release the interpreter lock (get rid
of Py_BEGIN_ALLOW_THREADS c.s.).  This would however prevent other
threads from doing any work while this thread may be blocked for I/O.

An alternative solution is to put Py_BEGIN_ALLOW_THREADS and
Py_END_ALLOW_THREADS just around the fwrite() call.  This is safe, but
would require a lot of lock operations and would probably slow things
down too much.

Ideas?

--Guido van Rossum (home page: http://www.python.org/~guido/)
import os
import sys
import thread
import random
import time
import tempfile

def good_guy(fp, list):
    t0 = time.time()
    fp.seek(0)
    fp.writelines(list)
    t1 = time.time()
    print fp.tell(), "bytes written"
    return t1-t0

def bad_guy(dt, list):
    time.sleep(random.random() * dt)
    del list[:]

def main():
    infn = "/usr/dict/words"
    if sys.argv[1:]:
        infn = sys.argv[1]
    print "reading %s..." % infn
    fp = open(infn)
    list = fp.readlines()
    fp.close()
    print "read %d lines" % len(list)
    tfn = tempfile.mktemp()
    fp = None
    try:
        fp = open(tfn, "w")
        print "calibrating..."
        dt = 0.0
        n = 3
        for i in range(n):
            dt = dt + good_guy(fp, list)
        dt = dt / n # average time it took to write the list to disk
        print "dt =", round(dt, 3)
        i = 0
        while 1:
            i = i+1
            print "test", i
            copy = map(lambda x: x[1:], list)
            thread.start_new_thread(bad_guy, (dt, copy))
            good_guy(fp, copy)
    finally:
        if fp:
            fp.close()
        try:
            os.unlink(tfn)
        except os.error:
            pass

main()


From tim_one@email.msn.com  Fri Mar 10 02:13:51 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 21:13:51 -0500
Subject: [Python-Dev] writelines() not thread-safe
In-Reply-To: <200003100113.UAA27337@eric.cnri.reston.va.us>
Message-ID: <000601bf8a36$46ebf880$58a2143f@tim>

[Guido van Rossum]
> Christian Tismer just did an exhaustive search for thread unsafe use
> of Python operations, and found two weaknesses.  One is
> posix.listdir(), which I had already found; the other is
> file.writelines().  Here's a program that demonstrates the bug;
> basically, while writelines is walking down the list, another thread
> could truncate the list, causing PyList_GetItem() to fail or a string
> object to be deallocated while writelines is using it.  On my SOlaris
> 7 system it typically crashes in the first or second iteration.
>
> It's easy to fix: just don't use release the interpreter lock (get rid
> of Py_BEGIN_ALLOW_THREADS c.s.).  This would however prevent other
> threads from doing any work while this thread may be blocked for I/O.
>
> An alternative solution is to put Py_BEGIN_ALLOW_THREADS and
> Py_END_ALLOW_THREADS just around the fwrite() call.  This is safe, but
> would require a lot of lock operations and would probably slow things
> down too much.
>
> Ideas?

2.5:

1: Before releasing the lock, make a shallow copy of the list.

1.5:  As in #1, but iteratively peeling off "the next N" values, for some N
balancing the number of lock operations against the memory burden (I don't
care about the speed of a shallow copy here ...).

2. Pull the same trick list.sort() uses:  make the list object immutable for
the duration (I know you think that's a hack, and it is <wink>, but it costs
virtually nothing and would raise an approriate error when they attempted
the insane mutation).

I actually like #2 best now, but won't in the future, because
file_writelines() should really accept an argument of any sequence type.
This makes 1.5 a better long-term hack.

although-adding-1.5-to-1.6-is-confusing<wink>-ly y'rs  - tim


From tim_one@email.msn.com  Fri Mar 10 02:52:26 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 21:52:26 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <1259490837-400325@hypernet.com>
Message-ID: <000901bf8a3b$ab314660$58a2143f@tim>

[Gordon McM, aspires to make sense of the mess]
> It doesn't work for commands builtin to whatever "shell" you're
> using. That's different between cmd and command, and the
> various flavors, versions and extensions thereof.

It's not that simple, either; e.g., old apps invoking the 16-bit subsystem
can screw up too.  Look at Tcl's man page for "exec" and just *try* to wrap
your brain around all the caveats they were left with after throwing a few
thousand lines of C at this under their Windows port <wink>.

> FWIW, I gave up a long time ago. I use redirection and a
> tempfile. The few times I've wanted "interactive" control, I've
> used Win32Process, dup'ed, inherited handles... the whole 9
> yards. Why? Look at all the questions about popen and child
> processes in general, on platforms where it *works*, (if it
> weren't for Donn Cave, nobody'd get it to work anywhere <wink>).

Donn is downright scary that way.  I stopped using 'em too, of course.

> To reiterate Tim's point: *none* of the c runtime routines for
> process control on Windows are adequate (beyond os.system
> and living with a DOS box popping up).

No, os.system is a problem under command.com flavors of Windows too, as
system spawns a new shell and command.com's exit code is *always* 0.  So
Python's os.system returns 0 no matter what app the user *thinks* they were
running, and whether it worked or set the baby on fire.

> The raw Win32 CreateProcess does everything you could possibly want, but
> takes a week or more to understand, (if this arg is a that, then that arg
> is a whatsit, and the next is limited to the values X  and Z unless...).

Except that CreateProcess doesn't handle shell metacharacters, right?  Tcl
is the only language I've seen that really works hard at making
cmdline-style process control portable.

so-all-we-need-to-do-is-a-single-createprocess-to-invoke-tcl<wink>-ly y'rs
    - tim


From tim_one@email.msn.com  Fri Mar 10 02:52:24 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 21:52:24 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <14535.46175.991970.135642@weyr.cnri.reston.va.us>
Message-ID: <000801bf8a3b$aa0c4e60$58a2143f@tim>

[Fred L. Drake, Jr.]
> Tim (& others),
>   Would this additional text be sufficient for the os.popen()
> documentation?
>
> 	\strong{Note:} This function behaves unreliably under Windows
>         due to the native implementation of \cfunction{popen()}.

Yes, that's good!  If Mark/Bill's alternatives don't make it in, would also
be good to point to the PythonWin extensions (although MarkH will have to
give us the Official Name for that).

>   If someone cares to explain what's weird about it, that might be
> appropriate as well, but I've never used this under Windows.

As the rest of this thread should have made abundantly clear by now <0.9
wink>, it's such a mess across various Windows flavors that nobody can
explain it.


From tim_one@email.msn.com  Fri Mar 10 03:15:18 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 22:15:18 -0500
Subject: [Python-Dev] RE: finalization again
In-Reply-To: <20000309123731.A3664@acs.ucalgary.ca>
Message-ID: <000a01bf8a3e$dc8878c0$58a2143f@tim>

Quickie:

[Tim]
>> It's not obvious, but the SCCs can be found in linear time (via Tarjan's
>> algorithm, which is simple but subtle;

[NeilS]
> Wow, it seems like it should be more expensive than that.

Oh yes!  Many bright people failed to discover the trick; Tarjan didn't
discover it until (IIRC) the early 70's, and it was a surprise.  It's just a
few lines of simple code added to an ordinary depth-first search.  However,
while the code is simple, a correctness proof is not.  BTW, if it wasn't
clear, when talking about graph algorithms "linear" is usual taken to mean
"in the sum of the number of nodes and edges".  Cyclops.py finds all the
cycles in linear time in that sense, too (but does not find the SCCs in
linear time, at least not in theory -- in practice you can't tell the
difference <wink>).

> What are the space requirements?

Same as depth-first search, plus a way to associate an SCC id with each
node, plus a single global "id" vrbl.  So it's worst-case linear (in the
number of nodes) space.  See, e.g., any of the books in Sedgewick's
"Algorithms in [Language du Jour]" series for working code.

> Also, does the simple algorithm you used in Cyclops have a name?

Not officially, but it answers to "hey, dumb-ass!" <wink>.

then-again-so-do-i-so-make-eye-contact-ly y'rs  - tim


From bwarsaw@cnri.reston.va.us  Fri Mar 10 04:21:46 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 9 Mar 2000 23:21:46 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <000701bf89ab$80cb8e20$0d2d153f@tim>
 <200003091955.OAA26217@eric.cnri.reston.va.us>
Message-ID: <14536.30810.720836.886023@anthem.cnri.reston.va.us>

Okay, I had a flash of inspiration on the way home from my gig
tonight.  Of course, I'm also really tired so I'm sure Tim will shoot
this down in his usual witty but humbling way.  I just had to get this
out or I wouldn't sleep tonight.

What if you timestamp instances when you create them?  Then when you
have trash cycles with finalizers, you sort them and finalize in
chronological order.  The nice thing here is that the user can have
complete control over finalization order by controlling object
creation order.

Some random thoughts:

- Finalization order of cyclic finalizable trash is completely
  deterministic.

- Given sufficient resolution of your system clock, you should never
  have two objects with the same timestamp.

- You could reduce the memory footprint by only including a timestamp
  for objects whose classes have __del__'s at instance creation time.
  Sticking an __del__ into your class dynamically would have no effect
  on objects that are already created (and I wouldn't poke you with a
  pointy stick if even post-twiddle instances didn't get
  timestamped).  Thus, such objects would never be finalized -- tough
  luck.

- FIFO order /seems/ more natural to me than FILO, but then I rarely
  create cyclic objects, and almost never use __del__, so this whole
  argument has been somewhat academic to me :).

- The rule seems easy enough to implement, describe, and understand.

I think I came up with a few more points on the drive home, but my
post jam, post lightbulb endorphodrenalin rush is quickly subsiding,
so I leave the rest until tomorrow.

its-simply-a-matter-of-time-ly y'rs,
-Barry


From Moshe Zadka <mzadka@geocities.com>  Fri Mar 10 05:32:41 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Fri, 10 Mar 2000 07:32:41 +0200 (IST)
Subject: [Python-Dev] finalization again
In-Reply-To: <Pine.LNX.4.10.10003091205510.18817-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003100728580.23922-100000@sundial>

On Thu, 9 Mar 2000, Greg Stein wrote:

> > But I'd still like to reclaim the memory.  If this is some
> > long-running server process that is executing arbitrary Python
> > commands sent to it by clients, it's not nice to leak, period.
> 
> If an exception is raised, the top-level server loop can catch it, log the
> error, and keep going. But yes: it will leak.

And Tim's version stops the leaking if the server is smart enough:
occasionally, it will call gc.get_dangerous_cycles(), and nuke everything
it finds there. (E.g., clean up dicts and lists). Some destructor raises
an exception? Ignore it (or whatever). And no willy-nilly "but I'm using a
silly OS which has hardly any concept of stderr" problems! If the server
wants, it can just send a message to the log.

rooting-for-tim-ly y'rs, Z.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From tim_one@email.msn.com  Fri Mar 10 08:18:29 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Fri, 10 Mar 2000 03:18:29 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <200003091955.OAA26217@eric.cnri.reston.va.us>
Message-ID: <000001bf8a69$37d57b40$812d153f@tim>

This is getting to be fun, but afraid I can only make time for the first
easy one tonight:

[Tim, conjures a horrid vision of finalizers installing new __del__ methods,
 then sez ...
]
> The scheme above is meant to be bulletproof in the face of abuses even
> I can't conceive of <wink>.

[Guido]
> Are you *sure* your scheme deals with this?

Never said it did -- only that it *meant* to <wink>.  Ya, you got me.  The
things I thought I had *proved* I put in the numbered list, and in a rush
put the speculative stuff in the reply body.  One practical thing I think I
can prove today:  after finding SCCs, and identifying the safe nodes without
predecessors, all such nodes S1, S2, ... can be cleaned up without fear of
resurrection, or of cleaning something in Si causing anything in Sj (i!=j)
to get reclaimed either (at the time I wrote it, I could only prove that
cleaning *one* Si was non-problematic).  Barring, of course, this "__del__
from hell" pathology.  Also suspect that this claim is isomorphic to your
later elaboration on why

    the objects on T at this point cannot be resurrected by a finalizer
    that runs, since they aren't reachable from any finalizers

That is, exactly the same is true of "the safe (SCC super)nodes without
predecessors", so I expect we've just got two ways of identifying the same
set here.  Perhaps yours is bigger, though (I realize that isn't clear;
later).

> Let's look at an example.
> (Again, lowercase nodes have no finalizers.)  Take G:
>
>   a <=> b -> C
>
> [and cleaning b can trigger C.__del__ which can create
>  a.__class__.__del__ before a is decref'ed ...]
>
> ... and we're halfway committing a crime we said we would never commit
> (touching cyclical trash with finalizers).

Wholly agreed.

> I propose to disregard this absurd possibility,

How come you never propose to just shoot people <0.9 wink>?

> except to the extent that Python shouldn't crash -- but we make no
> guarantees to the user.

"Shouldn't crash" is essential, sure.  Carry it another step:  after C is
finalized, we get back to the loop clearing b.__dict__, and the refcount on
"a" falls to 0 next.  So the new a.__del__ gets called.  Since b was visible
to a, it's possible for a.__del__ to resurrect b, which latter is now in
some bizarre (from the programmer's POV) cleared state (or even in the bit
bucket, if we optimistically reclaim b's memory "early"!).

I can't (well, don't want to <wink>) believe it will be hard to stop this.
It's just irksome to need to think about it at all.

making-java's-gc-look-easy?-ly y'rs  - tim


From guido@python.org  Fri Mar 10 13:46:43 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 10 Mar 2000 08:46:43 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Thu, 09 Mar 2000 23:21:46 EST."
 <14536.30810.720836.886023@anthem.cnri.reston.va.us>
References: <000701bf89ab$80cb8e20$0d2d153f@tim> <200003091955.OAA26217@eric.cnri.reston.va.us>
 <14536.30810.720836.886023@anthem.cnri.reston.va.us>
Message-ID: <200003101346.IAA27847@eric.cnri.reston.va.us>

> What if you timestamp instances when you create them?  Then when you
> have trash cycles with finalizers, you sort them and finalize in
> chronological order.  The nice thing here is that the user can have
> complete control over finalization order by controlling object
> creation order.
> 
> Some random thoughts:
> 
> - Finalization order of cyclic finalizable trash is completely
>   deterministic.
> 
> - Given sufficient resolution of your system clock, you should never
>   have two objects with the same timestamp.

Forget the clock -- just use a counter that is incremented on each
allocation.

> - You could reduce the memory footprint by only including a timestamp
>   for objects whose classes have __del__'s at instance creation time.
>   Sticking an __del__ into your class dynamically would have no effect
>   on objects that are already created (and I wouldn't poke you with a
>   pointy stick if even post-twiddle instances didn't get
>   timestamped).  Thus, such objects would never be finalized -- tough
>   luck.
> 
> - FIFO order /seems/ more natural to me than FILO, but then I rarely
>   create cyclic objects, and almost never use __del__, so this whole
>   argument has been somewhat academic to me :).

Ai, there's the rub.

Suppose I have a tree with parent and child links.  And suppose I have
a rule that children need to be finalized before their parents (maybe
they represent a Unix directory tree, where you must rm the files
before you can rmdir the directory).  This suggests that we should
choose LIFO: you must create the parents first (you have to create a
directory before you can create files in it).  However, now we add
operations to move nodes around in the tree.  Suddenly you can have a
child that is older than its parent! Conclusion: the creation time is
useless; the application logic and actual link relationships are
needed.

> - The rule seems easy enough to implement, describe, and understand.
> 
> I think I came up with a few more points on the drive home, but my
> post jam, post lightbulb endorphodrenalin rush is quickly subsiding,
> so I leave the rest until tomorrow.
> 
> its-simply-a-matter-of-time-ly y'rs,
> -Barry

Time flies like an arrow -- fruit flies like a banana.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Mar 10 15:06:48 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 10 Mar 2000 10:06:48 -0500
Subject: [Python-Dev] writelines() not thread-safe
In-Reply-To: Your message of "Thu, 09 Mar 2000 21:13:51 EST."
 <000601bf8a36$46ebf880$58a2143f@tim>
References: <000601bf8a36$46ebf880$58a2143f@tim>
Message-ID: <200003101506.KAA28358@eric.cnri.reston.va.us>

OK, here's a patch for writelines() that supports arbitrary sequences
and fixes the lock problem using Tim's solution #1.5 (slicing 1000
items at a time).  It contains a fast path for when the argument is a
list, using PyList_GetSlice; otherwise it uses PyObject_GetItem and a
fixed list.

Please have a good look at this; I've only tested it lightly.

--Guido van Rossum (home page: http://www.python.org/~guido/)

Index: fileobject.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Objects/fileobject.c,v
retrieving revision 2.70
diff -c -r2.70 fileobject.c
*** fileobject.c	2000/02/29 13:59:28	2.70
--- fileobject.c	2000/03/10 14:55:47
***************
*** 884,923 ****
  	PyFileObject *f;
  	PyObject *args;
  {
! 	int i, n;
  	if (f->f_fp == NULL)
  		return err_closed();
! 	if (args == NULL || !PyList_Check(args)) {
  		PyErr_SetString(PyExc_TypeError,
! 			   "writelines() requires list of strings");
  		return NULL;
  	}
! 	n = PyList_Size(args);
! 	f->f_softspace = 0;
! 	Py_BEGIN_ALLOW_THREADS
! 	errno = 0;
! 	for (i = 0; i < n; i++) {
! 		PyObject *line = PyList_GetItem(args, i);
! 		int len;
! 		int nwritten;
! 		if (!PyString_Check(line)) {
! 			Py_BLOCK_THREADS
! 			PyErr_SetString(PyExc_TypeError,
! 				   "writelines() requires list of strings");
  			return NULL;
  		}
! 		len = PyString_Size(line);
! 		nwritten = fwrite(PyString_AsString(line), 1, len, f->f_fp);
! 		if (nwritten != len) {
! 			Py_BLOCK_THREADS
! 			PyErr_SetFromErrno(PyExc_IOError);
! 			clearerr(f->f_fp);
! 			return NULL;
  		}
  	}
! 	Py_END_ALLOW_THREADS
  	Py_INCREF(Py_None);
! 	return Py_None;
  }
  
  static PyMethodDef file_methods[] = {
--- 884,975 ----
  	PyFileObject *f;
  	PyObject *args;
  {
! #define CHUNKSIZE 1000
! 	PyObject *list, *line;
! 	PyObject *result;
! 	int i, j, index, len, nwritten, islist;
! 
  	if (f->f_fp == NULL)
  		return err_closed();
! 	if (args == NULL || !PySequence_Check(args)) {
  		PyErr_SetString(PyExc_TypeError,
! 			   "writelines() requires sequence of strings");
  		return NULL;
  	}
! 	islist = PyList_Check(args);
! 
! 	/* Strategy: slurp CHUNKSIZE lines into a private list,
! 	   checking that they are all strings, then write that list
! 	   without holding the interpreter lock, then come back for more. */
! 	index = 0;
! 	if (islist)
! 		list = NULL;
! 	else {
! 		list = PyList_New(CHUNKSIZE);
! 		if (list == NULL)
  			return NULL;
+ 	}
+ 	result = NULL;
+ 
+ 	for (;;) {
+ 		if (islist) {
+ 			Py_XDECREF(list);
+ 			list = PyList_GetSlice(args, index, index+CHUNKSIZE);
+ 			if (list == NULL)
+ 				return NULL;
+ 			j = PyList_GET_SIZE(list);
  		}
! 		else {
! 			for (j = 0; j < CHUNKSIZE; j++) {
! 				line = PySequence_GetItem(args, index+j);
! 				if (line == NULL) {
! 					if (PyErr_ExceptionMatches(PyExc_IndexError)) {
! 						PyErr_Clear();
! 						break;
! 					}
! 					/* Some other error occurred.
! 					   Note that we may lose some output. */
! 					goto error;
! 				}
! 				if (!PyString_Check(line)) {
! 					PyErr_SetString(PyExc_TypeError,
! 					 "writelines() requires sequences of strings");
! 					goto error;
! 				}
! 				PyList_SetItem(list, j, line);
! 			}
! 		}
! 		if (j == 0)
! 			break;
! 
! 		Py_BEGIN_ALLOW_THREADS
! 		f->f_softspace = 0;
! 		errno = 0;
! 		for (i = 0; i < j; i++) {
! 			line = PyList_GET_ITEM(list, i);
! 			len = PyString_GET_SIZE(line);
! 			nwritten = fwrite(PyString_AS_STRING(line),
! 					  1, len, f->f_fp);
! 			if (nwritten != len) {
! 				Py_BLOCK_THREADS
! 				PyErr_SetFromErrno(PyExc_IOError);
! 				clearerr(f->f_fp);
! 				Py_DECREF(list);
! 				return NULL;
! 			}
  		}
+ 		Py_END_ALLOW_THREADS
+ 
+ 		if (j < CHUNKSIZE)
+ 			break;
+ 		index += CHUNKSIZE;
  	}
! 
  	Py_INCREF(Py_None);
! 	result = Py_None;
!   error:
! 	Py_XDECREF(list);
! 	return result;
  }
  
  static PyMethodDef file_methods[] = {


From skip@mojam.com (Skip Montanaro)  Fri Mar 10 15:28:13 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Fri, 10 Mar 2000 09:28:13 -0600
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
Message-ID: <200003101528.JAA15951@beluga.mojam.com>

Consider the following snippet of code from MySQLdb.py:

    try:
        self._query(query % escape_row(args, qc))
    except TypeError:
        self._query(query % escape_dict(args, qc))

It's not quite right.  There are at least four reasons I can think of why
the % operator might raise a TypeError:

    1. query has not enough format specifiers
    2. query has too many format specifiers
    3. argument type mismatch between individual format specifier and
       corresponding argument
    4. query expects dist-style interpolation

The except clause only handles the last case.  That leaves the other three
cases mishandled.  The above construct pretends that all TypeErrors possible
are handled by calling escape_dict() instead of escape_row().

I stumbled on case 2 yesterday and got a fairly useless error message when
the code in the except clause also bombed.  Took me a few minutes of head
scratching to see that I had an extra %s in my format string.  A note to
Andy Dustman, MySQLdb's author, yielded the following modified version:

    try:
        self._query(query % escape_row(args, qc))
    except TypeError, m:
        if m.args[0] == "not enough arguments for format string": raise
        if m.args[0] == "not all arguments converted": raise
        self._query(query % escape_dict(args, qc))

This will do the trick for me for the time being.  Note, however, that the
only way for Andy to decide which of the cases occurred (case 3 still isn't
handled above, but should occur very rarely in MySQLdb since it only uses
the more accommodating %s as a format specifier) is to compare the string
value of the message to see which of the four cases was raised.

This strong coupling via the error message text between the exception being
raised (in C code, in this case) and the place where it's caught seems bad
to me and encourages authors to either not recover from errors or to recover
from them in the crudest fashion.  If Guido decides to tweak the TypeError
message in any fashion, perhaps to include the count of arguments in the
format string and argument tuple, this code will break.  It makes me wonder
if there's not a better mechanism waiting to be discovered.  Would it be
possible to publish an interface of some sort via the exceptions module that
would allow symbolic names or dictionary references to be used to decide
which case is being handled?  I envision something like the following in
exceptions.py:

    UNKNOWN_ERROR_CATEGORY = 0
    TYP_SHORT_FORMAT = 1
    TYP_LONG_FORMAT = 2
    ...
    IND_BAD_RANGE = 1

    message_map = {
        # leave
        (TypeError, ("not enough arguments for format string",)):
	    TYP_SHORT_FORMAT,
	(TypeError, ("not all arguments converted",)):
	    TYP_LONG_FORMAT,
	...
	(IndexError, ("list index out of range",)): IND_BAD_RANGE,
	...
    }

This would isolate the raw text of exception strings to just a single place
(well, just one place on the exception handling side of things).  It would
be used something like

    try:
        self._query(query % escape_row(args, qc))
    except TypeError, m:
        from exceptions import *
        exc_case = message_map.get((TypeError, m.args), UNKNOWN_ERROR_CATEGORY)
        if exc_case in [UNKNOWN_ERROR_CATEGORY,TYP_SHORT_FORMAT,
		        TYP_LONG_FORMAT]: raise
        self._query(query % escape_dict(args, qc))

This could be added to exceptions.py without breaking existing code.

Does this (or something like it) seem like a reasonable enhancement for
Py2K?  If we can narrow things down to an implementable solution I'll create 
a patch.

Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/


From guido@python.org  Fri Mar 10 16:17:56 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 10 Mar 2000 11:17:56 -0500
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: Your message of "Fri, 10 Mar 2000 09:28:13 CST."
 <200003101528.JAA15951@beluga.mojam.com>
References: <200003101528.JAA15951@beluga.mojam.com>
Message-ID: <200003101617.LAA28722@eric.cnri.reston.va.us>

> Consider the following snippet of code from MySQLdb.py:

Skip, I'm not familiar with MySQLdb.py, and I have no idea what your
example is about.  From the rest of the message I feel it's not about
MySQLdb at all, but about string formatting, butthe point escapes me
because you never quite show what's in the format string and what
error that gives.  Could you give some examples based on first
principles?  A simple interactive session showing the various errors
would be helpful...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward@cnri.reston.va.us  Fri Mar 10 19:05:04 2000
From: gward@cnri.reston.va.us (Greg Ward)
Date: Fri, 10 Mar 2000 14:05:04 -0500
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: <200003101617.LAA28722@eric.cnri.reston.va.us>; from guido@python.org on Fri, Mar 10, 2000 at 11:17:56AM -0500
References: <200003101528.JAA15951@beluga.mojam.com> <200003101617.LAA28722@eric.cnri.reston.va.us>
Message-ID: <20000310140503.A8619@cnri.reston.va.us>

On 10 March 2000, Guido van Rossum said:
> Skip, I'm not familiar with MySQLdb.py, and I have no idea what your
> example is about.  From the rest of the message I feel it's not about
> MySQLdb at all, but about string formatting, butthe point escapes me
> because you never quite show what's in the format string and what
> error that gives.  Could you give some examples based on first
> principles?  A simple interactive session showing the various errors
> would be helpful...

I think Skip's point was just this: "TypeError" isn't expressive
enough.  If you catch TypeError on a statement with multiple possible
type errors, you don't know which one you caught.  Same holds for any
exception type, really: a given statement could blow up with ValueError
for any number of reasons.  Etc., etc.

One possible solution, and I think this is what Skip was getting at, is
to add an "error code" to the exception object that identifies the error
more reliably than examining the error message.  It's just the
errno/strerror dichotomy: strerror is for users, errno is for code.  I
think Skip is just saying that Pythone exception objets need an errno
(although it doesn't have to be a number).  It would probably only make
sense to define error codes for exceptions that can be raised by Python
itself, though.

        Greg


From skip@mojam.com (Skip Montanaro)  Fri Mar 10 20:17:30 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Fri, 10 Mar 2000 14:17:30 -0600 (CST)
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: <200003101617.LAA28722@eric.cnri.reston.va.us>
References: <200003101528.JAA15951@beluga.mojam.com>
 <200003101617.LAA28722@eric.cnri.reston.va.us>
Message-ID: <14537.22618.656740.296408@beluga.mojam.com>

    Guido> Skip, I'm not familiar with MySQLdb.py, and I have no idea what
    Guido> your example is about.  From the rest of the message I feel it's
    Guido> not about MySQLdb at all, but about string formatting, 

My apologies.  You're correct, it's really not about MySQLdb. It's about
handling multiple cases raised by the same exception.

First, a more concrete example that just uses simple string formats:

    code		exception
    "%s" % ("a", "b")	TypeError: 'not all arguments converted'
    "%s %s" % "a"	TypeError: 'not enough arguments for format string'
    "%(a)s" % ("a",)	TypeError: 'format requires a mapping'
    "%d" % {"a": 1}	TypeError: 'illegal argument type for built-in operation'

Let's presume hypothetically that it's possible to recover from some subset
of the TypeErrors that are raised, but not all of them.  Now, also presume
that the format strings and the tuple, string or dict literals I've given
above can be stored in variables (which they can).

If we wrap the code in a try/except statement, we can catch the TypeError
exception and try to do something sensible.  This is precisely the trick
that Andy Dustman uses in MySQLdb: first try expanding the format string
using a tuple as the RH operand, then try with a dict if that fails.

Unfortunately, as you can see from the above examples, there are four cases
that need to be handled.  To distinguish them currently, you have to compare
the message you get with the exception to string literals that are generally
defined in C code in the interpreter.  Here's what Andy's original code
looked like stripped of the MySQLdb-ese:

    try:
        x = format % tuple_generating_function(...)
    except TypeError:
        x = format % dict_generating_function(...)

That doesn't handle the first two cases above.  You have to inspect the
message that raise sends out:

    try:
        x = format % tuple_generating_function(...)
    except TypeError, m:
        if m.args[0] == "not all arguments converted": raise
        if m.args[0] == "not enough arguments for format string": raise
        x = format % dict_generating_function(...)

This comparison of except arguments with hard-coded strings (especially ones
the programmer has no direct control over) seems fragile to me.  If you
decide to reword the error message strings, you break someone's code.

In my previous message I suggested collecting this fragility in the
exceptions module where it can be better isolated.  My solution is a bit
cumbersome, but could probably be cleaned up somewhat, but basically looks
like 

    try:
        x = format % tuple_generating_function(...)
    except TypeError, m:
        import exceptions
	msg_case = exceptions.message_map.get((TypeError, m.args),
				              exceptions.UNKNOWN_ERROR_CATEGORY)
	# punt on the cases we can't recover from
        if msg_case == exceptions.TYP_SHORT_FORMAT: raise
        if msg_case == exceptions.TYP_LONG_FORMAT: raise
        if msg_case == exceptions.UNKNOWN_ERROR_CATEGORY: raise
	# handle the one we can
        x = format % dict_generating_function(...)

In private email that crossed my original message, Andy suggested defining
more standard exceptions, e.g.:

    class FormatError(TypeError): pass
    class TooManyElements(FormatError): pass
    class TooFewElements(FormatError): pass

then raising the appropriate error based on the circumstance.  Code that
catches TypeError exceptions would still work.

So there are two possible changes on the table:

    1. define more standard exceptions so you can distinguish classes of
       errors on a more fine-grained basis using just the first argument of
       the except clause.

    2. provide some machinery in exceptions.py to allow programmers a
       measure of uncoupling from using hard-coded strings to distinguish
       cases. 

Skip


From skip@mojam.com (Skip Montanaro)  Fri Mar 10 20:21:11 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Fri, 10 Mar 2000 14:21:11 -0600 (CST)
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: <20000310140503.A8619@cnri.reston.va.us>
References: <200003101528.JAA15951@beluga.mojam.com>
 <200003101617.LAA28722@eric.cnri.reston.va.us>
 <20000310140503.A8619@cnri.reston.va.us>
Message-ID: <14537.22839.664131.373727@beluga.mojam.com>

    Greg> One possible solution, and I think this is what Skip was getting
    Greg> at, is to add an "error code" to the exception object that
    Greg> identifies the error more reliably than examining the error
    Greg> message.  It's just the errno/strerror dichotomy: strerror is for
    Greg> users, errno is for code.  I think Skip is just saying that
    Greg> Pythone exception objets need an errno (although it doesn't have
    Greg> to be a number).  It would probably only make sense to define
    Greg> error codes for exceptions that can be raised by Python itself,
    Greg> though.

I'm actually allowing the string to be used as the error code.  If you raise 
TypeError with "not all arguments converted" as the argument, then that
string literal will appear in the definition of exceptions.message_map as
part of a key.  The programmer would only refer to the args attribute of the 
object being raised.

either-or-makes-no-real-difference-to-me-ly y'rs,

Skip


From bwarsaw@cnri.reston.va.us  Fri Mar 10 20:56:45 2000
From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us)
Date: Fri, 10 Mar 2000 15:56:45 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <000701bf89ab$80cb8e20$0d2d153f@tim>
 <200003091955.OAA26217@eric.cnri.reston.va.us>
 <14536.30810.720836.886023@anthem.cnri.reston.va.us>
 <200003101346.IAA27847@eric.cnri.reston.va.us>
Message-ID: <14537.24973.579056.533282@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido@python.org> writes:

    >> Given sufficient resolution of your system
    >> clock, you should never have two objects with the same
    >> timestamp.

    GvR> Forget the clock -- just use a counter that is incremented on
    GvR> each allocation.

Good idea.

    GvR> Suppose I have a tree with parent and child links.  And
    GvR> suppose I have a rule that children need to be finalized
    GvR> before their parents (maybe they represent a Unix directory
    GvR> tree, where you must rm the files before you can rmdir the
    GvR> directory).  This suggests that we should choose LIFO: you
    GvR> must create the parents first (you have to create a directory
    GvR> before you can create files in it).  However, now we add
    GvR> operations to move nodes around in the tree.  Suddenly you
    GvR> can have a child that is older than its parent! Conclusion:
    GvR> the creation time is useless; the application logic and
    GvR> actual link relationships are needed.

One potential way to solve this is to provide an interface for
refreshing the counter; for discussion purposes, I'll call this
sys.gcrefresh(obj).  Throws a TypeError if obj isn't a finalizable
instance.  Otherwise, it sets the "timestamp" to the current counter
value and increments the counter.

Thus, in your example, when the child node is reparented, you
sys.gcrefresh(child) and now the parent is automatically older.  Of
course, what if the child has its own children?  You've now got an age
graph like this

    parent > child < grandchild

with the wrong age relationship between the parent and grandchild.  So
when you refresh, you've got to walk down the containment tree making
sure your grandkids are "younger" than yourself.  E.g.:

class Node:
    ...
    def __del__(self):
	...

    def reparent(self, node):
	self.parent = node
	self.refresh()

    def refresh(self):
	sys.gcrefresh(self)
	for c in self.children:
	    c.refresh()

The point to all this is that it gives explicit control of the
finalizable cycle reclamation order to the user, via a fairly easy to
understand, and manipulate mechanism.

twas-only-a-flesh-wound-but-waiting-for-the-next-stroke-ly y'rs,
-Barry


From jim@interet.com  Fri Mar 10 21:14:45 2000
From: jim@interet.com (James C. Ahlstrom)
Date: Fri, 10 Mar 2000 16:14:45 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff
 going to be adopted?
References: <000801bf8a3b$aa0c4e60$58a2143f@tim>
Message-ID: <38C965C4.B164C2D5@interet.com>

Tim Peters wrote:
> 
> [Fred L. Drake, Jr.]
> > Tim (& others),
> >   Would this additional text be sufficient for the os.popen()
> > documentation?
> >
> >       \strong{Note:} This function behaves unreliably under Windows
> >         due to the native implementation of \cfunction{popen()}.
> 
> Yes, that's good!  If Mark/Bill's alternatives don't make it in, would also
> be good to point to the PythonWin extensions (although MarkH will have to
> give us the Official Name for that).

Well, it looks like this thread has fizzled out.  But what did we
decide?

Changing the docs to say popen() "doesn't work reliably" is
a little weak.  Maybe removing popen() is better, and demanding
that Windows users use win32pipe.

I played around with a patch to posixmodule.c which eliminates
_popen() and implements os.popen() using CreatePipe().  It
sort of works on NT and fails on 95.  Anyway, I am stuck on
how to make a Python file object from a pipe handle.

Would it be a good idea to extract the Wisdom from win32pipe
and re-implement os.popen() either in C or by using win32pipe
directly?  Using C is simple and to the point.

I feel Tim's original complaint that popen() is a Problem
still hasn't been fixed.

JimA


From Moshe Zadka <mzadka@geocities.com>  Fri Mar 10 21:29:05 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Fri, 10 Mar 2000 23:29:05 +0200 (IST)
Subject: [Python-Dev] finalization again
In-Reply-To: <14537.24973.579056.533282@anthem.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003102324250.4723-100000@sundial>

On Fri, 10 Mar 2000 bwarsaw@cnri.reston.va.us wrote:

> One potential way to solve this is to provide an interface for
> refreshing the counter; for discussion purposes, I'll call this
> sys.gcrefresh(obj).

Barry, there are other problems with your scheme, but I won't even try to 
point those out: having to call a function whose purpose can only be
described in terms of a concrete implementation of a garbage collection
scheme is simply unacceptable. I can almost see you shouting "Come back
here, I'll bite your legs off" <wink>.

> The point to all this is that it gives explicit control of the
> finalizable cycle reclamation order to the user, via a fairly easy to
> understand, and manipulate mechanism.

Oh? This sounds like the most horrendus mechanism alive....

you-probably-jammed-a-*little*-too-loud-ly y'rs, Z.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From bwarsaw@cnri.reston.va.us  Fri Mar 10 22:15:27 2000
From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us)
Date: Fri, 10 Mar 2000 17:15:27 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <14537.24973.579056.533282@anthem.cnri.reston.va.us>
 <Pine.GSO.4.10.10003102324250.4723-100000@sundial>
Message-ID: <14537.29695.532507.197580@anthem.cnri.reston.va.us>

Just throwing out ideas.


From DavidA@ActiveState.com  Fri Mar 10 22:20:45 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Fri, 10 Mar 2000 14:20:45 -0800
Subject: [Python-Dev] finalization again
In-Reply-To: <Pine.GSO.4.10.10003102324250.4723-100000@sundial>
Message-ID: <LMBBIEIJKMPMLBONJMFCIEHMCCAA.DavidA@ActiveState.com>

Moshe, some _arguments_ backing your feelings might give them more weight...
As they stand, they're just insults, and if I were Barry I'd ignore them.

--david ascher

Moshe Zadka:

> Barry, there are other problems with your scheme, but I won't even try to
> point those out: having to call a function whose purpose can only be
> described in terms of a concrete implementation of a garbage collection
> scheme is simply unacceptable. I can almost see you shouting "Come back
> here, I'll bite your legs off" <wink>.
> [...]
> Oh? This sounds like the most horrendus mechanism alive....


From skip@mojam.com (Skip Montanaro)  Fri Mar 10 22:40:02 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Fri, 10 Mar 2000 16:40:02 -0600
Subject: [Python-Dev] on the suitability of ideas tossed out to python-dev
Message-ID: <200003102240.QAA07881@beluga.mojam.com>

Folks, let's not forget that python-dev is a place where oftentimes
half-baked ideas will get advanced.  I came up with an idea about decoupling
error handling from exception message strings.  I don't expect my idea to be
adopted as is.  Similarly, Barry's ideas about object timestamps were
admittedly conceived late at night in the thrill following an apparently
good gig. (I like the idea that every object has a modtime, but for other
reasons than Barry suggested.)

My feeling is that bad ideas will get winnowed out or drastically modified
quickly enough anyway.  Think of these early ideas as little more than
brainstorms.  A lot of times if I have an idea, I feel I need to put it down
on my virtual whiteboard quickly, because a) I often don't have a lot of
time to pursue stuff (do it now or it won't get done), b) because bad ideas
can be the catalyst for better ideas, and c) if I don't do it immediately,
I'll probably forget the idea altogether, thus missing the opportunity for
reason b altogether.

Try and collect a bunch of ideas before shooting any down and see what falls
out.  The best ideas will survive.  When people start proving things and
using fancy diagrams like "a <=> b -> C", then go ahead and get picky... ;-)

Have a relaxing, thought provoking weekend.  I'm going to go see a movie
this evening with my wife and youngest son, appropriately enough titled, "My
Dog Skip".  Enough Pythoneering for one day...

bow-wow-ly y'rs,

Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/


From guido@python.org  Sat Mar 11 00:20:01 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 10 Mar 2000 19:20:01 -0500
Subject: [Python-Dev] Unicode patches checked in
Message-ID: <200003110020.TAA17777@eric.cnri.reston.va.us>

I've just checked in a massive patch from Marc-Andre Lemburg which
adds Unicode support to Python.  This work was financially supported
by Hewlett-Packard.  Marc-Andre has done a tremendous amount of work,
for which I cannot thank him enough.

We're still awaiting some more things: Marc-Andre gave me
documentation patches which will be reviewed by Fred Drake before they
are checked in; Fredrik Lundh has developed a new regular expression
which is Unicode-aware and which should be checked in real soon now.
Also, the documentation is probably incomplete and will be updated,
and of course there may be bugs -- this should be considered alpha
software.  However, I believe it is quite good already, otherwise I
wouldn't have checked it in!

I'd like to invite everyone with an interest in Unicode or Python 1.6
to check out this new Unicode-aware Python, so that we can ensure a
robust code base by the time Python 1.6 is released (planned release
date: June 1, 2000).  The download links are below.

Links:

http://www.python.org/download/cvs.html
    Instructions on how to get access to the CVS version.
    (David Ascher is making nightly tarballs of the CVS version
    available at http://starship.python.net/crew/da/pythondists/)

http://starship.python.net/crew/lemburg/unicode-proposal.txt
    The latest version of the specification on which the Marc
    has based his implementation.

http://www.python.org/sigs/i18n-sig/
    Home page of the i18n-sig (Internationalization SIG), which has
    lots of other links about this and related issues.

http://www.python.org/search/search_bugs.html
    The Python Bugs List.  Use this for all bug reports.

Note that next Tuesday I'm going on a 10-day trip, with limited time
to read email and no time to solve problems.  The usual crowd will
take care of urgent updates.  See you at the Intel Computing Continuum
Conference in San Francisco or at the Python Track at Software
Development 2000 in San Jose!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one@email.msn.com  Sat Mar 11 02:03:47 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Fri, 10 Mar 2000 21:03:47 -0500
Subject: [Python-Dev] Finalization in Eiffel
Message-ID: <000701bf8afe$0a0fd800$a42d153f@tim>

Eiffel is Bertrand Meyer's "design by contract" OO language.  Meyer took
extreme care in its design, and has written extensively and articulately
about the design -- agree with him or not, he's always worth reading!

I used Eiffel briefly a few years ago, just out of curiosity.  I didn't
recall even bumping into a notion of destructors.  Turns out it does have
them, but they're appallingly (whether relative to Eiffel's usual clarity,
or even relative to C++'s usual lack thereof <0.9 wink>) ill-specified.

An Eiffel class can register a destructor by inheriting from the system
MEMORY class and overriding the latter's "dispose()".  This appears to be
viewed as a low-level facility, and neither OOSC (2nd ed) nor "Eiffel: The
Language" say much about its semantics.  Within dispose, you're explicitly
discouraged from invoking methods on *any* other object, and resurrection is
right out the window.  But the language doesn't appear to check for any of
that, which is extremely un-Eiffel-like.  Many msgs on comp.lang.eiffel from
people who should know suggest that all but one Eiffel implementation pay no
attention at all to reachability during gc, and that none support
resurrection.  If you need ordering during finalization, the advice is to
write that part in C/C++.  Violations of the vague rules appear to lead to
random system damage(!).

Looking at various Eiffel pkgs on the web, the sole use of dispose was in
one-line bodies that released external resources (like memory & db
connections) via calling an external C/C++ function.

jealous-&-appalled-at-the-same-time<wink>-ly y'rs  - tim


From tim_one@email.msn.com  Sat Mar 11 02:03:50 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Fri, 10 Mar 2000 21:03:50 -0500
Subject: [Python-Dev] Conventional wisdom on finalization
Message-ID: <000801bf8afe$0b3df7c0$a42d153f@tim>

David Chase maintains a well-regarded GC FAQ, at

    http://www.iecc.com/gclist/GC-faq.html

Interested folks should look it up.  A couple highlights:

On cycles with finalizers:

    In theory, of course, a cycle in the graph of objects to be finalized
    will prevent a topological sort from succeeding.  In practice, the
    "right" thing to do appears to be to signal an error (at least when
    debugging) and let the programmer clean this up.  People with experience
    on large systems report that such cycles are in fact exceedingly rare
    (note, however, that some languages define "finalizers" for almost
    every object, and that was not the case for the large systems studied
    -- there, finalizers were not too common).

On Java's "finalizer called only once" rule:

    if an object is revived in finalization, that is fine, but its
    finalizer will not run a second time. (It isn't clear if this is a
    matter of design, or merely an accident of the first implementation
    of the language, but it is in the specification now. Obviously, this
    encourages careful use of finalization, in much the same way that
    driving without seatbelts encourages careful driving.)

Until today, I had no idea I was so resolutely conventional <wink>.

seems-we're-trying-to-do-more-than-anyone-other-than-us-expects-ly
    y'rs  - tim


From shichang@icubed.com" <shichang@icubed.com  Fri Mar 10 22:33:11 2000
From: shichang@icubed.com" <shichang@icubed.com (Shichang Zhao)
Date: Fri, 10 Mar 2000 22:33:11 -0000
Subject: [Python-Dev] RE: Unicode patches checked in
Message-ID: <01BF8AE0.9E911980.shichang@icubed.com>

I would love to test the Python 1.6 (Unicode support) in Chinese language 
aspect, but I don't know where I can get a copy of OS that supports 
Chinese. Anyone can point me a direction?

-----Original Message-----
From:	Guido van Rossum [SMTP:guido@python.org]
Sent:	Saturday, March 11, 2000 12:20 AM
To:	Python mailing list; python-announce@python.org; python-dev@python.org; 
i18n-sig@python.org; string-sig@python.org
Cc:	Marc-Andre Lemburg
Subject:	Unicode patches checked in

I've just checked in a massive patch from Marc-Andre Lemburg which
adds Unicode support to Python.  This work was financially supported
by Hewlett-Packard.  Marc-Andre has done a tremendous amount of work,
for which I cannot thank him enough.

We're still awaiting some more things: Marc-Andre gave me
documentation patches which will be reviewed by Fred Drake before they
are checked in; Fredrik Lundh has developed a new regular expression
which is Unicode-aware and which should be checked in real soon now.
Also, the documentation is probably incomplete and will be updated,
and of course there may be bugs -- this should be considered alpha
software.  However, I believe it is quite good already, otherwise I
wouldn't have checked it in!

I'd like to invite everyone with an interest in Unicode or Python 1.6
to check out this new Unicode-aware Python, so that we can ensure a
robust code base by the time Python 1.6 is released (planned release
date: June 1, 2000).  The download links are below.

Links:

http://www.python.org/download/cvs.html
    Instructions on how to get access to the CVS version.
    (David Ascher is making nightly tarballs of the CVS version
    available at http://starship.python.net/crew/da/pythondists/)

http://starship.python.net/crew/lemburg/unicode-proposal.txt
    The latest version of the specification on which the Marc
    has based his implementation.

http://www.python.org/sigs/i18n-sig/
    Home page of the i18n-sig (Internationalization SIG), which has
    lots of other links about this and related issues.

http://www.python.org/search/search_bugs.html
    The Python Bugs List.  Use this for all bug reports.

Note that next Tuesday I'm going on a 10-day trip, with limited time
to read email and no time to solve problems.  The usual crowd will
take care of urgent updates.  See you at the Intel Computing Continuum
Conference in San Francisco or at the Python Track at Software
Development 2000 in San Jose!

--Guido van Rossum (home page: http://www.python.org/~guido/)

--
http://www.python.org/mailman/listinfo/python-list


From Moshe Zadka <mzadka@geocities.com>  Sat Mar 11 09:10:12 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 11 Mar 2000 11:10:12 +0200 (IST)
Subject: [Python-Dev] Unicode: When Things Get Hairy
Message-ID: <Pine.GSO.4.10.10003111108090.8019-100000@sundial>

The following "problem" is easy to fix. However, what I wanted to know is
if people (Skip and Guido most importantly) think it is a problem:

>>> "a" in u"bbba"
1
>>> u"a" in "bbba"
Traceback (innermost last):
  File "<stdin>", line 1, in ?
TypeError: string member test needs char left operand

Suggested fix: in stringobject.c, explicitly allow a unicode char left
operand.

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From mal@lemburg.com  Sat Mar 11 10:24:26 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 11 Mar 2000 11:24:26 +0100
Subject: [Python-Dev] Unicode: When Things Get Hairy
References: <Pine.GSO.4.10.10003111108090.8019-100000@sundial>
Message-ID: <38CA1EDA.423F8A2C@lemburg.com>

Moshe Zadka wrote:
> 
> The following "problem" is easy to fix. However, what I wanted to know is
> if people (Skip and Guido most importantly) think it is a problem:
> 
> >>> "a" in u"bbba"
> 1
> >>> u"a" in "bbba"
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
> TypeError: string member test needs char left operand
> 
> Suggested fix: in stringobject.c, explicitly allow a unicode char left
> operand.

Hmm, this must have been introduced by your contains code...
it did work before.

The normal action taken by the Unicode and the string
code in these mixed type situations is to first
convert everything to Unicode and then retry the operation.
Strings are interpreted as UTF-8 during this conversion.

To simplify this task, I added method APIs to the
Unicode object which do the conversion for you (they
apply all the necessariy coercion business to all arguments).
I guess adding another PyUnicode_Contains() wouldn't hurt :-)

Perhaps I should also add a tp_contains slot to the
Unicode object which then uses the above API as well.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Moshe Zadka <mzadka@geocities.com>  Sat Mar 11 11:05:48 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 11 Mar 2000 13:05:48 +0200 (IST)
Subject: [Python-Dev] Unicode: When Things Get Hairy
In-Reply-To: <38CA1EDA.423F8A2C@lemburg.com>
Message-ID: <Pine.GSO.4.10.10003111300320.8673-100000@sundial>

On Sat, 11 Mar 2000, M.-A. Lemburg wrote:

> Hmm, this must have been introduced by your contains code...
> it did work before.

Nope: the string "in" semantics were forever special-cased. Guido beat me
soundly for trying to change the semantics...

> The normal action taken by the Unicode and the string
> code in these mixed type situations is to first
> convert everything to Unicode and then retry the operation.
> Strings are interpreted as UTF-8 during this conversion.

Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments.
Should it? (Again, it didn't before). If it does, then the order of
testing for seq_contains and seq_getitem and conversions 

> Perhaps I should also add a tp_contains slot to the
> Unicode object which then uses the above API as well.

But that wouldn't help at all for 

u"a" in "abbbb"

PySequence_Contains only dispatches on the container argument :-(

(BTW: I discovered it while contemplating adding a seq_contains (not
tp_contains) to unicode objects to optimize the searching for a bit.)

PS:
MAL: thanks for the a great birthday present! I'm enjoying the unicode
patch a lot.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From guido@python.org  Sat Mar 11 12:16:06 2000
From: guido@python.org (Guido van Rossum)
Date: Sat, 11 Mar 2000 07:16:06 -0500
Subject: [Python-Dev] Unicode: When Things Get Hairy
In-Reply-To: Your message of "Sat, 11 Mar 2000 13:05:48 +0200."
 <Pine.GSO.4.10.10003111300320.8673-100000@sundial>
References: <Pine.GSO.4.10.10003111300320.8673-100000@sundial>
Message-ID: <200003111216.HAA12651@eric.cnri.reston.va.us>

[Moshe discovers that u"a" in "bbba" raises TypeError]

[Marc-Andre]
> > Hmm, this must have been introduced by your contains code...
> > it did work before.
> 
> Nope: the string "in" semantics were forever special-cased. Guido beat me
> soundly for trying to change the semantics...

But I believe that Marc-Andre added a special case for Unicode in
PySequence_Contains.  I looked for evidence, but the last snapshot that
I actually saved and built before Moshe's code was checked in is from
2/18 and it isn't in there.  Yet I believe Marc-Andre.  The special
case needs to be added back to string_contains in stringobject.c.

> > The normal action taken by the Unicode and the string
> > code in these mixed type situations is to first
> > convert everything to Unicode and then retry the operation.
> > Strings are interpreted as UTF-8 during this conversion.
> 
> Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments.
> Should it? (Again, it didn't before). If it does, then the order of
> testing for seq_contains and seq_getitem and conversions 

Or it could be done this way.

> > Perhaps I should also add a tp_contains slot to the
> > Unicode object which then uses the above API as well.

Yes.

> But that wouldn't help at all for 
> 
> u"a" in "abbbb"

It could if PySeqeunce_Contains would first look for a string and a
unicode argument (in either order) and in that case convert the string
to unicode.

> PySequence_Contains only dispatches on the container argument :-(
> 
> (BTW: I discovered it while contemplating adding a seq_contains (not
> tp_contains) to unicode objects to optimize the searching for a bit.)

You may beat Marc-Andre to it, but I'll have to let him look at the
code anyway -- I'm not sufficiently familiar with the Unicode stuff
myself yet.

BTW, I added a tag "pre-unicode" to the CVS tree to the revisions
before the Unicode changes were made.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Sat Mar 11 13:32:57 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 11 Mar 2000 14:32:57 +0100
Subject: [Python-Dev] Unicode: When Things Get Hairy
References: <Pine.GSO.4.10.10003111300320.8673-100000@sundial> <200003111216.HAA12651@eric.cnri.reston.va.us>
Message-ID: <38CA4B08.7B13438D@lemburg.com>

Guido van Rossum wrote:
> 
> [Moshe discovers that u"a" in "bbba" raises TypeError]
> 
> [Marc-Andre]
> > > Hmm, this must have been introduced by your contains code...
> > > it did work before.
> >
> > Nope: the string "in" semantics were forever special-cased. Guido beat me
> > soundly for trying to change the semantics...
> 
> But I believe that Marc-Andre added a special case for Unicode in
> PySequence_Contains.  I looked for evidence, but the last snapshot that
> I actually saved and built before Moshe's code was checked in is from
> 2/18 and it isn't in there.  Yet I believe Marc-Andre.  The special
> case needs to be added back to string_contains in stringobject.c.

Moshe was right: I had probably not checked the code because
the obvious combinations worked out of the box... the
only combination which doesn't work is "unicode in string".
I'll fix it next week.

BTW, there's a good chance that the string/Unicode integration
is not complete yet: just keep looking for them.

> > > The normal action taken by the Unicode and the string
> > > code in these mixed type situations is to first
> > > convert everything to Unicode and then retry the operation.
> > > Strings are interpreted as UTF-8 during this conversion.
> >
> > Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments.
> > Should it? (Again, it didn't before). If it does, then the order of
> > testing for seq_contains and seq_getitem and conversions
> 
> Or it could be done this way.
> 
> > > Perhaps I should also add a tp_contains slot to the
> > > Unicode object which then uses the above API as well.
> 
> Yes.
> 
> > But that wouldn't help at all for
> >
> > u"a" in "abbbb"
> 
> It could if PySeqeunce_Contains would first look for a string and a
> unicode argument (in either order) and in that case convert the string
> to unicode.

I think the right way to do
this is to add a special case to seq_contains in the
string implementation. That's how most other auto-coercions
work too.

Instead of raising an error, the implementation would then
delegate the work to PyUnicode_Contains().
 
> > PySequence_Contains only dispatches on the container argument :-(
> >
> > (BTW: I discovered it while contemplating adding a seq_contains (not
> > tp_contains) to unicode objects to optimize the searching for a bit.)
> 
> You may beat Marc-Andre to it, but I'll have to let him look at the
> code anyway -- I'm not sufficiently familiar with the Unicode stuff
> myself yet.

I'll add that one too.
 
BTW, Happy Birthday, Moshe :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Sat Mar 11 13:57:34 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 11 Mar 2000 14:57:34 +0100
Subject: [Python-Dev] Unicode: When Things Get Hairy
References: <Pine.GSO.4.10.10003111300320.8673-100000@sundial> <200003111216.HAA12651@eric.cnri.reston.va.us> <38CA4B08.7B13438D@lemburg.com>
Message-ID: <38CA50CE.BEEFAB5E@lemburg.com>

This is a multi-part message in MIME format.
--------------56A130F1FCAC300009B200AD
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I couldn't resist :-) Here's the patch...

BTW, how should we proceed with future patches ? Should I wrap
them together about once a week, or send them as soon as they
are done ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/
--------------56A130F1FCAC300009B200AD
Content-Type: text/plain; charset=us-ascii;
 name="Unicode-Implementation-2000-03-11.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="Unicode-Implementation-2000-03-11.patch"

diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Include/unicodeobject.h Python+Unicode/Include/unicodeobject.h
--- CVS-Python/Include/unicodeobject.h	Fri Mar 10 23:33:05 2000
+++ Python+Unicode/Include/unicodeobject.h	Sat Mar 11 14:45:59 2000
@@ -683,6 +683,17 @@
     PyObject *args		/* Argument tuple or dictionary */
     );
 
+/* Checks whether element is contained in container and return 1/0
+   accordingly.
+
+   element has to coerce to an one element Unicode string. -1 is
+   returned in case of an error. */
+
+extern DL_IMPORT(int) PyUnicode_Contains(
+    PyObject *container,	/* Container string */ 
+    PyObject *element		/* Element string */
+    );
+
 /* === Characters Type APIs =============================================== */
 
 /* These should not be used directly. Use the Py_UNICODE_IS* and
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py
--- CVS-Python/Lib/test/test_unicode.py	Sat Mar 11 00:23:20 2000
+++ Python+Unicode/Lib/test/test_unicode.py	Sat Mar 11 14:52:29 2000
@@ -219,6 +219,19 @@
 test('translate', u"abababc", u'iiic', {ord('a'):None, ord('b'):ord('i')})
 test('translate', u"abababc", u'iiix', {ord('a'):None, ord('b'):ord('i'), ord('c'):u'x'})
 
+# Contains:
+print 'Testing Unicode contains method...',
+assert ('a' in 'abdb') == 1
+assert ('a' in 'bdab') == 1
+assert ('a' in 'bdaba') == 1
+assert ('a' in 'bdba') == 1
+assert ('a' in u'bdba') == 1
+assert (u'a' in u'bdba') == 1
+assert (u'a' in u'bdb') == 0
+assert (u'a' in 'bdb') == 0
+assert (u'a' in 'bdba') == 1
+print 'done.'
+
 # Formatting:
 print 'Testing Unicode formatting strings...',
 assert u"%s, %s" % (u"abc", "abc") == u'abc, abc'
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt
--- CVS-Python/Misc/unicode.txt	Sat Mar 11 00:14:11 2000
+++ Python+Unicode/Misc/unicode.txt	Sat Mar 11 14:53:37 2000
@@ -743,8 +743,9 @@
 stream codecs as available through the codecs module should 
 be used.
 
-XXX There should be a short-cut open(filename,mode,encoding) available which
-    also assures that mode contains the 'b' character when needed.
+The codecs module should provide a short-cut open(filename,mode,encoding)
+available which also assures that mode contains the 'b' character when
+needed.
 
 
 File/Stream Input:
@@ -810,6 +811,10 @@
 Introduction to Unicode (a little outdated by still nice to read):
         http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html
 
+For comparison:
+	Introducing Unicode to ECMAScript --
+	http://www-4.ibm.com/software/developer/library/internationalization-support.html
+
 Encodings:
 
     Overview:
@@ -832,7 +837,7 @@
 
 History of this Proposal:
 -------------------------
-1.2: 
+1.2: Removed POD about codecs.open()
 1.1: Added note about comparisons and hash values. Added note about
      case mapping algorithms. Changed stream codecs .read() and
      .write() method to match the standard file-like object methods
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/stringobject.c Python+Unicode/Objects/stringobject.c
--- CVS-Python/Objects/stringobject.c	Sat Mar 11 10:55:09 2000
+++ Python+Unicode/Objects/stringobject.c	Sat Mar 11 14:47:45 2000
@@ -389,7 +389,9 @@
 {
 	register char *s, *end;
 	register char c;
-	if (!PyString_Check(el) || PyString_Size(el) != 1) {
+	if (!PyString_Check(el))
+		return PyUnicode_Contains(a, el);
+	if (PyString_Size(el) != 1) {
 		PyErr_SetString(PyExc_TypeError,
 				"string member test needs char left operand");
 		return -1;
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/unicodeobject.c Python+Unicode/Objects/unicodeobject.c
--- CVS-Python/Objects/unicodeobject.c	Fri Mar 10 23:53:23 2000
+++ Python+Unicode/Objects/unicodeobject.c	Sat Mar 11 14:48:52 2000
@@ -2737,6 +2737,49 @@
     return -1;
 }
 
+int PyUnicode_Contains(PyObject *container,
+		       PyObject *element)
+{
+    PyUnicodeObject *u = NULL, *v = NULL;
+    int result;
+    register const Py_UNICODE *p, *e;
+    register Py_UNICODE ch;
+
+    /* Coerce the two arguments */
+    u = (PyUnicodeObject *)PyUnicode_FromObject(container);
+    if (u == NULL)
+	goto onError;
+    v = (PyUnicodeObject *)PyUnicode_FromObject(element);
+    if (v == NULL)
+	goto onError;
+
+    /* Check v in u */
+    if (PyUnicode_GET_SIZE(v) != 1) {
+	PyErr_SetString(PyExc_TypeError,
+			"string member test needs char left operand");
+	goto onError;
+    }
+    ch = *PyUnicode_AS_UNICODE(v);
+    p = PyUnicode_AS_UNICODE(u);
+    e = p + PyUnicode_GET_SIZE(u);
+    result = 0;
+    while (p < e) {
+	if (*p++ == ch) {
+	    result = 1;
+	    break;
+	}
+    }
+
+    Py_DECREF(u);
+    Py_DECREF(v);
+    return result;
+
+onError:
+    Py_XDECREF(u);
+    Py_XDECREF(v);
+    return -1;
+}
+
 /* Concat to string or Unicode object giving a new Unicode object. */
 
 PyObject *PyUnicode_Concat(PyObject *left,
@@ -3817,6 +3860,7 @@
     (intintargfunc) unicode_slice, 	/* sq_slice */
     0, 					/* sq_ass_item */
     0, 					/* sq_ass_slice */
+    (objobjproc)PyUnicode_Contains, 	/*sq_contains*/
 };
 
 static int

--------------56A130F1FCAC300009B200AD--


From tim_one@email.msn.com  Sat Mar 11 20:10:23 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 11 Mar 2000 15:10:23 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <14536.30810.720836.886023@anthem.cnri.reston.va.us>
Message-ID: <000e01bf8b95$d52939e0$c72d153f@tim>

[Barry A. Warsaw, jamming after hours]
> ...
> What if you timestamp instances when you create them?  Then when you
> have trash cycles with finalizers, you sort them and finalize in
> chronological order.

Well, I strongly agree that would be better than finalizing them in
increasing order of storage address <wink>.

> ...
> - FIFO order /seems/ more natural to me than FILO,

Forget cycles for a moment, and consider just programs that manipulate
*immutable* containers (the simplest kind to think about):  at the time you
create an immutable container, everything *contained* must already be in
existence, so every pointer goes from a newer object (container) to an older
one (containee).  This is the "deep" reason for why, e.g., you can't build a
cycle out of pure tuples in Python (if every pointer goes new->old, you
can't get a loop, else each node in the loop would be (transitively) older
than itself!).

Then, since a finalizer can see objects pointed *to*, a finalizer can see
only older objects.  Since it's desirable that a finalizer see only wholly
intact (unfinalized) objects, it is in fact the oldest object ("first in")
that needs to be cleaned up last ("last out").  So, under the assumption of
immutability, FILO is sufficient, but FIFO dangerous.  So your muse inflamed
you with an interesting tune, but you fingered the riff backwards <wink>.

One problem is that it all goes out the window as soon as mutation is
allowed.  It's *still* desirable that a finalizer see only unfinalized
objects, but in the presence of mutation that no longer bears any
relationship to relative creation time.

Another problem is in Guido's directory example, which we can twist to view
as an "immutable container" problem that builds its image of the directory
bottom-up, and where a finalizer on each node tries to remove the file (or
delete the directory, whichever the node represents).  In this case the
physical remove/delete/unlink operations have to follow a *postorder*
traversal of the container tree, so that "finalizer sees only unfinalized
objects" is the opposite of what the app needs!

The lesson to take from that is that the implementation can't possibly guess
what ordering an app may need in a fancy finalizer.  At best it can promise
to follow a "natural" ordering based on the points-to relationship, and
while "finalizer sees only unfinalized objects" is at least clear, it's
quite possibly unhelpful (in Guido's particular case, it *can* be exploited,
though, by adding a postorder remove/delete/unlink method to nodes, and
explicitly calling it from __del__ -- "the rules" guarantee that the root of
the tree will get finalized first, and the code can rely on that in its own
explicit postorder traversal).

>   but then I rarely create cyclic objects, and almost never use __del__,
>   so this whole argument has been somewhat academic to me :).

Well, not a one of us creates cycles often in CPython today, simply because
we don't want to track down leaks <0.5 wink>.  It seems that nobody here
uses __del__ much, either; indeed, my primary use of __del__ is simply to
call an explicit break_cycles() function from the header node of a graph!
The need for that goes away as soon as Python reclaims cycles by itself, and
I may never use __del__ at all then in the vast bulk of my code.

It's because we've seen no evidence here (and also that I've seen none
elsewhere either) that *anyone* is keen on mixing cycles with finalizers
that I've been so persistent in saying "screw it -- let it leak, but let the
user get at it if they insist on doing it".  Seems we're trying to provide
slick support for something nobody wants to do.  If it happens by accident
anyway, well, people sometimes divide by 0 by accident too <0.0 wink>:  give
them a way to know about it, but don't move heaven & earth trying to treat
it like a normal case.

if-it-were-easy-to-implement-i-wouldn't-care-ly y'rs  - tim


From Moshe Zadka <mzadka@geocities.com>  Sat Mar 11 20:35:43 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 11 Mar 2000 22:35:43 +0200 (IST)
Subject: [Python-Dev] finalization again
In-Reply-To: <000e01bf8b95$d52939e0$c72d153f@tim>
Message-ID: <Pine.GSO.4.10.10003112233240.12810-100000@sundial>

In a continuation (yes, a dangerous word in these parts) of the timbot's
looks at the way other languages handle finalization, let me add something 
from the Sather manual I'm now reading (when I'm done with it, you'll see
me begging for iterators here, and having some weird ideas in the
types-sig):

===============================
   Finalization will only occur once, even if new references are created
   to the object during finalization. Because few guarantees can be made
   about the environment in which finalization occurs, finalization is
   considered dangerous and should only be used in the rare cases that
   conventional coding will not suffice.
===============================

(Sather is garbage-collected, BTW)
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From tim_one@email.msn.com  Sat Mar 11 20:51:47 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 11 Mar 2000 15:51:47 -0500
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: <200003101528.JAA15951@beluga.mojam.com>
Message-ID: <001001bf8b9b$9e09d720$c72d153f@tim>

[Skip Montanaro, with an expression that may raise TypeError for any of
 several distinct reasons, and wants to figure out which one after the fact]

The existing exception machinery is sufficiently powerful for building a
solution, so nothing new is needed in the language.  What you really need
here is an exhaustive list of all exceptions the language can raise, and
when, and why, and a formally supported "detail" field (whether numeric id
or string or whatever) that you can rely on to tell them apart at runtime.

There are at least a thousand cases that need to be so documented and
formalized.  That's why not a one of them is now <0.9 wink>.

If P3K is a rewrite from scratch, a rational scheme could be built in from
the start.  Else it would seem to require a volunteer with even less of a
life than us <wink>.


From tim_one@email.msn.com  Sat Mar 11 20:51:49 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 11 Mar 2000 15:51:49 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <38C965C4.B164C2D5@interet.com>
Message-ID: <001101bf8b9b$9f37f6e0$c72d153f@tim>

[James C. Ahlstrom]
> Well, it looks like this thread has fizzled out.  But what did we
> decide?

Far as I could tell, nothing specific.

> ...
> I feel Tim's original complaint that popen() is a Problem
> still hasn't been fixed.

I was passing it on from MikeF's c.l.py posting.  This isn't a new problem,
of course, it just drags on year after year -- which is the heart of MikeF's
gripe.  People have code that *does* work, but for whatever reasons it never
gets moved to the core.  In the meantime, the Library Ref implies the broken
code that is in the core does work.  One or the other has to change, and it
looks most likely to me that Fred will change the docs for 1.6.  While not
ideal, that would be a huge improvement over the status quo.

luckily-few-people-expect-windows-to-work-anyway<0.9-wink>-ly y'rs  - tim


From mhammond@skippinet.com.au  Mon Mar 13 03:50:35 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Mon, 13 Mar 2000 14:50:35 +1100
Subject: [Python-Dev] string.replace behaviour change since Unicode patch.
Message-ID: <ECEPKNMJLHAPFFJHDOJBKEGMCGAA.mhammond@skippinet.com.au>

Hi,
	After applying the Unicode changes string.replace() seems to have changed
its behaviour:

Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import string
>>> string.replace("foo\nbar", "\n", "")
'foobar'
>>>

But since the Unicode update:

Python 1.5.2+ (#0, Feb  2 2000, 16:46:55) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import string
>>> string.replace("foo\nbar", "\n", "")
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "L:\src\python-cvs\lib\string.py", line 407, in replace
    return s.replace(old, new, maxsplit)
ValueError: empty replacement string
>>>

The offending check is stringmodule.c, line 1578:
	if (repl_len <= 0) {
		PyErr_SetString(PyExc_ValueError, "empty replacement string");
		return NULL;
	}

Changing the check to "< 0" fixes the immediate problem, but it is unclear
why the check was added at all, so I didnt bother submitting a patch...

Mark.


From mal@lemburg.com  Mon Mar 13 09:13:50 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 13 Mar 2000 10:13:50 +0100
Subject: [Python-Dev] string.replace behaviour change since Unicode patch.
References: <ECEPKNMJLHAPFFJHDOJBKEGMCGAA.mhammond@skippinet.com.au>
Message-ID: <38CCB14D.C07ACC26@lemburg.com>

Mark Hammond wrote:
> 
> Hi,
>         After applying the Unicode changes string.replace() seems to have changed
> its behaviour:
> 
> Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> import string
> >>> string.replace("foo\nbar", "\n", "")
> 'foobar'
> >>>
> 
> But since the Unicode update:
> 
> Python 1.5.2+ (#0, Feb  2 2000, 16:46:55) [MSC 32 bit (Intel)] on win32
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> import string
> >>> string.replace("foo\nbar", "\n", "")
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
>   File "L:\src\python-cvs\lib\string.py", line 407, in replace
>     return s.replace(old, new, maxsplit)
> ValueError: empty replacement string
> >>>
> 
> The offending check is stringmodule.c, line 1578:
>         if (repl_len <= 0) {
>                 PyErr_SetString(PyExc_ValueError, "empty replacement string");
>                 return NULL;
>         }
>
> Changing the check to "< 0" fixes the immediate problem, but it is unclear
> why the check was added at all, so I didnt bother submitting a patch...

Dang. Must have been my mistake -- it should read:

        if (sub_len <= 0) {
                PyErr_SetString(PyExc_ValueError, "empty pattern string");
                return NULL;
        }

Thanks for reporting this... I'll include the fix in the
next patch set.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake@acm.org  Mon Mar 13 15:43:09 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 13 Mar 2000 10:43:09 -0500 (EST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <001101bf8b9b$9f37f6e0$c72d153f@tim>
References: <38C965C4.B164C2D5@interet.com>
 <001101bf8b9b$9f37f6e0$c72d153f@tim>
Message-ID: <14541.3213.590243.359394@weyr.cnri.reston.va.us>

Tim Peters writes:
 > code that is in the core does work.  One or the other has to change, and it
 > looks most likely to me that Fred will change the docs for 1.6.  While not
 > ideal, that would be a huge improvement over the status quo.

  Actually, I just checked in my proposed change for the 1.5.2 doc
update that I'm releasing soon.
  I'd like to remove it for 1.6, if the appropriate implementation is
moved into the core.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From gvwilson@nevex.com  Mon Mar 13 21:10:52 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Mon, 13 Mar 2000 16:10:52 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
Message-ID: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>

Once 1.6 is out the door, would people be willing to consider extending
Python's token set to make HTML/XML-ish spellings using entity references
legal?  This would make the following 100% legal Python:

i = 0
while i &lt; 10:
    print i &amp; 1
    i = i + 1

which would in turn make it easier to embed Python in XML such as
config-files-for-whatever-Software-Carpentry-produces-to-replace-make,
PMZ, and so on.

Greg


From skip@mojam.com (Skip Montanaro)  Mon Mar 13 21:23:17 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Mon, 13 Mar 2000 15:23:17 -0600 (CST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <14541.23621.89087.357783@beluga.mojam.com>

    Greg> Once 1.6 is out the door, would people be willing to consider
    Greg> extending Python's token set to make HTML/XML-ish spellings using
    Greg> entity references legal?  This would make the following 100% legal
    Greg> Python:

    Greg> i = 0
    Greg> while i &lt; 10:
    Greg>     print i &amp; 1
    Greg>     i = i + 1

What makes it difficult to pump your Python code through cgi.escape when
embedding it?  There doesn't seem to be an inverse function to cgi.escape
(at least not in the cgi module), but I suspect it could rather easily be
written. 

-- 
Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/


From akuchlin@mems-exchange.org  Mon Mar 13 21:23:29 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Mon, 13 Mar 2000 16:23:29 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <14541.23633.873411.86833@amarok.cnri.reston.va.us>

gvwilson@nevex.com writes:
>Once 1.6 is out the door, would people be willing to consider extendin=
g
>Python's token set to make HTML/XML-ish spellings using entity referen=
ces
>legal?  This would make the following 100% legal Python:
>
>i =3D 0
>while i &lt; 10:
>    print i &amp; 1
>    i =3D i + 1

I don't think that would be sufficient.  What about user-defined
entities, as in r&eacute;sultat =3D max(a,b)?  (r=9Bsultat, in French.)=

Would Python have to also parse a DTD from somewhere?  What about
other places when Python and XML syntax collide, as in this contrived
example:

<![CDATA[
# Python code starts here
if a[index[1]]>b:
    print ...

Oops!  The ]]> looks like the end of the CDATA section, but it's legal
Python code.  IMHO whatever tool is outputting the XML should handle
escaping wacky characters in the Python code, which will be undone
by the parser when the XML gets parsed.  Users certainly won't be
writing this XML by hand; writing 'if (i &lt; 10)' is very strange.

--=20
A.M. Kuchling=09=09=09http://starship.python.net/crew/amk/
Art history is the nightmare from which art is struggling to awake.
    -- Robert Fulford


From gvwilson@nevex.com  Mon Mar 13 21:58:27 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Mon, 13 Mar 2000 16:58:27 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <14541.23633.873411.86833@amarok.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003131638180.12270-100000@akbar.nevex.com>

> >Greg Wilson wrote:
> >...would people be willing to consider extending
> >Python's token set to make HTML/XML-ish spellings using entity reference=
s
> >legal?
> >
> >i =3D 0
> >while i &lt; 10:
> >    print i &amp; 1
> >    i =3D i + 1

> Skip Montanaro wrote:
> What makes it difficult to pump your Python code through cgi.escape when
> embedding it?

Most non-programmers use WYSIWYG editor, and many of these are moving
toward XML-compliant formats.  Parsing the standard character entities
seemed like a good first step toward catering to this (large) audience.

> Andrew Kuchling wrote:
> I don't think that would be sufficient.  What about user-defined
> entities, as in r&eacute;sultat =3D max(a,b)?  (r=9Bsultat, in French.)
> Would Python have to also parse a DTD from somewhere?

Longer term, I believe that someone is going to come out with a
programming language that (finally) leaves the flat-ASCII world behind,
and lets people use the structuring mechanisms (e.g. XML) that we have
developed for everyone else's data.  I think it would be to Python's
advantage to be first, and if I'm wrong, there's little harm done.
User-defined entities, DTD's, and the like are probably part of that, but
I don't think I know enough to know what to ask for.  Escaping the
standard entites seems like an easy start.

> Andrew Kuchling also wrote:
> What about other places when Python and XML syntax collide, as in this
> contrived example:
>=20
> <![CDATA[
> # Python code starts here
> if a[index[1]]>b:
>     print ...
>=20
> Oops!  The ]]> looks like the end of the CDATA section, but it's legal
> Python code.

Yup; that's one of the reasons I'd like to be able to write:

<python>
# Python code starts here
if a[index[1]]&gt;b:
    print ...
</python>

> Users certainly won't be writing this XML by hand; writing 'if (i &lt;
> 10)' is very strange.

I'd expect my editor to put '&lt;' in the file when I press the '<' key,
and to display '<' on the screen when viewing the file.

thanks,
Greg


From beazley@rustler.cs.uchicago.edu  Mon Mar 13 22:35:24 2000
From: beazley@rustler.cs.uchicago.edu (David M. Beazley)
Date: Mon, 13 Mar 2000 16:35:24 -0600 (CST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <200003132235.QAA08031@rustler.cs.uchicago.edu>

gvwilson@nevex.com writes:
 > Once 1.6 is out the door, would people be willing to consider extending
 > Python's token set to make HTML/XML-ish spellings using entity references
 > legal?  This would make the following 100% legal Python:
 > 
 > i = 0
 > while i &lt; 10:
 >     print i &amp; 1
 >     i = i + 1
 > 
 > which would in turn make it easier to embed Python in XML such as
 > config-files-for-whatever-Software-Carpentry-produces-to-replace-make,
 > PMZ, and so on.
 > 

Sure, and while we're at it, maybe we can add support for C trigraph
sequences as well.  Maybe I'm missing the point, but why can't you
just use a filter (cgi.escape() or something comparable)?  I for one,
am *NOT* in favor of complicating the Python parser in this most bogus
manner.

Furthermore, with respect to the editor argument, I can't think of a
single reason why any sane programmer would be writing programs in
Microsoft Word or whatever it is that you're talking about.
Therefore, I don't think that the Python parser should be modified in
any way to account for XML tags, entities, or other extraneous markup
that's not part of the core language.  I know that I, for one, would
be extremely pissed if I fired up emacs and had to maintain someone
else's code that had all of this garbage in it.  Just my 0.02.

-- Dave


From gvwilson@nevex.com  Mon Mar 13 22:48:33 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Mon, 13 Mar 2000 17:48:33 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <200003132235.QAA08031@rustler.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>

> David M. Beazley wrote:

> ...and while we're at it, maybe we can add support for C trigraph
> sequences as well.

I don't know of any mass-market editors that generate C trigraphs.

> ...I can't think of a single reason why any sane programmer would be
> writing programs in Microsoft Word or whatever it is that you're
> talking about.

'S funny --- my non-programmer friends can't figure out why any sane
person would use a glorified glass TTY like emacs... or why they should
have to, just to program... I just think that someone's going to do this
for some language, some time soon, and I'd rather Python be in the lead
than play catch-up.

Thanks,
Greg


From Fredrik Lundh" <effbot@telia.com  Mon Mar 13 23:16:41 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Tue, 14 Mar 2000 00:16:41 +0100
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
Message-ID: <00ca01bf8d42$6a154500$34aab5d4@hagrid>

Greg wrote:

> > ...I can't think of a single reason why any sane programmer would be
> > writing programs in Microsoft Word or whatever it is that you're
> > talking about.
>=20
> 'S funny --- my non-programmer friends can't figure out why any sane
> person would use a glorified glass TTY like emacs... or why they =
should
> have to, just to program... I just think that someone's going to do =
this
> for some language, some time soon, and I'd rather Python be in the =
lead
> than play catch-up.

I don't get it.  the XML specification contains a lot of stuff,
and I completely fail to see how adding support for a very
small part of XML would make it possible to use XML editors
to write Python code.

what am I missing?

</F>


From DavidA@ActiveState.com  Mon Mar 13 23:15:25 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Mon, 13 Mar 2000 15:15:25 -0800
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
Message-ID: <NDBBJPNCJLKKIOBLDOMJEENOCBAA.DavidA@ActiveState.com>

> 'S funny --- my non-programmer friends can't figure out why any sane
> person would use a glorified glass TTY like emacs... or why they should
> have to, just to program... I just think that someone's going to do this
> for some language, some time soon, and I'd rather Python be in the lead
> than play catch-up.

But the scheme you put forth causes major problems for current Python users
who *are* using glass TTYs, so I don't think it'll fly for very basic
political reasons nicely illustrated by Dave-the-diplomat's response.

While storage of Python files in XML documents is a good thing, it's hard to
see why XML should be viewed as the only storage format for Python files.  I
think a much richer XML schema could be useful in some distant future:

<class name="Foo">
  <method name="Foo">
    <argumentlist>
      <argument name="self">
      ...

What might be more useful in the short them IMO is to define a _standard_
mechanism for Python-in-XML encoding/decoding, so that all code which
encodes Python in XML is done the same way, and so that XML editors can
figure out once and for all how to decode Python-in-CDATA.

Strawman Encoding # 1:
  replace < with &lt; and > with &gt; when not in strings, and vice versa on
the decoding side.

Strawman Encoding # 2:
  - do Strawman 1, AND
  - replace space-determined indentation with { and } tokens or other INDENT
and DEDENT markers using some rare Unicode characters to work around
inevitable bugs in whitespace handling of XML processors.

--david


From gvwilson@nevex.com  Mon Mar 13 23:14:43 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Mon, 13 Mar 2000 18:14:43 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJEENOCBAA.DavidA@ActiveState.com>
Message-ID: <Pine.LNX.4.10.10003131811020.14229-100000@akbar.nevex.com>

> David Ascher wrote:
> But the scheme you put forth causes major problems for current Python
> users who *are* using glass TTYs, so I don't think it'll fly for very
> basic political reasons nicely illustrated by Dave's response.

Understood.  I thought that handling standard entities might be a
useful first step toward storage of Python as XML, which in turn would
help make Python more accessible to people who don't want to switch
editors just to program.  I felt that an all-or-nothing approach would be
even less likely to get a favorable response than handling entities... :-)

Greg


From beazley@rustler.cs.uchicago.edu  Mon Mar 13 23:12:55 2000
From: beazley@rustler.cs.uchicago.edu (David M. Beazley)
Date: Mon, 13 Mar 2000 17:12:55 -0600 (CST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
References: <200003132235.QAA08031@rustler.cs.uchicago.edu>
 <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
Message-ID: <200003132312.RAA08107@rustler.cs.uchicago.edu>

gvwilson@nevex.com writes:
 > 
 > 'S funny --- my non-programmer friends can't figure out why any sane
 > person would use a glorified glass TTY like emacs... or why they should
 > have to, just to program...

Look, I'm all for CP4E and making programming more accessible to the
masses, but as a professional programmer, I frankly do not care what
non-programmers think about the tools that I (and most of the
programming world) use to write software.  Furthermore, if all of your
non-programmer friends don't want to care about the underlying
details, they certainly won't care how programs are
represented---including a nice and *simple* text representation
without markup, entities, and other syntax that is not an essential
part of the language.  However, as a professional, I most certainly DO
care about how programs are represented--specifically, I want to be
able to move them around between machines. Edit them with essentially
any editor, transform them as I see fit, and be able to easily read
them and have a sense of what is going on.  Markup is just going to
make this a huge pain in the butt. No, I'm not for this idea one
bit. Sorry.

 > I just think that someone's going to do this
 > for some language, some time soon, and I'd rather Python be in the lead
 > than play catch-up.

What gives you the idea that Python is behind?  What is it playing
catch up to?

-- Dave


From DavidA@ActiveState.com  Mon Mar 13 23:36:54 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Mon, 13 Mar 2000 15:36:54 -0800
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131811020.14229-100000@akbar.nevex.com>
Message-ID: <NDBBJPNCJLKKIOBLDOMJEEOACBAA.DavidA@ActiveState.com>

> > David Ascher wrote:
> > But the scheme you put forth causes major problems for current Python
> > users who *are* using glass TTYs, so I don't think it'll fly for very
> > basic political reasons nicely illustrated by Dave's response.
>
> Understood.  I thought that handling standard entities might be a
> useful first step toward storage of Python as XML, which in turn would
> help make Python more accessible to people who don't want to switch
> editors just to program.  I felt that an all-or-nothing approach would be
> even less likely to get a favorable response than handling entities... :-)
>
> Greg

If you propose a transformation between Python Syntax and XML, then you
potentially have something which all parties can agree to as being a good
thing.  Forcing one into the other is denying the history and current
practices of both domains and user populations.  You cannot ignore the fact
that "I can read anyone's Python" is a key selling point of Python among its
current practitioners, or that its cleanliness and lack of magic characters
($ is usually invoked, but &lt; is just as magic/ugly) are part of its
appeal/success.

No XML editor is going to edit all XML documents without custom editors
anyway!  I certainly don't expect to be drawing SVG diagrams with a
keyboard!  That's what schemas and custom editors are for.  Define a schema
for 'encoded Python' (well, first, find a schema notation that will
survive), write a plugin to your favorite XML editor, and then your
(theoretical? =) users can use the same 'editor' to edit PythonXML or any
other XML.  Most XML probably won't be edited with a keyboard but with a
pointing device or a speech recognizer anyway...

IMO, you're being seduced by the apparent closeness between XML and
Python-in-ASCII.  It's only superficial...  Think of Python-in-ASCII as a
rendering of Python-in-XML, Dave will think of Python-in-XML as a rendering
of Python-in-ASCII, and everyone will be happy (as long as everyone agrees
on the one-to-one transformation).

--david


From paul@prescod.net  Mon Mar 13 23:43:48 2000
From: paul@prescod.net (Paul Prescod)
Date: Mon, 13 Mar 2000 15:43:48 -0800
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <38CD7D34.6569C1AA@prescod.net>

You should use your entities in the XML files, and then whatever
application actually launches Python (PMZ, your make engine, XMetaL)
could decode the data and launch Python. 
This is already how it works in XMetaL. I've just reinstalled recently
so I don't have my macro file. Therefore, please excuse the Javascript
(not Python) example.

<MACRO name="Revert To Saved" lang="JScript" id="90" 
desc="Opens last saved version of the current document">
<![CDATA[
if (!ActiveDocument.Saved) {
  retVal = Application.Confirm("If you continue you will lose changes to
this document.\nDo you want to revert to the last-saved version?");
  if (retVal) {
    ActiveDocument.Reload();
  }
}
]]></MACRO> 
 
This is in "journalist.mcr" in the "Macros" folder of XMetaL. This
already works fine for Python. You change lang="Python" and thanks to
the benevalence of Bill Gates and the hard work of Mark Hammond, you can
use Python for XMetaL macros. It doesn't work perfectly: exceptions
crash XMetaL, last I tried.

As long as you don't make mistakes, everything works nicely. :) You can
write XMetaL macros in Python and the whole thing is stored as XML.
Still, XMetaL is not very friendly as a Python editor. It doesn't have
nice whitespace handling!

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Out of timber so crooked as that which man is made nothing entirely
straight can be built. - Immanuel Kant


From paul@prescod.net  Mon Mar 13 23:59:23 2000
From: paul@prescod.net (Paul Prescod)
Date: Mon, 13 Mar 2000 15:59:23 -0800
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
Message-ID: <38CD80DB.39150F33@prescod.net>

gvwilson@nevex.com wrote:
> 
> 'S funny --- my non-programmer friends can't figure out why any sane
> person would use a glorified glass TTY like emacs... or why they should
> have to, just to program... I just think that someone's going to do this
> for some language, some time soon, and I'd rather Python be in the lead
> than play catch-up.

Your goal is worth pursuing but I agree with the others that the syntax
change is not the right way.

It _is_ possible to teach XMetaL to edit Python programs -- structurally
-- just as it does XML. What you do is hook into the macro engine (which
already supports Python) and use the Python tokenizer to build a parse
tree. You copy that into a DOM using the same elements and attributes
you would use if you were doing some kind of batch conversion. Then on
"save" you reverse the process. Implementation time: ~3 days.

The XMetaL competitor, Documentor has an API specifically designed to
make this sort of thing easy.

Making either of them into a friendly programmer's editor is a much
larger task. I think this is where the majority of the R&D should occur,
not at the syntax level. If one invents a fundamentally better way of
working with the structures behind Python code, then it would be
relatively easy to write code that maps that to today's Python syntax.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Out of timber so crooked as that which man is made nothing entirely
straight can be built. - Immanuel Kant


From Moshe Zadka <mzadka@geocities.com>  Tue Mar 14 01:14:09 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Tue, 14 Mar 2000 03:14:09 +0200 (IST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <Pine.GSO.4.10.10003140312520.12735-100000@sundial>

On Mon, 13 Mar 2000 gvwilson@nevex.com wrote:

> Once 1.6 is out the door, would people be willing to consider extending
> Python's token set to make HTML/XML-ish spellings using entity references
> legal?  This would make the following 100% legal Python:
> 
> i = 0
> while i &lt; 10:
>     print i &amp; 1
>     i = i + 1
> 
> which would in turn make it easier to embed Python in XML such as
> config-files-for-whatever-Software-Carpentry-produces-to-replace-make,
> PMZ, and so on.

Why? Whatever XML parser you use will output "i&lt;1" as "i<1", so 
the Python that comes out of the XML parser is quite all right. Why change
Python to do an XML parser job?
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mhammond@skippinet.com.au  Tue Mar 14 01:18:45 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Tue, 14 Mar 2000 12:18:45 +1100
Subject: [Python-Dev] unicode objects and C++
Message-ID: <ECEPKNMJLHAPFFJHDOJBIEHMCGAA.mhammond@skippinet.com.au>

I struck a bit of a snag with the Unicode support when trying to use the
most recent update in a C++ source file.

The problem turned out to be that unicodeobject.h did a #include "wchar.h",
but did it while an 'extern "C"' block was open.  This upset the MSVC6
wchar.h, as it has special C++ support.

Attached below is a patch I made to unicodeobject.h that solved my problem
and allowed my compilations to succeed.  Theoretically the same problem
could exist for wctype.h, and probably lots of other headers, but this is
the immediate problem :-)

An alternative patch would be to #include "whcar.h" in PC\config.h outside
of any 'extern "C"' blocks - wchar.h on Windows has guards that allows for
multiple includes, so the unicodeobject.h include of that file will succeed,
but not have the side-effect it has now.

Im not sure what the preferred solution is - quite possibly the PC\config.h
change, but Ive include the unicodeobject.h patch anyway :-)

Mark.

*** unicodeobject.h	2000/03/13 23:22:24	2.2
--- unicodeobject.h	2000/03/14 01:06:57
***************
*** 85,91 ****
--- 85,101 ----
  #endif

  #ifdef HAVE_WCHAR_H
+
+ #ifdef __cplusplus
+ } /* Close the 'extern "C"' before bringing in system headers */
+ #endif
+
  # include "wchar.h"
+
+ #ifdef __cplusplus
+ extern "C" {
+ #endif
+
  #endif

  #ifdef HAVE_USABLE_WCHAR_T


From mal@lemburg.com  Mon Mar 13 23:31:30 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 14 Mar 2000 00:31:30 +0100
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131811020.14229-100000@akbar.nevex.com>
Message-ID: <38CD7A52.5709DF5F@lemburg.com>

gvwilson@nevex.com wrote:
> 
> > David Ascher wrote:
> > But the scheme you put forth causes major problems for current Python
> > users who *are* using glass TTYs, so I don't think it'll fly for very
> > basic political reasons nicely illustrated by Dave's response.
> 
> Understood.  I thought that handling standard entities might be a
> useful first step toward storage of Python as XML, which in turn would
> help make Python more accessible to people who don't want to switch
> editors just to program.  I felt that an all-or-nothing approach would be
> even less likely to get a favorable response than handling entities... :-)

This should be easy to implement provided a hook for compile()
is added to e.g. the sys-module which then gets used instead
of calling the byte code compiler directly...

Then you could redirect the compile() arguments to whatever
codec you wish (e.g. a SGML entity codec) and the builtin
compiler would only see the output of that codec.

Well, just a thought... I don't think encoding programs would
make life as a programmer easier, but instead harder. It adds
one more level of confusion on top of it all.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Tue Mar 14 09:45:49 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 14 Mar 2000 10:45:49 +0100
Subject: [Python-Dev] unicode objects and C++
References: <ECEPKNMJLHAPFFJHDOJBIEHMCGAA.mhammond@skippinet.com.au>
Message-ID: <38CE0A4D.1209B830@lemburg.com>

Mark Hammond wrote:
> 
> I struck a bit of a snag with the Unicode support when trying to use the
> most recent update in a C++ source file.
> 
> The problem turned out to be that unicodeobject.h did a #include "wchar.h",
> but did it while an 'extern "C"' block was open.  This upset the MSVC6
> wchar.h, as it has special C++ support.

Thanks for reporting this.
 
> Attached below is a patch I made to unicodeobject.h that solved my problem
> and allowed my compilations to succeed.  Theoretically the same problem
> could exist for wctype.h, and probably lots of other headers, but this is
> the immediate problem :-)
> 
> An alternative patch would be to #include "whcar.h" in PC\config.h outside
> of any 'extern "C"' blocks - wchar.h on Windows has guards that allows for
> multiple includes, so the unicodeobject.h include of that file will succeed,
> but not have the side-effect it has now.
> 
> Im not sure what the preferred solution is - quite possibly the PC\config.h
> change, but Ive include the unicodeobject.h patch anyway :-)
> 
> Mark.
> 
> *** unicodeobject.h     2000/03/13 23:22:24     2.2
> --- unicodeobject.h     2000/03/14 01:06:57
> ***************
> *** 85,91 ****
> --- 85,101 ----
>   #endif
> 
>   #ifdef HAVE_WCHAR_H
> +
> + #ifdef __cplusplus
> + } /* Close the 'extern "C"' before bringing in system headers */
> + #endif
> +
>   # include "wchar.h"
> +
> + #ifdef __cplusplus
> + extern "C" {
> + #endif
> +
>   #endif
> 
>   #ifdef HAVE_USABLE_WCHAR_T
> 

I've included this patch (should solve the problem for all inlcuded
system header files, since it wraps only the Unicode 
APIs in extern "C"):

--- /home/lemburg/clients/cnri/CVS-Python/Include/unicodeobject.h       Fri Mar 10 23:33:05 2000
+++ unicodeobject.h     Tue Mar 14 10:38:08 2000
@@ -1,10 +1,7 @@
 #ifndef Py_UNICODEOBJECT_H
 #define Py_UNICODEOBJECT_H
-#ifdef __cplusplus
-extern "C" {
-#endif
 
 /*
 
 Unicode implementation based on original code by Fredrik Lundh,
 modified by Marc-Andre Lemburg (mal@lemburg.com) according to the
@@ -167,10 +165,14 @@ typedef unsigned short Py_UNICODE;
 
 #define Py_UNICODE_MATCH(string, offset, substring)\
     (!memcmp((string)->str + (offset), (substring)->str,\
              (substring)->length*sizeof(Py_UNICODE)))
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* --- Unicode Type ------------------------------------------------------- */
 
 typedef struct {
     PyObject_HEAD
     int length;                        /* Length of raw Unicode data in buffer */


I'll post a complete Unicode update patch by the end of the week
for inclusion in CVS.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From ping@lfw.org  Tue Mar 14 11:19:59 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Tue, 14 Mar 2000 06:19:59 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.GSO.4.10.10003140312520.12735-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003140616390.558-100000@skuld.lfw.org>

On Tue, 14 Mar 2000, Moshe Zadka wrote:
> On Mon, 13 Mar 2000 gvwilson@nevex.com wrote:
> > legal?  This would make the following 100% legal Python:
> > 
> > i = 0
> > while i &lt; 10:
> >     print i &amp; 1
> >     i = i + 1
> 
> Why? Whatever XML parser you use will output "i&lt;1" as "i<1", so 
> the Python that comes out of the XML parser is quite all right. Why change
> Python to do an XML parser job?

I totally agree.

To me, this is the key issue: it is NOT the responsibility of the
programming language to accommodate any particular encoding format.

While we're at it, why don't we change Python to accept
quoted-printable source code?  Or base64-encoded source code?

XML already defines a perfectly reasonable mechanism for
escaping a plain stream of text -- adding this processing to
Python adds nothing but confusion.  The possible useful
benefit from adding the proposed "feature" is exactly zero.


-- ?!ng

"This code is better than any code that doesn't work has any right to be."
    -- Roger Gregory, on Xanadu


From ping@lfw.org  Tue Mar 14 11:21:59 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Tue, 14 Mar 2000 06:21:59 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJEEOACBAA.DavidA@ActiveState.com>
Message-ID: <Pine.LNX.4.10.10003140620470.558-100000@skuld.lfw.org>

On Mon, 13 Mar 2000, David Ascher wrote:
> 
> If you propose a transformation between Python Syntax and XML, then you
> potentially have something which all parties can agree to as being a good
> thing.

Indeed.  I know that i wouldn't have any use for it at the moment,
but i can see the potential for usefulness of a structured representation
for Python source code (like an AST in XML) which could be directly
edited in an XML editor, and processed (by an XSL stylesheet?) to produce
actual runnable Python.  But attempting to mix the two doesn't get
you anywhere.


-- ?!ng

"This code is better than any code that doesn't work has any right to be."
    -- Roger Gregory, on Xanadu


From Fredrik Lundh" <effbot@telia.com  Tue Mar 14 15:41:01 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Tue, 14 Mar 2000 16:41:01 +0100
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131811020.14229-100000@akbar.nevex.com>
Message-ID: <002201bf8dcb$ba9a11c0$34aab5d4@hagrid>

Greg:

> Understood.  I thought that handling standard entities might be a
> useful first step toward storage of Python as XML, which in turn would
> help make Python more accessible to people who don't want to switch
> editors just to program.  I felt that an all-or-nothing approach would =
be
> even less likely to get a favorable response than handling entities... =
:-)

well, I would find it easier to support a more aggressive
proposal:

    make sure Python 1.7 can deal with source code
    written in Unicode, using any supported encoding.

with that in place, you can plug in your favourite unicode
encoding via the Unicode framework.

</F>


From Fredrik Lundh" <effbot@telia.com  Tue Mar 14 22:21:38 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Tue, 14 Mar 2000 23:21:38 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
Message-ID: <000901bf8e03$abf88420$34aab5d4@hagrid>

> I've just checked in a massive patch from Marc-Andre Lemburg which
> adds Unicode support to Python.

massive, indeed.

didn't notice this before, but I just realized that after the
latest round of patches, the python15.dll is now 700k larger
than it was for 1.5.2 (more than twice the size).

my original unicode DLL was 13k.

hmm...

</F>


From akuchlin@mems-exchange.org  Tue Mar 14 22:19:44 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Tue, 14 Mar 2000 17:19:44 -0500 (EST)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <000901bf8e03$abf88420$34aab5d4@hagrid>
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
 <000901bf8e03$abf88420$34aab5d4@hagrid>
Message-ID: <14542.47872.184978.985612@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>didn't notice this before, but I just realized that after the
>latest round of patches, the python15.dll is now 700k larger
>than it was for 1.5.2 (more than twice the size).

Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source
code, and produces a 632168-byte .o file on my Sparc.  (Will some
compiler systems choke on a file that large?  Could we read database
info from a file instead, or mmap it into memory?)

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
    "Are you OK, dressed like that? You don't seem to notice the cold."
    "I haven't come ten thousand miles to discuss the weather, Mr Moberly."
    -- Moberly and the Doctor, in "The Seeds of Doom"


From mal@lemburg.com  Wed Mar 15 08:32:29 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 09:32:29 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
 <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us>
Message-ID: <38CF4A9D.13A0080@lemburg.com>

"Andrew M. Kuchling" wrote:
> 
> Fredrik Lundh writes:
> >didn't notice this before, but I just realized that after the
> >latest round of patches, the python15.dll is now 700k larger
> >than it was for 1.5.2 (more than twice the size).
> 
> Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source
> code, and produces a 632168-byte .o file on my Sparc.  (Will some
> compiler systems choke on a file that large?  Could we read database
> info from a file instead, or mmap it into memory?)

That is dues to the unicodedata module being compiled
into the DLL statically. On Unix you can build it shared too
-- there are no direct references to it in the implementation.
I suppose that on Windows the same should be done... the
question really is whether this is intended or not -- moving
the module into a DLL is at least technically no problem
(someone would have to supply a patch for the MSVC project
files though).

Note that unicodedata is only needed by programs which do
a lot of Unicode manipulations and in the future probably
by some codecs too.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From pf@artcom-gmbh.de  Wed Mar 15 10:42:26 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Wed, 15 Mar 2000 11:42:26 +0100 (MET)
Subject: [Python-Dev] Unicode in Python and Tcl/Tk compared (was Unicode patches checked in...)
In-Reply-To: <38CF4A9D.13A0080@lemburg.com> from "M.-A. Lemburg" at "Mar 15, 2000  9:32:29 am"
Message-ID: <m12VBFy-000CnCC@artcom0.artcom-gmbh.de>

Hi!

> > Fredrik Lundh writes:
> > >didn't notice this before, but I just realized that after the
> > >latest round of patches, the python15.dll is now 700k larger
> > >than it was for 1.5.2 (more than twice the size).
> > 
> "Andrew M. Kuchling" wrote:
> > Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source
> > code, and produces a 632168-byte .o file on my Sparc.  (Will some
> > compiler systems choke on a file that large?  Could we read database
> > info from a file instead, or mmap it into memory?)
> 
M.-A. Lemburg wrote:
> That is dues to the unicodedata module being compiled
> into the DLL statically. On Unix you can build it shared too
> -- there are no direct references to it in the implementation.
> I suppose that on Windows the same should be done... the
> question really is whether this is intended or not -- moving
> the module into a DLL is at least technically no problem
> (someone would have to supply a patch for the MSVC project
> files though).
> 
> Note that unicodedata is only needed by programs which do
> a lot of Unicode manipulations and in the future probably
> by some codecs too.

Now as the unicode patches were checked in and as Fredrik Lundh
noticed a considerable increase of the size of the python-DLL,
which was obviously mostly caused by those tables, I had some fear
that a Python/Tcl/Tk based application could eat up much more memory,
if we update from Python1.5.2 and Tcl/Tk 8.0.5 
to Python 1.6 and Tcl/Tk 8.3.0.

As some of you certainly know, some kind of unicode support has
also been added to Tcl/Tk since 8.1.  So I did some research and
would like to share what I have found out so far:

Here are the compared sizes of the tcl/tk shared libs on Linux:

   old:                   | new:                   | bloat increase in %:
   -----------------------+------------------------+---------------------
   libtcl8.0.so    533414 | libtcl8.3.so    610241 | 14.4 %
   libtk8.0.so     714908 | libtk8.3.so     811916 | 13.6 %

The addition of unicode wasn't the only change to TclTk.  So this
seems reasonable.  Unfortunately there is no python shared library,
so a direct comparison of increased memory consumption is impossible.
Nevertheless I've the following figures (stripped binary sizes of
the Python interpreter):
   1.5.2           382616 
   CVS_10-02-00    393668 (a month before unicode)
   CVS_12-03-00    507448 (just after unicode)
That is an increase of "only" 111 kBytes.  Not so bad but nevertheless
a "bloat increase" of 32.6 %.  And additionally there is now
   unicodedata.so  634940 
   _codecsmodule.so 38955 
which (I guess) will also be loaded if the application starts using some
of the new features.

Since I didn't take care of unicode in the past, I feel unable to
compare the implementations of unicode in both systems and what impact
they will have on the real memory performance and even more important on
the functionality of the combined use of both packages together with
Tkinter.

Tcl/Tk keeps around a sub-directory called 'encoding', which --I guess--
contains information somehow similar or related to that in 'unicodedata.so', 
but separated into several files?

So below I included a shortened excerpts from the 200k+ tcl8.3.0/changes
and the tk8.3.0/changes files about unicode.  May be someone
else more involved with unicode can shed some light on this topic?

Do we need some changes to Tkinter.py or _tkinter or both?

---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ----
[...]
======== Changes for 8.1 go below this line ========

6/18/97 (new feature) Tcl now supports international character sets:
    - All C APIs now accept UTF-8 strings instead of iso8859-1 strings,
      wherever you see "char *", unless explicitly noted otherwise.
    - All Tcl strings represented in UTF-8, which is a convenient
      multi-byte encoding of Unicode.  Variable names, procedure names,
      and all other values in Tcl may include arbitrary Unicode characters.
      For example, the Tcl command "string length" returns how many
      Unicode characters are in the argument string.
    - For Java compatibility, embedded null bytes in C strings are
      represented as \xC080 in UTF-8 strings, but the null byte at the end
      of a UTF-8 string remains \0.  Thus Tcl strings once again do not
      contain null bytes, except for termination bytes.
    - For Java compatibility, "\uXXXX" is used in Tcl to enter a Unicode
      character.  "\u0000" through "\uffff" are acceptable Unicode 
      characters.  
    - "\xXX" is used to enter a small Unicode character (between 0 and 255)
      in Tcl.
    - Tcl automatically translates between UTF-8 and the normal encoding for
      the platform during interactions with the system.
    - The fconfigure command now supports a -encoding option for specifying
      the encoding of an open file or socket.  Tcl will automatically
      translate between the specified encoding and UTF-8 during I/O. 
      See the directory library/encoding to find out what encodings are
      supported (eventually there will be an "encoding" command that
      makes this information more accessible).
    - There are several new C APIs that support UTF-8 and various encodings.
      See Utf.3 for procedures that translate between Unicode and UTF-8
      and manipulate UTF-8 strings. See Encoding.3 for procedures that
      create new encodings and translate between encodings.  See
      ToUpper.3 for procedures that perform case conversions on UTF-8
      strings.
[...]
1/16/98 (new feature) Tk now supports international characters sets:
    - Font display mechanism overhauled to display Unicode strings
      containing full set of international characters.  You do not need
      Unicode fonts on your system in order to use tk or see international
      characters.  For those familiar with the Japanese or Chinese patches,
      there is no "-kanjifont" option.  Characters from any available fonts
      will automatically be used if the widget's originally selected font is
      not capable of displaying a given character.  
    - Textual widgets are international aware.  For instance, cursor
      positioning commands would now move the cursor forwards/back by 1
      international character, not by 1 byte.  
    - Input Method Editors (IMEs) work on Mac and Windows.  Unix is still in
      progress.
[...]
10/15/98 (bug fix) Changed regexp and string commands to properly
handle case folding according to the Unicode character
tables. (stanton)

10/21/98 (new feature) Added an "encoding" command to facilitate
translations of strings between different character encodings.  See
the encoding.n manual entry for more details. (stanton)

11/3/98 (bug fix) The regular expression character classification
syntax now includes Unicode characters in the supported
classes. (stanton)
[...]
11/17/98 (bug fix) "scan" now correctly handles Unicode
characters. (stanton)
[...]
11/19/98 (bug fix) Fixed menus and titles so they properly display
Unicode characters under Windows. [Bug: 819] (stanton)
[...]
4/2/99 (new apis)  Made various Unicode utility functions public.
Tcl_UtfToUniCharDString, Tcl_UniCharToUtfDString, Tcl_UniCharLen,
Tcl_UniCharNcmp, Tcl_UniCharIsAlnum, Tcl_UniCharIsAlpha,
Tcl_UniCharIsDigit, Tcl_UniCharIsLower, Tcl_UniCharIsSpace,
Tcl_UniCharIsUpper, Tcl_UniCharIsWordChar, Tcl_WinUtfToTChar,
Tcl_WinTCharToUtf (stanton)
[...]
4/5/99 (bug fix) Fixed handling of Unicode in text searches.  The
-count option was returning byte counts instead of character counts.
[...]
5/18/99 (bug fix) Fixed clipboard code so it handles Unicode data
properly on Windows NT and 95. [Bug: 1791] (stanton)
[...]
6/3/99  (bug fix) Fixed selection code to handle Unicode data in
COMPOUND_TEXT and STRING selections.  [Bug: 1791] (stanton)
[...]
6/7/99  (new feature) Optimized string index, length, range, and
append commands. Added a new Unicode object type. (hershey)
[...]
6/14/99 (new feature) Merged string and Unicode object types.  Added
new public Tcl API functions:  Tcl_NewUnicodeObj, Tcl_SetUnicodeObj,
Tcl_GetUnicode, Tcl_GetUniChar, Tcl_GetCharLength, Tcl_GetRange,
Tcl_AppendUnicodeToObj. (hershey)
[...]
6/23/99 (new feature) Updated Unicode character tables to reflect
Unicode 2.1 data. (stanton)
[...]

--- Released 8.3.0, February 10, 2000 --- See ChangeLog for details ---
---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ----

Sorry if this was boring old stuff for some of you.

Best Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From Vladimir.Marangozov@inrialpes.fr  Wed Mar 15 11:40:21 2000
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Wed, 15 Mar 2000 12:40:21 +0100 (CET)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38CF4A9D.13A0080@lemburg.com> from "M.-A. Lemburg" at Mar 15, 2000 09:32:29 AM
Message-ID: <200003151140.MAA30301@python.inrialpes.fr>

M.-A. Lemburg wrote:
> 
> Note that unicodedata is only needed by programs which do
> a lot of Unicode manipulations and in the future probably
> by some codecs too.

Perhaps it would make sense to move the Unicode database on the
Python side (write it in Python)? Or init the database dynamically
in the unicodedata module on import? It's quite big, so if it's
possible to avoid the static declaration (and if the unicodata module
is enabled by default), I'd vote for a dynamic initialization of the
database from reference (Python ?) file(s).

M-A, is something in this spirit doable?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tismer@tismer.com  Wed Mar 15 12:57:04 2000
From: tismer@tismer.com (Christian Tismer)
Date: Wed, 15 Mar 2000 13:57:04 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
 <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com>
Message-ID: <38CF88A0.CF876A74@tismer.com>


"M.-A. Lemburg" wrote:
...

> Note that unicodedata is only needed by programs which do
> a lot of Unicode manipulations and in the future probably
> by some codecs too.

Would it be possible to make the Unicode support configurable?

My problem is that patches in the CVS are of different kinds.
Some are error corrections and enhancements which I would
definately like to use.
Others are brand new features like the Unicode support.
Absolutely great stuff! But this will most probably change
a number of times again, and I think it is a bad idea when
I include it into my Stackless distribution.

I'd appreciate it very much if I could use the same CVS tree
for testing new stuff, and to build my distribution, with
new features switched off. Please :-)

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From jim@digicool.com  Wed Mar 15 13:35:48 2000
From: jim@digicool.com (Jim Fulton)
Date: Wed, 15 Mar 2000 08:35:48 -0500
Subject: [Python-Dev] Finalizers considered questionable ;)
Message-ID: <38CF91B4.A36C8C5@digicool.com>

Here's my $0.02.

I agree with the sentiments that use of finalizers 
should be discouraged.  They are extremely helpful
in cases like tempfile.TemporaryFileWrapper, so I
think that they should be supported. I do think that
the language should not promise a high level of service.

Some observations:

  - I spent a little bit of time on the ANSI 
    Smalltalk committee, where I naively advocated
    adding finalizers to the language. I was resoundingly
    told no. :)

  - Most of the Python objects I deal with these days
    are persistent. Their lifetimes are a lot more complicated
    that most Python objects.  They get created once, but they
    get loaded into and out of memory many times.  In fact, they
    can be in memory many times simultaneously. :) A couple
    of years ago I realized that it only made sense to call
    __init__ when an object was first created, not when it is
    subsequently (re)loaded into memory.  This led to a 
    change in Python pickling semantics and the deprecation
    of the loathsome __getinitargs__ protocol. :)

    For me, a similar case can be made against use of __del__
    for persistent objects.  For persistent objects, a __del__
    method should only be used for cleaning up the most volatile
    of resources. A persistent object __del__ should not perform
    any semantically meaningful operations because __del__ has 
    no semantic meaning.

  - Zope has a few uses of __del__. These are all for
    non-persistent objects. Interesting, in grepping for __del__,
    I found a lot of cases where __del__ was used and then commented 
    out.  Finalizers seem to be the sort of thing that people
    want initially and then get over.

I'm inclined to essentially keep the current rules and
simply not promise that __del__ will be able to run correctly.
That is, Python should call __del__ and ignore exceptions raised
(or provide some *optional* logging or other debugging facility).
There is no reason for __del__ to fail unless it depends on
cyclicly-related objects, which should be viewed as a design
mistake.

OTOH, __del__ should never fail because module globals go away. 
IMO, the current circular references involving module globals are
unnecessary, but that's a different topic. ;)

Jim

--
Jim Fulton           mailto:jim@digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From mal@lemburg.com  Wed Mar 15 15:00:14 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 16:00:14 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
 <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com>
Message-ID: <38CFA57E.21A3B3EF@lemburg.com>

Christian Tismer wrote:
> 
> "M.-A. Lemburg" wrote:
> ...
> 
> > Note that unicodedata is only needed by programs which do
> > a lot of Unicode manipulations and in the future probably
> > by some codecs too.
> 
> Would it be possible to make the Unicode support configurable?

This is currently not planned as the Unicode integration
touches many different parts of the interpreter to
enhance string/Unicode integration... sorry.

Also, I'm not sure whether adding #ifdefs throuhgout
the code would increase its elegance ;-)
 
> My problem is that patches in the CVS are of different kinds.
> Some are error corrections and enhancements which I would
> definately like to use.
> Others are brand new features like the Unicode support.
> Absolutely great stuff! But this will most probably change
> a number of times again, and I think it is a bad idea when
> I include it into my Stackless distribution.

Why not ? All you have to do is rebuild the distribution
every time you push a new version -- just like I did
for the Unicode version before the CVS checkin was done.
 
> I'd appreciate it very much if I could use the same CVS tree
> for testing new stuff, and to build my distribution, with
> new features switched off. Please :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Wed Mar 15 14:57:13 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 15:57:13 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003151140.MAA30301@python.inrialpes.fr>
Message-ID: <38CFA4C9.E6B8EB5D@lemburg.com>

Vladimir Marangozov wrote:
> 
> M.-A. Lemburg wrote:
> >
> > Note that unicodedata is only needed by programs which do
> > a lot of Unicode manipulations and in the future probably
> > by some codecs too.
> 
> Perhaps it would make sense to move the Unicode database on the
> Python side (write it in Python)? Or init the database dynamically
> in the unicodedata module on import? It's quite big, so if it's
> possible to avoid the static declaration (and if the unicodata module
> is enabled by default), I'd vote for a dynamic initialization of the
> database from reference (Python ?) file(s).

The unicodedatabase module contains the Unicode database
as static C data - this makes it shareable among (Python)
processes.

Python modules don't provide this feature: instead a dictionary
would have to be built on import which would increase the heap
size considerably. Those dicts would *not* be shareable.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From tismer@tismer.com  Wed Mar 15 15:20:06 2000
From: tismer@tismer.com (Christian Tismer)
Date: Wed, 15 Mar 2000 16:20:06 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
 <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com>
Message-ID: <38CFAA26.2B2F0D01@tismer.com>


"M.-A. Lemburg" wrote:
> 
> Christian Tismer wrote:
...
> > Absolutely great stuff! But this will most probably change
> > a number of times again, and I think it is a bad idea when
> > I include it into my Stackless distribution.
> 
> Why not ? All you have to do is rebuild the distribution
> every time you push a new version -- just like I did
> for the Unicode version before the CVS checkin was done.

But how can I then publish my source code, when I always
pull Unicode into it. I don't like to be exposed to
side effects like 700kb code bloat, just by chance, since it
is in the dist right now (and will vanish again).

I don't say there must be #ifdefs all and everywhere, but
can I build without *using* Unicode? I don't want to
introduce something new to my users what they didn't ask for.
And I don't want to take care about their installations.
Finally I will for sure not replace a 500k DLL by a 1.2M
monster, so this is definately not what I want at the moment.

How do I build a dist that doesn't need to change a lot of
stuff in the user's installation?
Note that Stackless Python is a drop-in replacement,
not a Python distribution. Or should it be?

ciao - chris   (who really wants to get SLP 1.1 out)

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From Fredrik Lundh" <effbot@telia.com  Wed Mar 15 16:04:54 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Wed, 15 Mar 2000 17:04:54 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>				<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com>
Message-ID: <014001bf8e98$35644480$34aab5d4@hagrid>

CT:
> How do I build a dist that doesn't need to change a lot of
> stuff in the user's installation?

somewhere in this thread, Guido wrote:

> BTW, I added a tag "pre-unicode" to the CVS tree to the revisions
> before the Unicode changes were made.

maybe you could base SLP on that one?

</F>


From Vladimir.Marangozov@inrialpes.fr  Wed Mar 15 16:27:36 2000
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Wed, 15 Mar 2000 17:27:36 +0100 (CET)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38CFA4C9.E6B8EB5D@lemburg.com> from "M.-A. Lemburg" at Mar 15, 2000 03:57:13 PM
Message-ID: <200003151627.RAA32543@python.inrialpes.fr>

> [me]
> > 
> > Perhaps it would make sense to move the Unicode database on the
> > Python side (write it in Python)? Or init the database dynamically
> > in the unicodedata module on import? It's quite big, so if it's
> > possible to avoid the static declaration (and if the unicodata module
> > is enabled by default), I'd vote for a dynamic initialization of the
> > database from reference (Python ?) file(s).

[Marc-Andre]
> 
> The unicodedatabase module contains the Unicode database
> as static C data - this makes it shareable among (Python)
> processes.

The static data is shared if the module is a shared object (.so).
If unicodedata is not a .so, then you'll have a seperate copy of the
database in each process.

> 
> Python modules don't provide this feature: instead a dictionary
> would have to be built on import which would increase the heap
> size considerably. Those dicts would *not* be shareable.

I haven't mentioned dicts, have I? I suggested that the entries in the
C version of the database be rewritten in Python (or a text file)
The unicodedata module would, in it's init function, allocate memory
for the database and would populate it before returning "import okay"
to Python -- this is one way to init the db dynamically, among others.

As to sharing the database among different processes, this is a classic
IPC pb, which has nothing to do with the static C declaration of the db.
Or, hmmm, one of us is royally confused <wink>.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tismer@tismer.com  Wed Mar 15 16:22:42 2000
From: tismer@tismer.com (Christian Tismer)
Date: Wed, 15 Mar 2000 17:22:42 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>				<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid>
Message-ID: <38CFB8D2.537FCAD9@tismer.com>


Fredrik Lundh wrote:
> 
> CT:
> > How do I build a dist that doesn't need to change a lot of
> > stuff in the user's installation?
> 
> somewhere in this thread, Guido wrote:
> 
> > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions
> > before the Unicode changes were made.
> 
> maybe you could base SLP on that one?

I have no idea how this works. Would this mean that I cannot
get patctes which come after unicode?

Meanwhile, I've looked into the sources. It is easy for me
to get rid of the problem by supplying my own unicodedata.c,
where I replace all functions by some unimplemented exception.

Furthermore, I wondered about the data format. Is the unicode
database used inyou re package as well? Otherwise, I see
only references form unicodedata.c, and that means the data
structure can be massively enhanced.
At the moment, that baby is 64k entries long, with four bytes
and an optional string.
This is a big waste. The strings are almost all some distinct
<xxx> prefixes, together with a list of hex smallwords. This
is done as strings, probably this makes 80 percent of the space.

The only function that uses the "decomposition" field (namely
the string) is unicodedata_decomposition. It does nothing
more than to wrap it into a PyObject.
We can do a little better here. I gues I can bring it down
to a third of this space without much effort, just by using
- binary encoding for the <xxx> tags as enumeration
- binary encoding of the hexed entries
- omission of the spaces
Instead of a 64 k of structures which contain pointers anyway,
I can use a 64k pointer array with offsets into one packed
table.

The unicodedata access functions would change *slightly*,
just building some hex strings and so on. I guess this
is not a time critical section?

Should I try this evening? :-)

cheers - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From mal@lemburg.com  Wed Mar 15 16:04:43 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 17:04:43 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
 <000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com>
Message-ID: <38CFB49B.885B8B16@lemburg.com>

Christian Tismer wrote:
> 
> "M.-A. Lemburg" wrote:
> >
> > Christian Tismer wrote:
> ...
> > > Absolutely great stuff! But this will most probably change
> > > a number of times again, and I think it is a bad idea when
> > > I include it into my Stackless distribution.
> >
> > Why not ? All you have to do is rebuild the distribution
> > every time you push a new version -- just like I did
> > for the Unicode version before the CVS checkin was done.
> 
> But how can I then publish my source code, when I always
> pull Unicode into it. I don't like to be exposed to
> side effects like 700kb code bloat, just by chance, since it
> is in the dist right now (and will vanish again).

All you have to do is build the unicodedata module shared
and not statically bound into python.dll. This one module
causes most of the code bloat...
 
> I don't say there must be #ifdefs all and everywhere, but
> can I build without *using* Unicode? I don't want to
> introduce something new to my users what they didn't ask for.
> And I don't want to take care about their installations.
> Finally I will for sure not replace a 500k DLL by a 1.2M
> monster, so this is definately not what I want at the moment.
> 
> How do I build a dist that doesn't need to change a lot of
> stuff in the user's installation?

I don't think that the Unicode stuff will disable
the running environment... (haven't tried this though).
The unicodedata module is not used by the interpreter
and the rest is imported on-the-fly, not during init
time, so at least in theory, not using Unicode will
result in Python not looking for e.g. the encodings
package.

> Note that Stackless Python is a drop-in replacement,
> not a Python distribution. Or should it be?

Probably... I think it's simply easier to install
and probably also easier to maintain because it doesn't
cause dependencies on other "default" installations.
The user will then explicitly know that she is installing
something a little different from the default distribution...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Wed Mar 15 17:26:15 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 18:26:15 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>				<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> <38CFB8D2.537FCAD9@tismer.com>
Message-ID: <38CFC7B7.A1ABD51C@lemburg.com>

Christian Tismer wrote:
> 
> Fredrik Lundh wrote:
> >
> > CT:
> > > How do I build a dist that doesn't need to change a lot of
> > > stuff in the user's installation?
> >
> > somewhere in this thread, Guido wrote:
> >
> > > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions
> > > before the Unicode changes were made.
> >
> > maybe you could base SLP on that one?
> 
> I have no idea how this works. Would this mean that I cannot
> get patctes which come after unicode?
> 
> Meanwhile, I've looked into the sources. It is easy for me
> to get rid of the problem by supplying my own unicodedata.c,
> where I replace all functions by some unimplemented exception.

No need (see my other posting): simply disable the module
altogether... this shouldn't hurt any part of the interpreter
as the module is a user-land only module.

> Furthermore, I wondered about the data format. Is the unicode
> database used inyou re package as well? Otherwise, I see
> only references form unicodedata.c, and that means the data
> structure can be massively enhanced.
> At the moment, that baby is 64k entries long, with four bytes
> and an optional string.
> This is a big waste. The strings are almost all some distinct
> <xxx> prefixes, together with a list of hex smallwords. This
> is done as strings, probably this makes 80 percent of the space.

I have made no attempt to optimize the structure... (due
to lack of time mostly) the current implementation is
really not much different from a rewrite of the UnicodeData.txt
file availble at the unicode.org site.

If you want to, I can mail you the marshalled Python dict version of
that database to play with.
 
> The only function that uses the "decomposition" field (namely
> the string) is unicodedata_decomposition. It does nothing
> more than to wrap it into a PyObject.
> We can do a little better here. I gues I can bring it down
> to a third of this space without much effort, just by using
> - binary encoding for the <xxx> tags as enumeration
> - binary encoding of the hexed entries
> - omission of the spaces
> Instead of a 64 k of structures which contain pointers anyway,
> I can use a 64k pointer array with offsets into one packed
> table.
> 
> The unicodedata access functions would change *slightly*,
> just building some hex strings and so on. I guess this
> is not a time critical section?

It may be if these functions are used in codecs, so you should
pay attention to speed too...
 
> Should I try this evening? :-)

Sure :-) go ahead...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Wed Mar 15 17:39:14 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 18:39:14 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003151627.RAA32543@python.inrialpes.fr>
Message-ID: <38CFCAC2.7690DF55@lemburg.com>

Vladimir Marangozov wrote:
> 
> > [me]
> > >
> > > Perhaps it would make sense to move the Unicode database on the
> > > Python side (write it in Python)? Or init the database dynamically
> > > in the unicodedata module on import? It's quite big, so if it's
> > > possible to avoid the static declaration (and if the unicodata module
> > > is enabled by default), I'd vote for a dynamic initialization of the
> > > database from reference (Python ?) file(s).
> 
> [Marc-Andre]
> >
> > The unicodedatabase module contains the Unicode database
> > as static C data - this makes it shareable among (Python)
> > processes.
> 
> The static data is shared if the module is a shared object (.so).
> If unicodedata is not a .so, then you'll have a seperate copy of the
> database in each process.

Uhm, comparing the two versions Python 1.5 and the current
CVS Python I get these figures on Linux:

Executing : ./python -i -c '1/0'

Python 1.5: 1208kB / 728 kB (resident/shared)
Python CVS: 1280kB / 808 kB ("/")

Not much of a change if you ask me and the CVS version has the
unicodedata module linked statically... so there's got to be
some sharing and load-on-demand going on behind the scenes:
this is what I was referring to when I mentioned static
C data. The OS can much better deal with these sharing techniques
and delayed loads than anything we could implement on top of
it in C or Python.

But perhaps this is Linux-specific...
 
> > Python modules don't provide this feature: instead a dictionary
> > would have to be built on import which would increase the heap
> > size considerably. Those dicts would *not* be shareable.
> 
> I haven't mentioned dicts, have I? I suggested that the entries in the
> C version of the database be rewritten in Python (or a text file)
> The unicodedata module would, in it's init function, allocate memory
> for the database and would populate it before returning "import okay"
> to Python -- this is one way to init the db dynamically, among others.

I'm leaving this as exercise to the interested reader ;-)
Really, if you have better ideas for the unicodedata module,
please go ahead.
 
> As to sharing the database among different processes, this is a classic
> IPC pb, which has nothing to do with the static C declaration of the db.
> Or, hmmm, one of us is royally confused <wink>.

Could you check this on other platforms ? Perhaps Linux is
doing more than other OSes are in this field.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Fredrik Lundh" <effbot@telia.com  Wed Mar 15 18:23:59 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Wed, 15 Mar 2000 19:23:59 +0100
Subject: [Python-Dev] first public SRE snapshot now available!
References: <200003151627.RAA32543@python.inrialpes.fr> <38CFCAC2.7690DF55@lemburg.com>
Message-ID: <01f901bf8eab$a353e780$34aab5d4@hagrid>

I just uploaded the first public SRE snapshot to:

    http://w1.132.telia.com/~u13208596/sre.htm

-- this kit contains windows binaries only (make
   sure you have built the interpreter from a recent
   CVS version)

-- the engine fully supports unicode target strings.
   (not sure about the pattern compiler, though...)

-- it's probably buggy as hell.  for things I'm working
   on at this very moment, see:

   http://w1.132.telia.com/~u13208596/sre/status.htm

I hope to get around to fix the core dump (it crashes half-
ways through sre_fulltest.py, by no apparent reason) and
the backreferencing problem later today.  stay tuned.

</F>

PS. note that "public" doesn't really mean "suitable for the
c.l.python crowd", or "suitable for production use".  in other
words, let's keep this one on this list for now.  thanks!


From tismer@tismer.com  Wed Mar 15 18:15:27 2000
From: tismer@tismer.com (Christian Tismer)
Date: Wed, 15 Mar 2000 19:15:27 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>				<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> <38CFB8D2.537FCAD9@tismer.com> <38CFC7B7.A1ABD51C@lemburg.com>
Message-ID: <38CFD33F.3C02BF43@tismer.com>


"M.-A. Lemburg" wrote:
> 
> Christian Tismer wrote:

[the old data comression guy has been reanimated]

> If you want to, I can mail you the marshalled Python dict version of
> that database to play with.
...
> > Should I try this evening? :-)
> 
> Sure :-) go ahead...

Thank you. Meanwhile I've heard that there is some well-known
bot working on that under the hood, with a much better approach
than mine. So I'll take your advice, and continue to write
silly stackless enhancements. They say this is my destiny :-)

ciao - continuous

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From DavidA@ActiveState.com  Wed Mar 15 18:21:40 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Wed, 15 Mar 2000 10:21:40 -0800
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38CFA4C9.E6B8EB5D@lemburg.com>
Message-ID: <NDBBJPNCJLKKIOBLDOMJKEAJCCAA.DavidA@ActiveState.com>

> The unicodedatabase module contains the Unicode database
> as static C data - this makes it shareable among (Python)
> processes.
>
> Python modules don't provide this feature: instead a dictionary
> would have to be built on import which would increase the heap
> size considerably. Those dicts would *not* be shareable.

I know it's complicating things, but wouldn't an mmap'ed buffer allow
inter-process sharing while keeping DLL size down and everything on-disk
until needed?

Yes, I know, mmap calls aren't uniform across platforms and isn't supported
on all platforms -- I still think that it's silly not to use it on those
platforms where it is available, and I'd like to see mmap unification move
forward, so this is as good a motivation as any to bite the bullet.

Just a thought,

--david


From jim@digicool.com  Wed Mar 15 18:24:53 2000
From: jim@digicool.com (Jim Fulton)
Date: Wed, 15 Mar 2000 13:24:53 -0500
Subject: [Python-Dev] Allowing multiple socket maps in asyncore (and asynchat)
Message-ID: <38CFD575.A0536439@digicool.com>

I find asyncore to be quite useful, however, it is currently
geared to having a single main loop. It uses a global socket
map that all asyncore dispatchers register with.

I have an application in which I want to have multiple 
socket maps.

I propose that we start moving toward a model in which 
selection of a socket map and control of the asyncore loop
is a bit more explicit.  

If no one objects, I'll work up some initial patches.

Who should I submit these to? Sam? 
Should the medusa public CVS form the basis?

Jim

--
Jim Fulton           mailto:jim@digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From jcw@equi4.com  Wed Mar 15 19:39:37 2000
From: jcw@equi4.com (Jean-Claude Wippler)
Date: Wed, 15 Mar 2000 20:39:37 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <NDBBJPNCJLKKIOBLDOMJKEAJCCAA.DavidA@ActiveState.com>
Message-ID: <38CFE6F9.3E8E9385@equi4.com>

David Ascher wrote:

[shareable unicodedatabase]
> I know it's complicating things, but wouldn't an mmap'ed buffer allow
> inter-process sharing while keeping DLL size down and everything
> on-disk until needed?

AFAIK, on platforms which support mmap, static data already gets mmap'ed
in by the OS (just like all code), so this might have little effect.

I'm more concerned by the distribution size increase.

-jcw


From bwarsaw@cnri.reston.va.us  Wed Mar 15 18:41:00 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Wed, 15 Mar 2000 13:41:00 -0500 (EST)
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
 <000901bf8e03$abf88420$34aab5d4@hagrid>
 <14542.47872.184978.985612@amarok.cnri.reston.va.us>
 <38CF4A9D.13A0080@lemburg.com>
 <38CF88A0.CF876A74@tismer.com>
 <38CFA57E.21A3B3EF@lemburg.com>
 <38CFAA26.2B2F0D01@tismer.com>
 <014001bf8e98$35644480$34aab5d4@hagrid>
Message-ID: <14543.55612.969101.206695@anthem.cnri.reston.va.us>

>>>>> "FL" == Fredrik Lundh <effbot@telia.com> writes:

    FL> somewhere in this thread, Guido wrote:

    >> BTW, I added a tag "pre-unicode" to the CVS tree to the
    >> revisions before the Unicode changes were made.

    FL> maybe you could base SLP on that one?

/F's got it exactly right.  Check out a new directory using a stable
tag (maybe you want to base your changes on pre-unicode tag, or python
1.52?).  Patch in that subtree and then eventually you'll have to
merge your changes into the head of the branch.

-Barry


From rushing@nightmare.com  Thu Mar 16 01:52:22 2000
From: rushing@nightmare.com (Sam Rushing)
Date: Wed, 15 Mar 2000 17:52:22 -0800 (PST)
Subject: [Python-Dev] Allowing multiple socket maps in asyncore (and asynchat)
In-Reply-To: <38CFD575.A0536439@digicool.com>
References: <38CFD575.A0536439@digicool.com>
Message-ID: <14544.15958.546712.466506@seattle.nightmare.com>

Jim Fulton writes:
 > I find asyncore to be quite useful, however, it is currently
 > geared to having a single main loop. It uses a global socket
 > map that all asyncore dispatchers register with.
 > 
 > I have an application in which I want to have multiple 
 > socket maps.

But still only a single event loop, yes?
Why do you need multiple maps?  For a priority system of some kind?

 > I propose that we start moving toward a model in which selection of
 > a socket map and control of the asyncore loop is a bit more
 > explicit.
 > 
 > If no one objects, I'll work up some initial patches.

If it can be done in a backward-compatible fashion, that sounds fine;
but it sounds tricky.  Even the simple {<descriptor>:object...} change
broke so many things that we're still using the old stuff at eGroups.

 > Who should I submit these to? Sam? 
 > Should the medusa public CVS form the basis?

Yup, yup.

-Sam


From tim_one@email.msn.com  Thu Mar 16 07:06:23 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 16 Mar 2000 02:06:23 -0500
Subject: [Python-Dev] Finalizers considered questionable ;)
In-Reply-To: <38CF91B4.A36C8C5@digicool.com>
Message-ID: <000201bf8f16$237e5e80$662d153f@tim>

[Jim Fulton]
> ...
> There is no reason for __del__ to fail unless it depends on
> cyclicly-related objects, which should be viewed as a design
> mistake.
>
> OTOH, __del__ should never fail because module globals go away.
> IMO, the current circular references involving module globals are
> unnecessary, but that's a different topic. ;)

IOW, you view "the current circular references involving module globals" as
"a design mistake" <wink>.  And perhaps they are!  I wouldn't call it a
different topic, though:  so long as people are *viewing* shutdown __del__
problems as just another instance of finalizers in cyclic trash, it makes
the latter *seem* inescapably "normal", and so something that has to be
catered to.  If you have a way to take the shutdown problems out of the
discussion, it would help clarify both topics, at the very least by
deconflating them.

it's-a-mailing-list-so-no-need-to-stay-on-topic<wink>-ly y'rs  - tim


From gstein@lyra.org  Thu Mar 16 12:01:36 2000
From: gstein@lyra.org (Greg Stein)
Date: Thu, 16 Mar 2000 04:01:36 -0800 (PST)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38CF88A0.CF876A74@tismer.com>
Message-ID: <Pine.LNX.4.10.10003160357500.2258-100000@nebula.lyra.org>

On Wed, 15 Mar 2000, Christian Tismer wrote:
>...
> Would it be possible to make the Unicode support configurable?

This might be interesting from the standpoint of those guys who are doing
the tiny Python interpreter thingy for embedded systems.

> My problem is that patches in the CVS are of different kinds.
> Some are error corrections and enhancements which I would
> definately like to use.
> Others are brand new features like the Unicode support.
> Absolutely great stuff! But this will most probably change
> a number of times again, and I think it is a bad idea when
> I include it into my Stackless distribution.
> 
> I'd appreciate it very much if I could use the same CVS tree
> for testing new stuff, and to build my distribution, with
> new features switched off. Please :-)

But! I find this reason completely off the mark. In essence, you're
arguing that we should not put *any* new feature into the CVS repository
because it might mess up what *you* are doing.

Sorry, but that just irks me. If you want a stable Python, then don't use
the CVS version. Or base it off a specific tag in CVS. Or something. Just
don't ask for development to be stopped.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Thu Mar 16 12:08:43 2000
From: gstein@lyra.org (Greg Stein)
Date: Thu, 16 Mar 2000 04:08:43 -0800 (PST)
Subject: [Python-Dev] const data (was: Unicode patches checked in)
In-Reply-To: <200003151627.RAA32543@python.inrialpes.fr>
Message-ID: <Pine.LNX.4.10.10003160401570.2258-100000@nebula.lyra.org>

On Wed, 15 Mar 2000, Vladimir Marangozov wrote:
> > [me]
> > > 
> > > Perhaps it would make sense to move the Unicode database on the
> > > Python side (write it in Python)? Or init the database dynamically
> > > in the unicodedata module on import? It's quite big, so if it's
> > > possible to avoid the static declaration (and if the unicodata module
> > > is enabled by default), I'd vote for a dynamic initialization of the
> > > database from reference (Python ?) file(s).
> 
> [Marc-Andre]
> > 
> > The unicodedatabase module contains the Unicode database
> > as static C data - this makes it shareable among (Python)
> > processes.
> 
> The static data is shared if the module is a shared object (.so).
> If unicodedata is not a .so, then you'll have a seperate copy of the
> database in each process.

Nope. A shared module means that multiple executables can share the code.
Whether the const data resides in an executable or a .so, the OS will map
it into readonly memory and share it across all procsses.

> > Python modules don't provide this feature: instead a dictionary
> > would have to be built on import which would increase the heap
> > size considerably. Those dicts would *not* be shareable.
> 
> I haven't mentioned dicts, have I? I suggested that the entries in the
> C version of the database be rewritten in Python (or a text file)
> The unicodedata module would, in it's init function, allocate memory
> for the database and would populate it before returning "import okay"
> to Python -- this is one way to init the db dynamically, among others.

This would place all that data into the per-process heap. Definitely not
shared, and definitely a big hit for each Python process.

> As to sharing the database among different processes, this is a classic
> IPC pb, which has nothing to do with the static C declaration of the db.
> Or, hmmm, one of us is royally confused <wink>.

This isn't IPC. It is sharing of some constant data. The most effective
way to manage this is through const C data. The OS will properly manage
it.

And sorry, David, but mmap'ing a file will simply add complexity. As jcw
mentioned, the OS is pretty much doing this anyhow when it deals with a
const data segment in your executable.

I don't believe this is Linux specific. This kind of stuff has been done
for a *long* time on the platforms, too.

Side note: the most effective way of exposing this const data up to Python
(without shoving it onto the heap) is through buffers created via:
   PyBuffer_FromMemory(ptr, size)
This allows the data to reside in const, shared memory while it is also
exposed up to Python.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From Vladimir.Marangozov@inrialpes.fr  Thu Mar 16 12:39:42 2000
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Thu, 16 Mar 2000 13:39:42 +0100 (CET)
Subject: [Python-Dev] const data (was: Unicode patches checked in)
In-Reply-To: <Pine.LNX.4.10.10003160401570.2258-100000@nebula.lyra.org> from "Greg Stein" at Mar 16, 2000 04:08:43 AM
Message-ID: <200003161239.NAA01671@python.inrialpes.fr>

Greg Stein wrote:
> 
> [me]
> > The static data is shared if the module is a shared object (.so).
> > If unicodedata is not a .so, then you'll have a seperate copy of the
> > database in each process.
> 
> Nope. A shared module means that multiple executables can share the code.
> Whether the const data resides in an executable or a .so, the OS will map
> it into readonly memory and share it across all procsses.

I must have been drunk yesterday<wink>. You're right.

> I don't believe this is Linux specific. This kind of stuff has been done
> for a *long* time on the platforms, too.

Yes.

> 
> Side note: the most effective way of exposing this const data up to Python
> (without shoving it onto the heap) is through buffers created via:
>    PyBuffer_FromMemory(ptr, size)
> This allows the data to reside in const, shared memory while it is also
> exposed up to Python.

And to avoid the size increase of the Python library, perhaps unicodedata
needs to be uncommented by default in Setup.in (for the release, not now).
As M-A pointed out, the module isn't isn't necessary for the normal
operation of the interpreter.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From gstein@lyra.org  Thu Mar 16 12:56:21 2000
From: gstein@lyra.org (Greg Stein)
Date: Thu, 16 Mar 2000 04:56:21 -0800 (PST)
Subject: [Python-Dev] Finalizers considered questionable ;)
In-Reply-To: <000201bf8f16$237e5e80$662d153f@tim>
Message-ID: <Pine.LNX.4.10.10003160455020.2258-100000@nebula.lyra.org>

On Thu, 16 Mar 2000, Tim Peters wrote:
>...
> IOW, you view "the current circular references involving module globals" as
> "a design mistake" <wink>.  And perhaps they are!  I wouldn't call it a
> different topic, though:  so long as people are *viewing* shutdown __del__
> problems as just another instance of finalizers in cyclic trash, it makes
> the latter *seem* inescapably "normal", and so something that has to be
> catered to.  If you have a way to take the shutdown problems out of the
> discussion, it would help clarify both topics, at the very least by
> deconflating them.

Bah. Module globals are easy. My tp_clean suggestion handles them quite
easily at shutdown. No more special-code in import.c.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tismer@tismer.com  Thu Mar 16 12:53:46 2000
From: tismer@tismer.com (Christian Tismer)
Date: Thu, 16 Mar 2000 13:53:46 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <Pine.LNX.4.10.10003160357500.2258-100000@nebula.lyra.org>
Message-ID: <38D0D95A.B13EC17E@tismer.com>


Greg Stein wrote:
> 
> On Wed, 15 Mar 2000, Christian Tismer wrote:
> >...
> > Would it be possible to make the Unicode support configurable?
> 
> This might be interesting from the standpoint of those guys who are doing
> the tiny Python interpreter thingy for embedded systems.
> 
> > My problem is that patches in the CVS are of different kinds.
> > Some are error corrections and enhancements which I would
> > definately like to use.
> > Others are brand new features like the Unicode support.
> > Absolutely great stuff! But this will most probably change
> > a number of times again, and I think it is a bad idea when
> > I include it into my Stackless distribution.
> >
> > I'd appreciate it very much if I could use the same CVS tree
> > for testing new stuff, and to build my distribution, with
> > new features switched off. Please :-)
> 
> But! I find this reason completely off the mark. In essence, you're
> arguing that we should not put *any* new feature into the CVS repository
> because it might mess up what *you* are doing.

No, this is your interpretation, and a reduction which I can't follow.
There are inprovements and features in the CVS version which I need.
I prefer to build against it, instead of the old 1.5.2. What's wrong
with that? I want to find a way that gives me the least trouble
in doing so.

> Sorry, but that just irks me. If you want a stable Python, then don't use
> the CVS version. Or base it off a specific tag in CVS. Or something. Just
> don't ask for development to be stopped.

No, I ask for development to be stopped. Code freeze until Y3k :-)
Why are you trying to put such a nonsense into my mouth?
You know that I know that you know better.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From tismer@tismer.com  Thu Mar 16 13:25:48 2000
From: tismer@tismer.com (Christian Tismer)
Date: Thu, 16 Mar 2000 14:25:48 +0100
Subject: [Python-Dev] const data (was: Unicode patches checked in)
References: <200003161239.NAA01671@python.inrialpes.fr>
Message-ID: <38D0E0DC.B997F836@tismer.com>


Vladimir Marangozov wrote:
> 
> Greg Stein wrote:

> > Side note: the most effective way of exposing this const data up to Python
> > (without shoving it onto the heap) is through buffers created via:
> >    PyBuffer_FromMemory(ptr, size)
> > This allows the data to reside in const, shared memory while it is also
> > exposed up to Python.
> 
> And to avoid the size increase of the Python library, perhaps unicodedata
> needs to be uncommented by default in Setup.in (for the release, not now).
> As M-A pointed out, the module isn't isn't necessary for the normal
> operation of the interpreter.

Sounds like a familiar idea. :-)

BTW., yesterday evening I wrote an analysis script, to see how
far this data is compactable without going into real compression,
just redundancy folding and byte/short indexing was used.
If I'm not wrong, this reduces the size of the database to less
than 25kb. That small amount of extra data would make the
uncommenting feature quite unimportant, except for the issue
of building tiny Pythons.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From gstein@lyra.org  Thu Mar 16 13:06:46 2000
From: gstein@lyra.org (Greg Stein)
Date: Thu, 16 Mar 2000 05:06:46 -0800 (PST)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38D0D95A.B13EC17E@tismer.com>
Message-ID: <Pine.LNX.4.10.10003160502590.2258-100000@nebula.lyra.org>

On Thu, 16 Mar 2000, Christian Tismer wrote:
> Greg Stein wrote:
>...
> > Sorry, but that just irks me. If you want a stable Python, then don't use
> > the CVS version. Or base it off a specific tag in CVS. Or something. Just
> > don't ask for development to be stopped.
> 
> No, I ask for development to be stopped. Code freeze until Y3k :-)
> Why are you trying to put such a nonsense into my mouth?
> You know that I know that you know better.

Simply because that is what it sounds like on this side of my monitor :-)

I'm seeing your request as asking for people to make special
considerations in their patches for your custom distribution. While I
don't have a problem with making Python more flexible to distro
maintainers, it seemed like you were approaching it from the "wrong"
angle. Like I said, making Unicode optional for the embedded space makes
sense; making it optional so it doesn't bloat your distro didn't :-)

Not a big deal... it is mostly a perception on my part. I also tend to
dislike things that hold development back.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal@lemburg.com  Fri Mar 17 18:53:39 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 17 Mar 2000 19:53:39 +0100
Subject: [Python-Dev] Unicode Update 2000-03-17
Message-ID: <38D27F33.4055A942@lemburg.com>

This is a multi-part message in MIME format.
--------------A764B515049AA0B5F7643A5B
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Attached you find an update of the Unicode implementation.

The patch is against the current CVS version. I would appreciate
if someone with CVS checkin permissions could check the changes
in.

The patch contains all bugs and patches sent this week and
also fixes a leak in the codecs code and a bug in the free list
code for Unicode objects (which only shows up when compiling
Python with Py_DEBUG; thanks to MarkH for spotting this one).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/
--------------A764B515049AA0B5F7643A5B
Content-Type: text/plain; charset=us-ascii;
 name="Unicode-Implementation-2000-03-17.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="Unicode-Implementation-2000-03-17.patch"

Only in CVS-Python/Doc/tools: anno-api.py
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Include/unicodeobject.h Python+Unicode/Include/unicodeobject.h
--- CVS-Python/Include/unicodeobject.h	Fri Mar 17 15:24:30 2000
+++ Python+Unicode/Include/unicodeobject.h	Tue Mar 14 10:38:08 2000
@@ -1,8 +1,5 @@
 #ifndef Py_UNICODEOBJECT_H
 #define Py_UNICODEOBJECT_H
-#ifdef __cplusplus
-extern "C" {
-#endif
 
 /*
 
@@ -109,8 +106,9 @@
 /* --- Internal Unicode Operations ---------------------------------------- */
 
 /* If you want Python to use the compiler's wctype.h functions instead
-   of the ones supplied with Python, define WANT_WCTYPE_FUNCTIONS.
-   This reduces the interpreter's code size. */
+   of the ones supplied with Python, define WANT_WCTYPE_FUNCTIONS or
+   configure Python using --with-ctype-functions.  This reduces the
+   interpreter's code size. */
 
 #if defined(HAVE_USABLE_WCHAR_T) && defined(WANT_WCTYPE_FUNCTIONS)
 
@@ -169,6 +167,10 @@
     (!memcmp((string)->str + (offset), (substring)->str,\
              (substring)->length*sizeof(Py_UNICODE)))
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* --- Unicode Type ------------------------------------------------------- */
 
 typedef struct {
@@ -647,7 +649,7 @@
     int direction		/* Find direction: +1 forward, -1 backward */
     );
 
-/* Count the number of occurances of substr in str[start:end]. */
+/* Count the number of occurrences of substr in str[start:end]. */
 
 extern DL_IMPORT(int) PyUnicode_Count(
     PyObject *str,		/* String */ 
@@ -656,7 +658,7 @@
     int end			/* Stop index */
     );
 
-/* Replace at most maxcount occurances of substr in str with replstr
+/* Replace at most maxcount occurrences of substr in str with replstr
    and return the resulting Unicode object. */
 
 extern DL_IMPORT(PyObject *) PyUnicode_Replace(
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/codecs.py Python+Unicode/Lib/codecs.py
--- CVS-Python/Lib/codecs.py	Sat Mar 11 00:20:43 2000
+++ Python+Unicode/Lib/codecs.py	Mon Mar 13 14:33:54 2000
@@ -55,7 +55,7 @@
     """
     def encode(self,input,errors='strict'):
         
-        """ Encodes the object intput and returns a tuple (output
+        """ Encodes the object input and returns a tuple (output
             object, length consumed).
 
             errors defines the error handling to apply. It defaults to
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/encodings/__init__.py Python+Unicode/Lib/encodings/__init__.py
--- CVS-Python/Lib/encodings/__init__.py	Sat Mar 11 00:17:18 2000
+++ Python+Unicode/Lib/encodings/__init__.py	Mon Mar 13 14:30:33 2000
@@ -30,13 +30,13 @@
 import string,codecs,aliases
 
 _cache = {}
-_unkown = '--unkown--'
+_unknown = '--unknown--'
 
 def search_function(encoding):
     
     # Cache lookup
-    entry = _cache.get(encoding,_unkown)
-    if entry is not _unkown:
+    entry = _cache.get(encoding,_unknown)
+    if entry is not _unknown:
         return entry
 
     # Import the module
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_string.py Python+Unicode/Lib/test/test_string.py
--- CVS-Python/Lib/test/test_string.py	Sat Mar 11 10:52:43 2000
+++ Python+Unicode/Lib/test/test_string.py	Mon Mar 13 10:12:46 2000
@@ -143,6 +143,7 @@
 test('translate', 'xyz', 'xyz', table)
 
 test('replace', 'one!two!three!', 'one@two!three!', '!', '@', 1)
+test('replace', 'one!two!three!', 'onetwothree', '!', '')
 test('replace', 'one!two!three!', 'one@two@three!', '!', '@', 2)
 test('replace', 'one!two!three!', 'one@two@three@', '!', '@', 3)
 test('replace', 'one!two!three!', 'one@two@three@', '!', '@', 4)
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py
--- CVS-Python/Lib/test/test_unicode.py	Fri Mar 17 15:24:31 2000
+++ Python+Unicode/Lib/test/test_unicode.py	Mon Mar 13 10:13:05 2000
@@ -108,6 +108,7 @@
     test('translate', u'xyz', u'xyz', table)
 
 test('replace', u'one!two!three!', u'one@two!three!', u'!', u'@', 1)
+test('replace', u'one!two!three!', u'onetwothree', '!', '')
 test('replace', u'one!two!three!', u'one@two@three!', u'!', u'@', 2)
 test('replace', u'one!two!three!', u'one@two@three@', u'!', u'@', 3)
 test('replace', u'one!two!three!', u'one@two@three@', u'!', u'@', 4)
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt
--- CVS-Python/Misc/unicode.txt	Sat Mar 11 00:14:11 2000
+++ Python+Unicode/Misc/unicode.txt	Fri Mar 17 16:55:11 2000
@@ -743,8 +743,9 @@
 stream codecs as available through the codecs module should 
 be used.
 
-XXX There should be a short-cut open(filename,mode,encoding) available which
-    also assures that mode contains the 'b' character when needed.
+The codecs module should provide a short-cut open(filename,mode,encoding)
+available which also assures that mode contains the 'b' character when
+needed.
 
 
 File/Stream Input:
@@ -810,6 +811,10 @@
 Introduction to Unicode (a little outdated by still nice to read):
         http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html
 
+For comparison:
+	Introducing Unicode to ECMAScript --
+	http://www-4.ibm.com/software/developer/library/internationalization-support.html
+
 Encodings:
 
     Overview:
@@ -832,7 +837,7 @@
 
 History of this Proposal:
 -------------------------
-1.2: 
+1.2: Removed POD about codecs.open()
 1.1: Added note about comparisons and hash values. Added note about
      case mapping algorithms. Changed stream codecs .read() and
      .write() method to match the standard file-like object methods
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Modules/stropmodule.c Python+Unicode/Modules/stropmodule.c
--- CVS-Python/Modules/stropmodule.c	Wed Mar  1 10:22:53 2000
+++ Python+Unicode/Modules/stropmodule.c	Mon Mar 13 14:33:23 2000
@@ -1054,7 +1054,7 @@
 
   strstr replacement for arbitrary blocks of memory.
 
-  Locates the first occurance in the memory pointed to by MEM of the
+  Locates the first occurrence in the memory pointed to by MEM of the
   contents of memory pointed to by PAT.  Returns the index into MEM if
   found, or -1 if not found.  If len of PAT is greater than length of
   MEM, the function returns -1.
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/stringobject.c Python+Unicode/Objects/stringobject.c
--- CVS-Python/Objects/stringobject.c	Tue Mar 14 00:14:17 2000
+++ Python+Unicode/Objects/stringobject.c	Mon Mar 13 14:33:24 2000
@@ -1395,7 +1395,7 @@
 
   strstr replacement for arbitrary blocks of memory.
 
-  Locates the first occurance in the memory pointed to by MEM of the
+  Locates the first occurrence in the memory pointed to by MEM of the
   contents of memory pointed to by PAT.  Returns the index into MEM if
   found, or -1 if not found.  If len of PAT is greater than length of
   MEM, the function returns -1.
@@ -1578,7 +1578,7 @@
 		return NULL;
 
 	if (sub_len <= 0) {
-		PyErr_SetString(PyExc_ValueError, "empty replacement string");
+		PyErr_SetString(PyExc_ValueError, "empty pattern string");
 		return NULL;
 	}
 	new_s = mymemreplace(str,len,sub,sub_len,repl,repl_len,count,&out_len);
Only in CVS-Python/Objects: stringobject.c.orig
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/unicodeobject.c Python+Unicode/Objects/unicodeobject.c
--- CVS-Python/Objects/unicodeobject.c	Tue Mar 14 00:14:17 2000
+++ Python+Unicode/Objects/unicodeobject.c	Wed Mar 15 10:49:19 2000
@@ -83,7 +83,7 @@
    all objects on the free list having a size less than this
    limit. This reduces malloc() overhead for small Unicode objects.  
 
-   At worse this will result in MAX_UNICODE_FREELIST_SIZE *
+   At worst this will result in MAX_UNICODE_FREELIST_SIZE *
    (sizeof(PyUnicodeObject) + STAYALIVE_SIZE_LIMIT +
    malloc()-overhead) bytes of unused garbage.
 
@@ -180,7 +180,7 @@
         unicode_freelist = *(PyUnicodeObject **)unicode_freelist;
         unicode_freelist_size--;
         unicode->ob_type = &PyUnicode_Type;
-        _Py_NewReference(unicode);
+        _Py_NewReference((PyObject *)unicode);
 	if (unicode->str) {
 	    if (unicode->length < length &&
 		_PyUnicode_Resize(unicode, length)) {
@@ -199,16 +199,19 @@
 	unicode->str = PyMem_NEW(Py_UNICODE, length + 1);
     }
 
-    if (!unicode->str) {
-        PyMem_DEL(unicode);
-        PyErr_NoMemory();
-        return NULL;
-    }
+    if (!unicode->str) 
+	goto onError;
     unicode->str[length] = 0;
     unicode->length = length;
     unicode->hash = -1;
     unicode->utf8str = NULL;
     return unicode;
+
+ onError:
+    _Py_ForgetReference((PyObject *)unicode);
+    PyMem_DEL(unicode);
+    PyErr_NoMemory();
+    return NULL;
 }
 
 static
@@ -224,7 +227,6 @@
         *(PyUnicodeObject **)unicode = unicode_freelist;
         unicode_freelist = unicode;
         unicode_freelist_size++;
-        _Py_ForgetReference(unicode);
     }
     else {
 	free(unicode->str);
@@ -489,7 +491,7 @@
     }
     else {
         PyErr_Format(PyExc_ValueError,
-                     "UTF-8 decoding error; unkown error handling code: %s",
+                     "UTF-8 decoding error; unknown error handling code: %s",
                      errors);
         return -1;
     }
@@ -611,7 +613,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "UTF-8 encoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -733,7 +735,7 @@
     }
     else {
         PyErr_Format(PyExc_ValueError,
-                     "UTF-16 decoding error; unkown error handling code: %s",
+                     "UTF-16 decoding error; unknown error handling code: %s",
                      errors);
         return -1;
     }
@@ -921,7 +923,7 @@
     else {
         PyErr_Format(PyExc_ValueError,
                      "Unicode-Escape decoding error; "
-                     "unkown error handling code: %s",
+                     "unknown error handling code: %s",
                      errors);
         return -1;
     }
@@ -1051,6 +1053,10 @@
 
 */
 
+static const Py_UNICODE *findchar(const Py_UNICODE *s,
+				  int size,
+				  Py_UNICODE ch);
+
 static
 PyObject *unicodeescape_string(const Py_UNICODE *s,
                                int size,
@@ -1069,9 +1075,6 @@
     p = q = PyString_AS_STRING(repr);
 
     if (quotes) {
-        static const Py_UNICODE *findchar(const Py_UNICODE *s,
-					  int size,
-					  Py_UNICODE ch);
         *p++ = 'u';
         *p++ = (findchar(s, size, '\'') && 
                 !findchar(s, size, '"')) ? '"' : '\'';
@@ -1298,7 +1301,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "Latin-1 encoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1369,7 +1372,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "ASCII decoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1431,7 +1434,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "ASCII encoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1502,7 +1505,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "charmap decoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1618,7 +1621,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "charmap encoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1750,7 +1753,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "translate error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Python/codecs.c Python+Unicode/Python/codecs.c
--- CVS-Python/Python/codecs.c	Fri Mar 10 23:57:27 2000
+++ Python+Unicode/Python/codecs.c	Wed Mar 15 11:27:54 2000
@@ -93,9 +93,14 @@
 
 PyObject *_PyCodec_Lookup(const char *encoding)
 {
-    PyObject *result, *args = NULL, *v;
+    PyObject *result, *args = NULL, *v = NULL;
     int i, len;
 
+    if (_PyCodec_SearchCache == NULL || _PyCodec_SearchPath == NULL) {
+	PyErr_SetString(PyExc_SystemError,
+			"codec module not properly initialized");
+	goto onError;
+    }
     if (!import_encodings_called)
 	import_encodings();
 
@@ -109,6 +114,7 @@
     result = PyDict_GetItem(_PyCodec_SearchCache, v);
     if (result != NULL) {
 	Py_INCREF(result);
+	Py_DECREF(v);
 	return result;
     }
     
@@ -121,6 +127,7 @@
     if (args == NULL)
 	goto onError;
     PyTuple_SET_ITEM(args,0,v);
+    v = NULL;
 
     for (i = 0; i < len; i++) {
 	PyObject *func;
@@ -146,7 +153,7 @@
     if (i == len) {
 	/* XXX Perhaps we should cache misses too ? */
 	PyErr_SetString(PyExc_LookupError,
-			"unkown encoding");
+			"unknown encoding");
 	goto onError;
     }
 
@@ -156,6 +163,7 @@
     return result;
 
  onError:
+    Py_XDECREF(v);
     Py_XDECREF(args);
     return NULL;
 }
@@ -378,5 +386,7 @@
 void _PyCodecRegistry_Fini()
 {
     Py_XDECREF(_PyCodec_SearchPath);
+    _PyCodec_SearchPath = NULL;
     Py_XDECREF(_PyCodec_SearchCache);
+    _PyCodec_SearchCache = NULL;
 }

--------------A764B515049AA0B5F7643A5B--


From bwarsaw@cnri.reston.va.us  Fri Mar 17 19:16:02 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 17 Mar 2000 14:16:02 -0500 (EST)
Subject: [Python-Dev] Unicode Update 2000-03-17
References: <38D27F33.4055A942@lemburg.com>
Message-ID: <14546.33906.771022.916209@anthem.cnri.reston.va.us>

>>>>> "M" == M  <mal@lemburg.com> writes:

    M> The patch is against the current CVS version. I would
    M> appreciate if someone with CVS checkin permissions could check
    M> the changes in.

Hi MAL, I just tried to apply your patch against the tree, however
patch complains that the Lib/codecs.py patch is reversed.  I haven't
looked closely at it, but do you have any ideas?  Or why don't you
just send me Lib/codecs.py and I'll drop it in place.

Everything else patched cleanly.

-Barry


From ping@lfw.org  Fri Mar 17 14:06:13 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 08:06:13 -0600 (CST)
Subject: [Python-Dev] Boolean type for Py3K?
Message-ID: <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org>

I wondered to myself today while reading through the Python
tutorial whether it would be a good idea to have a separate
boolean type in Py3K.  Would this help catch common mistakes?

I won't presume to truly understand the new-to-Python experience,
but one might *guess* that

    >>> 5 > 3
    true

would make a little more sense to a beginner than

    >>> 5 > 3
    1

Of course this means introducing "true" and "false" as keywords 
(or built-in values like None -- perhaps they should be spelled
True and False?) and completely changing the way a lot of code
runs by introducing a bunch of type checking, so it may be too
radical a change, but --

And i don't know if it's already been discussed a lot, but --

I thought it wouldn't hurt just to raise the question.


-- ?!ng


From ping@lfw.org  Fri Mar 17 14:06:55 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 08:06:55 -0600 (CST)
Subject: [Python-Dev] Should None be a keyword?
Message-ID: <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org>

Related to my last message: should None become a keyword in Py3K?


-- ?!ng


From bwarsaw@cnri.reston.va.us  Fri Mar 17 20:49:24 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 17 Mar 2000 15:49:24 -0500 (EST)
Subject: [Python-Dev] Boolean type for Py3K?
References: <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org>
Message-ID: <14546.39508.312796.221069@anthem.cnri.reston.va.us>

>>>>> "KY" == Ka-Ping Yee <ping@lfw.org> writes:

    KY> I wondered to myself today while reading through the Python
    KY> tutorial whether it would be a good idea to have a separate
    KY> boolean type in Py3K.  Would this help catch common mistakes?

Almost a year ago, I mused about a boolean type in c.l.py, and came up
with this prototype in Python.

-------------------- snip snip --------------------
class Boolean:
    def __init__(self, flag=0):
        self.__flag = not not flag

    def __str__(self):
        return self.__flag and 'true' or 'false'

    def __repr__(self):
        return self.__str__()

    def __nonzero__(self):
        return self.__flag == 1

    def __cmp__(self, other):
        if (self.__flag and other) or (not self.__flag and not other):
            return 0
        else:
            return 1

    def __rcmp__(self, other):
        return -self.__cmp__(other)

true = Boolean(1)
false = Boolean()
-------------------- snip snip --------------------

I think it makes sense to augment Python's current truth rules with a
built-in boolean type and True and False values.  But unless it's tied
in more deeply (e.g. comparisons return one of these instead of
integers -- and what are the implications of that?) then it's pretty
much just syntactic sugar <0.75 lick>.

-Barry


From bwarsaw@cnri.reston.va.us  Fri Mar 17 20:50:00 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 17 Mar 2000 15:50:00 -0500 (EST)
Subject: [Python-Dev] Should None be a keyword?
References: <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org>
Message-ID: <14546.39544.673335.378797@anthem.cnri.reston.va.us>

>>>>> "KY" == Ka-Ping Yee <ping@lfw.org> writes:

    KY> Related to my last message: should None become a keyword in
    KY> Py3K?

Why?  Just to reserve it?
-Barry


From Moshe Zadka <mzadka@geocities.com>  Fri Mar 17 20:52:29 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Fri, 17 Mar 2000 22:52:29 +0200 (IST)
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: <14546.39508.312796.221069@anthem.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003172248210.16605-100000@sundial>

On Fri, 17 Mar 2000, Barry A. Warsaw wrote:

> Almost a year ago, I mused about a boolean type in c.l.py, and came up
> with this prototype in Python.

Cool prototype!
However, I think I have a problem with the proposed semantics:

>     def __cmp__(self, other):
>         if (self.__flag and other) or (not self.__flag and not other):
>             return 0
>         else:
>             return 1

This means:

true == 1
true == 2

But 

1 != 2

I have some difficulty with == not being an equivalence relation...

> I think it makes sense to augment Python's current truth rules with a
> built-in boolean type and True and False values.

Right on! Except for the built-in...why not have it like exceptions.py,
Python code necessary for the interpreter? Languages which compile
themselves are not unheard of <wink>

> But unless it's tied
> in more deeply (e.g. comparisons return one of these instead of
> integers -- and what are the implications of that?) 

Breaking loads of horrible code. Unacceptable for the 1.x series, but 
perfectly fine in Py3K

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From Fredrik Lundh" <effbot@telia.com  Fri Mar 17 21:12:15 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Fri, 17 Mar 2000 22:12:15 +0100
Subject: [Python-Dev] Should None be a keyword?
References: <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org> <14546.39544.673335.378797@anthem.cnri.reston.va.us>
Message-ID: <004e01bf9055$79012000$34aab5d4@hagrid>

Barry A. Warsaw wrote:
> >>>>> "KY" =3D=3D Ka-Ping Yee <ping@lfw.org> writes:
>=20
>     KY> Related to my last message: should None become a keyword in
>     KY> Py3K?
>=20
> Why?  Just to reserve it?

to avoid stuff errors like:

    def foo():

        result =3D None

        # two screenfuls of code

        None, a, b =3D mytuple # perlish unpacking

which gives an interesting error on the first line, instead
of a syntax error on the last.

</F>


From guido@python.org  Fri Mar 17 21:20:05 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 17 Mar 2000 16:20:05 -0500
Subject: [Python-Dev] Should None be a keyword?
In-Reply-To: Your message of "Fri, 17 Mar 2000 08:06:55 CST."
 <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org>
References: <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org>
Message-ID: <200003172120.QAA09045@eric.cnri.reston.va.us>

Yes.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Mar 17 21:20:36 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 17 Mar 2000 16:20:36 -0500
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: Your message of "Fri, 17 Mar 2000 08:06:13 CST."
 <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org>
References: <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org>
Message-ID: <200003172120.QAA09115@eric.cnri.reston.va.us>

Yes.  True and False make sense.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From python-dev@python.org  Fri Mar 17 21:17:06 2000
From: python-dev@python.org (Peter Funk)
Date: Fri, 17 Mar 2000 22:17:06 +0100 (MET)
Subject: [Python-Dev] Should None be a keyword?
In-Reply-To: <14546.39544.673335.378797@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at "Mar 17, 2000  3:50: 0 pm"
Message-ID: <m12W47G-000CnCC@artcom0.artcom-gmbh.de>

> >>>>> "KY" == Ka-Ping Yee <ping@lfw.org> writes:
> 
>     KY> Related to my last message: should None become a keyword in
>     KY> Py3K?

Barry A. Warsaw schrieb:
> Why?  Just to reserve it?

This is related to the general type checking discussion.  IMO the suggested
    >>> 1 > 0
    True
wouldn't buy us much, as long the following behaviour stays in Py3K:
    >>> a = '2' ; b = 3
    >>> a < b
    0
    >>> a > b
    1
This is irritating to Newcomers (at least from rather short time experience
as member of python-help)!  And this is esspecially irritating, since you 
can't do
    >>> c = a + b
    Traceback (innermost last):
      File "<stdin>", line 1, in ?
    TypeError: illegal argument type for built-in operation

IMO this difference is far more difficult to catch for newcomers than 
the far more often discussed 5/3 == 1 behaviour.

Have a nice weekend and don't forget to hunt for remaining bugs in 
Fred upcoming 1.5.2p2 docs ;-), Peter.
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From ping@lfw.org  Fri Mar 17 15:53:38 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 09:53:38 -0600 (CST)
Subject: [Python-Dev] list.shift()
Message-ID: <Pine.LNX.4.10.10003170950440.16448-100000@server1.lfw.org>

Has list.shift() been proposed?

    # pretend lists are implemented in Python and 'self' is a list
    def shift(self):
        item = self[0]
        del self[:1]
        return item

This would make queues read nicely... use "append" and "pop" for
a stack, "append" and "shift" for a queue.

(This is while on the thought-train of "making built-in types do
more, rather than introducing more special types", as you'll see
in my next message.)


-- ?!ng


From gvanrossum@beopen.com  Fri Mar 17 22:00:18 2000
From: gvanrossum@beopen.com (Guido van Rossum)
Date: Fri, 17 Mar 2000 17:00:18 -0500
Subject: [Python-Dev] list.shift()
References: <Pine.LNX.4.10.10003170950440.16448-100000@server1.lfw.org>
Message-ID: <38D2AAF2.CFBF3A2@beopen.com>

Ka-Ping Yee wrote:
> 
> Has list.shift() been proposed?
> 
>     # pretend lists are implemented in Python and 'self' is a list
>     def shift(self):
>         item = self[0]
>         del self[:1]
>         return item
> 
> This would make queues read nicely... use "append" and "pop" for
> a stack, "append" and "shift" for a queue.
> 
> (This is while on the thought-train of "making built-in types do
> more, rather than introducing more special types", as you'll see
> in my next message.)

You can do this using list.pop(0).  I don't think the name "shift" is very
intuitive (smells of sh and Perl :-).  Do we need a new function?

--Guido


From ping@lfw.org  Fri Mar 17 16:08:37 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:08:37 -0600 (CST)
Subject: [Python-Dev] Using lists as sets
Message-ID: <Pine.LNX.4.10.10003170953410.16448-100000@server1.lfw.org>

A different way to provide sets in Python, which occurred to
me on Wednesday at Guido's talk in Mountain View (hi Guido!),
is to just make lists work better.

Someone asked Guido a question about the ugliness of using
dicts in a certain way, and it was clear that what he wanted
was a real set.  Guido's objection to introducing more core
data types is that it makes it more difficult to choose which
data type to use, and opens the possibility of using entirely
the wrong one -- a very well-taken point, i thought.

(That recently-mentioned study of scripting vs. system language
performance seems relevant here: a few of the C programs
submitted were much *slower* than the ones in Python or Perl
just because people had to choose and implement their own data
structures, and so they were able to completely shoot themselves
in both feet and lose a leg or two in the process.)

So...

Hypothesis: The only real reason people might want a separate
set type, or have to use dicts as sets, is that linear search
on a list is too slow.

Therefore: All we have to do is speed up "in" on lists, and now
we have a set type that is nice to read and write, and already
has nice spellings for set semantics like "in".

Implementation possibilities:

    + Whip up a hash table behind the scenes if "in" gets used
      a lot on a particular list and all its members are hashable.
      This makes "in" no longer O(n), which is most of the battle.
      remove() can also be cheap -- though you have to do a little
      more bookkeeping to take care of multiple copies of elements.

    + Or, add a couple of methods, e.g. take() appends an item to
      a list if it's not there already, drop() removes all copies
      of an item from a list.  These tip us off: the first time one
      of these methods gets used, we make the hash table then.

I think the semantics would be pretty understandable and simple to
explain, which is the main thing.

Any thoughts?


-- ?!ng


From ping@lfw.org  Fri Mar 17 16:12:22 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:12:22 -0600 (CST)
Subject: [Python-Dev] list.shift()
In-Reply-To: <38D2AAF2.CFBF3A2@beopen.com>
Message-ID: <Pine.LNX.4.10.10003171009150.16549-100000@server1.lfw.org>

On Fri, 17 Mar 2000, Guido van Rossum wrote:
> You can do this using list.pop(0).  I don't think the name "shift" is very
> intuitive (smells of sh and Perl :-).  Do we need a new function?

Oh -- sorry, that's my ignorance showing.  I didn't know pop()
took an argument (of course it would -- duh...).  No need to
add anything more, then, i think.  Sorry!

Fred et al. on doc-sig: it would be really good for the tutorial 
to show a queue example and a stack example in the section where
list methods are introduced.


-- ?!ng


From ping@lfw.org  Fri Mar 17 16:13:44 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:13:44 -0600 (CST)
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: <200003172120.QAA09115@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003171012440.16549-100000@server1.lfw.org>

Guido: (re None being a keyword)
> Yes.

Guido: (re booleans)
> Yes.  True and False make sense.


Astounding.  I don't think i've ever seen such quick agreement on
anything!  And twice in one day!  I'm think i'm going to go lie down.

:)  :)


-- ?!ng


From DavidA@ActiveState.com  Fri Mar 17 22:23:53 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Fri, 17 Mar 2000 14:23:53 -0800
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003170953410.16448-100000@server1.lfw.org>
Message-ID: <NDBBJPNCJLKKIOBLDOMJAEEJCCAA.DavidA@ActiveState.com>

> I think the semantics would be pretty understandable and simple to
> explain, which is the main thing.
>
> Any thoughts?

Would

	(a,b) in Set

return true of (a,b) was a subset of Set, or if (a,b) was an element of Set?

--david


From mal@lemburg.com  Fri Mar 17 22:41:46 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 17 Mar 2000 23:41:46 +0100
Subject: [Python-Dev] Boolean type for Py3K?
References: <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org> <200003172120.QAA09115@eric.cnri.reston.va.us>
Message-ID: <38D2B4AA.2EE933BD@lemburg.com>

Guido van Rossum wrote:
> 
> Yes.  True and False make sense.

mx.Tools defines these as new builtins... and they correspond
to the C level singletons Py_True and Py_False.

# Truth constants
True = (1==1)
False = (1==0)

I'm not sure whether breaking the idiom of True == 1 and
False == 0 (or in other words: truth values are integers)
would be such a good idea. Nothing against adding name
bindings in __builtins__ though...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From ping@lfw.org  Fri Mar 17 16:53:12 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:53:12 -0600 (CST)
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: <14546.39508.312796.221069@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003170940500.16448-100000@server1.lfw.org>

On Fri, 17 Mar 2000, Barry A. Warsaw wrote:
> Almost a year ago, I mused about a boolean type in c.l.py, and came up
> with this prototype in Python.
> 
> -------------------- snip snip --------------------
> class Boolean:
[...]
> 
> I think it makes sense to augment Python's current truth rules with a
> built-in boolean type and True and False values.  But unless it's tied
> in more deeply (e.g. comparisons return one of these instead of
> integers -- and what are the implications of that?) then it's pretty
> much just syntactic sugar <0.75 lick>.

Yeah, and the whole point *is* the change in semantics, not the
syntactic sugar.  I'm hoping we can gain some safety from the
type checking... though i can't seem to think of a good example
off the top of my head.

It's easier to think of examples if things like 'if', 'and', 'or',
etc. only accept booleans as conditional arguments -- but i can't
imagine going that far, as that would just be really annoying.

Let's see.  Specifically, the following would probably return
booleans:

    magnitude comparisons:      <, >, <=, >=  (and __cmp__)
    value equality comparisons: ==, !=
    identity comparisons:       is, is not
    containment tests:          in, not in (and __contains__)

... and booleans would be different from integers in that
arithmetic would be illegal... but that's about it. (?)
Booleans are still storable immutable values; they could be
keys to dicts but not lists; i don't know what else.

Maybe this wouldn't actually buy us anything except for the
nicer spelling of "True" and "False", which might not be worth
it.  ... Hmm.  Can anyone think of common cases where this
could help?


-- n!?g


From ping@lfw.org  Fri Mar 17 16:59:17 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:59:17 -0600 (CST)
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJAEEJCCAA.DavidA@ActiveState.com>
Message-ID: <Pine.LNX.4.10.10003171053290.16648-100000@server1.lfw.org>

On Fri, 17 Mar 2000, David Ascher wrote:
> > I think the semantics would be pretty understandable and simple to
> > explain, which is the main thing.
> >
> > Any thoughts?
> 
> Would
> 
> 	(a,b) in Set
> 
> return true of (a,b) was a subset of Set, or if (a,b) was an element of Set?

This would return true if (a, b) was an element of the set --
exactly the same semantics as we currently have for lists.

Ideally it would also be kind of nice to use < > <= >= as
subset/superset operators, but that requires revising the
way we do comparisons, and you know, it might not really be
used all that often anyway.

-, |, and & could operate on lists sensibly when we use
them as sets -- just define a few simple rules for ordering
and you should be fine.  e.g.

    c = a - b is equivalent to

        c = a
        for item in b: c.drop(item)

    c = a | b is equivalent to

        c = a
        for item in b: c.take(item)

    c = a & b is equivalent to

        c = []
        for item in a:
            if item in b: c.take(item)

where

    c.take(item) is equivalent to

        if item not in c: c.append(item)

    c.drop(item) is equivalent to

        while item in c: c.remove(item)


The above is all just semantics, of course, to make the point
that the semantics can be simple.  The implementation could do
different things that are much faster when there's a hash table
helping out.


-- ?!ng


From gvwilson@nevex.com  Fri Mar 17 23:28:05 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Fri, 17 Mar 2000 18:28:05 -0500 (EST)
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: <Pine.LNX.4.10.10003171012440.16549-100000@server1.lfw.org>
Message-ID: <Pine.LNX.4.10.10003171825420.20710-100000@akbar.nevex.com>

> Guido: (re None being a keyword)
> > Yes.

> Guido: (re booleans)
> > Yes.  True and False make sense.

> Ka-Ping:
> Astounding.  I don't think i've ever seen such quick agreement on
> anything!  And twice in one day!  I'm think i'm going to go lie down.

No, no, keep going --- you're on a roll.

Greg


From ping@lfw.org  Fri Mar 17 17:49:18 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 11:49:18 -0600 (CST)
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003171053290.16648-100000@server1.lfw.org>
Message-ID: <Pine.LNX.4.10.10003171147520.16707-100000@server1.lfw.org>

On Fri, 17 Mar 2000, Ka-Ping Yee wrote:
> 
>     c.take(item) is equivalent to
> 
>         if item not in c: c.append(item)
> 
>     c.drop(item) is equivalent to
> 
>         while item in c: c.remove(item)

I think i've decided that i like the verb "include" much better than
the rather vague word "take".  Perhaps this also suggests "exclude"
instead of "drop".


-- ?!ng


From klm@digicool.com  Sat Mar 18 00:32:56 2000
From: klm@digicool.com (Ken Manheimer)
Date: Fri, 17 Mar 2000 19:32:56 -0500 (EST)
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003171053290.16648-100000@server1.lfw.org>
Message-ID: <Pine.LNX.4.21.0003171909080.3101-100000@korak.digicool.com>

On Fri, 17 Mar 2000, Ka-Ping Yee wrote:

> On Fri, 17 Mar 2000, David Ascher wrote:
> > > I think the semantics would be pretty understandable and simple to
> > > explain, which is the main thing.
> > >
> > > Any thoughts?
> > 
> > Would
> > 
> > 	(a,b) in Set
> > 
> > return true of (a,b) was a subset of Set, or if (a,b) was an element of Set?
> 
> This would return true if (a, b) was an element of the set --
> exactly the same semantics as we currently have for lists.

I really like the idea of using dynamically-tuned lists provide set
functionality!  I often wind up needing something like set functionality,
and implementing little convenience routines (unique, difference, etc)
repeatedly.  I don't mind that so much, but the frequency signifies that
i, at least, would benefit from built-in support for sets...

I guess the question is whether it's practical to come up with a
reasonably adequate, reasonably general dynamic optimization strategy.  
Seems like an interesting challenge - is there prior art?

As ping says, maintaining the existing list semantics handily answers
challenges like david's question.  New methods, like [].subset('a', 'b'),
could provide the desired additional functionality - and contribute to
biasing the object towards set optimization, etc.  Neato!

Ken
klm@digicool.com


From ping@lfw.org  Fri Mar 17 19:02:13 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 13:02:13 -0600 (CST)
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <Pine.LNX.4.21.0003171909080.3101-100000@korak.digicool.com>
Message-ID: <Pine.LNX.4.10.10003171247020.16707-100000@server1.lfw.org>

On Fri, 17 Mar 2000, Ken Manheimer wrote:
> 
> I really like the idea of using dynamically-tuned lists provide set
> functionality!  I often wind up needing something like set functionality,
> and implementing little convenience routines (unique, difference, etc)
> repeatedly.  I don't mind that so much, but the frequency signifies that
> i, at least, would benefit from built-in support for sets...

Greg asked about how to ensure that a given item only appears
once in each list when used as a set, and whether i would
flag the list as "i'm now operating as a set".  My answer is no --
i don't want there to be any visible state on the list.  (It can
internally decide to optimize its behaviour for a particular purpose,
but in no event should this decision ever affect the semantics of
its manifested behaviour.)  Externally visible state puts us back
right where we started -- now the user has to decide what type of
thing she wants to use, and that's more decisions and loaded guns
pointing at feet that we were trying to avoid in the first place.

There's something very nice about there being just two mutable
container types in Python.  As Guido said, the first two types
you learn are lists and dicts, and it's pretty obvious which
one to pick for your purposes, and you can't really go wrong.

I'd like to copy my reply to Greg here because it exposes some of the
philosophy i'm attempting with this proposal:

    You'd trust the client to use take() (or should i say include())
    instead of append().  But, in the end, this wouldn't make any
    difference to the result of "in".  In fact, you could do multisets
    since lists already have count().

    What i'm trying to do is to put together a few very simple pieces
    to get all the behaviour necessary to work with sets, if you want
    it.  I don't want the object itself to have any state that manifests
    itself as "now i'm a set", or "now i'm a list".  You just pick the
    methods you want to use.

    It's just like stacks and queues.  There's no state on the list that
    says "now i'm a stack, so read from the end" or "now i'm a queue,
    so read from the front".  You decide where you want to read items
    by picking the appropriate method, and this lets you get the best
    of both worlds -- flexibility and simplicity.

Back to Ken:
> I guess the question is whether it's practical to come up with a
> reasonably adequate, reasonably general dynamic optimization strategy.  
> Seems like an interesting challenge - is there prior art?

I'd be quite happy with just turning on set optimization when
include() and exclude() get used (nice and predictable).  Maybe you
could provide a set() built-in that would construct you a list with
set optimization turned on, but i'm not too sure if we really want
to expose it that way.


-- ?!ng


From Moshe Zadka <mzadka@geocities.com>  Sat Mar 18 05:27:13 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 18 Mar 2000 07:27:13 +0200 (IST)
Subject: [Python-Dev] list.shift()
In-Reply-To: <Pine.LNX.4.10.10003170950440.16448-100000@server1.lfw.org>
Message-ID: <Pine.GSO.4.10.10003180721560.18689-100000@sundial>

On Fri, 17 Mar 2000, Ka-Ping Yee wrote:

> 
> Has list.shift() been proposed?
> 
>     # pretend lists are implemented in Python and 'self' is a list
>     def shift(self):
>         item = self[0]
>         del self[:1]
>         return item
> 
> This would make queues read nicely... use "append" and "pop" for
> a stack, "append" and "shift" for a queue.

Actually, I once thought about writing a Deque in Python for a couple
of hours (I later wrote it, and then threw it away because I had nothing
to do with it, but that isn't my point). So I did write "shift" (though
I'm certain I didn't call it that). It's not as easy to write a
maintainable yet efficient "shift": I got stuck with a pointer to the 
beginning of the "real list" which I incremented on a "shift", and a
complex heuristic for when lists de- and re-allocate.

I think the tradeoffs are shaky enough that it is better to write it in 
pure Python rather then having more functions in C (whether in an old
builtin type rather then a new one). Anyone needing to treat a list as a 
Deque would just construct one

l = Deque(l)

built-in-functions:-just-say-no-ly y'rs, Z.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From artcom0!pf@artcom-gmbh.de  Fri Mar 17 22:43:35 2000
From: artcom0!pf@artcom-gmbh.de (artcom0!pf@artcom-gmbh.de)
Date: Fri, 17 Mar 2000 23:43:35 +0100 (MET)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <38D2AAF2.CFBF3A2@beopen.com> from Guido van Rossum at "Mar 17, 2000  5: 0:18 pm"
Message-ID: <m12WICF-000CnCC@artcom0.artcom-gmbh.de>

Ka-Ping Yee wrote:
[...]
> >     # pretend lists are implemented in Python and 'self' is a list
> >     def shift(self):
> >         item = self[0]
> >         del self[:1]
> >         return item
[...]

Guido van Rossum:
> You can do this using list.pop(0).  I don't think the name "shift" is very
> intuitive (smells of sh and Perl :-).  Do we need a new function?

I think no.  But what about this one?:

	# pretend self and dict are dictionaries:
	def supplement(self, dict):
	    for k, v in dict.items():
	        if not self.data.has_key(k):
		    self.data[k] = v

Note the similarities to {}.update(dict), but update replaces existing
entries in self, which is sometimes not desired.  I know, that supplement
can also simulated with:
	tmp = dict.copy()
	tmp.update(self)
	self.data = d
But this is stll a little ugly.  IMO a builtin method to supplement
(complete?) a dictionary with default values from another dictionary 
would sometimes be a useful tool.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From ping@lfw.org  Sat Mar 18 18:48:10 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Sat, 18 Mar 2000 10:48:10 -0800 (PST)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <m12WICF-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.LNX.4.10.10003181047250.1758-100000@localhost>

On Fri, 17 Mar 2000 artcom0!pf@artcom-gmbh.de wrote:
> 
> I think no.  But what about this one?:
> 
> 	# pretend self and dict are dictionaries:
> 	def supplement(self, dict):
> 	    for k, v in dict.items():
> 	        if not self.data.has_key(k):
> 		    self.data[k] = v

I'd go for that.  It would be nice to have a non-overwriting update().
The only issue is the choice of verb; "supplement" sounds pretty
reasonable to me.


-- ?!ng

"If I have not seen as far as others, it is because giants were standing
on my shoulders."
    -- Hal Abelson


From python-dev@python.org  Sat Mar 18 19:23:37 2000
From: python-dev@python.org (Peter Funk)
Date: Sat, 18 Mar 2000 20:23:37 +0100 (MET)
Subject: [Python-Dev] dict.supplement()
In-Reply-To: <Pine.LNX.4.10.10003181047250.1758-100000@localhost> from Ka-Ping Yee at "Mar 18, 2000 10:48:10 am"
Message-ID: <m12WOoz-000CnCC@artcom0.artcom-gmbh.de>

Hi!
> > 	# pretend self and dict are dictionaries:
> > 	def supplement(self, dict):
> > 	    for k, v in dict.items():
> > 	        if not self.data.has_key(k):
> > 		    self.data[k] = v
 
Ka-Ping Yee schrieb:
> I'd go for that.  It would be nice to have a non-overwriting update().
> The only issue is the choice of verb; "supplement" sounds pretty
> reasonable to me.

In German we have the verb "erg�nzen" which translates 
either into "supplement" or "complete" (from my  dictionary).  
"supplement" has the disadvantage of being rather long for 
the name of a builtin method.

Nevertheless I've used this in my class derived from UserDict.UserDict.

Now let's witch topic to the recent discussion about Set type:  
you all certainly know, that something similar has been done before by 
Aaron Watters?  see:
  <http://starship.python.net/crew/aaron_watters/kjbuckets/kjbuckets.html>

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From gvwilson@nevex.com  Mon Mar 20 14:52:12 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Mon, 20 Mar 2000 09:52:12 -0500 (EST)
Subject: [Python-Dev] re: Using lists as sets
Message-ID: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>

[After discussion with Ping, and weekend thought]

I would like to vote against using lists as sets:

1. It blurs Python's categorization of containers.  The rest of the world
   thinks of sets as unordered, associative, and binary-valued (a term I
   just made up to mean "containing 0 or 1 instance of X").  Lists, on the
   other hand, are ordered, positionally-indexed, and multi-valued.
   While a list is always a legal queue or stack (although lists permit
   state transitions that are illegal for queues or stacks), most lists
   are not legal sets.

2. Python has, in dictionaries, a much more logical starting point for
   sets.  A set is exactly a dictionary whose keys matter, and whose
   values don't.  Adding operations to dictionaries to insert keys, etc.,
   without having to supply a value, naively appears no harder than adding
   operations to lists, and would probably be much easier to explain when
   teaching a class.

3. (Long-term speculation) Even if P3K isn't written in C++, many modules
   for it will be.  It would therefore seem sensible to design P3K in a
   C++-friendly way --- in particular, to align Python's container  
   hierarchy with that used in the Standard Template Library.  Using lists
   as a basis for sets would give Python a very different container type
   hierarchy than the STL, which could make it difficult for automatic
   tools like SWIG to map STL-based things to Python and vice versa.
   Using dictionaries as a basis for sets would seem to be less
   problematic.  (Note that if Wadler et al's Generic Java proposal
   becomes part of that language, an STL clone will almost certainly
   become part of that language, and require JPython interfacing.)

On a semi-related note, can someone explain why programs are not allowed
to iterate directly through the elements of a dictionary:

   for (key, value) in dict:
      ...body...

Thanks,

Greg

      "No XML entities were harmed in the production of this message."


From Moshe Zadka <mzadka@geocities.com>  Mon Mar 20 15:03:47 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Mon, 20 Mar 2000 17:03:47 +0200 (IST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>
Message-ID: <Pine.GSO.4.10.10003201656060.29136-100000@sundial>

On Mon, 20 Mar 2000 gvwilson@nevex.com wrote:

> [After discussion with Ping, and weekend thought]
> 
> I would like to vote against using lists as sets:

I'd like to object too, but for slightly different reasons: 20-something
lines of Python can implement a set (I just chacked it) with the new 
__contains__. We can just suply it in the standard library (Set module?)
and be over and done with. 
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From jcw@equi4.com  Mon Mar 20 15:37:19 2000
From: jcw@equi4.com (Jean-Claude Wippler)
Date: Mon, 20 Mar 2000 16:37:19 +0100
Subject: [Python-Dev] re: Using lists as sets
References: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>
Message-ID: <38D645AF.661CA335@equi4.com>

gvwilson@nevex.com wrote:
> 
> [After discussion with Ping, and weekend thought]

[good stuff]

Allow me to offer yet another perspective on this.  I'll keep it short.

Python has sequences (indexable collections) and maps (associative
collections).  C++'s STL has vectors, sets, multi-sets, maps, and
multi-maps.

I find the distinction between these puzzling, and hereby offer another,
somewhat relational-database minded, categorization as food for thought:

- collections consist of objects, each of them with attributes
- the first N attributes form the "key", the rest is the "residue"
- there is also an implicit position attribute, which I'll call "#"
- so an object consists of attributes: (K1,K2,...KN,#,R1,R2,...,RM)
- one more bit of specification is needed: whether # is part of the key

Let me mark the position between key attributes and residue with ":", so
everything before the colon marks the uniquely identifying attributes.

  A vector (sequence) is:  #:R1,R2,...,RM
  A set is:                K1,K2,...KN:
  A multi-set is:          K1,K2,...KN,#:
  A map is:                K1,K2,...KN:#,R1,R2,...,RM
  A multi-map is:          K1,K2,...KN,#:R1,R2,...,RM

And a somewhat esoteric member of this classification:

  A singleton is:          :R1,R2,...,RM

I have no idea what this means for Python, but merely wanted to show how
a relational, eh, "view" on all this might perhaps simplify the issues.

-jcw


From fdrake@acm.org  Mon Mar 20 16:55:59 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 20 Mar 2000 11:55:59 -0500 (EST)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <m12WICF-000CnCC@artcom0.artcom-gmbh.de>
References: <38D2AAF2.CFBF3A2@beopen.com>
 <m12WICF-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <14550.22559.550660.403909@weyr.cnri.reston.va.us>

artcom0!pf@artcom-gmbh.de writes:
 > Note the similarities to {}.update(dict), but update replaces existing
 > entries in self, which is sometimes not desired.  I know, that supplement
 > can also simulated with:

Peter,
  I like this!

 > 	tmp = dict.copy()
 > 	tmp.update(self)
 > 	self.data = d

  I presume you mean "self.data = tmp"; "self.data.update(tmp)" would
be just a little more robust, at the cost of an additional update.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From tismer@tismer.com  Mon Mar 20 17:10:34 2000
From: tismer@tismer.com (Christian Tismer)
Date: Mon, 20 Mar 2000 18:10:34 +0100
Subject: [Python-Dev] re: Using lists as sets
References: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com> <38D645AF.661CA335@equi4.com>
Message-ID: <38D65B8A.50B81D08@tismer.com>


Jean-Claude Wippler wrote:
[relational notation]

>   A vector (sequence) is:  #:R1,R2,...,RM
>   A set is:                K1,K2,...KN:
>   A multi-set is:          K1,K2,...KN,#:
>   A map is:                K1,K2,...KN:#,R1,R2,...,RM
>   A multi-map is:          K1,K2,...KN,#:R1,R2,...,RM

This is a nice classification!
To my understanding, why not
   A map is:                K1,K2,...KN:R1,R2,...,RM

Where is a # in a map?

And what do you mean by N and M?
Is K1..KN one key, mae up of N sub keys, or do you mean the
whole set of keys, where each one is mapped somehow.
I guess not, the notation looks like I should think of tuples.
No, that would imply that N and M were fixed, but they are not.
But you say
"- collections consist of objects, each of them with attributes".
Ok, N and M seem to be individual for each object, right?

But when defining a map for instance, and we're talking of the
objects, then the map is the set of these objects, and I have to
think of
  K[0]..K(N(o)):R[0]..R(M(o))
where N and M are functions of the individual object o, right?

Isn't it then better to think different of these objects, saying
they can produce some key object and some value object of any
shape, and a position, where each of these can be missing?

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From jeremy@cnri.reston.va.us  Mon Mar 20 17:28:28 2000
From: jeremy@cnri.reston.va.us (Jeremy Hylton)
Date: Mon, 20 Mar 2000 12:28:28 -0500 (EST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>
Message-ID: <14550.24508.341533.908941@goon.cnri.reston.va.us>

>>>>> "GVW" == gvwilson  <gvwilson@nevex.com> writes:

  GVW> On a semi-related note, can someone explain why programs are
  GVW> not allowed to iterate directly through the elements of a
  GVW> dictionary:

  GVW>    for (key, value) in dict:
              ...body...

Pythonic design rules #2:
     Explicit is better than implicit.

There are at least three "natural" ways to interpret "for ... in dict:"
In addition to the version that strikes you as most natural, some
people also imagine that a for loop should iterate over the keys or the
values.  Instead of guessing, Python provides explicit methods for
each possibility: items, keys, values.

Yet another possibility, implemented in early versions of JPython and
later removed, was to treat a dictionary exactly like a list: Call
__getitem__(0), then 1, ..., until a KeyError was raised.  In other
words, a dictionary could behave like a list provided that it had
integer keys.

Jeremy


From jcw@equi4.com  Mon Mar 20 17:56:44 2000
From: jcw@equi4.com (Jean-Claude Wippler)
Date: Mon, 20 Mar 2000 18:56:44 +0100
Subject: [Python-Dev] re: Using lists as sets
References: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com> <38D645AF.661CA335@equi4.com> <38D65B8A.50B81D08@tismer.com>
Message-ID: <38D6665C.ECDE09DE@equi4.com>

Christian,

>    A map is:                K1,K2,...KN:R1,R2,...,RM

Yes, my list was inconsistent.

> Is K1..KN one key, made up of N sub keys, or do you mean the
> whole set of keys, where each one is mapped somehow.
[...]
> Ok, N and M seem to be individual for each object, right?
[...] 
> Isn't it then better to think different of these objects, saying
> they can produce some key object and some value object of any
> shape, and a position, where each of these can be missing?

Depends on your perspective.  In the relational world, the (K1,...,KN)
attributes identify the object, but they are not themselves considered
an object.  In OO-land, (K1,...,KN) is an object, and a map takes such
as an object as input and delivers (R1,...,RM) as result.

This tension shows the boundary of both relational and OO models, IMO.
I wish it'd be possible to unify them, but I haven't figured it out.

-jcw, concept maverick / fool on the hill - pick one :)


From pf@artcom-gmbh.de  Mon Mar 20 18:28:17 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Mon, 20 Mar 2000 19:28:17 +0100 (MET)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <14550.22559.550660.403909@weyr.cnri.reston.va.us> from "Fred L. Drake, Jr." at "Mar 20, 2000 11:55:59 am"
Message-ID: <m12X6uX-000CnCC@artcom0.artcom-gmbh.de>

I wrote:
>  > Note the similarities to {}.update(dict), but update replaces existing
>  > entries in self, which is sometimes not desired.  I know, that supplement
>  > can also simulated with:
> 
Fred L. Drake, Jr.:
> Peter,
>   I like this!
> 
>  > 	tmp = dict.copy()
>  > 	tmp.update(self)
>  > 	self.data = d
> 
>   I presume you mean "self.data = tmp"; "self.data.update(tmp)" would
> be just a little more robust, at the cost of an additional update.

Ouppss... I should have tested this before posting.  But currently I use 
the more explicit (and probably slower version) in my code:

class ConfigDict(UserDict.UserDict):
    def supplement(self, defaults):
    	for k, v in defaults.items():
	    if not self.data.has_key(k):
		self.data[k] = v

Works fine so far, although it requires usually an additional copy operation.
Consider another example, where arbitrary instance attributes should be
specified as keyword arguments to the constructor:

  >>> class Example:
  ...     _defaults = {'a': 1, 'b': 2}
  ...     _config = _defaults
  ...     def __init__(self, **kw):
  ...         if kw:
  ...             self._config = self._defaults.copy()
  ...             self._config.update(kw)
  ... 
  >>> A = Example(a=12345)
  >>> A._config
  {'b': 2, 'a': 12345}
  >>> B = Example(c=3)
  >>> B._config
  {'b': 2, 'c': 3, 'a': 1}

If 'supplement' were a dictionary builtin method, this would become simply:
	kw.supplement(self._defaults)
	self._config = kw

Unfortunately this can't be achieved using a wrapper class like UserDict,
since the **kw argument is always a builtin dictionary object.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, 27777 Ganderkesee, Tel: 04222 9502 70, Fax: -60


From ping@lfw.org  Mon Mar 20 12:36:34 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Mon, 20 Mar 2000 06:36:34 -0600 (CST)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <m12X6uX-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.LNX.4.10.10003200634430.22891-100000@server1.lfw.org>

On Mon, 20 Mar 2000, Peter Funk wrote:
> Consider another example, where arbitrary instance attributes should be
> specified as keyword arguments to the constructor:
> 
>   >>> class Example:
>   ...     _defaults = {'a': 1, 'b': 2}
>   ...     _config = _defaults
>   ...     def __init__(self, **kw):
>   ...         if kw:
>   ...             self._config = self._defaults.copy()
>   ...             self._config.update(kw)

Yes!  I do this all the time.  I wrote a user-interface module
to take care of exactly this kind of hassle when creating lots
of UI components.  When you're making UI, you can easily drown in
keyword arguments and default values if you're not careful.


-- ?!ng


From fdrake@acm.org  Mon Mar 20 19:02:48 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 20 Mar 2000 14:02:48 -0500 (EST)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <m12X6uX-000CnCC@artcom0.artcom-gmbh.de>
References: <14550.22559.550660.403909@weyr.cnri.reston.va.us>
 <m12X6uX-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <14550.30168.129259.356581@weyr.cnri.reston.va.us>

Peter Funk writes:
 > Ouppss... I should have tested this before posting.  But currently I use 
 > the more explicit (and probably slower version) in my code:

  The performance is based entirely on the size of each; in the
(probably typical) case of smallish dictionaries (<50 entries), it's
probably cheaper to use a temporary dict and do the update.
  For large dicts (on the defaults side), it may make more sense to
reduce the number of objects that need to be created:

       target = ...
       has_key = target.has_key
       for key in defaults.keys():
           if not has_key(key):
               target[key] = defaults[key]

  This saves the construction of len(defaults) 2-tuples.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From Moshe Zadka <mzadka@geocities.com>  Mon Mar 20 19:23:01 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Mon, 20 Mar 2000 21:23:01 +0200 (IST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <14550.24508.341533.908941@goon.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003202118470.4407-100000@sundial>

On Mon, 20 Mar 2000, Jeremy Hylton wrote:

> Yet another possibility, implemented in early versions of JPython and
> later removed, was to treat a dictionary exactly like a list: Call
> __getitem__(0), then 1, ..., until a KeyError was raised.  In other
> words, a dictionary could behave like a list provided that it had
> integer keys.

Two remarks: Jeremy meant "consecutive natural keys starting with 0",
(yes, I've managed to learn mind-reading from the timbot) and that (the
following is considered a misfeature):

import UserDict
a = UserDict.UserDict()
a[0]="hello"
a[1]="world"

for word in a:
	print word

Will print "hello", "world", and then die with KeyError.
I realize why this is happening, and realize it could only be fixed in
Py3K. However, a temporary (though not 100% backwards compatible) fix is
that "for" will catch LookupError, rather then IndexError.

Any comments?
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mhammond@skippinet.com.au  Mon Mar 20 19:39:31 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Mon, 20 Mar 2000 11:39:31 -0800
Subject: [Python-Dev] Unicode and Windows
Message-ID: <ECEPKNMJLHAPFFJHDOJBAENNCGAA.mhammond@skippinet.com.au>

I would like to discuss Unicode on the Windows platform, and how it relates
to MBCS that Windows uses.

My main goal here is to ensure that Unicode on Windows can make a round-trip
to and from native Unicode stores.  As an example, let's take the registry -
a Windows user should be able to read a Unicode value from the registry then
write it back.  The value written back should be _identical_ to the value
read.  Ditto for the file system: If the filesystem is Unicode, then I would
expect the following code:
  for fname in os.listdir():
    f = open(fname + ".tmp", "w")

To create filenames on the filesystem with the exact base name even when the
basename contains non-ascii characters.


However, the Unicode patches do not appear to make this possible.  open()
uses PyArg_ParseTuple(args, "s...");  PyArg_ParseTuple() will automatically
convert a Unicode object to UTF-8, so we end up passing a UTF-8 encoded
string to the C runtime fopen function.

The end result of all this is that we end up with UTF-8 encoded names in the
registry/on the file system.  It does not seem possible to get a true
Unicode string onto either the file system or in the registry.

Unfortunately, Im not experienced enough to know the full ramifications, but
it _appears_ that on Windows the default "unicode to string" translation
should be done via the WideCharToMultiByte() API.  This will then pass an
MBCS encoded ascii string to Windows, and the "right thing" should magically
happen.  Unfortunately, MBCS encoding is dependant on the current locale
(ie, one MBCS sequence will mean completely different things depending on
the locale).  I dont see a portability issue here, as the documentation
could state that "Unicode->ASCII conversions use the most appropriate
conversion for the platform.  If the platform is not Unicode aware, then
UTF-8 will be used."

This issue is the final one before I release the win32reg module.  It seems
_critical_ to me that if Python supports Unicode and the platform supports
Unicode, then Python unicode values must be capable of being passed to the
platform.  For the win32reg module I could quite possibly hack around the
problem, but the more general problem (categorized by the open() example
above) still remains...

Any thoughts?

Mark.


From jeremy@cnri.reston.va.us  Mon Mar 20 19:51:28 2000
From: jeremy@cnri.reston.va.us (Jeremy Hylton)
Date: Mon, 20 Mar 2000 14:51:28 -0500 (EST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <Pine.GSO.4.10.10003202118470.4407-100000@sundial>
References: <14550.24508.341533.908941@goon.cnri.reston.va.us>
 <Pine.GSO.4.10.10003202118470.4407-100000@sundial>
Message-ID: <14550.33088.110785.78631@goon.cnri.reston.va.us>

>>>>> "MZ" == Moshe Zadka <moshez@math.huji.ac.il> writes:

  MZ> On Mon, 20 Mar 2000, Jeremy Hylton wrote:
  >> Yet another possibility, implemented in early versions of JPython
  >> and later removed, was to treat a dictionary exactly like a list:
  >> Call __getitem__(0), then 1, ..., until a KeyError was raised.
  >> In other words, a dictionary could behave like a list provided
  >> that it had integer keys.

  MZ> Two remarks: Jeremy meant "consecutive natural keys starting
  MZ> with 0", (yes, I've managed to learn mind-reading from the
  MZ> timbot) 

I suppose I meant that (perhaps you can read my mind as well as I
can);  I also meant using values of Python's integer datatype :-).


and that (the following is considered a misfeature):

  MZ> import UserDict 
  MZ> a = UserDict.UserDict() 
  MZ> a[0]="hello"
  MZ> a[1]="world"

  MZ> for word in a: print word

  MZ> Will print "hello", "world", and then die with KeyError.  I
  MZ> realize why this is happening, and realize it could only be
  MZ> fixed in Py3K. However, a temporary (though not 100% backwards
  MZ> compatible) fix is that "for" will catch LookupError, rather
  MZ> then IndexError.

I'm not sure what you mean by "fix."  (Please read your mind for me
<wink>.)  I think by fix you mean, "allow the broken code above to
execute without raising an exception."  Yuck!

As far as I can tell, the problem is caused by the special
way that a for loop uses the __getitem__ protocol.  There are two
related issues that lead to confusion.

In cases other than for loops, __getitem__ is invoked when the
syntactic construct x[i] is used.  This means either lookup in a list
or in a dict depending on the type of x.  If it is a list, the index
must be an integer and IndexError can be raised.  If it is a dict, the
index can be anything (even an unhashable type; TypeError is only
raised by insertion for this case) and KeyError can be raised.

In a for loop, the same protocol (__getitem__) is used, but with the
special convention that the object should be a sequence.  Python will
detect when you try to use a builtin type that is not a sequence,
e.g. a dictionary.  If the for loop iterates over an instance type
rather than a builtin type, there is no way to check whether the
__getitem__ protocol is being implemented by a sequence or a mapping.

The right solution, I think, is to allow a means for stating
explicitly whether a class with an __getitem__ method is a sequence or
a mapping (or both?).  Then UserDict can declare itself to be a
mapping and using it in a for loop will raise the TypeError, "loop
over non-sequence" (which has a standard meaning defined in Skip's
catalog <0.8 wink>).

I believe this is where types-vs.-classes meets
subtyping-vs.-inheritance.  I suspect that the right solution, circa
Py3K, is that classes must explicitly state what types they are
subtypes of or what interfaces they implement.

Jeremy


From Moshe Zadka <mzadka@geocities.com>  Mon Mar 20 20:13:20 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Mon, 20 Mar 2000 22:13:20 +0200 (IST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <14550.33088.110785.78631@goon.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003202205420.4980-100000@sundial>

On Mon, 20 Mar 2000, Jeremy Hylton wrote:

> I'm not sure what you mean by "fix."

I mean any sane behaviour -- either failing on TypeError at the beginning, 
like "for" does, or executing without raising an exception. Raising an
exception in the middle which is imminent is definitely (for the right
values of definitely) a suprising behaviour (I know it suprised me!).

>I think by fix you mean, "allow the broken code above to
> execute without raising an exception."  Yuck!

I agree it is yucky -- it is all a weird echo of the yuckiness of the
type/class dichotomy. What I suggested it a temporary patch...
 
> As far as I can tell, the problem is caused by the special
> way that a for loop uses the __getitem__ protocol.

Well, my look is that it is caused by the fact __getitem__ is used both
for the sequence protocol and the mapping protocol (well, I'm cheating
through my teeth here, but you understand what I mean <wink>)

Agreed though, that the whole iteration protocol should be revisited --
but that is a subject for another post.

> The right solution, I think, is to allow a means for stating
> explicitly whether a class with an __getitem__ method is a sequence or
> a mapping (or both?).

And this is the fix I wanted for Py3K (details to be debated, still).
See? You read my mind perfectly.

> I suspect that the right solution, circa
> Py3K, is that classes must explicitly state what types they are
> subtypes of or what interfaces they implement.

Exactly. And have subclassable built-in classes in the same fell swoop.

getting-all-excited-for-py3k-ly y'rs, Z.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping@lfw.org  Mon Mar 20 14:34:12 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Mon, 20 Mar 2000 08:34:12 -0600 (CST)
Subject: [Python-Dev] Set options
Message-ID: <Pine.LNX.4.10.10003200805080.23306-100000@server1.lfw.org>

I think that at this point the possibilities for doing sets
come down to four options:


1. use lists

    visible changes:   new methods l.include, l.exclude

    invisible changes: faster 'in'

    usage:             s = [1, 2], s.include(3), s.exclude(3),
                       if item in s, for item in s

2. use dicts

    visible changes:   for/if x in dict means keys
                       accept dicts without values (e.g. {1, 2})
                       new special non-printing value ": Present"
                       new method d.insert(x) means d[x] = Present

    invisible changes: none

    usage:             s = {1, 2}, s.insert(3), del s[3],
                       if item in s, for item in s

3. new type

    visible changes:   set() built-in
                       new <type 'set'> with methods .insert, .remove

    invisible changes: none

    usage:             s = set(1, 2), s.insert(3), s.remove(3)
                       if item in s, for item in s

4. do nothing

    visible changes:   none

    invisible changes: none

    usage:             s = {1: 1, 2: 1}, s[3] = 1, del s[3],
                       if s.has_key(item), for item in s.keys()


Let me say a couple of things about #1 and #2.  I'm happy with both.
I quite like the idea of using dicts this way (#2), in fact -- i
think it was the first idea i remember chatting about.

If i remember correctly, Guido's objection to #2 was that "in" on
a dictionary would work on the keys, which isn't consistent with
the fact that "in" on a list works on the values.

However, this doesn't really bother me at all.  It's a very simple
rule, especially when you think of how people understand dictionaries.
If you hand someone a *real* dictionary, and ask them

    Is the word "python" in the dictionary?
    
they'll go look up "python" in the *keys* of the dictionary (the
words), not the values (the definitions).

So i'm quite all right with saying

    for x in dict:

and having that loop over the keys, or saying

    if x in dict:

and having that check whether x is a valid key.  It makes perfect
sense to me.  My main issue with #2 was that sets would print like

    {"Alice": 1, "Bob": 1, "Ted": 1}

and this would look weird.  However, as Greg explained to me, it
would be possible to introduce a default value to go with set
members that just says "i'm here", such as 'Present' (read as:
"Alice" is present in the set) or 'Member' or even 'None', and
this value wouldn't print out -- thus

    s = {"Bob"}
    s.include("Alice")
    print s

would produce

    {"Alice", "Bob"}

representing a dictionary that actually contained

    {"Alice": Present, "Bob": Present}

You'd construct set constants like this too:

    {2, 4, 7}

Using dicts this way (rather than having a separate set type
that just happened to be spelled with {}) avoids the parsing
issue: no need for look-ahead; you just toss in "Present" when
the text doesn't supply a colon, and move on.

I'd be okay with this, though i'm not sure everyone would; and
together with Guido's initial objection, that's what motivated me
to propose the lists-as-sets thing: fewer changes all around, no
ambiguities introduced -- just two new methods, and we're done.

Hmm.

I know someone who's just learning Python.  I will attempt to
ask some questions about what she would find natural, and see
if that reveals anything interesting.


-- ?!ng


From bwarsaw@cnri.reston.va.us  Mon Mar 20 22:01:00 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Mon, 20 Mar 2000 17:01:00 -0500 (EST)
Subject: [Python-Dev] re: Using lists as sets
References: <14550.24508.341533.908941@goon.cnri.reston.va.us>
 <Pine.GSO.4.10.10003202118470.4407-100000@sundial>
 <14550.33088.110785.78631@goon.cnri.reston.va.us>
Message-ID: <14550.40860.72418.648591@anthem.cnri.reston.va.us>

>>>>> "JH" == Jeremy Hylton <jeremy@cnri.reston.va.us> writes:

    JH> As far as I can tell, the problem is caused by the special way
    JH> that a for loop uses the __getitem__ protocol.  There are two
    JH> related issues that lead to confusion.

>>>>> "MZ" == Moshe Zadka <moshez@math.huji.ac.il> writes:

    MZ> Well, my look is that it is caused by the fact __getitem__ is
    MZ> used both for the sequence protocol and the mapping protocol

Right.

    MZ> Agreed though, that the whole iteration protocol should be
    MZ> revisited -- but that is a subject for another post.

Yup.

    JH> The right solution, I think, is to allow a means for stating
    JH> explicitly whether a class with an __getitem__ method is a
    JH> sequence or a mapping (or both?).

Or should the two protocol use different method names (code breakage!).

    JH> I believe this is where types-vs.-classes meets
    JH> subtyping-vs.-inheritance.

meets protocols-vs.-interfaces.


From Moshe Zadka <mzadka@geocities.com>  Tue Mar 21 05:16:00 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Tue, 21 Mar 2000 07:16:00 +0200 (IST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <14550.40860.72418.648591@anthem.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003210712270.8637-100000@sundial>

On Mon, 20 Mar 2000, Barry A. Warsaw wrote:

>     MZ> Agreed though, that the whole iteration protocol should be
>     MZ> revisited -- but that is a subject for another post.
> 
> Yup.

(Go Stackless, go!?)

>     JH> I believe this is where types-vs.-classes meets
>     JH> subtyping-vs.-inheritance.
> 
> meets protocols-vs.-interfaces.

It took me 5 minutes of intensive thinking just to understand what Barry
meant. Just wait until we introduce Sather-like "supertypes" (which are
pretty Pythonic, IMHO)

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From Moshe Zadka <mzadka@geocities.com>  Tue Mar 21 05:21:24 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Tue, 21 Mar 2000 07:21:24 +0200 (IST)
Subject: [Python-Dev] Set options
In-Reply-To: <Pine.LNX.4.10.10003200805080.23306-100000@server1.lfw.org>
Message-ID: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>

On Mon, 20 Mar 2000, Ka-Ping Yee wrote:

> I think that at this point the possibilities for doing sets
> come down to four options:
> 
> 
> 1. use lists
> 2. use dicts
> 3. new type
> 4. do nothing

5. new Python module with a class "Set"
(The issues are similar to #3, but this has the advantage of not changing
the interpreter)
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mal@lemburg.com  Tue Mar 21 00:25:09 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 21 Mar 2000 01:25:09 +0100
Subject: [Python-Dev] Unicode and Windows
References: <ECEPKNMJLHAPFFJHDOJBAENNCGAA.mhammond@skippinet.com.au>
Message-ID: <38D6C165.EEF58232@lemburg.com>

Mark Hammond wrote:
> 
> I would like to discuss Unicode on the Windows platform, and how it relates
> to MBCS that Windows uses.
> 
> My main goal here is to ensure that Unicode on Windows can make a round-trip
> to and from native Unicode stores.  As an example, let's take the registry -
> a Windows user should be able to read a Unicode value from the registry then
> write it back.  The value written back should be _identical_ to the value
> read.  Ditto for the file system: If the filesystem is Unicode, then I would
> expect the following code:
>   for fname in os.listdir():
>     f = open(fname + ".tmp", "w")
> 
> To create filenames on the filesystem with the exact base name even when the
> basename contains non-ascii characters.
> 
> However, the Unicode patches do not appear to make this possible.  open()
> uses PyArg_ParseTuple(args, "s...");  PyArg_ParseTuple() will automatically
> convert a Unicode object to UTF-8, so we end up passing a UTF-8 encoded
> string to the C runtime fopen function.

Right. The idea with open() was to write a special version (using
#ifdefs) for use on Windows platforms which does all the needed
magic to convert Unicode to whatever the native format and locale
is...

Using parser markers for this is obviously *not* the right way
to get to the core of the problem. Basically, you will have to
write a helper which takes a string, Unicode or some other
"t" compatible object as name object and then converts it to
the system's view of things.

I think we had a private discussion about this a few months ago:
there was some way to convert Unicode to a platform independent
format which then got converted to MBCS -- don't remember the details
though.

> The end result of all this is that we end up with UTF-8 encoded names in the
> registry/on the file system.  It does not seem possible to get a true
> Unicode string onto either the file system or in the registry.
> 
> Unfortunately, Im not experienced enough to know the full ramifications, but
> it _appears_ that on Windows the default "unicode to string" translation
> should be done via the WideCharToMultiByte() API.  This will then pass an
> MBCS encoded ascii string to Windows, and the "right thing" should magically
> happen.  Unfortunately, MBCS encoding is dependant on the current locale
> (ie, one MBCS sequence will mean completely different things depending on
> the locale).  I dont see a portability issue here, as the documentation
> could state that "Unicode->ASCII conversions use the most appropriate
> conversion for the platform.  If the platform is not Unicode aware, then
> UTF-8 will be used."

No, no, no... :-) The default should be (and is) UTF-8 on all platforms
-- whether the platform supports Unicode or not. If a platform
uses a different encoding, an encoder should be used which applies
the needed transformation.
 
> This issue is the final one before I release the win32reg module.  It seems
> _critical_ to me that if Python supports Unicode and the platform supports
> Unicode, then Python unicode values must be capable of being passed to the
> platform.  For the win32reg module I could quite possibly hack around the
> problem, but the more general problem (categorized by the open() example
> above) still remains...
> 
> Any thoughts?

Can't you use the wchar_t interfaces for the task (see
the unicodeobject.h file for details) ? Perhaps you can
first transfer Unicode to wchar_t and then on to MBCS
using a win32 API ?!

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Tue Mar 21 09:27:56 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 21 Mar 2000 10:27:56 +0100
Subject: [Python-Dev] Set options
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
Message-ID: <38D7409C.169B0C42@lemburg.com>

Moshe Zadka wrote:
> 
> On Mon, 20 Mar 2000, Ka-Ping Yee wrote:
> 
> > I think that at this point the possibilities for doing sets
> > come down to four options:
> >
> >
> > 1. use lists
> > 2. use dicts
> > 3. new type
> > 4. do nothing
> 
> 5. new Python module with a class "Set"
> (The issues are similar to #3, but this has the advantage of not changing
> the interpreter)

Perhaps someone could take Aaron's kjbuckets and write
a Python emulation for it (I think he's even already done something
like this for gadfly). Then the emulation could go into the
core and if people want speed they can install his extension
(the emulation would have to detect this and use the real thing
then).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack@oratrix.nl  Tue Mar 21 11:54:30 2000
From: jack@oratrix.nl (Jack Jansen)
Date: Tue, 21 Mar 2000 12:54:30 +0100
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
 Tue, 21 Mar 2000 01:25:09 +0100 , <38D6C165.EEF58232@lemburg.com>
Message-ID: <20000321115430.88A11370CF2@snelboot.oratrix.nl>

I guess we need another format specifier than "s" here. "s" does the 
conversion to standard-python-utf8 for wide strings, and we'd need another 
format for conversion to current-local-os-convention-8-bit-encoding-of-unicode-
strings.

I assume that that would also come in handy for MacOS, where we'll have the 
same problem (filenames are in Apple's proprietary 8bit encoding).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal@lemburg.com  Tue Mar 21 12:14:54 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 21 Mar 2000 13:14:54 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000321115430.88A11370CF2@snelboot.oratrix.nl>
Message-ID: <38D767BE.C45F8286@lemburg.com>

Jack Jansen wrote:
> 
> I guess we need another format specifier than "s" here. "s" does the
> conversion to standard-python-utf8 for wide strings,

Actually, "t" does the UTF-8 conversion... "s" will give you
the raw internal UTF-16 representation in platform byte order.

> and we'd need another
> format for conversion to current-local-os-convention-8-bit-encoding-of-unicode-
> strings.

I'd suggest adding some king of generic

	PyOS_FilenameFromObject(PyObject *v,
				void *buffer,
				int buffer_len)

API for the conversion of strings, Unicode and text buffers
to an OS dependent filename buffer.

And/or perhaps sepcific APIs for each OS... e.g.

	PyOS_MBCSFromObject() (only on WinXX)
	PyOS_AppleFromObject() (only on Mac ;)

> I assume that that would also come in handy for MacOS, where we'll have the
> same problem (filenames are in Apple's proprietary 8bit encoding).

Is that encoding already supported by the encodings package ?
If not, could you point me to a map file for the encoding ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake@acm.org  Tue Mar 21 14:56:47 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 21 Mar 2000 09:56:47 -0500 (EST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <38D767BE.C45F8286@lemburg.com>
References: <20000321115430.88A11370CF2@snelboot.oratrix.nl>
 <38D767BE.C45F8286@lemburg.com>
Message-ID: <14551.36271.33825.841965@weyr.cnri.reston.va.us>

M.-A. Lemburg writes:
 > And/or perhaps sepcific APIs for each OS... e.g.
 > 
 > 	PyOS_MBCSFromObject() (only on WinXX)
 > 	PyOS_AppleFromObject() (only on Mac ;)

  Another approach may be to add some format modifiers:

	te -- text in an encoding specified by a C string (somewhat
              similar to O&)
        tE -- text, encoding specified by a Python object (probably a
              string passed as a parameter or stored from some other
              call)

  (I'd prefer the [eE] before the t, but the O modifiers follow, so
consistency requires this ugly construct.)
  This brings up the issue of using a hidden conversion function which 
may create a new object that needs the same lifetime guarantees as the 
real parameters; we discussed this issue a month or two ago.
  Somewhere, there's a call context that includes the actual parameter 
tuple.  PyArg_ParseTuple() could have access to a "scratch" area where
it could place objects constructed during parameter parsing.  This
area could just be a hidden tuple.  When the C call returns, the
scratch area can be discarded.
  The difficulty is in giving PyArg_ParseTuple() access to the scratch 
area, but I don't know how hard that would be off the top of my head.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From jeremy@cnri.reston.va.us  Tue Mar 21 17:14:07 2000
From: jeremy@cnri.reston.va.us (Jeremy Hylton)
Date: Tue, 21 Mar 2000 12:14:07 -0500 (EST)
Subject: [Python-Dev] Set options
In-Reply-To: <38D7409C.169B0C42@lemburg.com>
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
 <38D7409C.169B0C42@lemburg.com>
Message-ID: <14551.44511.805860.808811@goon.cnri.reston.va.us>

>>>>> "MAL" == M -A Lemburg <mal@lemburg.com> writes:

  MAL> Perhaps someone could take Aaron's kjbuckets and write a Python
  MAL> emulation for it (I think he's even already done something like
  MAL> this for gadfly). Then the emulation could go into the core and
  MAL> if people want speed they can install his extension (the
  MAL> emulation would have to detect this and use the real thing
  MAL> then).

I've been waiting for Tim Peters to say something about sets, but I'll
chime in with what I recall him saying last time a discussion like
this came up on c.l.py.  (I may misremember, in which case I'll at
least draw him into the discussion in order to correct me <0.5 wink>.)

The problem with a set module is that there are a number of different
ways to implement them -- in C using kjbuckets is one example.  Each
approach is appropriate for some applications, but not for every one.
A set is pretty simple to build from a list or a dictionary, so we
leave it to application writers to write the one that is appropriate
for their application.

Jeremy


From skip@mojam.com (Skip Montanaro)  Tue Mar 21 17:25:57 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Tue, 21 Mar 2000 11:25:57 -0600 (CST)
Subject: [Python-Dev] Set options
In-Reply-To: <38D7409C.169B0C42@lemburg.com>
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
 <38D7409C.169B0C42@lemburg.com>
Message-ID: <14551.45221.447838.534003@beluga.mojam.com>

    Marc> Perhaps someone could take Aaron's kjbuckets and write a Python
    Marc> emulation for it ...

Any reason why kjbuckets and friends have never been placed in the core?
If, as it seems from the discussion, a set type is a good thing to add to
the core, it seems to me that Aaron's code would be a good candidate
implementation/foundation. 

-- 
Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/


From bwarsaw@cnri.reston.va.us  Tue Mar 21 17:47:49 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 21 Mar 2000 12:47:49 -0500 (EST)
Subject: [Python-Dev] Set options
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
 <38D7409C.169B0C42@lemburg.com>
 <14551.45221.447838.534003@beluga.mojam.com>
Message-ID: <14551.46533.918688.13801@anthem.cnri.reston.va.us>

>>>>> "SM" == Skip Montanaro <skip@mojam.com> writes:

    SM> Any reason why kjbuckets and friends have never been placed in
    SM> the core?  If, as it seems from the discussion, a set type is
    SM> a good thing to add to the core, it seems to me that Aaron's
    SM> code would be a good candidate implementation/foundation.

It would seem to me that distutils is a better way to go for
kjbuckets.  The core already has basic sets (via dictionaries).  We're
pretty much just quibbling about efficiency, API, and syntax, aren't
we?

-Barry


From mhammond@skippinet.com.au  Tue Mar 21 17:48:06 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Tue, 21 Mar 2000 09:48:06 -0800
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <38D6C165.EEF58232@lemburg.com>
Message-ID: <ECEPKNMJLHAPFFJHDOJBKEOKCGAA.mhammond@skippinet.com.au>

>
> Right. The idea with open() was to write a special version (using
> #ifdefs) for use on Windows platforms which does all the needed
> magic to convert Unicode to whatever the native format and locale
> is...

That works for open() - but what about other extension modules?

This seems to imply that any Python extension on Windows that wants to pass
a Unicode string to an external function can not use PyArg_ParseTuple() with
anything other than "O", and perform the magic themselves.

This just seems a little back-to-front to me.  Platforms that have _no_
native Unicode support have useful utilities for working with Unicode.
Platforms that _do_ have native Unicode support can not make use of these
utilities.  Is this by design, or simply a sad side-effect of the design?

So - it is trivial to use Unicode on platforms that dont support it, but
quite difficult on platforms that do.

> Using parser markers for this is obviously *not* the right way
> to get to the core of the problem. Basically, you will have to
> write a helper which takes a string, Unicode or some other
> "t" compatible object as name object and then converts it to
> the system's view of things.

Why "obviously"?  What on earth does the existing mechamism buy me on
Windows, other than grief that I can not use it?

> I think we had a private discussion about this a few months ago:
> there was some way to convert Unicode to a platform independent
> format which then got converted to MBCS -- don't remember the details
> though.

There is a Win32 API function for this.  However, as you succinctly pointed
out, not many people are going to be aware of its name, or how to use the
multitude of flags offered by these conversion functions, or know how to
deal with the memory management, etc.

> Can't you use the wchar_t interfaces for the task (see
> the unicodeobject.h file for details) ? Perhaps you can
> first transfer Unicode to wchar_t and then on to MBCS
> using a win32 API ?!

Sure - I can.  But can everyone who writes interfaces to Unicode functions?
You wrote the Python Unicode support but dont know its name - pity the poor
Joe Average trying to write an extension.

It seems to me that, on Windows, the Python Unicode support as it stands is
really internal.  I can not think of a single time that an extension writer
on Windows would ever want to use the "t" markers - am I missing something?
I dont believe that a single Unicode-aware function in the Windows
extensions (of which there are _many_) could be changed to use the "t"
markers.

It still seems to me that the Unicode support works well on platforms with
no Unicode support, and is fairly useless on platforms with the support.  I
dont believe that any extension on Windows would want to use the "t"
marker - so, as Fred suggested, how about providing something for us that
can help us interface to the platform's Unicode?

This is getting too hard for me - I will release my windows registry module
without Unicode support, and hope that in the future someone cares enough to
address it, and to add a large number of LOC that will be needed simply to
get Unicode talking to Unicode...

Mark.


From skip@mojam.com (Skip Montanaro)  Tue Mar 21 18:04:11 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Tue, 21 Mar 2000 12:04:11 -0600 (CST)
Subject: [Python-Dev] Set options
In-Reply-To: <14551.46533.918688.13801@anthem.cnri.reston.va.us>
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
 <38D7409C.169B0C42@lemburg.com>
 <14551.45221.447838.534003@beluga.mojam.com>
 <14551.46533.918688.13801@anthem.cnri.reston.va.us>
Message-ID: <14551.47515.648064.969034@beluga.mojam.com>

    BAW> It would seem to me that distutils is a better way to go for
    BAW> kjbuckets.  The core already has basic sets (via dictionaries).
    BAW> We're pretty much just quibbling about efficiency, API, and syntax,
    BAW> aren't we?

Yes (though I would quibble with your use of the word "quibbling" ;-).  If
new syntax is in the offing as some have proposed, why not go for a more
efficient implementation at the same time?  I believe Aaron has maintained
that kjbuckets is generally more efficient than Python's dictionary object.

Skip


From mal@lemburg.com  Tue Mar 21 17:44:11 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 21 Mar 2000 18:44:11 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000321115430.88A11370CF2@snelboot.oratrix.nl>
 <38D767BE.C45F8286@lemburg.com> <14551.36271.33825.841965@weyr.cnri.reston.va.us>
Message-ID: <38D7B4EB.66DAEBF3@lemburg.com>

"Fred L. Drake, Jr." wrote:
> 
> M.-A. Lemburg writes:
>  > And/or perhaps sepcific APIs for each OS... e.g.
>  >
>  >      PyOS_MBCSFromObject() (only on WinXX)
>  >      PyOS_AppleFromObject() (only on Mac ;)
> 
>   Another approach may be to add some format modifiers:
> 
>         te -- text in an encoding specified by a C string (somewhat
>               similar to O&)
>         tE -- text, encoding specified by a Python object (probably a
>               string passed as a parameter or stored from some other
>               call)
> 
>   (I'd prefer the [eE] before the t, but the O modifiers follow, so
> consistency requires this ugly construct.)
>   This brings up the issue of using a hidden conversion function which
> may create a new object that needs the same lifetime guarantees as the
> real parameters; we discussed this issue a month or two ago.
>   Somewhere, there's a call context that includes the actual parameter
> tuple.  PyArg_ParseTuple() could have access to a "scratch" area where
> it could place objects constructed during parameter parsing.  This
> area could just be a hidden tuple.  When the C call returns, the
> scratch area can be discarded.
>   The difficulty is in giving PyArg_ParseTuple() access to the scratch
> area, but I don't know how hard that would be off the top of my head.

Some time ago, I considered adding "U+" with builtin auto-conversion
to the tuple parser... after some discussion about the error
handling issues involved with this I quickly dropped that idea
again and used the standard "O" approach plus a call to a helper
function which then applied the conversion.

(Note the "+" behind "U": this was intended to indicate that the
returned object has had the refcount incremented and that the
caller must take care of decrementing it again.)

The "O" + helper approach is a little clumsy, but works
just fine. Plus it doesn't add any more overhead to the
already convoluted PyArg_ParseTuple().

BTW, what other external char formats are we talking about ?
E.g. how do you handle MBCS or DBCS under WinXX ? Are there
routines to have wchar_t buffers converted into the two ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gmcm@hypernet.com  Tue Mar 21 18:25:43 2000
From: gmcm@hypernet.com (Gordon McMillan)
Date: Tue, 21 Mar 2000 13:25:43 -0500
Subject: [Python-Dev] Set options
In-Reply-To: <14551.44511.805860.808811@goon.cnri.reston.va.us>
References: <38D7409C.169B0C42@lemburg.com>
Message-ID: <1258459347-36172889@hypernet.com>

Jeremy wrote:

> The problem with a set module is that there are a number of different
> ways to implement them -- in C using kjbuckets is one example.  

Nah. Sets are pretty unambiguous. They're also easy, and 
boring. The interesting stuff is graphs and operations like 
composition, closure and transpositions. That's also where 
stuff gets ambiguous. E.g., what's the right behavior when you 
invert {'a':1,'b':1}? Hint: any answer you give will be met by the 
wrath of God.

I would love this stuff, and as a faithful worshipper of Our Lady 
of Corrugated Ironism, I could probably live with whatever rules 
are arrived at; but I'm afraid I would have to considerably 
enlarge my kill file.


- Gordon


From gstein@lyra.org  Tue Mar 21 18:40:20 2000
From: gstein@lyra.org (Greg Stein)
Date: Tue, 21 Mar 2000 10:40:20 -0800 (PST)
Subject: [Python-Dev] Set options
In-Reply-To: <14551.44511.805860.808811@goon.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003211039420.19728-100000@nebula.lyra.org>

On Tue, 21 Mar 2000, Jeremy Hylton wrote:
> >>>>> "MAL" == M -A Lemburg <mal@lemburg.com> writes:
>   MAL> Perhaps someone could take Aaron's kjbuckets and write a Python
>   MAL> emulation for it (I think he's even already done something like
>   MAL> this for gadfly). Then the emulation could go into the core and
>   MAL> if people want speed they can install his extension (the
>   MAL> emulation would have to detect this and use the real thing
>   MAL> then).
> 
> I've been waiting for Tim Peters to say something about sets, but I'll
> chime in with what I recall him saying last time a discussion like
> this came up on c.l.py.  (I may misremember, in which case I'll at
> least draw him into the discussion in order to correct me <0.5 wink>.)
> 
> The problem with a set module is that there are a number of different
> ways to implement them -- in C using kjbuckets is one example.  Each
> approach is appropriate for some applications, but not for every one.
> A set is pretty simple to build from a list or a dictionary, so we
> leave it to application writers to write the one that is appropriate
> for their application.

Yah... +1 on what Jeremy said.

Leave them out of the distro since we can't do them Right for all people.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From Moshe Zadka <mzadka@geocities.com>  Tue Mar 21 18:34:56 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Tue, 21 Mar 2000 20:34:56 +0200 (IST)
Subject: [Python-Dev] Set options
In-Reply-To: <14551.47515.648064.969034@beluga.mojam.com>
Message-ID: <Pine.GSO.4.10.10003212027090.28133-100000@sundial>

On Tue, 21 Mar 2000, Skip Montanaro wrote:

>     BAW> It would seem to me that distutils is a better way to go for
>     BAW> kjbuckets.  The core already has basic sets (via dictionaries).
>     BAW> We're pretty much just quibbling about efficiency, API, and syntax,
>     BAW> aren't we?
> 
> If new syntax is in the offing as some have proposed,

FWIW, I'm against new syntax. The core-language has changed quite a lot
between 1.5.2 and 1.6 --

* strings have grown methods
* there are unicode strings
* "in" operator overloadable

The second change even includes a syntax change (u"some string") whose
variants I'm still not familiar enough to comment on (ru"some\string"?
ur"some\string"? Both legal?). I feel too many changes destabilize the
language (this might seem a bit extreme, considering I pushed towards one
of the changes), and we should try to improve on things other then the
core -- one of these is a more hierarchical standard library, and a
standard distribution mechanism, to rival CPAN -- then anyone could 

import data.sets.kjbuckets

With only a trivial 

>>> import dist
>>> dist.install("data.sets.kjbuckets")

> why not go for a more efficient implementation at the same time? 

Because Python dicts are "pretty efficient", and it is not a trivial
question to check optimiality in this area: tests can be rigged to prove
almost anything with the right test-cases, and there's no promise we'll
choose the "right ones".

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From Moshe Zadka <mzadka@geocities.com>  Tue Mar 21 18:38:02 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Tue, 21 Mar 2000 20:38:02 +0200 (IST)
Subject: [Python-Dev] Set options
In-Reply-To: <1258459347-36172889@hypernet.com>
Message-ID: <Pine.GSO.4.10.10003212036480.28133-100000@sundial>

On Tue, 21 Mar 2000, Gordon McMillan wrote:

> E.g., what's the right behavior when you 
> invert {'a':1,'b':1}? Hint: any answer you give will be met by the 
> wrath of God.

Isn't "wrath of God" translated into Python is "an exception"?

raise ValueError("dictionary is not 1-1") 

seems fine to me.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From skip@mojam.com (Skip Montanaro)  Tue Mar 21 18:42:55 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Tue, 21 Mar 2000 12:42:55 -0600 (CST)
Subject: [Python-Dev] Set options
In-Reply-To: <Pine.GSO.4.10.10003212027090.28133-100000@sundial>
References: <14551.47515.648064.969034@beluga.mojam.com>
 <Pine.GSO.4.10.10003212027090.28133-100000@sundial>
Message-ID: <14551.49839.377385.99637@beluga.mojam.com>

    Skip> If new syntax is in the offing as some have proposed,

    Moshe> FWIW, I'm against new syntax. The core-language has changed quite
    Moshe> a lot between 1.5.2 and 1.6 --

I thought we were talking about Py3K, where syntax changes are somewhat more
expected.  Just to make things clear, the syntax change I was referring to
was the value-less dict syntax that someone proposed a few days ago:

    myset = {"a", "b", "c"}

Note that I wasn't necessarily supporting the proposal, only acknowledging
that it had been made.

In general, I think we need to keep straight where people feel various
proposals are going to fit.  When a thread goes for more than a few messages 
it's easy to forget.

-- 
Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/


From ping@lfw.org  Tue Mar 21 13:07:51 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Tue, 21 Mar 2000 07:07:51 -0600 (CST)
Subject: [Python-Dev] Set options
In-Reply-To: <14551.46533.918688.13801@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003210643440.27995-100000@server1.lfw.org>

Jeremy Hylton wrote:
> The problem with a set module is that there are a number of different
> ways to implement them -- in C using kjbuckets is one example.  Each
> approach is appropriate for some applications, but not for every one.

For me, anyway, this is not about trying to engineer a universally
perfect solution into Python -- it's about providing some simple, basic,
easy-to-understand functionality that takes care of the common case.
For example, dictionaries are simple, their workings are easy enough
to understand, and they aren't written to efficiently support things
like inversion and composition because most of the time no one needs
to do these things.

The same holds true for sets.  All i would want is something i can
put things into, and take things out of, and ask about what's inside.

Barry Warsaw wrote:
> It would seem to me that distutils is a better way to go for
> kjbuckets.  The core already has basic sets (via dictionaries).  We're
> pretty much just quibbling about efficiency, API, and syntax, aren't we?

Efficiency: Hashtables have proven quite adequate for dicts, so
i think they're quite adequate for sets.

API and syntax: I believe the goal is obvious, because Python already
has very nice notation ("in", "not in") -- it just doesn't work quite
the way one would want.  It works semantically right on lists, but
they're a little slow.  It doesn't work on dicts, but we can make it so.

Here is where my "explanation metric" comes into play.  How much
additional explaining do you have to do in each case to answer the
question "what do i do when i need a set"?


1.  Use lists.

    Explain that "include()" means "append if not already present",
    and "exclude()" means "remove if present".  You are done.


2.  Use dicts.
    
    Explain that "for x in dict" iterates over the keys, and
    "if x in dict" looks for a key.  Explain what happens when
    you write "{1, 2, 3}", and the special non-printing value
    constant.  Explain how to add elements to a set and remove
    elements from a set.


3.  Create a new type.

    Explain that there exists another type "set" with methods
    "insert" and "remove".  Explain how to construct sets.
    Explain how "in" and "not in" work, where this type fits
    in with the other types, and when to choose this type
    over other types.


4.  Do nothing.

    Explain that dictionaries can be used as sets if you assign
    keys a dummy value, use "del" to remove keys, iterate over
    "dict.keys()", and use "dict.has_key()" to test membership.


This is what motivated my proposal for using lists: it requires
by far the least explanation.  This is no surprise because a lot
of things about lists have been explained already.

My preference in terms of elegance is about equal for 1, 2, 3,
with 4 distinctly behind; but my subjective ranking of "explanation
complexity" (as in "how to get there from here") is 1 < 4 < 3 < 2.


-- ?!ng


From tismer@tismer.com  Tue Mar 21 20:13:38 2000
From: tismer@tismer.com (Christian Tismer)
Date: Tue, 21 Mar 2000 21:13:38 +0100
Subject: [Python-Dev] Unicode Database Compression
Message-ID: <38D7D7F2.14A2FBB5@tismer.com>

Hi,

I have spent the last four days on compressing the
Unicode database.

With little decoding effort, I can bring the data down to 25kb.
This would still be very fast, since codes are randomly
accessible, although there are some simple shifts and masks.

With a bit more effort, this can be squeezed down to 15kb
by some more aggressive techniques like common prefix
elimination. Speed would be *slightly* worse, since a small
loop (average 8 cycles) is performed to obtain a character
from a packed nybble.

This is just all the data which is in Marc's unicodedatabase.c
file. I checked efficiency by creating a delimited file like
the original database text file with only these columns and
ran PkZip over it. The result was 40kb. This says that I found
a lot of correlations which automatic compressors cannot see.

Now, before generating the final C code, I'd like to ask some
questions:

What is more desirable: Low compression and blinding speed?
Or high compression and less speed, since we always want to
unpack a whole code page?

Then, what about the other database columns?
There are a couple of extra atrributes which I find coded
as switch statements elsewhere. Should I try to pack these
codes into my squeezy database, too?

And last: There are also two quite elaborated columns with
textual descriptions of the codes (the uppercase blah version
of character x). Do we want these at all? And if so, should
I try to compress them as well? Should these perhaps go
into a different source file as a dynamic module, since they
will not be used so often?

waiting for directives - ly y'rs - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From Moshe Zadka <mzadka@geocities.com>  Wed Mar 22 05:44:00 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Wed, 22 Mar 2000 07:44:00 +0200 (IST)
Subject: [1.x] Re: [Python-Dev] Set options
In-Reply-To: <14551.49839.377385.99637@beluga.mojam.com>
Message-ID: <Pine.GSO.4.10.10003212114530.29516-100000@sundial>

On Tue, 21 Mar 2000, Skip Montanaro wrote:

>     Skip> If new syntax is in the offing as some have proposed,
> 
>     Moshe> FWIW, I'm against new syntax. The core-language has changed quite
>     Moshe> a lot between 1.5.2 and 1.6 --
> 
> I thought we were talking about Py3K

My argument was strictly a 1.x argument. I'm hoping to get sets it in 1.7
or 1.8.

> In general, I think we need to keep straight where people feel various
> proposals are going to fit. 

You're right. I'll start prefixing my posts accordingally.

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mal@lemburg.com  Wed Mar 22 10:11:25 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 22 Mar 2000 11:11:25 +0100
Subject: [Python-Dev] Re: Unicode Database Compression
References: <38D7D7F2.14A2FBB5@tismer.com>
Message-ID: <38D89C4D.370C19D@lemburg.com>

Christian Tismer wrote:
> 
> Hi,
> 
> I have spent the last four days on compressing the
> Unicode database.

Cool :-)
 
> With little decoding effort, I can bring the data down to 25kb.
> This would still be very fast, since codes are randomly
> accessible, although there are some simple shifts and masks.
> 
> With a bit more effort, this can be squeezed down to 15kb
> by some more aggressive techniques like common prefix
> elimination. Speed would be *slightly* worse, since a small
> loop (average 8 cycles) is performed to obtain a character
> from a packed nybble.
> 
> This is just all the data which is in Marc's unicodedatabase.c
> file. I checked efficiency by creating a delimited file like
> the original database text file with only these columns and
> ran PkZip over it. The result was 40kb. This says that I found
> a lot of correlations which automatic compressors cannot see.

Not bad ;-)
 
> Now, before generating the final C code, I'd like to ask some
> questions:
> 
> What is more desirable: Low compression and blinding speed?
> Or high compression and less speed, since we always want to
> unpack a whole code page?

I'd say high speed and less compression. The reason is that
the Asian codecs will need fast access to the database. With
their large mapping tables size the few more kB don't hurt,
I guess.

> Then, what about the other database columns?
> There are a couple of extra atrributes which I find coded
> as switch statements elsewhere. Should I try to pack these
> codes into my squeezy database, too?

You basically only need to provide the APIs (and columns)
defined in the unicodedata Python API, e.g. the
character description column is not needed.
 
> And last: There are also two quite elaborated columns with
> textual descriptions of the codes (the uppercase blah version
> of character x). Do we want these at all? And if so, should
> I try to compress them as well? Should these perhaps go
> into a different source file as a dynamic module, since they
> will not be used so often?

I guess you are talking about the "Unicode 1.0 Name"
and the "10646 comment field" -- see above, there's no
need to include these descriptions in the database...
 
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Wed Mar 22 11:04:32 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 22 Mar 2000 12:04:32 +0100
Subject: [Python-Dev] Unicode and Windows
References: <ECEPKNMJLHAPFFJHDOJBKEOKCGAA.mhammond@skippinet.com.au>
Message-ID: <38D8A8C0.66123F2C@lemburg.com>

Mark Hammond wrote:
> 
> >
> > Right. The idea with open() was to write a special version (using
> > #ifdefs) for use on Windows platforms which does all the needed
> > magic to convert Unicode to whatever the native format and locale
> > is...
> 
> That works for open() - but what about other extension modules?
> 
> This seems to imply that any Python extension on Windows that wants to pass
> a Unicode string to an external function can not use PyArg_ParseTuple() with
> anything other than "O", and perform the magic themselves.
> 
> This just seems a little back-to-front to me.  Platforms that have _no_
> native Unicode support have useful utilities for working with Unicode.
> Platforms that _do_ have native Unicode support can not make use of these
> utilities.  Is this by design, or simply a sad side-effect of the design?
> 
> So - it is trivial to use Unicode on platforms that dont support it, but
> quite difficult on platforms that do.

The problem is that Windows seems to use a completely different
internal Unicode format than most of the rest of the world.

As I've commented on in a different post, the only way to have
PyArg_ParseTuple() perform auto-conversion is by allowing it
to return objects which are garbage collected by the caller.
The problem with this is error handling, since PyArg_ParseTuple()
will have to keep track of all objects it created until the
call returns successfully. An alternative approach is sketched
below.

Note that *all* platforms will have to use this approach...
not only Windows or other platforms with Unicode support.

> > Using parser markers for this is obviously *not* the right way
> > to get to the core of the problem. Basically, you will have to
> > write a helper which takes a string, Unicode or some other
> > "t" compatible object as name object and then converts it to
> > the system's view of things.
> 
> Why "obviously"?  What on earth does the existing mechamism buy me on
> Windows, other than grief that I can not use it?

Sure, you can :-) Just fetch the object, coerce it to
Unicode and then encode it according to your platform needs
(PyUnicode_FromObject() takes care of the coercion part for you).
 
> > I think we had a private discussion about this a few months ago:
> > there was some way to convert Unicode to a platform independent
> > format which then got converted to MBCS -- don't remember the details
> > though.
> 
> There is a Win32 API function for this.  However, as you succinctly pointed
> out, not many people are going to be aware of its name, or how to use the
> multitude of flags offered by these conversion functions, or know how to
> deal with the memory management, etc.
> 
> > Can't you use the wchar_t interfaces for the task (see
> > the unicodeobject.h file for details) ? Perhaps you can
> > first transfer Unicode to wchar_t and then on to MBCS
> > using a win32 API ?!
> 
> Sure - I can.  But can everyone who writes interfaces to Unicode functions?
> You wrote the Python Unicode support but dont know its name - pity the poor
> Joe Average trying to write an extension.

Hey, Mark... I'm not a Windows geek. How can I know which APIs
are available and which of them to use ?

And that's my point: add conversion APIs and codecs for the different
OSes which make the extension writer life easier.
 
> It seems to me that, on Windows, the Python Unicode support as it stands is
> really internal.  I can not think of a single time that an extension writer
> on Windows would ever want to use the "t" markers - am I missing something?
> I dont believe that a single Unicode-aware function in the Windows
> extensions (of which there are _many_) could be changed to use the "t"
> markers.

"t" is intended to return a text representation of a buffer
interface aware type... this happens to be UTF-8 for Unicode
objects -- what other encoding would you have expected ?

> It still seems to me that the Unicode support works well on platforms with
> no Unicode support, and is fairly useless on platforms with the support.  I
> dont believe that any extension on Windows would want to use the "t"
> marker - so, as Fred suggested, how about providing something for us that
> can help us interface to the platform's Unicode?

That's exactly what I'm talking about all the time... 
there currently are PyUnicode_AsWideChar() and PyUnicode_FromWideChar()
to interface to the compiler's wchar_t type. I have no problem
adding more of these APIs for the various OSes -- but they
would have to be coded by someone with Unicode skills on each
of those platforms, e.g. PyUnicode_AsMBCS() and PyUnicode_FromMBCS()
on Windows.
 
> This is getting too hard for me - I will release my windows registry module
> without Unicode support, and hope that in the future someone cares enough to
> address it, and to add a large number of LOC that will be needed simply to
> get Unicode talking to Unicode...

I think you're getting this wrong: I'm not argueing against adding
better support for Windows.

The only way I can think of using parser markers in this context
would be by having PyArg_ParseTuple() *copy* data into a given
data buffer rather than only passing a reference to it. This
would enable PyArg_ParseTuple() to apply whatever conversion
is needed while still keeping the temporary objects internal.

Hmm, sketching a little:

"es#",&encoding,&buffer,&buffer_len
	-- could mean: coerce the object to Unicode, then
	   encode it using the given encoding and then 
	   copy at most buffer_len bytes of data into
	   buffer and update buffer_len to the number of bytes
	   copied

This costs some cycles for copying data, but gets rid off
the problems involved in cleaning up after errors. The
caller will have to ensure that the buffer is large enough
and that the encoding fits the application's needs. Error
handling will be poor since the caller can't take any
action other than to pass on the error generated by
PyArg_ParseTuple().

Thoughts ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Wed Mar 22 13:40:23 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 22 Mar 2000 14:40:23 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000322113129.5E67C370CF2@snelboot.oratrix.nl>
Message-ID: <38D8CD47.E573A246@lemburg.com>

Jack Jansen wrote:
> 
> > "es#",&encoding,&buffer,&buffer_len
> >       -- could mean: coerce the object to Unicode, then
> >          encode it using the given encoding and then
> >          copy at most buffer_len bytes of data into
> >          buffer and update buffer_len to the number of bytes
> >          copied
> 
> This is a possible solution, but I think I would really prefer to also have
>  "eS", &encoding, &buffer_ptr
>  -- coerce the object to Unicode, then encode it using the given
>     encoding, malloc() a buffer to put the result in and return that.
> 
> I don't mind doing something like
> 
> {
>    char *filenamebuffer = NULL;
> 
>    if ( PyArg_ParseTuple(args, "eS", &macencoding, &filenamebuffer)
>        ...
>    open(filenamebuffer, ....);
>    PyMem_XDEL(filenamebuffer);
>    ...
> }
> 
> I think this would be much less error-prone than having fixed-length buffers
> all over the place.

PyArg_ParseTuple() should probably raise an error in case the
data doesn't fit into the buffer.

> And if this is indeed going to be used mainly in open()
> calls and such the cost of the extra malloc()/free() is going to be dwarfed by
> what the underlying OS call is going to use.

Good point. You'll still need the buffer_len output parameter
though -- otherwise you wouldn't be able tell the size of the
allocated buffer (the returned data may not be terminated).

How about this:

"es#", &encoding, &buffer, &buffer_len
	-- both buffer and buffer_len are in/out parameters
	-- if **buffer is non-NULL, copy the data into it
	   (at most buffer_len bytes) and update buffer_len
	   on output; truncation produces an error
	-- if **buffer is NULL, malloc() a buffer of size
	   buffer_len and return it through *buffer; if buffer_len
	   is -1, the allocated buffer should be large enough
	   to hold all data; again, truncation is an error
	-- apply coercion and encoding as described above

(could be that I've got the '*'s wrong, but you get the picture...:)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack@oratrix.nl  Wed Mar 22 13:46:50 2000
From: jack@oratrix.nl (Jack Jansen)
Date: Wed, 22 Mar 2000 14:46:50 +0100
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
 Wed, 22 Mar 2000 14:40:23 +0100 , <38D8CD47.E573A246@lemburg.com>
Message-ID: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl>

> > [on the user-supplies-buffer interface]
> > I think this would be much less error-prone than having fixed-length buffers
> > all over the place.
> 
> PyArg_ParseTuple() should probably raise an error in case the
> data doesn't fit into the buffer.

Ah, that's right, that solves most of that problem.

> > [on the malloced interface]
> Good point. You'll still need the buffer_len output parameter
> though -- otherwise you wouldn't be able tell the size of the
> allocated buffer (the returned data may not be terminated).

Are you sure? I would expect the "eS" format to be used to obtain 8-bit data 
in some local encoding, and I would expect that all 8-bit encodings of unicode 
data would still allow for null-termination. Or are there 8-bit encodings out 
there where a zero byte is normal occurrence and where it can't be used as 
terminator?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal@lemburg.com  Wed Mar 22 16:31:26 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 22 Mar 2000 17:31:26 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl>
Message-ID: <38D8F55E.6E324281@lemburg.com>

Jack Jansen wrote:
> 
> > > [on the user-supplies-buffer interface]
> > > I think this would be much less error-prone than having fixed-length buffers
> > > all over the place.
> >
> > PyArg_ParseTuple() should probably raise an error in case the
> > data doesn't fit into the buffer.
> 
> Ah, that's right, that solves most of that problem.
> 
> > > [on the malloced interface]
> > Good point. You'll still need the buffer_len output parameter
> > though -- otherwise you wouldn't be able tell the size of the
> > allocated buffer (the returned data may not be terminated).
> 
> Are you sure? I would expect the "eS" format to be used to obtain 8-bit data
> in some local encoding, and I would expect that all 8-bit encodings of unicode
> data would still allow for null-termination. Or are there 8-bit encodings out
> there where a zero byte is normal occurrence and where it can't be used as
> terminator?

Not sure whether these exist or not, but they are certainly
a possibility to keep in mind.

Perhaps adding "es#" and "es" (with 0-byte check) would be
ideal ?!

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From pf@artcom-gmbh.de  Wed Mar 22 16:54:42 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Wed, 22 Mar 2000 17:54:42 +0100 (MET)
Subject: [Python-Dev] Nitpicking on UserList implementation
Message-ID: <m12XoP4-000CnDC@artcom0.artcom-gmbh.de>

Hi!

Please have a look at the following method cited from Lib/UserList.py:

    def __radd__(self, other):
        if isinstance(other, UserList):                    # <-- ? 
            return self.__class__(other.data + self.data)  # <-- ?
        elif isinstance(other, type(self.data)):
            return self.__class__(other + self.data)
        else:
            return self.__class__(list(other) + self.data)

The reference manual tells about the __r*__ methods: 

    """These functions are only called if the left operand does not 
       support the corresponding operation."""

So if the left operand is a UserList instance, it should always have
a __add__ method, which will be called instead of the right operands
__radd__.  So I think the condition 'isinstance(other, UserList)'
in __radd__ above will always evaluate to False and so the two lines
marked with '# <-- ?' seem to be superfluous.

But 'UserList' is so mature:  Please tell me what I've oveerlooked before
I make a fool of myself and submit a patch removing this two lines.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From gvwilson@nevex.com  Thu Mar 23 17:10:16 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Thu, 23 Mar 2000 12:10:16 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
Message-ID: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>

[The following passed the Ping test, so I'm posting it here]

If None becomes a keyword, I would like to ask whether it could be used to
signal that a method is a class method, as opposed to an instance method:

class Ping:

    def __init__(self, arg):
        ...as usual...

    def method(self, arg):
        ...no change...

    def classMethod(None, arg):
        ...equivalent of C++ 'static'...

p = Ping("thinks this is cool")    # as always
p.method("who am I to argue?")     # as always
Ping.classMethod("hey, cool!")     # no 'self'
p.classMethod("hey, cool!")        # also selfless


I'd also like to ask (separately) that assignment to None be defined as a
no-op, so that programmers can write:

    year, month, None, None, None, None, weekday, None, None = gmtime(time())

instead of having to create throw-away variables to fill in slots in
tuples that they don't care about.  I think both behaviors are readable;
the first provides genuinely new functionality, while I often found the
second handy when I was doing logic programming.

Greg


From jim@digicool.com  Thu Mar 23 17:18:29 2000
From: jim@digicool.com (Jim Fulton)
Date: Thu, 23 Mar 2000 12:18:29 -0500
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <38DA51E5.B39D3E7B@digicool.com>

gvwilson@nevex.com wrote:
> 
> [The following passed the Ping test, so I'm posting it here]
> 
> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:
> 
> class Ping:
> 
>     def __init__(self, arg):
>         ...as usual...
> 
>     def method(self, arg):
>         ...no change...
> 
>     def classMethod(None, arg):
>         ...equivalent of C++ 'static'...

(snip)

As a point of jargon, please lets call this thing a "static 
method" (or an instance function, or something) rather than
a "class method".  

The distinction between "class methods" and "static methods"
has been discussed at length in the types sig (over a year
ago). If this proposal goes forward and the name "class method"
is used, I'll have to argue strenuously, and I really don't want
to do that. :] So, if you can live with the term "static method", 
you could save us alot of trouble by just saying "static method".

Jim

--
Jim Fulton           mailto:jim@digicool.com
Technical Director   (888) 344-4332              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From gvwilson@nevex.com  Thu Mar 23 17:21:48 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Thu, 23 Mar 2000 12:21:48 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <38DA51E5.B39D3E7B@digicool.com>
Message-ID: <Pine.LNX.4.10.10003231221170.890-100000@akbar.nevex.com>

> As a point of jargon, please lets call this thing a "static method"
> (or an instance function, or something) rather than a "class method".

I'd call it a penguin if that was what it took to get something like this
implemented... :-)

greg


From jim@digicool.com  Thu Mar 23 17:28:25 2000
From: jim@digicool.com (Jim Fulton)
Date: Thu, 23 Mar 2000 12:28:25 -0500
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231221170.890-100000@akbar.nevex.com>
Message-ID: <38DA5439.F5FE8FE6@digicool.com>

gvwilson@nevex.com wrote:
> 
> > As a point of jargon, please lets call this thing a "static method"
> > (or an instance function, or something) rather than a "class method".
> 
> I'd call it a penguin if that was what it took to get something like this
> implemented... :-)

Thanks a great name. Let's go with penguin. :)

Jim

--
Jim Fulton           mailto:jim@digicool.com
Technical Director   (888) 344-4332              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From mhammond@skippinet.com.au  Thu Mar 23 17:29:53 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Thu, 23 Mar 2000 09:29:53 -0800
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <ECEPKNMJLHAPFFJHDOJBEEAKCHAA.mhammond@skippinet.com.au>

...
> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:
>
>     def classMethod(None, arg):
>         ...equivalent of C++ 'static'...
...

> I'd also like to ask (separately) that assignment to None be defined as a
> no-op, so that programmers can write:
>
>     year, month, None, None, None, None, weekday, None, None =
> gmtime(time())

In the vernacular of a certain Mr Stein...

+2 on both of these :-)

[Although I do believe "static method" is a better name than "penguin" :-]


Mark.


From ping@lfw.org  Thu Mar 23 17:47:47 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Thu, 23 Mar 2000 09:47:47 -0800 (PST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <Pine.LNX.4.10.10003230942180.1187-100000@localhost>

On Thu, 23 Mar 2000 gvwilson@nevex.com wrote:
> 
> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:
> 
> class Ping:

[...]

Ack!  I've been reduced to a class with just three methods.
Oh well, i never really considered it a such a bad thing
to be called "simple-minded".  :)

>     def classMethod(None, arg):
>         ...equivalent of C++ 'static'...

Yeah, i agree with Jim; you might as well call this a "static
method" as opposed to a "class method".

I like the way "None" is explicitly stated here, so there's
no confusion about what the method does.  (Without it, there's
the question of whether the first argument will get thrown in,
or what...)

Hmm... i guess this also means one should ask what

    def function(None, arg):
        ...

does outside a class definition.  I suppose that should simply
be illegal.

> I'd also like to ask (separately) that assignment to None be defined as a
> no-op, so that programmers can write:
> 
>     year, month, None, None, None, None, weekday, None, None = gmtime(time())
> 
> instead of having to create throw-away variables to fill in slots in
> tuples that they don't care about.

For what it's worth, i sometimes use "_" for this purpose
(shades of Prolog!) but i can't make much of an argument
for its readability...


-- ?!ng

        I never dreamt that i would get to be
        The creature that i always meant to be
        But i thought, in spite of dreams,
        You'd be sitting somewhere here with me.


From fdrake@acm.org  Thu Mar 23 18:11:39 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 23 Mar 2000 13:11:39 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <14554.24155.948286.451340@weyr.cnri.reston.va.us>

gvwilson@nevex.com writes:
 > p.classMethod("hey, cool!")        # also selfless

  This is the example that I haven't seen before (I'm not on the
types-sig, so it may have been presented there), and I think this is
what makes it interesting; a method in a module isn't quite sufficient 
here, since a subclass can override or extend the penguin this way.
  (Er, if we *do* go with penguin, does this mean it only works on
Linux?  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From pf@artcom-gmbh.de  Thu Mar 23 18:25:57 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Thu, 23 Mar 2000 19:25:57 +0100 (MET)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com> from "gvwilson@nevex.com" at "Mar 23, 2000 12:10:16 pm"
Message-ID: <m12YCIv-000CnDC@artcom0.artcom-gmbh.de>

Hi!

gvwilson@nevex.com:
> I'd also like to ask (separately) that assignment to None be defined as a
> no-op, so that programmers can write:
> 
>     year, month, None, None, None, None, weekday, None, None = gmtime(time())

You can already do this today with 1.5.2, if you use a 'del None' statement:

Python 1.5.2 (#1, Jul 23 1999, 06:38:16)  [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> from time import time, gmtime
>>> year, month, None, None, None, None, weekday, None, None = gmtime(time())
>>> print year, month, None, weekday
2000 3 0 3
>>> del None
>>> print year, month, None, weekday
2000 3 None 3
>>> 

if None will become a keyword in Py3K this pyidiom should better be written as 
    year, month, None, None, None, None, ... = ...	
    if sys.version[0] == '1': del None

or
    try:
        del None
    except SyntaxError:
        pass # Wow running Py3K here!

I wonder, how much existinng code the None --> keyword change would brake.

Regards, Peter


From paul@prescod.net  Thu Mar 23 18:47:55 2000
From: paul@prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 10:47:55 -0800
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <38DA66DB.635E8731@prescod.net>

gvwilson@nevex.com wrote:
> 
> [The following passed the Ping test, so I'm posting it here]
> 
> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:

+1

Idea is good, but I'm not really happy with any of the the proposed
terminology...Python doesn't really have static anything.

I would vote at the same time to make self a keyword and signal if the
first argument is not one of None or self. Even now, one of my most
common Python mistakes is in forgetting self. I expect it happens to
anyone who shifts between other languages and Python.

Why does None have an upper case "N"? Maybe the keyword version should
be lower-case...

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"I and my companions suffer from a disease of the heart that can only
be cured with gold", Hernan Cortes


From bwarsaw@cnri.reston.va.us  Thu Mar 23 18:57:00 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 23 Mar 2000 13:57:00 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <14554.26876.514559.320219@anthem.cnri.reston.va.us>

>>>>> "gvwilson" ==   <gvwilson@nevex.com> writes:

    gvwilson> If None becomes a keyword, I would like to ask whether
    gvwilson> it could be used to signal that a method is a class
    gvwilson> method, as opposed to an instance method:

It still seems mildly weird that None would be a special kind of
keyword, one that has a value and is used in ways that no other
keyword is used.  Greg gives an example, and here's a few more:

def baddaboom(x, y, z=None):
    ...

if z is None:
    ...

try substituting `else' for `None' in these examples. ;)

Putting that issue aside, Greg's suggestion for static method
definitions is interesting.

class Ping:
    # would this be a SyntaxError?
    def __init__(None, arg):
	...

    def staticMethod(None, arg):
	...

p = Ping()
Ping.staticMethod(p, 7)  # TypeError
Ping.staticMethod(7)     # This is fine
p.staticMethod(7)        # So's this
Ping.staticMethod(p)     # and this !!

-Barry


From paul@prescod.net  Thu Mar 23 18:52:25 2000
From: paul@prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 10:52:25 -0800
Subject: [Python-Dev] dir()
Message-ID: <38DA67E9.AA593B7A@prescod.net>

Can someone explain why dir(foo) does not return all of foo's methods? I
know it's documented that way, I just don't know why it *is* that way.

I'm also not clear why instances don't have auto-populated __methods__
and __members__ members?

If there isn't a good reason (there probably is) then I would advocate
that these functions and members should be more comprehensive.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"I and my companions suffer from a disease of the heart that can only
be cured with gold", Hernan Cortes


From bwarsaw@cnri.reston.va.us  Thu Mar 23 19:00:57 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 23 Mar 2000 14:00:57 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
 <m12YCIv-000CnDC@artcom0.artcom-gmbh.de>
Message-ID: <14554.27113.546575.170565@anthem.cnri.reston.va.us>

>>>>> "PF" == Peter Funk <pf@artcom-gmbh.de> writes:

    |     try:
    |         del None
    |     except SyntaxError:
    |         pass # Wow running Py3K here!

I know how to break your Py3K code: stick None=None some where higher
up :)

    PF> I wonder, how much existinng code the None --> keyword change
    PF> would brake.

Me too.
-Barry


From gvwilson@nevex.com  Thu Mar 23 19:01:06 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Thu, 23 Mar 2000 14:01:06 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.26876.514559.320219@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003231359020.4065-100000@akbar.nevex.com>

> class Ping:
>     # would this be a SyntaxError?
>     def __init__(None, arg):
> 	...

Absolutely a syntax error; ditto any of the other special names (e.g.
__add__).

Greg


From akuchlin@mems-exchange.org  Thu Mar 23 19:06:33 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 23 Mar 2000 14:06:33 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.27113.546575.170565@anthem.cnri.reston.va.us>
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
 <m12YCIv-000CnDC@artcom0.artcom-gmbh.de>
 <14554.27113.546575.170565@anthem.cnri.reston.va.us>
Message-ID: <14554.27449.69043.924322@amarok.cnri.reston.va.us>

Barry A. Warsaw writes:
>>>>>> "PF" == Peter Funk <pf@artcom-gmbh.de> writes:
>    PF> I wonder, how much existinng code the None --> keyword change
>    PF> would brake.
>Me too.

I can't conceive of anyone using None as a function name or a variable
name, except through a bug or thinking that 'None, useful, None =
1,2,3' works.  Even though None isn't a fixed constant, it might as
well be.  How much C code have you see lately that starts with int
function(void *NULL) ?

Being able to do "None = 2" also smacks a bit of those legendary
Fortran compilers that let you accidentally change 2 into 4.  +1 on
this change for Py3K, and I doubt it would cause breakage even if
introduced into 1.x.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
    Principally I played pedants, idiots, old fathers, and drunkards.
    As you see, I had a narrow escape from becoming a professor.
    -- Robertson Davies, "Shakespeare over the Port"


From paul@prescod.net  Thu Mar 23 19:02:33 2000
From: paul@prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 11:02:33 -0800
Subject: [Python-Dev] Unicode character names
Message-ID: <38DA6A49.A60E405B@prescod.net>

Here's a feature I like from Perl's Unicode support:

"""
Support for interpolating named characters

The new \N escape interpolates named characters within strings. For
example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
unicode smiley face at the end. 
"""

I get really tired of looking up the Unicode character for "ndash" or
"right dagger". Does our Unicode database have enough information to
make something like this possible?

Obviously using the official (English) name is only really helpful for
people who speak English, so we should not remove the numeric option.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"I and my companions suffer from a disease of the heart that can only
be cured with gold", Hernan Cortes


From tismer@tismer.com  Thu Mar 23 19:27:53 2000
From: tismer@tismer.com (Christian Tismer)
Date: Thu, 23 Mar 2000 20:27:53 +0100
Subject: [Python-Dev] None as a keyword / class methods
References: <ECEPKNMJLHAPFFJHDOJBEEAKCHAA.mhammond@skippinet.com.au>
Message-ID: <38DA7039.B7CDC6FF@tismer.com>


Mark Hammond wrote:
> 
> ...
> > If None becomes a keyword, I would like to ask whether it could be used to
> > signal that a method is a class method, as opposed to an instance method:
> >
> >     def classMethod(None, arg):
> >         ...equivalent of C++ 'static'...
> ...
> 
> > I'd also like to ask (separately) that assignment to None be defined as a
> > no-op, so that programmers can write:
> >
> >     year, month, None, None, None, None, weekday, None, None =
> > gmtime(time())
> 
> In the vernacular of a certain Mr Stein...
> 
> +2 on both of these :-)

me 2, �h 1.5...

The assignment no-op seems to be ok. Having None as a place
holder for static methods creates the problem that we loose
compatibility with ordinary functions.
What I would propose instead is:

make the parameter name "self" mandatory for methods, and turn
everything else into a static method. This does not change
function semantics, but just the way the method binding works.

> [Although I do believe "static method" is a better name than "penguin" :-]

pynguin

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From gvwilson@nevex.com  Thu Mar 23 19:33:47 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Thu, 23 Mar 2000 14:33:47 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <38DA7039.B7CDC6FF@tismer.com>
Message-ID: <Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>

Hi, Christian; thanks for your mail.

> What I would propose instead is:
> make the parameter name "self" mandatory for methods, and turn
> everything else into a static method.

In my experience, significant omissions (i.e. something being important
because it is *not* there) often give beginners trouble.  For example,
in C++, you can't tell whether:

int foo::bar(int bah)
{
  return 0;
}

belongs to instances, or to the class as a whole, without referring back
to the header file [1].  To quote the immortal Jeremy Hylton:

    Pythonic design rules #2:
         Explicit is better than implicit.

Also, people often ask why 'self' is required as a method argument in
Python, when it is not in C++ or Java; this proposal would (retroactively)
answer that question...

Greg

[1] I know this isn't a problem in Java or Python; I'm just using it as an
illustration.


From skip@mojam.com (Skip Montanaro)  Thu Mar 23 20:09:00 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Thu, 23 Mar 2000 14:09:00 -0600 (CST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.27449.69043.924322@amarok.cnri.reston.va.us>
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
 <m12YCIv-000CnDC@artcom0.artcom-gmbh.de>
 <14554.27113.546575.170565@anthem.cnri.reston.va.us>
 <14554.27449.69043.924322@amarok.cnri.reston.va.us>
Message-ID: <14554.31196.387213.472302@beluga.mojam.com>

    AMK> +1 on this change for Py3K, and I doubt it would cause breakage
    AMK> even if introduced into 1.x.

Or if it did, it's probably code that's marginally broken already...

-- 
Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/


From tismer@tismer.com  Thu Mar 23 20:21:09 2000
From: tismer@tismer.com (Christian Tismer)
Date: Thu, 23 Mar 2000 21:21:09 +0100
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>
Message-ID: <38DA7CB5.87D62E14@tismer.com>

Yo,

gvwilson@nevex.com wrote:
> 
> Hi, Christian; thanks for your mail.
> 
> > What I would propose instead is:
> > make the parameter name "self" mandatory for methods, and turn
> > everything else into a static method.
> 
> In my experience, significant omissions (i.e. something being important
> because it is *not* there) often give beginners trouble.  For example,
> in C++, you can't tell whether:
> 
> int foo::bar(int bah)
> {
>   return 0;
> }
> 
> belongs to instances, or to the class as a whole, without referring back
> to the header file [1].  To quote the immortal Jeremy Hylton:
> 
>     Pythonic design rules #2:
>          Explicit is better than implicit.

Sure. I am explicitly *not* using self if I want no self. :-)

> Also, people often ask why 'self' is required as a method argument in
> Python, when it is not in C++ or Java; this proposal would (retroactively)
> answer that question...

You prefer to use the explicit keyword None? How would you then deal
with

def outside(None, blah):
    pass # stuff

I believe one answer about the explicit "self" is that it should
be simple and compatible with ordinary functions. Guido had just
to add the semantics that in methods the first parameter
automatically binds to the instance.

The None gives me a bit of trouble, but not much.
What I would like to spell is

ordinary functions                    (as it is now)
functions which are instance methods  (with the immortal self)
functions which are static methods    ???
functions which are class methods     !!!

Static methods can work either with the "1st param==None" rule
or with the "1st paramname!=self" rule or whatever.
But how would you do class methods, which IMHO should have
their class passed in as first parameter?
Do you see a clean syntax for this?

I thought of some weirdness like

def meth(self, ...
def static(self=None, ...           # eek
def classm(self=class, ...          # ahem

but this breaks the rule of default argument order.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From akuchlin@mems-exchange.org  Thu Mar 23 20:27:41 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 23 Mar 2000 15:27:41 -0500 (EST)
Subject: [Python-Dev] Unicode character names
In-Reply-To: <38DA6A49.A60E405B@prescod.net>
References: <38DA6A49.A60E405B@prescod.net>
Message-ID: <14554.32317.730574.967165@amarok.cnri.reston.va.us>

Paul Prescod writes:
>The new \N escape interpolates named characters within strings. For
>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
>unicode smiley face at the end. 

Cute idea, and it certainly means you can avoid looking up Unicode
numbers.  (You can look up names instead. :) )  Note that this means the
Unicode database is no longer optional if this is done; it has to be
around at code-parsing time.  Python could import it automatically, as
exceptions.py is imported.  Christian's work on compressing
unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
dragging around the Unicode database in the binary, or is it read out
of some external file or data structure?)

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
About ten days later, it being the time of year when the National collected
down and outs to walk on and understudy I arrived at the head office of the
National Theatre in Aquinas Street in Waterloo.
    -- Tom Baker, in his autobiography


From bwarsaw@cnri.reston.va.us  Thu Mar 23 20:39:43 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 23 Mar 2000 15:39:43 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
References: <38DA7039.B7CDC6FF@tismer.com>
 <Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>
Message-ID: <14554.33039.4390.591036@anthem.cnri.reston.va.us>

>>>>> "gvwilson" ==   <gvwilson@nevex.com> writes:

    gvwilson> belongs to instances, or to the class as a whole,
    gvwilson> without referring back to the header file [1].  To quote
    gvwilson> the immortal Jeremy Hylton:

Not to take anything away from Jeremy, who has contributed some
wonderfully Pythonic quotes of his own, but this one is taken from Tim
Peters' Zen of Python

    http://www.python.org/doc/Humor.html#zen

timbot-is-the-only-one-who's-gonna-outlive-his-current-chip-set-
around-here-ly y'rs,

-Barry


From jeremy-home@cnri.reston.va.us  Thu Mar 23 20:55:25 2000
From: jeremy-home@cnri.reston.va.us (Jeremy Hylton)
Date: Thu, 23 Mar 2000 15:55:25 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>
References: <38DA7039.B7CDC6FF@tismer.com>
 <Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>
Message-ID: <14554.33590.844200.145871@walden>

>>>>> "GVW" == gvwilson  <gvwilson@nevex.com> writes:

  GVW> To quote the immortal Jeremy Hylton:

  GVW>     Pythonic design rules #2: 
  GVW>             Explicit is better than implicit.

I wish I could take credit for that :-).  Tim Peters posted a list of
20 Pythonic theses to comp.lang.python under the title "The Python
Way."  I'll collect them all here in hopes of future readers mistaking
me for Tim again <wink>.

     Beautiful is better than ugly.
     Explicit is better than implicit.
     Simple is better than complex.
     Complex is better than complicated.
     Flat is better than nested.
     Sparse is better than dense.
     Readability counts.
     Special cases aren't special enough to break the rules.
     Although practicality beats purity.
     Errors should never pass silently.
     Unless explicitly silenced.
     In the face of ambiguity, refuse the temptation to guess.
     There should be one-- and preferably only one --obvious way to do it.
     Although that way may not be obvious at first unless you're Dutch.     
     Now is better than never.
     Although never is often better than *right* now.
     If the implementation is hard to explain, it's a bad idea.
     If the implementation is easy to explain, it may be a good idea.
     Namespaces are one honking great idea -- let's do more of those! 
  
See
http://x27.deja.com/getdoc.xp?AN=485548918&CONTEXT=953844380.1254555688&hitnum=9
for the full post.

to-be-immortal-i'd-need-to-be-a-bot-ly y'rs
Jeremy


From hylton@jagunet.com  Thu Mar 23 21:01:01 2000
From: hylton@jagunet.com (Jeremy Hylton)
Date: Thu, 23 Mar 2000 16:01:01 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <14554.34037.232728.670271@walden>

>>>>> "GVW" == gvwilson  <gvwilson@nevex.com> writes:

  GVW> I'd also like to ask (separately) that assignment to None be
  GVW> defined as a no-op, so that programmers can write:

  GVW>     year, month, None, None, None, None, weekday, None, None =
  GVW> gmtime(time())

  GVW> instead of having to create throw-away variables to fill in
  GVW> slots in tuples that they don't care about.  I think both
  GVW> behaviors are readable; the first provides genuinely new
  GVW> functionality, while I often found the second handy when I was
  GVW> doing logic programming.

-1 on this proposal

Pythonic design rule #8:
    Special cases aren't special enough to break the rules.

I think it's confusing to have assignment mean pop the top of the
stack for the special case that the name is None.  If Py3K makes None
a keyword, then it would also be the only keyword that can be used in
an assignment.  Finally, we'd need to explain to the rare newbie 
who used None as variable name why they assigned 12 to None but that
it's value was its name when it was later referenced.  (Think 
'print None'.)

When I need to ignore some of the return values, I use the name nil.

year, month, nil, nil, nil, nil, weekday, nil, nil = gmtime(time())

I think that's just as clear, only a whisker less efficient, and
requires no special cases.  Heck, it's even less typing <0.5 wink>.

Jeremy


From gvwilson@nevex.com  Thu Mar 23 20:59:41 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Thu, 23 Mar 2000 15:59:41 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.33590.844200.145871@walden>
Message-ID: <Pine.LNX.4.10.10003231558330.4218-100000@akbar.nevex.com>

>   GVW> To quote the immortal Jeremy Hylton:
>   GVW>     Pythonic design rules #2: 
>   GVW>             Explicit is better than implicit.
> 
> I wish I could take credit for that :-).  Tim Peters posted a list of
> 20 Pythonic theses to comp.lang.python under the title "The Python
> Way."

Traceback (innermost last):
  File "<stdin>", line 1, in ?
AttributionError: insight incorrectly ascribed


From paul@prescod.net  Thu Mar 23 21:26:42 2000
From: paul@prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 13:26:42 -0800
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com> <14554.34037.232728.670271@walden>
Message-ID: <38DA8C12.DFFD63D5@prescod.net>

Jeremy Hylton wrote:
> 
> ...
> year, month, nil, nil, nil, nil, weekday, nil, nil = gmtime(time())

So you're proposing nil as a new keyword?

I like it. +2

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"No, I'm not QUITE that stupid", Paul Prescod


From pf@artcom-gmbh.de  Thu Mar 23 21:46:49 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Thu, 23 Mar 2000 22:46:49 +0100 (MET)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.27113.546575.170565@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at "Mar 23, 2000  2: 0:57 pm"
Message-ID: <m12YFRJ-000CnDC@artcom0.artcom-gmbh.de>

Hi Barry!

> >>>>> "PF" == Peter Funk <pf@artcom-gmbh.de> writes:
> 
>     |     try:
>     |         del None
>     |     except SyntaxError:
>     |         pass # Wow running Py3K here!
 
Barry A. Warsaw:
> I know how to break your Py3K code: stick None=None some where higher
> up :)

Hmm.... I must admit, that I don't understand your argument.

In Python <= 1.5.2 'del None' works fine, iff it follows any assignment
to None in the same scope regardless, whether there has been a None=None
in the surrounding scope or in the same scope before this.

Since something like 'del for' or 'del import' raises a SyntaxError 
exception in Py152, I expect 'del None' to raise the same exception in
Py3K, after None has become a keyword.  Right?

Regards, Peter


From andy@reportlab.com  Thu Mar 23 21:54:23 2000
From: andy@reportlab.com (Andy Robinson)
Date: Thu, 23 Mar 2000 21:54:23 GMT
Subject: [Python-Dev] Unicode Character Names
In-Reply-To: <20000323202533.ABDB31CEF8@dinsdale.python.org>
References: <20000323202533.ABDB31CEF8@dinsdale.python.org>
Message-ID: <38da90b4.756297@post.demon.co.uk>

>Message: 20
>From: "Andrew M. Kuchling" <akuchlin@mems-exchange.org>
>Date: Thu, 23 Mar 2000 15:27:41 -0500 (EST)
>To: "python-dev@python.org" <python-dev@python.org>
>Subject: Re: [Python-Dev] Unicode character names
>
>Paul Prescod writes:
>>The new \N escape interpolates named characters within strings. For
>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
>>unicode smiley face at the end.=20
>
>Cute idea, and it certainly means you can avoid looking up Unicode
>numbers.  (You can look up names instead. :) )  Note that this means the
>Unicode database is no longer optional if this is done; it has to be
>around at code-parsing time.  Python could import it automatically, as
>exceptions.py is imported.  Christian's work on compressing
>unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
>dragging around the Unicode database in the binary, or is it read out
>of some external file or data structure?)

I agree - the names are really useful.  If you are doing conversion
work, often you want to know what a character is, but don't have a
complete Unicode font handy.  Being able to get the description for a
Unicode character is useful, as well as being able to use the
description as a constructor for it.

Also, there are some language specific things that might make it
useful to have the full character descriptions in Christian's
database.  For example, we'll have an (optional, not in the standard
library) Japanese module with functions like=20
isHalfWidthKatakana(), isFullWidthKatakana() to help normalize things.
Parsing the database and looking for strings in the descriptions is
one way to build this - not the only one, but it might be useful.

So I'd vote to put names in at first, and give us a few weeks to see
how useful they are before a final decision.

- Andy Robinson


From paul@prescod.net  Thu Mar 23 22:09:42 2000
From: paul@prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 14:09:42 -0800
Subject: [Python-Dev] Unicode character names
References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us>
Message-ID: <38DA9626.8B62DB77@prescod.net>

"Andrew M. Kuchling" wrote:
> 
> Paul Prescod writes:
> >The new \N escape interpolates named characters within strings. For
> >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> >unicode smiley face at the end.
> 
> Cute idea, and it certainly means you can avoid looking up Unicode
> numbers.  (You can look up names instead. :) )  

More important, though, the code is "self documenting". You never have
to go from the number back to the name.

> Note that this means the
> Unicode database is no longer optional if this is done; it has to be
> around at code-parsing time.  

I don't like the idea enough to exclude support for small machines or
anything like that. We should way the costs of requiring the Unicode
database at compile time.

> (Is Perl5.6 actually
> dragging around the Unicode database in the binary, or is it read out
> of some external file or data structure?)

I have no idea.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"I and my companions suffer from a disease of the heart that can only
be cured with gold", Hernan Cortes


From pf@artcom-gmbh.de  Thu Mar 23 22:12:25 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Thu, 23 Mar 2000 23:12:25 +0100 (MET)
Subject: [Python-Dev] Py3K: True and False builtin or keyword?
Message-ID: <m12YFq5-000CnDC@artcom0.artcom-gmbh.de>

Regarding the discussion about None becoming a keyword
in Py3K:  Recently the truth values True and False have been
mentioned.  Should they become builtin values --like None is now--
or should they become keywords?

Nevertheless: for the time being I came up with the following
weird idea:  If you put this in front of the main module of a Python app:

#!/usr/bin/env python
if __name__ == "__main__":
    import sys
    if sys.version[0] <= '1':
        __builtins__.True  = 1
        __builtins__.False = 0
    del sys
# --- continue with your app from here: ---
import foo, bar, ...
....

Now you can start to use False and True in any immported module 
as if they were already builtins.  Of course this is no surprise here
and Python is really fun, Peter.


From mal@lemburg.com  Thu Mar 23 21:07:35 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 23 Mar 2000 22:07:35 +0100
Subject: [Python-Dev] Unicode character names
References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us>
Message-ID: <38DA8797.F16301E4@lemburg.com>

"Andrew M. Kuchling" wrote:
> 
> Paul Prescod writes:
> >The new \N escape interpolates named characters within strings. For
> >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> >unicode smiley face at the end.
> 
> Cute idea, and it certainly means you can avoid looking up Unicode
> numbers.  (You can look up names instead. :) )  Note that this means the
> Unicode database is no longer optional if this is done; it has to be
> around at code-parsing time.  Python could import it automatically, as
> exceptions.py is imported.  Christian's work on compressing
> unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
> dragging around the Unicode database in the binary, or is it read out
> of some external file or data structure?)

Sorry to disappoint you guys, but the Unicode name and comments
are *not* included in the unicodedatabase.c file Christian
is currently working on. The reason is simple: it would add
huge amounts of string data to the file. So this is a no-no
for the core distribution...

Still, the above is easily possible by inventing a new
encoding, say unicode-with-smileys, which then reads in
a file containing the Unicode names and applies the necessary
magic to decode/encode data as Paul described above.

Would probably make a cool fun-project for someone who wants
to dive into writing codecs.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From bwarsaw@cnri.reston.va.us  Thu Mar 23 23:02:06 2000
From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us)
Date: Thu, 23 Mar 2000 18:02:06 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
References: <14554.27113.546575.170565@anthem.cnri.reston.va.us>
 <m12YFRJ-000CnDC@artcom0.artcom-gmbh.de>
Message-ID: <14554.41582.688247.569547@anthem.cnri.reston.va.us>

Hi Peter!

>>>>> "PF" == Peter Funk <pf@artcom-gmbh.de> writes:

    PF> Since something like 'del for' or 'del import' raises a
    PF> SyntaxError exception in Py152, I expect 'del None' to raise
    PF> the same exception in Py3K, after None has become a keyword.
    PF> Right?

I misread your example the first time through, but it still doesn't
quite parse on my second read.

-------------------- snip snip --------------------
pyvers = '2k'
try:
    del import
except SyntaxError:
    pyvers = '3k'
-------------------- snip snip --------------------
% python /tmp/foo.py
  File "/tmp/foo.py", line 3
    del import
             ^
SyntaxError: invalid syntax
-------------------- snip snip --------------------

See, you can't catch that SyntaxError because it doesn't happen at
run-time.  Maybe you meant to wrap the try suite in an exec?  Here's a
code sample that ought to work with 1.5.2 and the mythical
Py3K-with-a-None-keyword.

-------------------- snip snip --------------------
pyvers = '2k'
try:
    exec "del None"
except SyntaxError:
    pyvers = '3k'
except NameError:
    pass

print pyvers
-------------------- snip snip --------------------

Cheers,
-Barry


From klm@digicool.com  Thu Mar 23 23:05:08 2000
From: klm@digicool.com (Ken Manheimer)
Date: Thu, 23 Mar 2000 18:05:08 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <m12YFRJ-000CnDC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.LNX.4.21.0003231759571.3101-100000@korak.digicool.com>

On Thu, 23 Mar 2000 pf@artcom-gmbh.de wrote:

> Hi Barry!
> 
> > >>>>> "PF" == Peter Funk <pf@artcom-gmbh.de> writes:
> > 
> >     |     try:
> >     |         del None
> >     |     except SyntaxError:
> >     |         pass # Wow running Py3K here!
>  
> Barry A. Warsaw:
> > I know how to break your Py3K code: stick None=None some where higher
> > up :)

Huh.  Does anyone really think we're going to catch SyntaxError at
runtime, ever?  Seems like the code fragment above wouldn't work in the
first place.

But i suppose, with most of a millennium to emerge, py3k could have more
fundamental changes than i could even imagine...-)

Ken
klm@digicool.com


From pf@artcom-gmbh.de  Thu Mar 23 22:53:34 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Thu, 23 Mar 2000 23:53:34 +0100 (MET)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.27449.69043.924322@amarok.cnri.reston.va.us> from "Andrew M. Kuchling" at "Mar 23, 2000  2: 6:33 pm"
Message-ID: <m12YGTu-000CnDC@artcom0.artcom-gmbh.de>

Hi!

> Barry A. Warsaw writes:
> >>>>>> "PF" == Peter Funk <pf@artcom-gmbh.de> writes:
> >    PF> I wonder, how much existinng code the None --> keyword change
> >    PF> would brake.
> >Me too.
 
Andrew M. Kuchling:
> I can't conceive of anyone using None as a function name or a variable
> name, except through a bug or thinking that 'None, useful, None =
> 1,2,3' works.  Even though None isn't a fixed constant, it might as
> well be.  How much C code have you see lately that starts with int
> function(void *NULL) ?

I agree.  

urban legend:  Once upon a time someone found the following neat snippet 
of C source hidden in some header file of a very very huge software, 
after he has spend some nights trying to figure out, why some simple edits 
he made in order to make the code more readable broke the system:
	#ifdef TRUE
	/* eat this: you arrogant Quiche Eaters */
	#undef TRUE
	#undef FALSE
	#define TRUE (0)
	#define FALSE (1)
	#endif
Obviously the poor guy would have found this particular small piece of evil 
code much earlier, if he had simply 'grep'ed for comments... there were not 
so many in this system. ;-)

> Being able to do "None = 2" also smacks a bit of those legendary
> Fortran compilers that let you accidentally change 2 into 4.  +1 on
> this change for Py3K, and I doubt it would cause breakage even if
> introduced into 1.x.

We'll see: those "Real Programmers" never die.  Fortunately they
prefer Perl over Python. <0.5 grin>

Regards, Peter


From klm@digicool.com  Thu Mar 23 23:15:42 2000
From: klm@digicool.com (Ken Manheimer)
Date: Thu, 23 Mar 2000 18:15:42 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.41582.688247.569547@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003231809240.3101-100000@korak.digicool.com>

On Thu, 23 Mar 2000 bwarsaw@cnri.reston.va.us wrote:

> See, you can't catch that SyntaxError because it doesn't happen at
> run-time.  Maybe you meant to wrap the try suite in an exec?  Here's a

Huh.  Guess i should have read barry's re-response before i posted mine:

Desperately desiring to redeem myself, and contribute something to the
discussion, i'll settle the class/static method naming quandry with the
obvious alternative:

>  > p.classMethod("hey, cool!")        # also selfless

These should be called buddha methods - no self, samadhi, one with
everything, etc.

There, now i feel better.

:-)

Ken
klm@digicool.com

 A Zen monk walks up to a hotdog vendor and says "make me one with
 everything."

 Ha.  But that's not all.

 He gets the hot dog and pays with a ten.  After several moments
 waiting, he says to the vendor, "i was expecting change", and the
 vendor say, "you of all people should know, change comes from inside."

 That's all.


From bwarsaw@cnri.reston.va.us  Thu Mar 23 23:19:28 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 23 Mar 2000 18:19:28 -0500 (EST)
Subject: [Python-Dev] Py3K: True and False builtin or keyword?
References: <m12YFq5-000CnDC@artcom0.artcom-gmbh.de>
Message-ID: <14554.42624.213027.854942@anthem.cnri.reston.va.us>

>>>>> "PF" == Peter Funk <pf@artcom-gmbh.de> writes:

    PF> Now you can start to use False and True in any immported
    PF> module as if they were already builtins.  Of course this is no
    PF> surprise here and Python is really fun, Peter.

You /can/ do this, but that doesn't mean you /should/ :) Mucking with
builtins is fun the way huffing dry erase markers is fun.  Things are
very pretty at first, but eventually the brain cell lossage will more
than outweigh that cheap thrill.

I've seen a few legitimate uses for hacking builtins.  In Zope, I
believe Jim hacks get_transaction() or somesuch into builtins because
that way it's easy to get at without passing it through the call tree.
And in Zope it makes sense since this is a fancy database application
and your current transaction is a central concept.

I've occasionally wrapped an existing builtin because I needed to
extend it's functionality while keeping it's semantics and API
unchanged.  An example of this was my pre-Python-1.5.2 open_ex() in
Mailman's CGI driver script.  Before builtin open() would print the
failing file name, my open_ex() -- shown below -- would hack that into
the exception object.

But one of the things about Python that I /really/ like is that YOU
KNOW WHERE THINGS COME FROM.  If I suddenly start seeing True and
False in your code, I'm going to look for function locals and args,
then module globals, then from ... import *'s.  If I don't see it in
any of those, I'm going to put down my dry erase markers, look again,
and then utter a loud "huh?" :)

-Barry

realopen = open
def open_ex(filename, mode='r', bufsize=-1, realopen=realopen):
    from Mailman.Utils import reraise
    try:
        return realopen(filename, mode, bufsize)
    except IOError, e:
        strerror = e.strerror + ': ' + filename
        e.strerror = strerror
        e.filename = filename
        e.args = (e.args[0], strerror)
        reraise(e)

import __builtin__
__builtin__.__dict__['open'] = open_ex


From pf@artcom-gmbh.de  Thu Mar 23 23:23:57 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Fri, 24 Mar 2000 00:23:57 +0100 (MET)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.21.0003231759571.3101-100000@korak.digicool.com> from Ken Manheimer at "Mar 23, 2000  6: 5: 8 pm"
Message-ID: <m12YGxJ-000CnDC@artcom0.artcom-gmbh.de>

Hi!

> > >     |     try:
> > >     |         del None
> > >     |     except SyntaxError:
> > >     |         pass # Wow running Py3K here!
> >  
> > Barry A. Warsaw:
> > > I know how to break your Py3K code: stick None=None some where higher
> > > up :)
> 
Ken Manheimer:
> Huh.  Does anyone really think we're going to catch SyntaxError at
> runtime, ever?  Seems like the code fragment above wouldn't work in the
> first place.

Ouuppps... 

Unfortunately I had no chance to test this with Py3K before making a
fool of myself by posting this silly example.  Now I understand what
Barry meant.  So if None really becomes a keyword in Py3K we can be
sure to catch all those imaginary 'del None' statements very quickly.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From billtut@microsoft.com  Fri Mar 24 02:46:06 2000
From: billtut@microsoft.com (Bill Tutt)
Date: Thu, 23 Mar 2000 18:46:06 -0800
Subject: [Python-Dev] Re: Unicode character names
Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>

MAL wrote:

>Andrew M. Kuchling" wrote:
>> 
>> Paul Prescod writes:
>>>The new \N escape interpolates named characters within strings. For
>>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
>>>unicode smiley face at the end.
>> 
>> Cute idea, and it certainly means you can avoid looking up Unicode
>> numbers.  (You can look up names instead. :) )  Note that this means the
>> Unicode database is no longer optional if this is done; it has to be
>> around at code-parsing time.  Python could import it automatically, as
>> exceptions.py is imported.  Christian's work on compressing
>> unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
>> dragging around the Unicode database in the binary, or is it read out
>> of some external file or data structure?)
>
> Sorry to disappoint you guys, but the Unicode name and comments
> are *not* included in the unicodedatabase.c file Christian
> is currently working on. The reason is simple: it would add
> huge amounts of string data to the file. So this is a no-no
> for the core distribution...
>

Ok, now you're just being silly. Its possible to put the character names in
a separate structure so that they don't automatically get paged in with the
normal unicode character property data. If you never use it, it won't get
paged in, its that simple....

Looking up the Unicode code value from the Unicode character name smells
like a good time to use gperf to generate a perfect hash function for the
character names. Esp. for the Unicode 3.0 character namespace. Then you can
just store the hashkey -> Unicode character mapping, and hardly ever need to
page in the actual full character name string itself.

I haven't looked at what the comment field contains, so I have no idea how
useful that info is.

*waits while gperf crunches through the ~10,550 Unicode characters where
this would be useful*

Bill


From akuchlin@mems-exchange.org  Fri Mar 24 02:51:25 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Thu, 23 Mar 2000 21:51:25 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
Message-ID: <200003240251.VAA19921@newcnri.cnri.reston.va.us>

I've written up a list of things that need to get done before 1.6 is
finished.  This is my vision of what needs to be done, and doesn't
have an official stamp of approval from GvR or anyone else.  So it's
very probably wrong.

http://starship.python.net/crew/amk/python/1.6-jobs.html

Here's the list formatted as text.  The major outstanding things at
the moment seem to be sre and Distutils; once they go in, you could
probably release an alpha, because the other items are relatively
minor.

Still to do

     * XXX Revamped import hooks (or is this a post-1.6 thing?)
     * Update the documentation to match 1.6 changes.
     * Document more undocumented modules
     * Unicode: Add Unicode support for open() on Windows
     * Unicode: Compress the size of unicodedatabase
     * Unicode: Write \N{SMILEY} codec for Unicode
     * Unicode: the various XXX items in Misc/unicode.txt
     * Add module: Distutils
     * Add module: Jim Ahlstrom's zipfile.py
     * Add module: PyExpat interface
     * Add module: mmapfile
     * Add module: sre
     * Drop cursesmodule and package it separately. (Any other obsolete
       modules that should go?)
     * Delete obsolete subdirectories in Demo/ directory
     * Refurbish Demo subdirectories to be properly documented, match
       modern coding style, etc.
     * Support Unicode strings in PyExpat interface
     * Fix ./ld_so_aix installation problem on AIX
     * Make test.regrtest.py more usable outside of the Python test suite
     * Conservative garbage collection of cycles (maybe?)
     * Write friendly "What's New in 1.6" document/article

Done

   Nothing at the moment.

After 1.7

     * Rich comparisons
     * Revised coercions
     * Parallel for loop (for i in L; j in M: ...),
     * Extended slicing for all sequences.
     * GvR: "I've also been thinking about making classes be types (not
       as huge a change as you think, if you don't allow subclassing
       built-in types), and adding a built-in array type suitable for use
       by NumPy."

--amk


From esr@thyrsus.com  Fri Mar 24 03:30:53 2000
From: esr@thyrsus.com (Eric S. Raymond)
Date: Thu, 23 Mar 2000 22:30:53 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003240251.VAA19921@newcnri.cnri.reston.va.us>; from Andrew Kuchling on Thu, Mar 23, 2000 at 09:51:25PM -0500
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
Message-ID: <20000323223053.J28880@thyrsus.com>

Andrew Kuchling <akuchlin@mems-exchange.org>:
>      * Drop cursesmodule and package it separately. (Any other obsolete
>        modules that should go?)

Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel
configuration system I'm writing.  Why is it on the hit list?
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

Still, if you will not fight for the right when you can easily
win without bloodshed, if you will not fight when your victory
will be sure and not so costly, you may come to the moment when
you will have to fight with all the odds against you and only a
precarious chance for survival. There may be a worse case.  You
may have to fight when there is no chance of victory, because it
is better to perish than to live as slaves.
	--Winston Churchill


From dan@cgsoftware.com  Fri Mar 24 03:52:54 2000
From: dan@cgsoftware.com (Daniel Berlin+list.python-dev)
Date: 23 Mar 2000 22:52:54 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: "Eric S. Raymond"'s message of "Thu, 23 Mar 2000 22:30:53 -0500"
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com>
Message-ID: <4s9x6n3d.fsf@dan.resnet.rochester.edu>

"Eric S. Raymond" <esr@thyrsus.com> writes:


> Andrew Kuchling <akuchlin@mems-exchange.org>:
> >      * Drop cursesmodule and package it separately. (Any other obsolete
> >        modules that should go?)
> 
> Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel
> configuration system I'm writing.  Why is it on the hit list?

IIRC, it's because nobody really maintains it, and those that care
about it, use a different one (either ncurses module, or a newer cursesmodule).
So from what i understand, you get complaints, but no real advantage
to having it there.
I'm just trying to summarize, not fall on either side (some people get
touchy about issues like this).

--Dan


From esr@thyrsus.com  Fri Mar 24 04:11:37 2000
From: esr@thyrsus.com (Eric S. Raymond)
Date: Thu, 23 Mar 2000 23:11:37 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <4s9x6n3d.fsf@dan.resnet.rochester.edu>; from Daniel Berlin+list.python-dev on Thu, Mar 23, 2000 at 10:52:54PM -0500
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu>
Message-ID: <20000323231137.U28880@thyrsus.com>

Daniel Berlin+list.python-dev <dan@cgsoftware.com>:
> > Andrew Kuchling <akuchlin@mems-exchange.org>:
> > >      * Drop cursesmodule and package it separately. (Any other obsolete
> > >        modules that should go?)
> > 
> > Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel
> > configuration system I'm writing.  Why is it on the hit list?
> 
> IIRC, it's because nobody really maintains it, and those that care
> about it, use a different one (either ncurses module, or a newer cursesmodule).
> So from what i understand, you get complaints, but no real advantage
> to having it there.

OK.  Then what I guess I'd like is for a maintained equivalent of this
to join the core -- the ncurses module you referred to, for choice.

I'm not being random.  I'm trying to replace the mess that currently 
constitutes the kbuild system -- but I'll need to support an equivalent
of menuconfig.
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

"The state calls its own violence `law', but that of the individual `crime'"
	-- Max Stirner


From akuchlin@mems-exchange.org  Fri Mar 24 04:33:24 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Thu, 23 Mar 2000 23:33:24 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <20000323231137.U28880@thyrsus.com>
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
 <20000323223053.J28880@thyrsus.com>
 <4s9x6n3d.fsf@dan.resnet.rochester.edu>
 <20000323231137.U28880@thyrsus.com>
Message-ID: <14554.61460.311650.599253@newcnri.cnri.reston.va.us>

Eric S. Raymond writes:
>OK.  Then what I guess I'd like is for a maintained equivalent of this
>to join the core -- the ncurses module you referred to, for choice.

See the "Whither cursesmodule" thread in the python-dev archives:
http://www.python.org/pipermail/python-dev/2000-February/003796.html

One possibility was to blow off backward compatibility; are there any
systems that only have BSD curses, not SysV curses / ncurses?  Given
that Pavel Curtis announced he was dropping BSD curses maintainance
some years ago, I expect even the *BSDs use ncurses these days. 

However, Oliver Andrich doesn't seem interested in maintaining his
ncurses module, and someone just started a SWIG-generated interface
(http://pyncurses.sourceforge.net), so it's not obvious which one
you'd use.  (I *would* be willing to take over maintaining Andrich's
code; maintaining the BSD curses version just seems pointless these
days.)

--amk


From dan@cgsoftware.com  Fri Mar 24 04:43:51 2000
From: dan@cgsoftware.com (Daniel Berlin+list.python-dev)
Date: 23 Mar 2000 23:43:51 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Andrew Kuchling's message of "Thu, 23 Mar 2000 23:33:24 -0500 (EST)"
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> <14554.61460.311650.599253@newcnri.cnri.reston.va.us>
Message-ID: <em915660.fsf@dan.resnet.rochester.edu>

Andrew Kuchling <akuchlin@mems-exchange.org> writes:


> Eric S. Raymond writes:
> >OK.  Then what I guess I'd like is for a maintained equivalent of this
> >to join the core -- the ncurses module you referred to, for choice.
> 
> See the "Whither cursesmodule" thread in the python-dev archives:
> http://www.python.org/pipermail/python-dev/2000-February/003796.html
> 
> One possibility was to blow off backward compatibility; are there any
> systems that only have BSD curses, not SysV curses / ncurses?  Given
> that Pavel Curtis announced he was dropping BSD curses maintainance
> some years ago, I expect even the *BSDs use ncurses these days. 

Yes, they do.
ls /usr/src/lib/libncurses/
Makefile  ncurses_cfg.h  pathnames.h termcap.c
grep 5\.0 /usr/src/contrib/ncurses/*
<Shows the source tree contains ncurses 5.0>

At least, this is FreeBSD.
So there is no need for BSD curses anymore, on FreeBSD's account.


> --amk
> 


From esr@thyrsus.com  Fri Mar 24 04:47:56 2000
From: esr@thyrsus.com (Eric S. Raymond)
Date: Thu, 23 Mar 2000 23:47:56 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14554.61460.311650.599253@newcnri.cnri.reston.va.us>; from Andrew Kuchling on Thu, Mar 23, 2000 at 11:33:24PM -0500
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> <14554.61460.311650.599253@newcnri.cnri.reston.va.us>
Message-ID: <20000323234756.A29775@thyrsus.com>

Andrew Kuchling <akuchlin@mems-exchange.org>:
> Eric S. Raymond writes:
> >OK.  Then what I guess I'd like is for a maintained equivalent of this
> >to join the core -- the ncurses module you referred to, for choice.
> 
> See the "Whither cursesmodule" thread in the python-dev archives:
> http://www.python.org/pipermail/python-dev/2000-February/003796.html
> 
> One possibility was to blow off backward compatibility; are there any
> systems that only have BSD curses, not SysV curses / ncurses?  Given
> that Pavel Curtis announced he was dropping BSD curses maintainance
> some years ago, I expect even the *BSDs use ncurses these days. 

BSD curses was officially declared dead by its maintainer, Keith
Bostic, in early 1995.  Keith and I conspired to kill it of in favor
of ncurses :-).
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

If gun laws in fact worked, the sponsors of this type of legislation
should have no difficulty drawing upon long lists of examples of
criminal acts reduced by such legislation. That they cannot do so
after a century and a half of trying -- that they must sweep under the
rug the southern attempts at gun control in the 1870-1910 period, the
northeastern attempts in the 1920-1939 period, the attempts at both
Federal and State levels in 1965-1976 -- establishes the repeated,
complete and inevitable failure of gun laws to control serious crime.
        -- Senator Orrin Hatch, in a 1982 Senate Report


From andy@reportlab.com  Fri Mar 24 10:14:44 2000
From: andy@reportlab.com (Andy Robinson)
Date: Fri, 24 Mar 2000 10:14:44 GMT
Subject: [Python-Dev] Unicode character names
In-Reply-To: <20000324024913.B8C3A1CF22@dinsdale.python.org>
References: <20000324024913.B8C3A1CF22@dinsdale.python.org>
Message-ID: <38db3fc6.7370137@post.demon.co.uk>

On Thu, 23 Mar 2000 21:49:13 -0500 (EST), you wrote:

>Sorry to disappoint you guys, but the Unicode name and comments
>are *not* included in the unicodedatabase.c file Christian
>is currently working on. The reason is simple: it would add
>huge amounts of string data to the file. So this is a no-no
>for the core distribution...


You're right about what is compiled into the core.  I have to keep
reminding myself to distinguish three places functionality can live:

1. What is compiled into the Python core
2. What is in the standard Python library relating to encodings. =20
3. Completely separate add-on packages, maintained outside of Python,
to provide extra functionality for (e.g.) Asian encodings.

It is clear that both the Unicode database, and the mapping tables and
other files at unicode.org, are a great resource; but they could be
placed in (2) or (3) easily, along with scripts to unpack them.  It
probably makes sense for the i18n-sig to kick off a separate
'CodecKit' project for now, and we can see what good emerges from it
before thinking about what should go into the library.

>Still, the above is easily possible by inventing a new
>encoding, say unicode-with-smileys, which then reads in
>a file containing the Unicode names and applies the necessary
>magic to decode/encode data as Paul described above.
>Would probably make a cool fun-project for someone who wants
>to dive into writing codecs.
Yup.  Prime candidate for CodecKit.


- Andy


From mal@lemburg.com  Fri Mar 24 08:52:36 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 09:52:36 +0100
Subject: [Python-Dev] Re: Unicode character names
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
Message-ID: <38DB2CD4.CAD9F0E2@lemburg.com>

Bill Tutt wrote:
> 
> MAL wrote:
> 
> >Andrew M. Kuchling" wrote:
> >>
> >> Paul Prescod writes:
> >>>The new \N escape interpolates named characters within strings. For
> >>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> >>>unicode smiley face at the end.
> >>
> >> Cute idea, and it certainly means you can avoid looking up Unicode
> >> numbers.  (You can look up names instead. :) )  Note that this means the
> >> Unicode database is no longer optional if this is done; it has to be
> >> around at code-parsing time.  Python could import it automatically, as
> >> exceptions.py is imported.  Christian's work on compressing
> >> unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
> >> dragging around the Unicode database in the binary, or is it read out
> >> of some external file or data structure?)
> >
> > Sorry to disappoint you guys, but the Unicode name and comments
> > are *not* included in the unicodedatabase.c file Christian
> > is currently working on. The reason is simple: it would add
> > huge amounts of string data to the file. So this is a no-no
> > for the core distribution...
> >
> 
> Ok, now you're just being silly. Its possible to put the character names in
> a separate structure so that they don't automatically get paged in with the
> normal unicode character property data. If you never use it, it won't get
> paged in, its that simple....

Sure, but it would still cause the interpreter binary or DLL
to increase in size considerably... that caused some major
noise a few days ago due to the fact that the unicodedata module
adds some 600kB to the interpreter -- even though it would
only get swapped in when needed (the interpreter itself doesn't
use it).
 
> Looking up the Unicode code value from the Unicode character name smells
> like a good time to use gperf to generate a perfect hash function for the
> character names. Esp. for the Unicode 3.0 character namespace. Then you can
> just store the hashkey -> Unicode character mapping, and hardly ever need to
> page in the actual full character name string itself.

Great idea, but why not put this into separate codec module ?
 
> I haven't looked at what the comment field contains, so I have no idea how
> useful that info is.

Probably not worth looking at...
 
> *waits while gperf crunches through the ~10,550 Unicode characters where
> this would be useful*

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Fri Mar 24 10:37:53 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 11:37:53 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl> <38D8F55E.6E324281@lemburg.com>
Message-ID: <38DB4581.EB5315E0@lemburg.com>

Ok, I've just added two new parser markers to PyArg_ParseTuple()
which will hopefully make life a little easier for extension
writers.

The new code will be in the next patch set which I will release
early next week.

Here are the docs:

Internal Argument Parsing:
--------------------------

These markers are used by the PyArg_ParseTuple() APIs:

  "U":  Check for Unicode object and return a pointer to it

  "s":  For Unicode objects: auto convert them to the <default encoding>
        and return a pointer to the object's <defencstr> buffer.

  "s#": Access to the Unicode object via the bf_getreadbuf buffer interface 
        (see Buffer Interface); note that the length relates to the buffer
        length, not the Unicode string length (this may be different
        depending on the Internal Format).

  "t#": Access to the Unicode object via the bf_getcharbuf buffer interface
        (see Buffer Interface); note that the length relates to the buffer
        length, not necessarily to the Unicode string length (this may
        be different depending on the <default encoding>).

  "es": 
	Takes two parameters: encoding (const char **) and
	buffer (char **). 

	The input object is first coerced to Unicode in the usual way
	and then encoded into a string using the given encoding.

	On output, a buffer of the needed size is allocated and
	returned through *buffer as NULL-terminated string.
	The encoded may not contain embedded NULL characters.
	The caller is responsible for free()ing the allocated *buffer
	after usage.

  "es#":
	Takes three parameters: encoding (const char **),
	buffer (char **) and buffer_len (int *).
	
	The input object is first coerced to Unicode in the usual way
	and then encoded into a string using the given encoding.

	If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer)
	on input. Output is then copied to *buffer.

	If *buffer is NULL, a buffer of the needed size is
	allocated and output copied into it. *buffer is then
	updated to point to the allocated memory area. The caller
	is responsible for free()ing *buffer after usage.

	In both cases *buffer_len is updated to the number of
	characters written (excluding the trailing NULL-byte).
	The output buffer is assured to be NULL-terminated.

Examples:

Using "es#" with auto-allocation:

    static PyObject *
    test_parser(PyObject *self,
		PyObject *args)
    {
	PyObject *str;
	const char *encoding = "latin-1";
	char *buffer = NULL;
	int buffer_len = 0;

	if (!PyArg_ParseTuple(args, "es#:test_parser",
			      &encoding, &buffer, &buffer_len))
	    return NULL;
	if (!buffer) {
	    PyErr_SetString(PyExc_SystemError,
			    "buffer is NULL");
	    return NULL;
	}
	str = PyString_FromStringAndSize(buffer, buffer_len);
	free(buffer);
	return str;
    }

Using "es" with auto-allocation returning a NULL-terminated string:    
    
    static PyObject *
    test_parser(PyObject *self,
		PyObject *args)
    {
	PyObject *str;
	const char *encoding = "latin-1";
	char *buffer = NULL;

	if (!PyArg_ParseTuple(args, "es:test_parser",
			      &encoding, &buffer))
	    return NULL;
	if (!buffer) {
	    PyErr_SetString(PyExc_SystemError,
			    "buffer is NULL");
	    return NULL;
	}
	str = PyString_FromString(buffer);
	free(buffer);
	return str;
    }

Using "es#" with a pre-allocated buffer:
    
    static PyObject *
    test_parser(PyObject *self,
		PyObject *args)
    {
	PyObject *str;
	const char *encoding = "latin-1";
	char _buffer[10];
	char *buffer = _buffer;
	int buffer_len = sizeof(_buffer);

	if (!PyArg_ParseTuple(args, "es#:test_parser",
			      &encoding, &buffer, &buffer_len))
	    return NULL;
	if (!buffer) {
	    PyErr_SetString(PyExc_SystemError,
			    "buffer is NULL");
	    return NULL;
	}
	str = PyString_FromStringAndSize(buffer, buffer_len);
	return str;
    }

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein@lyra.org  Fri Mar 24 10:54:02 2000
From: gstein@lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 02:54:02 -0800 (PST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <38DB4581.EB5315E0@lemburg.com>
Message-ID: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, M.-A. Lemburg wrote:
>...
>   "s":  For Unicode objects: auto convert them to the <default encoding>
>         and return a pointer to the object's <defencstr> buffer.

Guess that I didn't notice this before, but it seems wierd that "s" and
"s#" return different encodings.

Why?

>   "es": 
> 	Takes two parameters: encoding (const char **) and
> 	buffer (char **). 
>...
>   "es#":
> 	Takes three parameters: encoding (const char **),
> 	buffer (char **) and buffer_len (int *).

I see no reason to make the encoding (const char **) rather than
(const char *). We are never returning a value, so this just makes it
harder to pass the encoding into ParseTuple.

There is precedent for passing in single-ref pointers. For example:

  PyArg_ParseTuple(args, "O!", &s, PyString_Type)


I would recommend using just one pointer level for the encoding.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal@lemburg.com  Fri Mar 24 11:29:12 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 12:29:12 +0100
Subject: [Python-Dev] Unicode and Windows
References: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
Message-ID: <38DB5188.AA580652@lemburg.com>

Greg Stein wrote:
> 
> On Fri, 24 Mar 2000, M.-A. Lemburg wrote:
> >...
> >   "s":  For Unicode objects: auto convert them to the <default encoding>
> >         and return a pointer to the object's <defencstr> buffer.
> 
> Guess that I didn't notice this before, but it seems wierd that "s" and
> "s#" return different encodings.
> 
> Why?

This is due to the buffer interface being used for "s#". Since
"s#" refers to the getreadbuf slot, it returns raw data. In
this case this is UTF-16 in platform dependent byte order.

"s" relies on NULL-terminated strings and doesn't use the
buffer interface at all. Thus "s" returns NULL-terminated
UTF-8 (UTF-16 is full of NULLs).
 
"t#" uses the getcharbuf slot and thus should return character
data. UTF-8 is the right encoding here.

> >   "es":
> >       Takes two parameters: encoding (const char **) and
> >       buffer (char **).
> >...
> >   "es#":
> >       Takes three parameters: encoding (const char **),
> >       buffer (char **) and buffer_len (int *).
> 
> I see no reason to make the encoding (const char **) rather than
> (const char *). We are never returning a value, so this just makes it
> harder to pass the encoding into ParseTuple.
> 
> There is precedent for passing in single-ref pointers. For example:
> 
>   PyArg_ParseTuple(args, "O!", &s, PyString_Type)
> 
> I would recommend using just one pointer level for the encoding.

You have a point there... even though it breaks the notion
of prepending all parameters with an '&' (ok, except the
type check one). OTOH, it would allow passing the encoding
right with the PyArg_ParseTuple() call which probably makes
more sense in this context.

I'll change it...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From tismer@tismer.com  Fri Mar 24 13:13:02 2000
From: tismer@tismer.com (Christian Tismer)
Date: Fri, 24 Mar 2000 14:13:02 +0100
Subject: [Python-Dev] Unicode character names
References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> <38DA8797.F16301E4@lemburg.com>
Message-ID: <38DB69DE.6D04B084@tismer.com>


"M.-A. Lemburg" wrote:
> 
> "Andrew M. Kuchling" wrote:
> >
> > Paul Prescod writes:
> > >The new \N escape interpolates named characters within strings. For
> > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> > >unicode smiley face at the end.
> >
> > Cute idea, and it certainly means you can avoid looking up Unicode
> > numbers.  (You can look up names instead. :) )  Note that this means the
> > Unicode database is no longer optional if this is done; it has to be
> > around at code-parsing time.  Python could import it automatically, as
> > exceptions.py is imported.  Christian's work on compressing
> > unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
> > dragging around the Unicode database in the binary, or is it read out
> > of some external file or data structure?)
> 
> Sorry to disappoint you guys, but the Unicode name and comments
> are *not* included in the unicodedatabase.c file Christian
> is currently working on. The reason is simple: it would add
> huge amounts of string data to the file. So this is a no-no
> for the core distribution...

This is not settled, still an open question.
What I have for non-textual data:
25 kb with dumb compression
15 kb with enhanced compression

What amounts of data am I talking about?
- The whole unicode database text file has size 
  632 kb.
- With PkZip this goes down to 
  96 kb.

Now, I produced another text file with just the currently
used data in it, and this sounds so:
- the stripped unicode text file has size
  216 kb.
- PkZip melts this down to
  40 kb.

Please compare that to my results above: I can do at least
twice as good. I hope I can compete for the text sections
as well (since this is something where zip is *good* at),
but just let me try.
Let's target 60 kb for the whole crap, and I'd be very pleased.

Then, there is still the question where to put the data.
Having one file in the dll and another externally would
be an option. I could also imagine to use a binary external
file all the time, with maximum possible compression.
By loading this structure, this would be partially expanded
to make it fast.
An advantage is that the compressed Unicode database
could become a stand-alone product. The size is in fact
so crazy small, that I'd like to make this available
to any other language.

> Still, the above is easily possible by inventing a new
> encoding, say unicode-with-smileys, which then reads in
> a file containing the Unicode names and applies the necessary
> magic to decode/encode data as Paul described above.

That sounds reasonable. Compression makes sense as well here,
since the expanded stuff makes quite an amount of kb, compared
to what it is "worth", compared to, say, the Python dll.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From mal@lemburg.com  Fri Mar 24 13:41:27 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 14:41:27 +0100
Subject: [Python-Dev] Unicode character names
References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> <38DA8797.F16301E4@lemburg.com> <38DB69DE.6D04B084@tismer.com>
Message-ID: <38DB7087.1B105AC7@lemburg.com>

Christian Tismer wrote:
> 
> "M.-A. Lemburg" wrote:
> >
> > "Andrew M. Kuchling" wrote:
> > >
> > > Paul Prescod writes:
> > > >The new \N escape interpolates named characters within strings. For
> > > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> > > >unicode smiley face at the end.
> > >
> > > Cute idea, and it certainly means you can avoid looking up Unicode
> > > numbers.  (You can look up names instead. :) )  Note that this means the
> > > Unicode database is no longer optional if this is done; it has to be
> > > around at code-parsing time.  Python could import it automatically, as
> > > exceptions.py is imported.  Christian's work on compressing
> > > unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
> > > dragging around the Unicode database in the binary, or is it read out
> > > of some external file or data structure?)
> >
> > Sorry to disappoint you guys, but the Unicode name and comments
> > are *not* included in the unicodedatabase.c file Christian
> > is currently working on. The reason is simple: it would add
> > huge amounts of string data to the file. So this is a no-no
> > for the core distribution...
> 
> This is not settled, still an open question.

Well, ok, depends on how much you can sqeeze out of the
text columns ;-) I still think that its better to leave
these gimmicks out of the core and put them into some
add-on, though.

> What I have for non-textual data:
> 25 kb with dumb compression
> 15 kb with enhanced compression

Looks good :-) With these sizes I think we could even integrate
the unicodedatabase.c + API into the core interpreter and
only have the unicodedata module to access the database
from within Python.
 
> What amounts of data am I talking about?
> - The whole unicode database text file has size
>   632 kb.
> - With PkZip this goes down to
>   96 kb.
> 
> Now, I produced another text file with just the currently
> used data in it, and this sounds so:
> - the stripped unicode text file has size
>   216 kb.
> - PkZip melts this down to
>   40 kb.
> 
> Please compare that to my results above: I can do at least
> twice as good. I hope I can compete for the text sections
> as well (since this is something where zip is *good* at),
> but just let me try.
> Let's target 60 kb for the whole crap, and I'd be very pleased.
>
> Then, there is still the question where to put the data.
> Having one file in the dll and another externally would
> be an option. I could also imagine to use a binary external
> file all the time, with maximum possible compression.
> By loading this structure, this would be partially expanded
> to make it fast.
> An advantage is that the compressed Unicode database
> could become a stand-alone product. The size is in fact
> so crazy small, that I'd like to make this available
> to any other language.

You could take the unicodedatabase.c file (+ header file)
and use it everywhere... I don't think it needs to contain
any Python specific code. The API names would have to follow
the Python naming schemes though.
 
> > Still, the above is easily possible by inventing a new
> > encoding, say unicode-with-smileys, which then reads in
> > a file containing the Unicode names and applies the necessary
> > magic to decode/encode data as Paul described above.
> 
> That sounds reasonable. Compression makes sense as well here,
> since the expanded stuff makes quite an amount of kb, compared
> to what it is "worth", compared to, say, the Python dll.

With 25kB for the non-text columns, I'd suggest simply
adding the file to the core. Text columns could then
go into a separate module.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido@python.org  Fri Mar 24 14:14:51 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 09:14:51 -0500
Subject: [Python-Dev] Hi -- I'm back!
Message-ID: <200003241414.JAA11740@eric.cnri.reston.va.us>

I'm back from ten days on the road.  I'll try to dig through the
various mailing list archives over the next few days, but it would be
more efficient if you are waiting for me to take action or express an
opinion on a particular issue (in *any* Python-related mailing list)
to mail me a summary or at least a pointer.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jack@oratrix.nl  Fri Mar 24 15:01:25 2000
From: jack@oratrix.nl (Jack Jansen)
Date: Fri, 24 Mar 2000 16:01:25 +0100
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: Message by Ka-Ping Yee <ping@lfw.org> ,
 Thu, 23 Mar 2000 09:47:47 -0800 (PST) , <Pine.LNX.4.10.10003230942180.1187-100000@localhost>
Message-ID: <20000324150125.7144A370CF2@snelboot.oratrix.nl>

> Hmm... i guess this also means one should ask what
> 
>     def function(None, arg):
>         ...
> 
> does outside a class definition.  I suppose that should simply
> be illegal.

No, it forces you to call the function with keyword arguments!
(initially meant jokingly, but thinking about it for a couple of seconds there 
might actually be cases where this is useful)
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From skip@mojam.com (Skip Montanaro)  Fri Mar 24 15:14:11 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Fri, 24 Mar 2000 09:14:11 -0600 (CST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
Message-ID: <14555.34371.749039.946891@beluga.mojam.com>

    AMK> I've written up a list of things that need to get done before 1.6
    AMK> is finished.  This is my vision of what needs to be done, and
    AMK> doesn't have an official stamp of approval from GvR or anyone else.
    AMK> So it's very probably wrong.

Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
of general usefulness (this is at least generally useful for anyone writing
web spiders ;-) shouldn't live in Tools, because it's not always available
and users need to do extra work to make them available.

I'd be happy to write up some documentation for it and twiddle the module to 
include doc strings.

-- 
Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/


From fdrake@acm.org  Fri Mar 24 15:20:03 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 10:20:03 -0500 (EST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
References: <38DB4581.EB5315E0@lemburg.com>
 <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
Message-ID: <14555.34723.841426.504538@weyr.cnri.reston.va.us>

Greg Stein writes:
 > There is precedent for passing in single-ref pointers. For example:
 > 
 >   PyArg_ParseTuple(args, "O!", &s, PyString_Type)
                                  ^^^^^^^^^^^^^^^^^

  Feeling ok?  I *suspect* these are reversed.  :)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake@acm.org  Fri Mar 24 15:24:13 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 10:24:13 -0500 (EST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <38DB5188.AA580652@lemburg.com>
References: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
 <38DB5188.AA580652@lemburg.com>
Message-ID: <14555.34973.303273.716146@weyr.cnri.reston.va.us>

M.-A. Lemburg writes:
 > You have a point there... even though it breaks the notion
 > of prepending all parameters with an '&' (ok, except the

  I've never heard of this notion; I hope I didn't just miss it in the 
docs!
  The O& also doesn't require a & in front of the name of the
conversion function, you just pass the right value.  So there are at
least two cases where you *typically* don't use a &.  (Other cases in
the 1.5.2 API are probably just plain weird if they don't!)
  Changing it to avoid the extra machinery is the Right Thing; you get 
to feel good today.  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From mal@lemburg.com  Fri Mar 24 16:38:06 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 17:38:06 +0100
Subject: [Python-Dev] Unicode and Windows
References: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
 <38DB5188.AA580652@lemburg.com> <14555.34973.303273.716146@weyr.cnri.reston.va.us>
Message-ID: <38DB99EE.F5949889@lemburg.com>

"Fred L. Drake, Jr." wrote:
> 
> M.-A. Lemburg writes:
>  > You have a point there... even though it breaks the notion
>  > of prepending all parameters with an '&' (ok, except the
> 
>   I've never heard of this notion; I hope I didn't just miss it in the
> docs!

If you scan the parameters list in getargs.c you'll come to
this conclusion and thus my notion: I've been programming like
this for years now :-)

>   The O& also doesn't require a & in front of the name of the
> conversion function, you just pass the right value.  So there are at
> least two cases where you *typically* don't use a &.  (Other cases in
> the 1.5.2 API are probably just plain weird if they don't!)
>   Changing it to avoid the extra machinery is the Right Thing; you get
> to feel good today.  ;)

Ok, feeling good now ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido@python.org  Fri Mar 24 20:44:02 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 15:44:02 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 09:14:11 CST."
 <14555.34371.749039.946891@beluga.mojam.com>
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
 <14555.34371.749039.946891@beluga.mojam.com>
Message-ID: <200003242044.PAA00677@eric.cnri.reston.va.us>

> Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
> of general usefulness (this is at least generally useful for anyone writing
> web spiders ;-) shouldn't live in Tools, because it's not always available
> and users need to do extra work to make them available.
> 
> I'd be happy to write up some documentation for it and twiddle the module to 
> include doc strings.

Deal.  Soon as we get the docs we'll move it to Lib.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gstein@lyra.org  Fri Mar 24 20:50:43 2000
From: gstein@lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 12:50:43 -0800 (PST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <14555.34723.841426.504538@weyr.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241248010.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Fred L. Drake, Jr. wrote:
> Greg Stein writes:
>  > There is precedent for passing in single-ref pointers. For example:
>  > 
>  >   PyArg_ParseTuple(args, "O!", &s, PyString_Type)
>                                   ^^^^^^^^^^^^^^^^^
> 
>   Feeling ok?  I *suspect* these are reversed.  :)

I just checked the code to ensure that it took a single pointer rather
than a double-pointer. I guess that I didn't verify the order :-)

Concept is valid, tho... the params do not necessarily require an
ampersand.

oop! Actually... this does require an ampersand:

    PyArg_ParseTuple(args, "O!", &PyString_Type, &s)

Don't want to pass the whole structure...

Well, regardless: I would much prefer to see the encoding passed as a
constant string, rather than having to shove the sucker into a variable
first, just so that I can insert a useless address-of operator.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From akuchlin@mems-exchange.org  Fri Mar 24 20:51:56 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 24 Mar 2000 15:51:56 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242044.PAA00677@eric.cnri.reston.va.us>
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
 <14555.34371.749039.946891@beluga.mojam.com>
 <200003242044.PAA00677@eric.cnri.reston.va.us>
Message-ID: <14555.54636.811100.254309@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>> Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
>Deal.  Soon as we get the docs we'll move it to Lib.

What about putting it in a package like 'www' or 'web'?  Packagizing
the existing library is hard because of backward compatibility, but
there's no such constraint for new modules.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
One need not be a chamber to be haunted; / One need not be a house; / The
brain has corridors surpassing / Material place.
    -- Emily Dickinson, "Time and Eternity"


From gstein@lyra.org  Fri Mar 24 21:00:25 2000
From: gstein@lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:00:25 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.54636.811100.254309@amarok.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Andrew M. Kuchling wrote:
> Guido van Rossum writes:
> >> Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
> >Deal.  Soon as we get the docs we'll move it to Lib.
> 
> What about putting it in a package like 'www' or 'web'?  Packagizing
> the existing library is hard because of backward compatibility, but
> there's no such constraint for new modules.

Or in the "network" package that was suggested a month ago?

And why *can't* we start on repackaging old module? I think the only
reason that somebody came up with to NOT do it was "well, if we don't
repackage the whole thing, then we should repackage nothing."  Which, IMO,
is totally bogus. We'll never get anywhere operating under that principle.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From fdrake@acm.org  Fri Mar 24 21:00:19 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 16:00:19 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>
References: <14555.54636.811100.254309@amarok.cnri.reston.va.us>
 <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>
Message-ID: <14555.55139.484135.602894@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Or in the "network" package that was suggested a month ago?

  +1

 > And why *can't* we start on repackaging old module? I think the only
 > reason that somebody came up with to NOT do it was "well, if we don't
 > repackage the whole thing, then we should repackage nothing."  Which, IMO,
 > is totally bogus. We'll never get anywhere operating under that principle.

  That doesn't bother me, but I tend to be a little conservative
(though usually not as conservative as Guido on such matters).  I
*would* like to decided theat 1.7 will be fully packagized, and not
wait until 2.0.  As long as 1.7 is a "testing the evolutionary path"
release, I think that's the right thing to do.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido@python.org  Fri Mar 24 21:03:54 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:03:54 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
Message-ID: <200003242103.QAA03288@eric.cnri.reston.va.us>

Someone noticed that socket.connect() and a few related functions
(connect_ex() and bind()) take either a single (host, port) tuple or
two separate arguments, but that only the tuple is documented.

Similar to append(), I'd like to close this gap, and I've made the
necessary changes.  This will probably break lots of code.

Similar to append(), I'd like people to fix their code rather than
whine -- two-arg connect() has never been documented, although it's
found in much code (even the socket module test code :-( ).

Similar to append(), I may revert the change if it is shown to cause
too much pain during beta testing...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Mar 24 21:05:57 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:05:57 -0500
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: Your message of "Fri, 24 Mar 2000 12:50:43 PST."
 <Pine.LNX.4.10.10003241248010.27878-100000@nebula.lyra.org>
References: <Pine.LNX.4.10.10003241248010.27878-100000@nebula.lyra.org>
Message-ID: <200003242105.QAA03543@eric.cnri.reston.va.us>

> Well, regardless: I would much prefer to see the encoding passed as a
> constant string, rather than having to shove the sucker into a variable
> first, just so that I can insert a useless address-of operator.

Of course.  Use & for output args, not as a matter of principle.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Mar 24 21:11:25 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:11:25 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 13:00:25 PST."
 <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>
References: <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>
Message-ID: <200003242111.QAA04208@eric.cnri.reston.va.us>

[Greg]
> And why *can't* we start on repackaging old module? I think the only
> reason that somebody came up with to NOT do it was "well, if we don't
> repackage the whole thing, then we should repackage nothing."  Which, IMO,
> is totally bogus. We'll never get anywhere operating under that principle.

The reason is backwards compatibility.  Assume we create a package
"web" and move all web related modules into it: httplib, urllib,
htmllib, etc.  Now for backwards compatibility, we add the web
directory to sys.path, so one can write either "import web.urllib" or
"import urllib".  But that loads the same code twice!  And in this
(carefully chosen :-) example, urllib actually has some state which
shouldn't be replicated.

Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
door, and there's a lot of other stuff I need to do besides moving
modules around.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Mar 24 21:15:00 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:15:00 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 16:00:19 EST."
 <14555.55139.484135.602894@weyr.cnri.reston.va.us>
References: <14555.54636.811100.254309@amarok.cnri.reston.va.us> <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>
 <14555.55139.484135.602894@weyr.cnri.reston.va.us>
Message-ID: <200003242115.QAA04648@eric.cnri.reston.va.us>

> Greg Stein writes:
>  > Or in the "network" package that was suggested a month ago?

[Fred]
>   +1

Which reminds me of another reason to wait: coming up with the right
package hierarchy is hard.  (E.g. I find network too long; plus, does
htmllib belong there?)

>   That doesn't bother me, but I tend to be a little conservative
> (though usually not as conservative as Guido on such matters).  I
> *would* like to decided theat 1.7 will be fully packagized, and not
> wait until 2.0.  As long as 1.7 is a "testing the evolutionary path"
> release, I think that's the right thing to do.

Agreed.

At the SD conference I gave a talk about the future of Python, and
there was (again) a good suggestion about forwards compatibility.
Starting with 1.7 (if not sooner), several Python 3000 features that
necessarily have to be incompatible (like 1/2 yielding 0.5 instead of
0) could issue warnings when (or unless?) Python is invoked with a
compatibility flag.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From bwarsaw@cnri.reston.va.us  Fri Mar 24 21:21:54 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 24 Mar 2000 16:21:54 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
Message-ID: <14555.56434.974884.832078@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido@python.org> writes:

    GvR> Someone noticed that socket.connect() and a few related
    GvR> functions (connect_ex() and bind()) take either a single
    GvR> (host, port) tuple or two separate arguments, but that only
    GvR> the tuple is documented.

    GvR> Similar to append(), I'd like to close this gap, and I've
    GvR> made the necessary changes.  This will probably break lots of
    GvR> code.

I don't agree that socket.connect() and friends need this fix.  Yes,
obviously append() needed fixing because of the application of Tim's
Twelfth Enlightenment to the semantic ambiguity.  But socket.connect()
has no such ambiguity; you may spell it differently, but you know
exactly what you mean.

My suggestion would be to not break any code, but extend connect's
interface to allow an optional second argument.  Thus all of these
calls would be legal:

sock.connect(addr)
sock.connect(addr, port)
sock.connect((addr, port))

One nit on the documentation of the socket module.  The second entry
says:

    bind (address) 
	 Bind the socket to address. The socket must not already be
	 bound. (The format of address depends on the address family --
	 see above.)

Huh?  What "above" part should I see?  Note that I'm reading this doc
off the web!

-Barry


From gstein@lyra.org  Fri Mar 24 21:27:57 2000
From: gstein@lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:27:57 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242111.QAA04208@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Guido van Rossum wrote:
> [Greg]
> > And why *can't* we start on repackaging old module? I think the only
> > reason that somebody came up with to NOT do it was "well, if we don't
> > repackage the whole thing, then we should repackage nothing."  Which, IMO,
> > is totally bogus. We'll never get anywhere operating under that principle.
> 
> The reason is backwards compatibility.  Assume we create a package
> "web" and move all web related modules into it: httplib, urllib,
> htmllib, etc.  Now for backwards compatibility, we add the web
> directory to sys.path, so one can write either "import web.urllib" or
> "import urllib".  But that loads the same code twice!  And in this
> (carefully chosen :-) example, urllib actually has some state which
> shouldn't be replicated.

We don't add it to the path. Instead, we create new modules that look
like:

---- httplib.py ----
from web.httplib import *
----

The only backwards-compat issue with this approach is that people who poke
values into the module will have problems. I don't believe that any of the
modules were designed for that, anyhow, so it would seem an acceptable to
(effectively) disallow that behavior.

> Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
> door, and there's a lot of other stuff I need to do besides moving
> modules around.

Stuff that *you* need to do, sure. But there *are* a lot of us who can
help here, and some who desire to spend their time moving modules.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Fri Mar 24 21:32:14 2000
From: gstein@lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:32:14 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242115.QAA04648@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241330080.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Guido van Rossum wrote:
> > Greg Stein writes:
> >  > Or in the "network" package that was suggested a month ago?
> 
> [Fred]
> >   +1
> 
> Which reminds me of another reason to wait: coming up with the right
> package hierarchy is hard.  (E.g. I find network too long; plus, does
> htmllib belong there?)

htmllib does not go there. Where does it go? Dunno. Leave it unless/until
somebody comes up with a place for it.

We package up obvious ones. We don't have to design a complete hierarchy.
There seemed to be a general "good feeling" around some kind of network
(protocol) package. Call it "net" if "network" is too long.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From guido@python.org  Fri Mar 24 21:27:51 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:27:51 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: Your message of "Fri, 24 Mar 2000 16:21:54 EST."
 <14555.56434.974884.832078@anthem.cnri.reston.va.us>
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
 <14555.56434.974884.832078@anthem.cnri.reston.va.us>
Message-ID: <200003242127.QAA06269@eric.cnri.reston.va.us>

> >>>>> "GvR" == Guido van Rossum <guido@python.org> writes:
> 
>     GvR> Someone noticed that socket.connect() and a few related
>     GvR> functions (connect_ex() and bind()) take either a single
>     GvR> (host, port) tuple or two separate arguments, but that only
>     GvR> the tuple is documented.
> 
>     GvR> Similar to append(), I'd like to close this gap, and I've
>     GvR> made the necessary changes.  This will probably break lots of
>     GvR> code.
> 
> I don't agree that socket.connect() and friends need this fix.  Yes,
> obviously append() needed fixing because of the application of Tim's
> Twelfth Enlightenment to the semantic ambiguity.  But socket.connect()
> has no such ambiguity; you may spell it differently, but you know
> exactly what you mean.
> 
> My suggestion would be to not break any code, but extend connect's
> interface to allow an optional second argument.  Thus all of these
> calls would be legal:
> 
> sock.connect(addr)
> sock.connect(addr, port)
> sock.connect((addr, port))

You probably meant:

  sock.connect(addr)
  sock.connect(host, port)
  sock.connect((host, port))

since (host, port) is equivalent to (addr).

> One nit on the documentation of the socket module.  The second entry
> says:
> 
>     bind (address) 
> 	 Bind the socket to address. The socket must not already be
> 	 bound. (The format of address depends on the address family --
> 	 see above.)
> 
> Huh?  What "above" part should I see?  Note that I'm reading this doc
> off the web!

Fred typically directs latex2html to break all sections apart.  It's
in the previous section:

  Socket addresses are represented as a single string for the AF_UNIX
  address family and as a pair (host, port) for the AF_INET address
  family, where host is a string representing either a hostname in
  Internet domain notation like 'daring.cwi.nl' or an IP address like
  '100.50.200.5', and port is an integral port number. Other address
  families are currently not supported.  The address format required by
  a particular socket object is automatically selected based on the
  address family specified when the socket object was created.

This also explains the reason for requiring a single argument: when
using AF_UNIX, the second argument makes no sense!

Frankly, I'm not sure what do here -- it's more correct to require a
single address argument always, but it's more convenient to allow two
sometimes.

Note that sendto(data, addr) only accepts the tuple form: you cannot
write sendto(data, host, port).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake@acm.org  Fri Mar 24 21:28:32 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 16:28:32 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
References: <200003242111.QAA04208@eric.cnri.reston.va.us>
 <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
Message-ID: <14555.56832.336242.378838@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Stuff that *you* need to do, sure. But there *are* a lot of us who can
 > help here, and some who desire to spend their time moving modules.

  Would it make sense for one of these people with time on their hands 
to propose a specific mapping from old->new names?  I think that would 
be a good first step, regardless of the implementation timing.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido@python.org  Fri Mar 24 21:29:44 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:29:44 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 13:27:57 PST."
 <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
References: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
Message-ID: <200003242129.QAA06510@eric.cnri.reston.va.us>

> We don't add it to the path. Instead, we create new modules that look
> like:
> 
> ---- httplib.py ----
> from web.httplib import *
> ----
> 
> The only backwards-compat issue with this approach is that people who poke
> values into the module will have problems. I don't believe that any of the
> modules were designed for that, anyhow, so it would seem an acceptable to
> (effectively) disallow that behavior.

OK, that's reasonable.  I'll have to invent a different reason why I
don't want this -- because I really don't!

> > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
> > door, and there's a lot of other stuff I need to do besides moving
> > modules around.
> 
> Stuff that *you* need to do, sure. But there *are* a lot of us who can
> help here, and some who desire to spend their time moving modules.

Hm.  Moving modules requires painful and arcane CVS manipulations that
can only be done by the few of us here at CNRI -- and I'm the only one
left who's full time on Python.  I'm still not convinced that it's a
good plan.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake@acm.org  Fri Mar 24 21:32:39 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 16:32:39 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: <14555.56434.974884.832078@anthem.cnri.reston.va.us>
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
 <14555.56434.974884.832078@anthem.cnri.reston.va.us>
Message-ID: <14555.57079.187670.916002@weyr.cnri.reston.va.us>

Barry A. Warsaw writes:
 > I don't agree that socket.connect() and friends need this fix.  Yes,
 > obviously append() needed fixing because of the application of Tim's
 > Twelfth Enlightenment to the semantic ambiguity.  But socket.connect()
 > has no such ambiguity; you may spell it differently, but you know
 > exactly what you mean.

  Crock.  The address representations have been fairly well defined
for quite a while.  Be explicit.

 > sock.connect(addr)

  This is the only legal signature.  (host, port) is simply the form
of addr for a particular address family.

 > One nit on the documentation of the socket module.  The second entry
 > says:
 > 
 >     bind (address) 
 > 	 Bind the socket to address. The socket must not already be
 > 	 bound. (The format of address depends on the address family --
 > 	 see above.)
 > 
 > Huh?  What "above" part should I see?  Note that I'm reading this doc
 > off the web!

  Definately written for the paper document!  Remind me about this
again in a month and I'll fix it, but I don't want to play games with
this little stuff until the 1.5.2p2 and 1.6 trees have been merged.
  Harrumph.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From gstein@lyra.org  Fri Mar 24 21:37:41 2000
From: gstein@lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:37:41 -0800 (PST)
Subject: [Python-Dev] delegating (was: 1.6 job list)
In-Reply-To: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Greg Stein wrote:
>...
> > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
> > door, and there's a lot of other stuff I need to do besides moving
> > modules around.
> 
> Stuff that *you* need to do, sure. But there *are* a lot of us who can
> help here, and some who desire to spend their time moving modules.

I just want to empahisize this point some more.

Python 1.6 has a defined timeline, with a defined set of minimal
requirements. However! I don't believe that a corollary of that says we
MUST ignore everything else. If those other options fit within the
required timeline, then why not? (assuming we have adequate testing and
doc to go with the changes)

There are ample people who have time and inclination to contribute. If
those contributions add positive benefit, then I see no reason to exclude
them (other than on pure merit, of course).

Note that some of the problems stem from CVS access. Much Guido-time could
be saved by a commit-then-review model, rather than review-then-Guido-
commits model. Fred does this very well with the Doc/ area.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Fri Mar 24 21:38:48 2000
From: gstein@lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:38:48 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241337460.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Guido van Rossum wrote:
>...
> > We don't add it to the path. Instead, we create new modules that look
> > like:
> > 
> > ---- httplib.py ----
> > from web.httplib import *
> > ----
> > 
> > The only backwards-compat issue with this approach is that people who poke
> > values into the module will have problems. I don't believe that any of the
> > modules were designed for that, anyhow, so it would seem an acceptable to
> > (effectively) disallow that behavior.
> 
> OK, that's reasonable.  I'll have to invent a different reason why I
> don't want this -- because I really don't!

Fair enough.

> > > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
> > > door, and there's a lot of other stuff I need to do besides moving
> > > modules around.
> > 
> > Stuff that *you* need to do, sure. But there *are* a lot of us who can
> > help here, and some who desire to spend their time moving modules.
> 
> Hm.  Moving modules requires painful and arcane CVS manipulations that
> can only be done by the few of us here at CNRI -- and I'm the only one
> left who's full time on Python.  I'm still not convinced that it's a
> good plan.

There are a number of ways to do this, and I'm familiar with all of them.
It is a continuing point of strife in the Apache CVS repositories :-)

But... it is premised on accepting the desire to move them, of course.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From guido@python.org  Fri Mar 24 21:38:51 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:38:51 -0500
Subject: [Python-Dev] delegating (was: 1.6 job list)
In-Reply-To: Your message of "Fri, 24 Mar 2000 13:37:41 PST."
 <Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org>
References: <Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org>
Message-ID: <200003242138.QAA07621@eric.cnri.reston.va.us>

> Note that some of the problems stem from CVS access. Much Guido-time could
> be saved by a commit-then-review model, rather than review-then-Guido-
> commits model. Fred does this very well with the Doc/ area.

Actually, I'm experimenting with this already: Unicode, list.append()
and socket.connect() are done in this way!

For renames it is really painful though, even if someone else at CNRI
can do it.

I'd like to see a draft package hierarchy please?

Also, if you have some time, please review the bugs in the bugs list.
Patches submitted with a corresponding PR# will be treated with
priority!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Fri Mar 24 21:40:48 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 22:40:48 +0100
Subject: [Python-Dev] Unicode Patch Set 2000-03-24
Message-ID: <38DBE0E0.76A298FE@lemburg.com>

This is a multi-part message in MIME format.
--------------16C56446D7F83349DECA84A2
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Attached you find the latest update of the Unicode implementation.
The patch is against the current CVS version.

It includes the fix I posted yesterday for the core dump problem
in codecs.c (was introduced by my previous patch set -- sorry),
adds more tests for the codecs and two new parser markers
"es" and "es#".

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/
--------------16C56446D7F83349DECA84A2
Content-Type: text/plain; charset=us-ascii;
 name="Unicode-Implementation-2000-03-24.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="Unicode-Implementation-2000-03-24.patch"

Only in CVS-Python/Doc/tools: anno-api.py
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/codecs.py Python+Unicode/Lib/codecs.py
--- CVS-Python/Lib/codecs.py	Thu Mar 23 23:58:41 2000
+++ Python+Unicode/Lib/codecs.py	Fri Mar 17 23:51:01 2000
@@ -46,7 +46,7 @@
         handling schemes by providing the errors argument. These
         string values are defined:
 
-         'strict' - raise an error (or a subclass)
+         'strict' - raise a ValueError error (or a subclass)
          'ignore' - ignore the character and continue with the next
          'replace' - replace with a suitable replacement character;
                     Python will use the official U+FFFD REPLACEMENT
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/output/test_unicode Python+Unicode/Lib/test/output/test_unicode
--- CVS-Python/Lib/test/output/test_unicode	Fri Mar 24 22:21:26 2000
+++ Python+Unicode/Lib/test/output/test_unicode	Sat Mar 11 00:23:21 2000
@@ -1,5 +1,4 @@
 test_unicode
 Testing Unicode comparisons... done.
-Testing Unicode contains method... done.
 Testing Unicode formatting strings... done.
 Testing unicodedata module... done.
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py
--- CVS-Python/Lib/test/test_unicode.py	Thu Mar 23 23:58:47 2000
+++ Python+Unicode/Lib/test/test_unicode.py	Fri Mar 24 00:29:43 2000
@@ -293,3 +293,33 @@
     assert unicodedata.combining(u'\u20e1') == 230
     
     print 'done.'
+
+# Test builtin codecs
+print 'Testing builtin codecs...',
+
+assert unicode('hello','ascii') == u'hello'
+assert unicode('hello','utf-8') == u'hello'
+assert unicode('hello','utf8') == u'hello'
+assert unicode('hello','latin-1') == u'hello'
+
+assert u'hello'.encode('ascii') == 'hello'
+assert u'hello'.encode('utf-8') == 'hello'
+assert u'hello'.encode('utf8') == 'hello'
+assert u'hello'.encode('utf-16-le') == 'h\000e\000l\000l\000o\000'
+assert u'hello'.encode('utf-16-be') == '\000h\000e\000l\000l\000o'
+assert u'hello'.encode('latin-1') == 'hello'
+
+u = u''.join(map(unichr, range(1024)))
+for encoding in ('utf-8', 'utf-16', 'utf-16-le', 'utf-16-be',
+                 'raw_unicode_escape', 'unicode_escape', 'unicode_internal'):
+    assert unicode(u.encode(encoding),encoding) == u
+
+u = u''.join(map(unichr, range(256)))
+for encoding in ('latin-1',):
+    assert unicode(u.encode(encoding),encoding) == u
+
+u = u''.join(map(unichr, range(128)))
+for encoding in ('ascii',):
+    assert unicode(u.encode(encoding),encoding) == u
+
+print 'done.'
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt
--- CVS-Python/Misc/unicode.txt	Thu Mar 23 23:58:48 2000
+++ Python+Unicode/Misc/unicode.txt	Fri Mar 24 22:29:35 2000
@@ -715,21 +715,126 @@
 
 These markers are used by the PyArg_ParseTuple() APIs:
 
-  'U':  Check for Unicode object and return a pointer to it
+  "U":  Check for Unicode object and return a pointer to it
 
-  's':  For Unicode objects: auto convert them to the <default encoding>
+  "s":  For Unicode objects: auto convert them to the <default encoding>
         and return a pointer to the object's <defencstr> buffer.
 
-  's#': Access to the Unicode object via the bf_getreadbuf buffer interface 
+  "s#": Access to the Unicode object via the bf_getreadbuf buffer interface 
         (see Buffer Interface); note that the length relates to the buffer
         length, not the Unicode string length (this may be different
         depending on the Internal Format).
 
-  't#': Access to the Unicode object via the bf_getcharbuf buffer interface
+  "t#": Access to the Unicode object via the bf_getcharbuf buffer interface
         (see Buffer Interface); note that the length relates to the buffer
         length, not necessarily to the Unicode string length (this may
         be different depending on the <default encoding>).
 
+  "es": 
+	Takes two parameters: encoding (const char *) and
+	buffer (char **). 
+
+	The input object is first coerced to Unicode in the usual way
+	and then encoded into a string using the given encoding.
+
+	On output, a buffer of the needed size is allocated and
+	returned through *buffer as NULL-terminated string.
+	The encoded may not contain embedded NULL characters.
+	The caller is responsible for free()ing the allocated *buffer
+	after usage.
+
+  "es#":
+	Takes three parameters: encoding (const char *),
+	buffer (char **) and buffer_len (int *).
+	
+	The input object is first coerced to Unicode in the usual way
+	and then encoded into a string using the given encoding.
+
+	If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer)
+	on input. Output is then copied to *buffer.
+
+	If *buffer is NULL, a buffer of the needed size is
+	allocated and output copied into it. *buffer is then
+	updated to point to the allocated memory area. The caller
+	is responsible for free()ing *buffer after usage.
+
+	In both cases *buffer_len is updated to the number of
+	characters written (excluding the trailing NULL-byte).
+	The output buffer is assured to be NULL-terminated.
+
+Examples:
+
+Using "es#" with auto-allocation:
+
+    static PyObject *
+    test_parser(PyObject *self,
+		PyObject *args)
+    {
+	PyObject *str;
+	const char *encoding = "latin-1";
+	char *buffer = NULL;
+	int buffer_len = 0;
+
+	if (!PyArg_ParseTuple(args, "es#:test_parser",
+			      encoding, &buffer, &buffer_len))
+	    return NULL;
+	if (!buffer) {
+	    PyErr_SetString(PyExc_SystemError,
+			    "buffer is NULL");
+	    return NULL;
+	}
+	str = PyString_FromStringAndSize(buffer, buffer_len);
+	free(buffer);
+	return str;
+    }
+
+Using "es" with auto-allocation returning a NULL-terminated string:    
+    
+    static PyObject *
+    test_parser(PyObject *self,
+		PyObject *args)
+    {
+	PyObject *str;
+	const char *encoding = "latin-1";
+	char *buffer = NULL;
+
+	if (!PyArg_ParseTuple(args, "es:test_parser",
+			      encoding, &buffer))
+	    return NULL;
+	if (!buffer) {
+	    PyErr_SetString(PyExc_SystemError,
+			    "buffer is NULL");
+	    return NULL;
+	}
+	str = PyString_FromString(buffer);
+	free(buffer);
+	return str;
+    }
+
+Using "es#" with a pre-allocated buffer:
+    
+    static PyObject *
+    test_parser(PyObject *self,
+		PyObject *args)
+    {
+	PyObject *str;
+	const char *encoding = "latin-1";
+	char _buffer[10];
+	char *buffer = _buffer;
+	int buffer_len = sizeof(_buffer);
+
+	if (!PyArg_ParseTuple(args, "es#:test_parser",
+			      encoding, &buffer, &buffer_len))
+	    return NULL;
+	if (!buffer) {
+	    PyErr_SetString(PyExc_SystemError,
+			    "buffer is NULL");
+	    return NULL;
+	}
+	str = PyString_FromStringAndSize(buffer, buffer_len);
+	return str;
+    }
+
 
 File/Stream Output:
 -------------------
@@ -837,6 +942,7 @@
 
 History of this Proposal:
 -------------------------
+1.3: Added new "es" and "es#" parser markers
 1.2: Removed POD about codecs.open()
 1.1: Added note about comparisons and hash values. Added note about
      case mapping algorithms. Changed stream codecs .read() and
Only in CVS-Python/Objects: .#stringobject.c.2.59
Only in CVS-Python/Objects: stringobject.c.orig
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Python/getargs.c Python+Unicode/Python/getargs.c
--- CVS-Python/Python/getargs.c	Sat Mar 11 10:55:21 2000
+++ Python+Unicode/Python/getargs.c	Fri Mar 24 20:22:26 2000
@@ -178,6 +178,8 @@
 		}
 		else if (level != 0)
 			; /* Pass */
+		else if (c == 'e')
+			; /* Pass */
 		else if (isalpha(c))
 			max++;
 		else if (c == '|')
@@ -654,6 +656,122 @@
 			break;
 		}
 	
+	case 'e': /* encoded string */
+		{
+			char **buffer;
+			const char *encoding;
+			PyObject *u, *s;
+			int size;
+
+			/* Get 'e' parameter: the encoding name */
+			encoding = (const char *)va_arg(*p_va, const char *);
+			if (encoding == NULL)
+				return "(encoding is NULL)";
+			
+			/* Get 's' parameter: the output buffer to use */
+			if (*format != 's')
+				return "(unkown parser marker combination)";
+			buffer = (char **)va_arg(*p_va, char **);
+			format++;
+			if (buffer == NULL)
+				return "(buffer is NULL)";
+			
+			/* Convert object to Unicode */
+			u = PyUnicode_FromObject(arg);
+			if (u == NULL)
+				return "string, unicode or text buffer";
+			
+			/* Encode object; use default error handling */
+			s = PyUnicode_AsEncodedString(u,
+						      encoding,
+						      NULL);
+			Py_DECREF(u);
+			if (s == NULL)
+				return "(encoding failed)";
+			if (!PyString_Check(s)) {
+				Py_DECREF(s);
+				return "(encoder failed to return a string)";
+			}
+			size = PyString_GET_SIZE(s);
+
+			/* Write output; output is guaranteed to be
+			   0-terminated */
+			if (*format == '#') { 
+				/* Using buffer length parameter '#':
+
+				   - if *buffer is NULL, a new buffer
+				   of the needed size is allocated and
+				   the data copied into it; *buffer is
+				   updated to point to the new buffer;
+				   the caller is responsible for
+				   free()ing it after usage
+
+				   - if *buffer is not NULL, the data
+				   is copied to *buffer; *buffer_len
+				   has to be set to the size of the
+				   buffer on input; buffer overflow is
+				   signalled with an error; buffer has
+				   to provide enough room for the
+				   encoded string plus the trailing
+				   0-byte
+
+				   - in both cases, *buffer_len is
+				   updated to the size of the buffer
+				   /excluding/ the trailing 0-byte
+
+				*/
+				int *buffer_len = va_arg(*p_va, int *);
+
+				format++;
+				if (buffer_len == NULL)
+					return "(buffer_len is NULL)";
+				if (*buffer == NULL) {
+					*buffer = PyMem_NEW(char, size + 1);
+					if (*buffer == NULL) {
+						Py_DECREF(s);
+						return "(memory error)";
+					}
+				} else {
+					if (size + 1 > *buffer_len) {
+						Py_DECREF(s);
+						return "(buffer overflow)";
+					}
+				}
+				memcpy(*buffer,
+				       PyString_AS_STRING(s),
+				       size + 1);
+				*buffer_len = size;
+			} else {
+				/* Using a 0-terminated buffer:
+
+				   - the encoded string has to be
+				   0-terminated for this variant to
+				   work; if it is not, an error raised
+
+				   - a new buffer of the needed size
+				   is allocated and the data copied
+				   into it; *buffer is updated to
+				   point to the new buffer; the caller
+				   is responsible for free()ing it
+				   after usage
+
+				 */
+				if (strlen(PyString_AS_STRING(s)) != size)
+					return "(encoded string without "\
+					       "NULL bytes)";
+				*buffer = PyMem_NEW(char, size + 1);
+				if (*buffer == NULL) {
+					Py_DECREF(s);
+					return "(memory error)";
+				}
+				memcpy(*buffer,
+				       PyString_AS_STRING(s),
+				       size + 1);
+			}
+			Py_DECREF(s);
+			break;
+		}
+
 	case 'S': /* string object */
 		{
 			PyObject **p = va_arg(*p_va, PyObject **);

--------------16C56446D7F83349DECA84A2--


From fdrake@acm.org  Fri Mar 24 21:40:38 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 16:40:38 -0500 (EST)
Subject: [Python-Dev] delegating (was: 1.6 job list)
In-Reply-To: <Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org>
References: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
 <Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org>
Message-ID: <14555.57558.939236.363358@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Note that some of the problems stem from CVS access. Much Guido-time could
 > be saved by a commit-then-review model, rather than review-then-Guido-

  This is a non-problem; I'm willing to do the arcane CVS
manipulations if the issue is Guido's time.
  What I will *not* do is do it piecemeal without a cohesive plan that 
Guido approves of at least 95%, and I'll be really careful to do that
last 5% when he's not in the office.  ;)

 > commits model. Fred does this very well with the Doc/ area.

  Thanks for the vote of confidence!
  The model that I use for the Doc/ area is more like "Fred reviews,
Fred commits, and Guido can read it on python.org like everyone else."
Works for me!  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From bwarsaw@cnri.reston.va.us  Fri Mar 24 21:45:38 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 24 Mar 2000 16:45:38 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <200003242115.QAA04648@eric.cnri.reston.va.us>
 <Pine.LNX.4.10.10003241330080.27878-100000@nebula.lyra.org>
Message-ID: <14555.57858.824301.693390@anthem.cnri.reston.va.us>

One thing you can definitely do now which breaks no code: propose a
package hierarchy for the standard library.


From akuchlin@mems-exchange.org  Fri Mar 24 21:46:28 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 24 Mar 2000 16:46:28 -0500 (EST)
Subject: [Python-Dev] Unicode charnames impl.
In-Reply-To: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
Message-ID: <14555.57908.151946.182639@amarok.cnri.reston.va.us>

Here's a strawman codec for doing the \N{NULL} thing.  Questions:

0) Is the code below correct?

1) What the heck would this encoding be called?

2) What does .encode() do?  (Right now it escapes \N as
\N{BACKSLASH}N.)

3) How can we store all those names?  The resulting dictionary makes a
361K .py file; Python dumps core trying to parse it.  (Another bug...)

4) What do you with the error \N{...... no closing right bracket.
   Right now it stops at that point, and never advances any farther.  
   Maybe it should assume it's an error if there's no } within the
   next 200 chars or some similar limit?
  
5) Do we need StreamReader/Writer classes, too?

I've also add a script that parses the names out of the NameList.txt 
file at ftp://ftp.unicode.org/Public/UNIDATA/.

--amk 


namecodec.py:
=============

import codecs

#from _namedict import namedict
namedict = {'NULL': 0, 'START OF HEADING' : 1,
            'BACKSLASH':ord('\\')}
            
class NameCodec(codecs.Codec):
    def encode(self,input,errors='strict'):
        # XXX what should this do?  Escape the
        # sequence \N as '\N{BACKSLASH}N'?
        return input.replace( '\\N', '\\N{BACKSLASH}N' )

    def decode(self,input,errors='strict'):
        output = unicode("")
        last = 0
        index = input.find( u'\\N{' )
        while index != -1:
            output = output + unicode( input[last:index] )
            used = index
            r_bracket = input.find( '}', index)
            if r_bracket == -1:
                # No closing bracket; bail out...
                break

            name = input[index + 3 : r_bracket]
            code = namedict.get( name )
            if code is not None:
                output = output + unichr(code)
            elif errors == 'strict':
                raise ValueError, 'Unknown character name %s' % repr(name)
            elif errors == 'ignore': pass
            elif errors == 'replace':
                output = output + unichr( 0xFFFD )
            
            last = r_bracket + 1
            index = input.find( '\\N{', last)
        else:
            # Finally failed gently, no longer finding a \N{...
            output = output + unicode( input[last:] )
            return len(input), output

        # Otherwise, we hit the break for an unterminated \N{...}
        return index, output
        
if __name__ == '__main__':
    c = NameCodec()
    for s in [ r'b\lah blah \N{NULL} asdf',
               r'b\l\N{START OF HEADING}\N{NU' ]:
        used, s2 = c.decode(s)
        print repr( s2 )

        s3 = c.encode(s)
        _, s4 = c.decode(s3)
        print repr(s3)
        assert s4 == s
        
    print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='replace' ))
    print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='ignore' ))


makenamelist.py
===============

# Hack to extract character names from NamesList.txt
# Output the repr() of the resulting dictionary
        
import re, sys, string

namedict = {}

while 1:
    L = sys.stdin.readline()
    if L == "": break

    m = re.match('([0-9a-fA-F]){4}(?:\t(.*)\s*)', L)
    if m is not None:
        last_char = int(m.group(1), 16)
        if m.group(2) is not None:
            name = string.upper( m.group(2) )
            if name not in ['<CONTROL>',
                            '<NOT A CHARACTER>']:
                namedict[ name ] = last_char
#                print name, last_char
            
    m = re.match('\t=\s*(.*)\s*(;.*)?', L)
    if m is not None:
        name = string.upper( m.group(1) )
        names = string.split(name, ',')
        names = map(string.strip, names)
        for n in names:
            namedict[ n ] = last_char
#            print n, last_char

# XXX and do what with this dictionary?        
print namedict


From mal@lemburg.com  Fri Mar 24 21:50:19 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 22:50:19 +0100
Subject: [Python-Dev] Unicode Patch Set 2000-03-24
References: <38DBE0E0.76A298FE@lemburg.com>
Message-ID: <38DBE31B.BCB342CA@lemburg.com>

Oops, sorry, the patch file wasn't supposed to go to python-dev.

Anyway, Greg's wish is included in there and MarkH should be
happy now -- at least I hope he his ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Jasbahr@origin.EA.com  Fri Mar 24 21:49:35 2000
From: Jasbahr@origin.EA.com (Asbahr, Jason)
Date: Fri, 24 Mar 2000 15:49:35 -0600
Subject: [Python-Dev] Memory Management
Message-ID: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com>

Greetings!

We're working on integrating our own memory manager into our project
and the current challenge is figuring out how to make it play nice
with Python (and SWIG).  The approach we're currently taking is to
patch 1.5.2 and augment the PyMem* macros to call external memory
allocation functions that we provide.  The idea is to easily allow 
the addition of third party memory management facilities to Python.
Assuming 1) we get it working :-), and 2) we sync to the latest Python
CVS and patch that, would this be a useful patch to give back to the 
community?  Has anyone run up against this before?

Thanks,

Jason Asbahr
Origin Systems, Inc.
jasbahr@origin.ea.com


From bwarsaw@cnri.reston.va.us  Fri Mar 24 21:53:01 2000
From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us)
Date: Fri, 24 Mar 2000 16:53:01 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
 <14555.56434.974884.832078@anthem.cnri.reston.va.us>
 <200003242127.QAA06269@eric.cnri.reston.va.us>
Message-ID: <14555.58301.790774.159381@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido@python.org> writes:

    GvR> You probably meant:

    |   sock.connect(addr)
    |   sock.connect(host, port)
    |   sock.connect((host, port))

    GvR> since (host, port) is equivalent to (addr).

Doh, yes. :)

    GvR> Fred typically directs latex2html to break all sections
    GvR> apart.  It's in the previous section:

I know, I was being purposefully dense for effect :)  Fred, is there
some way to make the html contain a link to the previous section for
the "see above" text?  That would solve the problem I think.

    GvR> This also explains the reason for requiring a single
    GvR> argument: when using AF_UNIX, the second argument makes no
    GvR> sense!

    GvR> Frankly, I'm not sure what do here -- it's more correct to
    GvR> require a single address argument always, but it's more
    GvR> convenient to allow two sometimes.

    GvR> Note that sendto(data, addr) only accepts the tuple form: you
    GvR> cannot write sendto(data, host, port).

Hmm, that /does/ complicate things -- it makes explaining the API more
difficult.  Still, in this case I think I'd lean toward liberal
acceptance of input parameters. :)

-Barry


From bwarsaw@cnri.reston.va.us  Fri Mar 24 21:57:01 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 24 Mar 2000 16:57:01 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
 <200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <14555.58541.207868.496747@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido@python.org> writes:

    GvR> OK, that's reasonable.  I'll have to invent a different
    GvR> reason why I don't want this -- because I really don't!

Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't
be persuaded to change your mind :)

-Barry


From fdrake@acm.org  Fri Mar 24 22:10:41 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 17:10:41 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: <14555.58301.790774.159381@anthem.cnri.reston.va.us>
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
 <14555.56434.974884.832078@anthem.cnri.reston.va.us>
 <200003242127.QAA06269@eric.cnri.reston.va.us>
 <14555.58301.790774.159381@anthem.cnri.reston.va.us>
Message-ID: <14555.59361.460705.258859@weyr.cnri.reston.va.us>

bwarsaw@cnri.reston.va.us writes:
 > I know, I was being purposefully dense for effect :)  Fred, is there
 > some way to make the html contain a link to the previous section for
 > the "see above" text?  That would solve the problem I think.

  No.  I expect this to no longer be a problem when we push to
SGML/XML, so I won't waste any time hacking around it.
  On the other hand, lots of places in the documentation refer to
"above" and "below" in the traditional sense used in paper documents,
and that doesn't work well for hypertext, even in the strongly
traditional book-derivation way the Python manuals are done.  As soon
as it's not in the same HTML file, "above" and "below" break for a lot 
of people.  So it still should be adjusted at an appropriate time.

 > Hmm, that /does/ complicate things -- it makes explaining the API more
 > difficult.  Still, in this case I think I'd lean toward liberal
 > acceptance of input parameters. :)

  No -- all the more reason to be strict and keep the descriptions as
simple as reasonable.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido@python.org  Fri Mar 24 22:10:32 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 17:10:32 -0500
Subject: [Python-Dev] Memory Management
In-Reply-To: Your message of "Fri, 24 Mar 2000 15:49:35 CST."
 <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com>
References: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com>
Message-ID: <200003242210.RAA11434@eric.cnri.reston.va.us>

> We're working on integrating our own memory manager into our project
> and the current challenge is figuring out how to make it play nice
> with Python (and SWIG).  The approach we're currently taking is to
> patch 1.5.2 and augment the PyMem* macros to call external memory
> allocation functions that we provide.  The idea is to easily allow 
> the addition of third party memory management facilities to Python.
> Assuming 1) we get it working :-), and 2) we sync to the latest Python
> CVS and patch that, would this be a useful patch to give back to the 
> community?  Has anyone run up against this before?

Check out the archives for patches@python.org looking for posts by
Vladimir Marangozov.  Vladimir has produced several rounds of patches
with a very similar goal in mind.  We're still working out some
details -- but it shouldn't be too long, and I hope that his patches
are also suitable for you.  If not, discussion is required!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From bwarsaw@cnri.reston.va.us  Fri Mar 24 22:12:35 2000
From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us)
Date: Fri, 24 Mar 2000 17:12:35 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
 <14555.56434.974884.832078@anthem.cnri.reston.va.us>
 <200003242127.QAA06269@eric.cnri.reston.va.us>
 <14555.58301.790774.159381@anthem.cnri.reston.va.us>
 <14555.59361.460705.258859@weyr.cnri.reston.va.us>
Message-ID: <14555.59475.802130.434345@anthem.cnri.reston.va.us>

>>>>> "Fred" == Fred L Drake, Jr <fdrake@acm.org> writes:

    Fred>   No -- all the more reason to be strict and keep the
    Fred> descriptions as simple as reasonable.

At the expense of (IMO unnecessarily) breaking existing code?


From mal@lemburg.com  Fri Mar 24 22:13:04 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 23:13:04 +0100
Subject: [Python-Dev] Unicode charnames impl.
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us>
Message-ID: <38DBE870.D88915B5@lemburg.com>

"Andrew M. Kuchling" wrote:
> 
> Here's a strawman codec for doing the \N{NULL} thing.  Questions:
> 
> 0) Is the code below correct?

Some comments below.
 
> 1) What the heck would this encoding be called?

Ehm, 'unicode-with-smileys' I guess... after all that's what motivated
the thread ;-) Seriously, I'd go with 'unicode-named'. You can then
stack it on top of 'unicode-escape' and get the best of both
worlds...
 
> 2) What does .encode() do?  (Right now it escapes \N as
> \N{BACKSLASH}N.)

.encode() should translate Unicode to a string. Since the
named char thing is probably only useful on input, I'd say:
don't do anything, except maybe return input.encode('unicode-escape').
 
> 3) How can we store all those names?  The resulting dictionary makes a
> 361K .py file; Python dumps core trying to parse it.  (Another bug...)

I've made the same experience with the large Unicode mapping
tables... the trick is to split the dictionary definition
in chunks and then use dict.update() to paste them together
again.
 
> 4) What do you with the error \N{...... no closing right bracket.
>    Right now it stops at that point, and never advances any farther.
>    Maybe it should assume it's an error if there's no } within the
>    next 200 chars or some similar limit?

I'd suggest to take the upper bound of all Unicode name
lengths as limit.
 
> 5) Do we need StreamReader/Writer classes, too?

If you plan to have it registered with a codec search
function, yes. No big deal though, because you can use
the Codec class as basis for them:

class StreamWriter(Codec,codecs.StreamWriter):
    pass
        
class StreamReader(Codec,codecs.StreamReader):
    pass

### encodings module API

def getregentry():

    return (Codec().encode,Codec().decode,StreamReader,StreamWriter)

Then call drop the scripts into the encodings package dir
and it should be useable via unicode(r'\N{SMILEY}','unicode-named')
and u":-)".encode('unicode-named').

> I've also add a script that parses the names out of the NameList.txt
> file at ftp://ftp.unicode.org/Public/UNIDATA/.
> 
> --amk
> 
> namecodec.py:
> =============
> 
> import codecs
> 
> #from _namedict import namedict
> namedict = {'NULL': 0, 'START OF HEADING' : 1,
>             'BACKSLASH':ord('\\')}
> 
> class NameCodec(codecs.Codec):
>     def encode(self,input,errors='strict'):
>         # XXX what should this do?  Escape the
>         # sequence \N as '\N{BACKSLASH}N'?
>         return input.replace( '\\N', '\\N{BACKSLASH}N' )

You should return a string on output... input will be a Unicode
object and the return value too if you don't add e.g.
an .encode('unicode-escape').
 
>     def decode(self,input,errors='strict'):
>         output = unicode("")
>         last = 0
>         index = input.find( u'\\N{' )
>         while index != -1:
>             output = output + unicode( input[last:index] )
>             used = index
>             r_bracket = input.find( '}', index)
>             if r_bracket == -1:
>                 # No closing bracket; bail out...
>                 break
> 
>             name = input[index + 3 : r_bracket]
>             code = namedict.get( name )
>             if code is not None:
>                 output = output + unichr(code)
>             elif errors == 'strict':
>                 raise ValueError, 'Unknown character name %s' % repr(name)

This could also be UnicodeError (its a subclass of ValueError).

>             elif errors == 'ignore': pass
>             elif errors == 'replace':
>                 output = output + unichr( 0xFFFD )

'\uFFFD' would save a call.
 
>             last = r_bracket + 1
>             index = input.find( '\\N{', last)
>         else:
>             # Finally failed gently, no longer finding a \N{...
>             output = output + unicode( input[last:] )
>             return len(input), output
> 
>         # Otherwise, we hit the break for an unterminated \N{...}
>         return index, output

Note that .decode() must only return the decoded data.
The "bytes read" integer was removed in order to make
the Codec APIs compatible with the standard file object
APIs.
 
> if __name__ == '__main__':
>     c = NameCodec()
>     for s in [ r'b\lah blah \N{NULL} asdf',
>                r'b\l\N{START OF HEADING}\N{NU' ]:
>         used, s2 = c.decode(s)
>         print repr( s2 )
> 
>         s3 = c.encode(s)
>         _, s4 = c.decode(s3)
>         print repr(s3)
>         assert s4 == s
> 
>     print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='replace' ))
>     print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='ignore' ))
> 
> makenamelist.py
> ===============
> 
> # Hack to extract character names from NamesList.txt
> # Output the repr() of the resulting dictionary
> 
> import re, sys, string
> 
> namedict = {}
> 
> while 1:
>     L = sys.stdin.readline()
>     if L == "": break
> 
>     m = re.match('([0-9a-fA-F]){4}(?:\t(.*)\s*)', L)
>     if m is not None:
>         last_char = int(m.group(1), 16)
>         if m.group(2) is not None:
>             name = string.upper( m.group(2) )
>             if name not in ['<CONTROL>',
>                             '<NOT A CHARACTER>']:
>                 namedict[ name ] = last_char
> #                print name, last_char
> 
>     m = re.match('\t=\s*(.*)\s*(;.*)?', L)
>     if m is not None:
>         name = string.upper( m.group(1) )
>         names = string.split(name, ',')
>         names = map(string.strip, names)
>         for n in names:
>             namedict[ n ] = last_char
> #            print n, last_char
> 
> # XXX and do what with this dictionary?
> print namedict
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://www.python.org/mailman/listinfo/python-dev

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake@acm.org  Fri Mar 24 22:12:42 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 17:12:42 -0500 (EST)
Subject: [Python-Dev] Memory Management
In-Reply-To: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com>
References: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com>
Message-ID: <14555.59482.61317.992089@weyr.cnri.reston.va.us>

Asbahr, Jason writes:
 > community?  Has anyone run up against this before?

  You should talk to Vladimir Marangozov; he's done a fair bit of work 
dealing with memory management in Python.  You probably want to read
the chapter he contributed to the Python/C API document for the
release earlier this week.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From skip@mojam.com (Skip Montanaro)  Fri Mar 24 22:19:50 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Fri, 24 Mar 2000 16:19:50 -0600 (CST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242115.QAA04648@eric.cnri.reston.va.us>
References: <14555.54636.811100.254309@amarok.cnri.reston.va.us>
 <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>
 <14555.55139.484135.602894@weyr.cnri.reston.va.us>
 <200003242115.QAA04648@eric.cnri.reston.va.us>
Message-ID: <14555.59910.631130.241930@beluga.mojam.com>

    Guido> Which reminds me of another reason to wait: coming up with the
    Guido> right package hierarchy is hard.  (E.g. I find network too long;
    Guido> plus, does htmllib belong there?)

Ah, another topic for python-dev.  Even if we can't do the packaging right
away, we should be able to hash out the structure.

Skip


From guido@python.org  Fri Mar 24 22:25:01 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 17:25:01 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: Your message of "Fri, 24 Mar 2000 17:10:41 EST."
 <14555.59361.460705.258859@weyr.cnri.reston.va.us>
References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> <14555.58301.790774.159381@anthem.cnri.reston.va.us>
 <14555.59361.460705.258859@weyr.cnri.reston.va.us>
Message-ID: <200003242225.RAA13408@eric.cnri.reston.va.us>

> bwarsaw@cnri.reston.va.us writes:
>  > I know, I was being purposefully dense for effect :)  Fred, is there
>  > some way to make the html contain a link to the previous section for
>  > the "see above" text?  That would solve the problem I think.

[Fred]
>   No.  I expect this to no longer be a problem when we push to
> SGML/XML, so I won't waste any time hacking around it.
>   On the other hand, lots of places in the documentation refer to
> "above" and "below" in the traditional sense used in paper documents,
> and that doesn't work well for hypertext, even in the strongly
> traditional book-derivation way the Python manuals are done.  As soon
> as it's not in the same HTML file, "above" and "below" break for a lot 
> of people.  So it still should be adjusted at an appropriate time.

My approach to this: put more stuff on the same page!  I personally
favor putting an entire chapter on one page; even if you split the
top-level subsections this wouldn't have happened.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From klm@digicool.com  Fri Mar 24 22:40:54 2000
From: klm@digicool.com (Ken Manheimer)
Date: Fri, 24 Mar 2000 17:40:54 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003241729380.1711-100000@korak.digicool.com>

Guido wrote:

> OK, that's reasonable.  I'll have to invent a different reason why I
> don't want this -- because I really don't!

I'm glad this organize-the-library-in-packages initiative seems to be
moving towards concentrating on the organization, rather than just
starting to put obvious things in the obvious places.  Personally, i
*crave* sensible, discoverable organization.  The only thing i like less
than complicated disorganization is complicated misorganization - and i
think that just diving in and doing the "obvious" placements would have
the terrible effect of making it harder, not easier, to move eventually to
the right arrangement.

Ken
klm@digicool.com


From akuchlin@mems-exchange.org  Fri Mar 24 22:45:20 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 24 Mar 2000 17:45:20 -0500 (EST)
Subject: [Python-Dev] Unicode charnames impl.
In-Reply-To: <38DBE870.D88915B5@lemburg.com>
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
 <14555.57908.151946.182639@amarok.cnri.reston.va.us>
 <38DBE870.D88915B5@lemburg.com>
Message-ID: <14555.61440.613940.50492@amarok.cnri.reston.va.us>

M.-A. Lemburg writes:
>.encode() should translate Unicode to a string. Since the
>named char thing is probably only useful on input, I'd say:
>don't do anything, except maybe return input.encode('unicode-escape').

Wait... then you can't stack it on top of unicode-escape, because it
would already be Unicode escaped.
 
>> 4) What do you with the error \N{...... no closing right bracket.
>I'd suggest to take the upper bound of all Unicode name
>lengths as limit.

Seems like a hack.

>Note that .decode() must only return the decoded data.
>The "bytes read" integer was removed in order to make
>the Codec APIs compatible with the standard file object
>APIs.

Huh? Why does Misc/unicode.txt describe decode() as "Decodes the
object input and returns a tuple (output object, length consumed)"?
Or are you talking about a different .decode() method?

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
    "Ruby's dead?"
    "Yes."
    "Ah me. That's the trouble with mortals. They do that. Not to worry, eh?"
    -- Dream and Pharamond, in SANDMAN #46: "Brief Lives:6"


From gmcm@hypernet.com  Fri Mar 24 22:50:12 2000
From: gmcm@hypernet.com (Gordon McMillan)
Date: Fri, 24 Mar 2000 17:50:12 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: <200003242103.QAA03288@eric.cnri.reston.va.us>
Message-ID: <1258184279-6957124@hypernet.com>

[Guido]
> Someone noticed that socket.connect() and a few related functions
> (connect_ex() and bind()) take either a single (host, port) tuple or
> two separate arguments, but that only the tuple is documented.
> 
> Similar to append(), I'd like to close this gap, and I've made the
> necessary changes.  This will probably break lots of code.

This will indeed cause great wailing and gnashing of teeth. I've 
been criticized for using the tuple form in the Sockets 
HOWTO (in fact I foolishly changed it to demonstrate both 
forms).
 
> Similar to append(), I'd like people to fix their code rather than
> whine -- two-arg connect() has never been documented, although it's
> found in much code (even the socket module test code :-( ).
> 
> Similar to append(), I may revert the change if it is shown to cause
> too much pain during beta testing...

I say give 'em something to whine about.

put-sand-in-the-vaseline-ly y'rs

- Gordon


From klm@digicool.com  Fri Mar 24 22:55:43 2000
From: klm@digicool.com (Ken Manheimer)
Date: Fri, 24 Mar 2000 17:55:43 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.58541.207868.496747@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003241747570.1711-100000@korak.digicool.com>

On Fri, 24 Mar 2000, Barry A. Warsaw wrote:

> 
> >>>>> "GvR" == Guido van Rossum <guido@python.org> writes:
> 
>     GvR> OK, that's reasonable.  I'll have to invent a different
>     GvR> reason why I don't want this -- because I really don't!
> 
> Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't
> be persuaded to change your mind :)

Maybe i'm just a slave to my organization mania, but i'd suggest the
following order change of 5 and 6, plus an addition; from:

5 now: Flat is better than nested.
6 now: Sparse is better than dense.

to:

5 Sparse is better than dense.
6 Flat is better than nested
6.5 until it gets too dense.

or-is-it-me-that-gets-too-dense'ly yrs,

ken
klm@digicool.com

(And couldn't the humor page get hooked up a bit better?  That was
definitely a fun part of maintaining python.org...)


From gstein@lyra.org  Sat Mar 25 01:19:18 2000
From: gstein@lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 17:19:18 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.57858.824301.693390@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241718500.26440-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Barry A. Warsaw wrote:
> One thing you can definitely do now which breaks no code: propose a
> package hierarchy for the standard library.

I already did!

http://www.python.org/pipermail/python-dev/2000-February/003761.html


*grumble*

-g

-- 
Greg Stein, http://www.lyra.org/


From tim_one@email.msn.com  Sat Mar 25 04:19:33 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Fri, 24 Mar 2000 23:19:33 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <001001bf9611$52e960a0$752d153f@tim>

[GregS proposes a partial packaging of std modules for 1.6, Guido objects on
 spurious grounds, GregS refutes that, Guido agrees]

> I'll have to invent a different reason why I don't want this -- because
> I really don't!

This one's easy!  It's why I left the 20th of the 20 Pythonic Theses for you
to fill in <wink>.  All you have to do now is come up with a pithy way to
say "if it's something Guido is so interested in that he wants to be deeply
involved in it himself, but it comes at a time when he's buried under prior
commitments, then tough tulips, it waits".

shades-of-the-great-renaming-ly y'rs  - tim


From tim_one@email.msn.com  Sat Mar 25 04:19:36 2000
From: tim_one@email.msn.com (Tim Peters)
Date: Fri, 24 Mar 2000 23:19:36 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.58541.207868.496747@anthem.cnri.reston.va.us>
Message-ID: <001101bf9611$544239e0$752d153f@tim>

[Guido]
> OK, that's reasonable.  I'll have to invent a different
> reason why I don't want this -- because I really don't!

[Barry]
> Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't
> be persuaded to change your mind :)

No no no no no:  "namespaces are one honking great idea ..." is the
controlling one here:  Guido really *does* want this!  It's a question of
timing, in the sense of "never is often better than *right* now", but to be
eventually modified by "now is better than never".  These were carefully
designed to support any position whatsoever, you know <wink>.

although-in-any-particular-case-there's-only-one-true-interpretation-ly
    y'rs  - tim


From guido@python.org  Sat Mar 25 04:19:41 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 23:19:41 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 17:19:18 PST."
 <Pine.LNX.4.10.10003241718500.26440-100000@nebula.lyra.org>
References: <Pine.LNX.4.10.10003241718500.26440-100000@nebula.lyra.org>
Message-ID: <200003250419.XAA25751@eric.cnri.reston.va.us>

> > One thing you can definitely do now which breaks no code: propose a
> > package hierarchy for the standard library.
> 
> I already did!
> 
> http://www.python.org/pipermail/python-dev/2000-February/003761.html
> 
> *grumble*

You've got to be kidding.  That's not a package hierarchy proposal,
it's just one package (network).

Without a comprehensive proposal I'm against a partial reorganization:
without a destination we can't start marching.

Naming things is very contentious -- everybody has an opinion.  To
pick the right names you must see things in perspective.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Moshe Zadka <mzadka@geocities.com>  Sat Mar 25 08:45:28 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 25 Mar 2000 10:45:28 +0200 (IST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <Pine.GSO.4.10.10003251036170.3539-100000@sundial>

On Thu, 23 Mar 2000 gvwilson@nevex.com wrote:

> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:

I'd like to know what you mean by "class" method. (I do know C++ and Java,
so I have some idea...). Specifically, my question is: how does a class
method access class variables? They can't be totally unqualified (because
that's very unpythonic). If they are qualified by the class's name, I see
it as a very mild improvement on the current situation. You could suggest,
for example, to qualify class variables by "class" (so you'd do things
like:
	class.x = 1), but I'm not sure I like it. On the whole, I think it
is a much bigger issue on how be denote class methods.

Also, one slight problem with your method of denoting class methods:
currently, it is possible to add instance method at run time to a class by
something like

class C:
	pass

def foo(self):
	pass

C.foo = foo

In your suggestion, how do you view the possiblity of adding class methods
to a class? (Note that "foo", above, is also perfectly usable as a plain
function). 

I want to note that Edward suggested denotation by a seperate namespace:

C.foo = foo # foo is an instance method
C.__methods__.foo = foo # foo is a class method

The biggest problem with that suggestion is that it doesn't address the
common case of defining it textually inside the class definition.

> I'd also like to ask (separately) that assignment to None be defined as a
> no-op, so that programmers can write:
> 
>     year, month, None, None, None, None, weekday, None, None = gmtime(time())
> 
> instead of having to create throw-away variables to fill in slots in
> tuples that they don't care about.

Currently, I use "_" for that purpose, after I heard the idea from Fredrik
Lundh.

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From gstein@lyra.org  Sat Mar 25 09:26:23 2000
From: gstein@lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 01:26:23 -0800 (PST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <200003250419.XAA25751@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003250005430.30345-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Guido van Rossum wrote:
> > > One thing you can definitely do now which breaks no code: propose a
> > > package hierarchy for the standard library.
> > 
> > I already did!
> > 
> > http://www.python.org/pipermail/python-dev/2000-February/003761.html
> > 
> > *grumble*
> 
> You've got to be kidding.  That's not a package hierarchy proposal,
> it's just one package (network).
>
> Without a comprehensive proposal I'm against a partial reorganization:
> without a destination we can't start marching.

Not kidding at all. I said before that I don't think we can do everything
all at once. I *do* think this is solvable with a greedy algorithm rather
than waiting for some nebulous completion point.

> Naming things is very contentious -- everybody has an opinion.  To
> pick the right names you must see things in perspective.

Sure. And those diverse opinions are why I don't believe it is possible to
do all at once. The task is simply too large to tackle in one shot. IMO,
it must be solved incrementally. I'm not even going to attempt to try to
define a hierarchy for all those modules. I count 137 on my local system.
Let's say that I *do* try... some are going to end up "forced" rather than
obeying some obvious grouping. If you do it a chunk at a time, then you
get the obvious, intuitive groupings. Try for more, and you just bung it
all up.

For discussion's sake: can you provide a rationale for doing it all at
once? In the current scenario, modules just appear at some point. After a
partial reorg, some modules appear at a different point. "No big whoop."
Just because module A is in a package doesn't imply that module B must
also be in a package.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Sat Mar 25 09:35:39 2000
From: gstein@lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 01:35:39 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <001001bf9611$52e960a0$752d153f@tim>
Message-ID: <Pine.LNX.4.10.10003250127400.30345-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Tim Peters wrote:
> [GregS proposes a partial packaging of std modules for 1.6, Guido objects on
>  spurious grounds, GregS refutes that, Guido agrees]
> 
> > I'll have to invent a different reason why I don't want this -- because
> > I really don't!
> 
> This one's easy!  It's why I left the 20th of the 20 Pythonic Theses for you
> to fill in <wink>.  All you have to do now is come up with a pithy way to
> say "if it's something Guido is so interested in that he wants to be deeply
> involved in it himself, but it comes at a time when he's buried under prior
> commitments, then tough tulips, it waits".

No need for Pythonic Theses. I don't see anybody disagreeing with the end
goal. The issue comes up with *how* to get there.

I say "do it incrementally" while others say "do it all at once."
Personally, I don't think it is possible to do all at once. As a
corollary, if you can't do it all at once, but you *require* that it be
done all at once, then you have effectively deferred the problem. To put
it another way, Guido has already invented a reason to not do it: he just
requires that it be done all at once. Result: it won't be done.

[ not saying this was Guido's intent or desire... but this is how I read
  the result :-) ]

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From Moshe Zadka <mzadka@geocities.com>  Sat Mar 25 09:55:12 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 25 Mar 2000 11:55:12 +0200 (IST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.34371.749039.946891@beluga.mojam.com>
Message-ID: <Pine.GSO.4.10.10003251154020.3539-100000@sundial>

On Fri, 24 Mar 2000, Skip Montanaro wrote:

> Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
> of general usefulness (this is at least generally useful for anyone writing
> web spiders ;-) shouldn't live in Tools, because it's not always available
> and users need to do extra work to make them available.

You're right, but I'd like this to be a 1.7 change. It's just that I plan
to suggest a great-renaming-fest for 1.7 modules, and then namespace
wouldn't be cluttered when you don't need it.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From Moshe Zadka <mzadka@geocities.com>  Sat Mar 25 10:16:23 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 25 Mar 2000 12:16:23 +0200 (IST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003251214081.3539-100000@sundial>

On Fri, 24 Mar 2000, Guido van Rossum wrote:

> OK, that's reasonable.  I'll have to invent a different reason why I
> don't want this -- because I really don't!

Here's a reason: there shouldn't be changes we'll retract later -- we
need to come up with the (more or less) right hierarchy the first time,
or we'll do a lot of work for nothing.

> Hm.  Moving modules requires painful and arcane CVS manipulations that
> can only be done by the few of us here at CNRI -- and I'm the only one
> left who's full time on Python.

Hmmmmm....this is a big problem. Maybe we need to have more people with
access to the CVS?
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mal@lemburg.com  Sat Mar 25 10:47:30 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 25 Mar 2000 11:47:30 +0100
Subject: [Python-Dev] Unicode charnames impl.
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
 <14555.57908.151946.182639@amarok.cnri.reston.va.us>
 <38DBE870.D88915B5@lemburg.com> <14555.61440.613940.50492@amarok.cnri.reston.va.us>
Message-ID: <38DC9942.3C4E4B92@lemburg.com>

"Andrew M. Kuchling" wrote:
> 
> M.-A. Lemburg writes:
> >.encode() should translate Unicode to a string. Since the
> >named char thing is probably only useful on input, I'd say:
> >don't do anything, except maybe return input.encode('unicode-escape').
> 
> Wait... then you can't stack it on top of unicode-escape, because it
> would already be Unicode escaped.

Sorry for the mixup (I guess yesterday wasn't my day...). I had
stream codecs in mind: these are stackable, meaning that you can
wrap one codec around another. And its also their interface API
that was changed -- not the basic stateless encoder/decoder ones.

Stacking of .encode()/.decode() must be done "by hand" in e.g.
the way I described above. Another approach would be subclassing
the unicode-escape Codec and then calling the base class method.

> >> 4) What do you with the error \N{...... no closing right bracket.
> >I'd suggest to take the upper bound of all Unicode name
> >lengths as limit.
> 
> Seems like a hack.

It is... but what other way would there be ?
 
> >Note that .decode() must only return the decoded data.
> >The "bytes read" integer was removed in order to make
> >the Codec APIs compatible with the standard file object
> >APIs.
> 
> Huh? Why does Misc/unicode.txt describe decode() as "Decodes the
> object input and returns a tuple (output object, length consumed)"?
> Or are you talking about a different .decode() method?

You're right... I was thinking about .read() and .write().
.decode() should do return a tuple, just as documented in
unicode.txt.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mhammond@skippinet.com.au  Sat Mar 25 13:20:59 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Sun, 26 Mar 2000 00:20:59 +1100
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <Pine.LNX.4.10.10003250005430.30345-100000@nebula.lyra.org>
Message-ID: <ECEPKNMJLHAPFFJHDOJBCECKCHAA.mhammond@skippinet.com.au>

[Greg writes]
> I'm not even going to attempt to try to
> define a hierarchy for all those modules. I count 137 on my local system.
> Let's say that I *do* try... some are going to end up "forced" rather than
> obeying some obvious grouping. If you do it a chunk at a time, then you
> get the obvious, intuitive groupings. Try for more, and you just bung it
> all up.
...
> Just because module A is in a package doesn't imply that module B must
> also be in a package.

I agree with Greg - every module will not fit into a package.

But I also agree with Guido - we _should_ attempt to go through the 137
modules and put the ones that fit into logical groupings.  Greg is probably
correct with his selection for "net", but a general evaluation is still a
good thing.  A view of the bigger picture will help to quell debates over
the structure, and only leave us with the squabbles over the exact spelling
:-)

+2 on ... err .... -1 on ... errr - awww - screw that-<grin>-ly,

Mark.


From tismer@tismer.com  Sat Mar 25 13:35:50 2000
From: tismer@tismer.com (Christian Tismer)
Date: Sat, 25 Mar 2000 14:35:50 +0100
Subject: [Python-Dev] Unicode charnames impl.
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us>
Message-ID: <38DCC0B6.2A7D0EF1@tismer.com>


"Andrew M. Kuchling" wrote:
...
> 3) How can we store all those names?  The resulting dictionary makes a
> 361K .py file; Python dumps core trying to parse it.  (Another bug...)

This is simply not the place to use a dictionary.
You don't need fast lookup from names to codes,
but something that supports incremental search.
This would enable PythonWin to sho a pop-up list after
you typed the first letters.

I'm working on a common substring analysis that makes
each entry into 3 to 5 small integers.
You then encode these in an order-preserving way. That means,
the resulting code table is still lexically ordered, and
access to the sentences is done via bisection.
Takes me some more time to get that, but it will not
be larger than 60k, or I drop it.
Also note that all the names use uppercase letters and space
only. An opportunity to use simple context encoding and
use just 4 bits most of the time.

...
> I've also add a script that parses the names out of the NameList.txt
> file at ftp://ftp.unicode.org/Public/UNIDATA/.

Is there any reason why you didn't use the UnicodeData.txt file,
I mean do I cover everything if I continue to use that?

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From Vladimir.Marangozov@inrialpes.fr  Sat Mar 25 14:59:55 2000
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Sat, 25 Mar 2000 15:59:55 +0100 (CET)
Subject: [Python-Dev] Windows and PyObject_NEW
Message-ID: <200003251459.PAA09181@python.inrialpes.fr>

For MarkH, Guido and the Windows experienced:

I've been reading Jeffrey Richter's "Advanced Windows" last night in order
to try understanding better why PyObject_NEW is implemented differently for
Windows. Again, I feel uncomfortable with this, especially now, when
I'm dealing with the memory aspect of Python's object constructors/desctrs.

Some time ago, Guido elaborated on why PyObject_NEW uses malloc() on the
user's side, before calling _PyObject_New (on Windows, cf. objimpl.h):

[Guido]
> I can explain the MS_COREDLL business:
> 
> This is defined on Windows because the core is in a DLL.  Since the
> caller may be in another DLL, and each DLL (potentially) has a
> different default allocator, and (in pre-Vladimir times) the
> type-specific deallocator typically calls free(), we (Mark & I)
> decided that the allocation should be done in the type-specific
> allocator.  We changed the PyObject_NEW() macro to call malloc() and
> pass that into _PyObject_New() as a second argument.

While I agree with this, from reading chapters 5-9 of (a French copy of)
the book (translated backwards here):

5. Win32 Memory Architecture
6. Exploring Virtual Memory
7. Using Virtual Memory in Your Applications
8. Memory Mapped Files
9. Heaps

I can't find any radical Windows specificities for memory management.
On Windows, like the rest of the OSes, the (virtual & physical) memory
allocated for a process is common and seem to be accessible from all
DDLs involved in an executable.

Things like page sharing, copy-on-write, private process mem, etc. are
conceptually all the same on Windows and Unix.

Now, the backwards binary compatibility argument aside (assuming that
extensions get recompiled when a new Python version comes out),
my concern is that with the introduction of PyObject_NEW *and* PyObject_DEL,
there's no point in having separate implementations for Windows and Unix
any more  (or I'm really missing something and I fail to see what it is).

User objects would be allocated *and* freed by the core DLL (at least
the object headers). Even if several DLLs use different allocators, this
shouldn't be a problem if what's obtained via PyObject_NEW is freed via
PyObject_DEL. This Python memory would be allocated from the Python's
core DLL regions/pages/heaps. And I believe that the memory allocated
by the core DLL is accessible from the other DLL's of the process.
(I haven't seen evidence on the opposite, but tell me if this is not true)

I thought that maybe Windows malloc() uses different heaps for the different
DLLs, but that's fine too, as long as the _NEW/_DEL symmetry is respected
and all heaps are accessible from all DLLs (which seems to be the case...),
but:

In the beginning of Chapter 9, Heaps, I read the following:

"""
...About Win32 heaps (compared to Win16 heaps)...

* There is only one kind of heap (it doesn't have any particular name,
  like "local" or "global" on Win16, because it's unique)

* Heaps are always local to a process. The contents of a process heap is
  not accessible from the threads of another process. A large number of
  Win16 applications use the global heap as a way of sharing data between
  processes; this change in the Win32 heaps is often a source of problems
  for porting Win16 applications to Win32.

* One process can create several heaps in its addressing space and can
  manipulate them all.

* A DLL does not have its own heap. It uses the heaps as part of the
  addressing space of the process. However, a DLL can create a heap in
  the addressing space of a process and reserve it for its own use.
  Since several 16-bit DLLs share data between processes by using the
  local heap of a DLL, this change is a source of problems when porting
  Win16 apps to Win32...
"""

This last paragraph confuses me. On one hand, it's stated that all heaps
can be manipulated by the process, and OTOH, a DLL can reserve a heap for
personal use within that process (implying the heap is r/w protected for
the other DLLs ?!?). The rest of this chapter does not explain how this
"private reservation" is or can be done, so some of you would probably
want to chime in and explain this to me.

Going back to PyObject_NEW, if it turns out that all heaps are accessible
from all DLLs involved in the process, I would probably lobby for unifying
the implementation of _PyObject_NEW/_New and _PyObject_DEL/_Del for Windows
and Unix.

Actually on Windows, object allocation does not depend on a central,
Python core memory allocator. Therefore, with the patches I'm working on,
changing the core allocator would work (would be changed for real) only for
platforms other than Windows.

Next, ff it's possible to unify the implementation, it would also be
possible to expose and officialize in the C API a new function set:

PyObject_New() and PyObject_Del() (without leading underscores)

For now, due to the implementation difference on Windows, we're forced to
use the macro versions PyObject_NEW/DEL.

Clearly, please tell me what would be wrong on Windows if a) & b) & c):

a) we have PyObject_New(), PyObject_Del()
b) their implementation is platform independent (no MS_COREDLL diffs,
   we retain the non-Windows variant)
c) they're both used systematically for all object types

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From gmcm@hypernet.com  Sat Mar 25 15:46:01 2000
From: gmcm@hypernet.com (Gordon McMillan)
Date: Sat, 25 Mar 2000 10:46:01 -0500
Subject: [Python-Dev] Windows and PyObject_NEW
In-Reply-To: <200003251459.PAA09181@python.inrialpes.fr>
Message-ID: <1258123323-10623548@hypernet.com>

Vladimir Marangozov

> ... And I believe that the memory allocated
> by the core DLL is accessible from the other DLL's of the process.
> (I haven't seen evidence on the opposite, but tell me if this is not true)

This is true. Or, I should say, it all boils down to 
 HeapAlloc( heap, flags, bytes)
and malloc is going to use the _crtheap.

> In the beginning of Chapter 9, Heaps, I read the following:
> 
> """
> ...About Win32 heaps (compared to Win16 heaps)...
> 
> * There is only one kind of heap (it doesn't have any particular name,
>   like "local" or "global" on Win16, because it's unique)
> 
> * Heaps are always local to a process. The contents of a process heap is
>   not accessible from the threads of another process. A large number of
>   Win16 applications use the global heap as a way of sharing data between
>   processes; this change in the Win32 heaps is often a source of problems
>   for porting Win16 applications to Win32.
> 
> * One process can create several heaps in its addressing space and can
>   manipulate them all.
> 
> * A DLL does not have its own heap. It uses the heaps as part of the
>   addressing space of the process. However, a DLL can create a heap in
>   the addressing space of a process and reserve it for its own use.
>   Since several 16-bit DLLs share data between processes by using the
>   local heap of a DLL, this change is a source of problems when porting
>   Win16 apps to Win32...
> """
> 
> This last paragraph confuses me. On one hand, it's stated that all heaps
> can be manipulated by the process, and OTOH, a DLL can reserve a heap for
> personal use within that process (implying the heap is r/w protected for
> the other DLLs ?!?). 

At any time, you can creat a new Heap
 handle HeapCreate(options, initsize, maxsize)

Nothing special about the "dll" context here. On Win9x, only 
someone who knows about the handle can manipulate the 
heap. (On NT, you can enumerate the handles in the process.)

I doubt very much that you would break anybody's code by 
removing the Windows specific behavior.

But it seems to me that unless Python always uses the 
default malloc, those of us who write C++ extensions will have 
to override operator new? I'm not sure. I've used placement 
new to allocate objects in a memory mapped file, but I've never 
tried to muck with the global memory policy of C++ program.


- Gordon


From akuchlin@mems-exchange.org  Sat Mar 25 17:58:56 2000
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Sat, 25 Mar 2000 12:58:56 -0500 (EST)
Subject: [Python-Dev] Unicode charnames impl.
In-Reply-To: <38DCC0B6.2A7D0EF1@tismer.com>
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
 <14555.57908.151946.182639@amarok.cnri.reston.va.us>
 <38DCC0B6.2A7D0EF1@tismer.com>
Message-ID: <14556.65120.22727.524616@newcnri.cnri.reston.va.us>

Christian Tismer writes:
>This is simply not the place to use a dictionary.
>You don't need fast lookup from names to codes,
>but something that supports incremental search.
>This would enable PythonWin to sho a pop-up list after
>you typed the first letters.

Hmm... one could argue that PythonWin or IDLE should provide their own
database for incremental searching; I was planning on following Bill
Tutt's suggestion of generating a perfect minimal hash for the names.
gperf isn't up to the job, but I found an algorithm that should be OK.
Just got to implement it now...  But, if your approach pays off it'll
be superior to a perfect hash.

>Is there any reason why you didn't use the UnicodeData.txt file,
>I mean do I cover everything if I continue to use that?

Oops; I saw the NameList file and just went for it; maybe it should
use the full UnicodeData.txt.

--amk


From Moshe Zadka <mzadka@geocities.com>  Sat Mar 25 18:10:44 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 25 Mar 2000 20:10:44 +0200 (IST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBCECKCHAA.mhammond@skippinet.com.au>
Message-ID: <Pine.GSO.4.10.10003252008130.7664-100000@sundial>

On Sun, 26 Mar 2000, Mark Hammond wrote:

> But I also agree with Guido - we _should_ attempt to go through the 137

Where did you come up with that number? I counted much more -- not quite
sure, but certainly more.

Well, here's a tentative suggestion I worked out today. This is just to
have something to quibble about. In the interest of rushing it out of the
door, there are a few modules (explicitly mentioned) which I have said
nothing about.

net
	httplib
	ftplib
	urllib
	cgi
	gopherlib
	imaplib
	poplib
	nntplib
	smptlib
	urlparse
	telnetlib
	server
		BaseHTTPServer
		CGIHTTPServer
		SimpleHTTPServer
		SocketServer
		asynchat
		asyncore
text
	sgmllib
	htmllib
	htmlentitydefs
	xml
		whatever the xml-sig puts here
	mail
		rfc822
		mime
			MimeWriter
			mimetools
			mimify
			mailcap
			mimetypes
			base64
			quopri
		mailbox
		mhlib
	binhex
	parse
		string
		re
		regex
		reconvert
		regex_syntax
		regsub
		shlex
	ConfigParser
	linecache
	multifile
	netrc
bin
	gzip
	zlib
	aifc
	chunk
	image
		imghdr
		colorsys
		imageop
		imgfile
		rgbimg
		yuvconvert
	sound
		sndhdr
		toaiff
		audiodev
		sunau
		sunaudio
		wave
		audioop
		sunaudiodev
db
	anydbm
	whichdb
	bsddb
	dbm
	dbhash
	dumbdbm
	gdbm
math
	bisect
	fpformat
	random
	whrandom
	cmath
	math
	crypt
	fpectl
	fpetest
	array
	md5
	mpz
	rotor
	sha
time
	calendar
	time
	tzparse
	sched
	timing
interpreter
	new
	py_compile
	code
	codeop
	compileall
	keyword
	token
	tokenize
	parser
	dis
	bdb
	pdb
	profile
	pyclbr
	tabnanny
	symbol
	pstats
	traceback
	rlcompleter
security
	Bastion
	rexec
	ihooks
file
	dircache
	path -- a virtual module which would do a from <something>path import *
	dospath
	posixpath
	macpath
	nturl2path
	ntpath
	macurl2path
	filecmp
	fileinput
	StringIO
	cStringIO
	glob
	fnmatch
	posixfile
	stat
	statcache
	statvfs
	tempfile
	shutil
	pipes
	popen2
	commands
	dl
	fcntl
serialize
	pickle
	cPickle
	shelve
	xdrlib
	copy
	copy_reg
threads
	thread
	threading
	Queue
	mutex
ui
	curses
	Tkinter
	cmd
	getpass
internal
	_codecs
	_locale
	_tkinter
	pcre
	strop
	posix
users
	pwd
	grp
	nis
exceptions
os
types
UserDict
UserList
user
site
locale
sgi
	al
	cd
	cl
	fl
	fm
	gl
	misc (what used to be sgimodule.c)
	sv
unicode
	codecs
	unicodedata
	unicodedatabase
========== Modules not handled ============
formatter
getopt
pprint
pty
repr
tty
errno
operator
pure
readline
resource
select
signal
socket
struct
syslog
termios

Well, if you got this far, you certainly deserve...

congratualtions-ly y'rs, Z.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From DavidA@ActiveState.com  Sat Mar 25 18:28:30 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Sat, 25 Mar 2000 10:28:30 -0800
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <Pine.GSO.4.10.10003252008130.7664-100000@sundial>
Message-ID: <LMBBIEIJKMPMLBONJMFCKEGNCDAA.DavidA@ActiveState.com>

> db
> 	anydbm
> 	whichdb
> 	bsddb
> 	dbm
> 	dbhash
> 	dumbdbm
> 	gdbm

This made me think of one issue which is worth considering -- is there a
mechanism for third-party packages to hook into the standard naming
hierarchy?  It'd be weird not to have the oracle and sybase modules within
the db toplevel package, for example.

--david ascher


From Moshe Zadka <mzadka@geocities.com>  Sat Mar 25 18:30:26 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 25 Mar 2000 20:30:26 +0200 (IST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <LMBBIEIJKMPMLBONJMFCKEGNCDAA.DavidA@ActiveState.com>
Message-ID: <Pine.GSO.4.10.10003252028290.7664-100000@sundial>

On Sat, 25 Mar 2000, David Ascher wrote:

> This made me think of one issue which is worth considering -- is there a
> mechanism for third-party packages to hook into the standard naming
> hierarchy?  It'd be weird not to have the oracle and sybase modules within
> the db toplevel package, for example.

My position is that any 3rd party module decides for itself where it wants
to live -- once we formalized the framework. Consider PyGTK/PyGnome,
PyQT/PyKDE -- they should live in the UI package too...


From DavidA@ActiveState.com  Sat Mar 25 18:50:14 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Sat, 25 Mar 2000 10:50:14 -0800
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <Pine.GSO.4.10.10003252028290.7664-100000@sundial>
Message-ID: <LMBBIEIJKMPMLBONJMFCKEGOCDAA.DavidA@ActiveState.com>

> On Sat, 25 Mar 2000, David Ascher wrote:
>
> > This made me think of one issue which is worth considering -- is there a
> > mechanism for third-party packages to hook into the standard naming
> > hierarchy?  It'd be weird not to have the oracle and sybase
> modules within
> > the db toplevel package, for example.
>
> My position is that any 3rd party module decides for itself where it wants
> to live -- once we formalized the framework. Consider PyGTK/PyGnome,
> PyQT/PyKDE -- they should live in the UI package too...

That sounds good in theory, but I can see possible problems down the line:

1) The current mapping between package names and directory structure means
that installing a third party package hierarchy in a different place on disk
than the standard library requires some work on the import mechanisms (this
may have been discussed already) and a significant amount of user education.

2) We either need a 'registration' mechanism whereby people can claim a name
in the standard hierarchy or expect conflicts.  As far as I can gather, in
the Perl world registration occurs by submission to CPAN.  Correct?

One alternative is to go the Java route, which would then mean, I think,
that some core modules are placed very high in the hierarchy (the equivalent
of the java. subtree), and some others are deprecated to lower subtree (the
equivalent of com.sun).

Anyway, I agree with Guido on this one -- naming is a contentious issue
wrought with long-term implications.  Let's not rush into a decision just
yet.

--david


From guido@python.org  Sat Mar 25 18:56:20 2000
From: guido@python.org (Guido van Rossum)
Date: Sat, 25 Mar 2000 13:56:20 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Sat, 25 Mar 2000 01:35:39 PST."
 <Pine.LNX.4.10.10003250127400.30345-100000@nebula.lyra.org>
References: <Pine.LNX.4.10.10003250127400.30345-100000@nebula.lyra.org>
Message-ID: <200003251856.NAA09636@eric.cnri.reston.va.us>

> I say "do it incrementally" while others say "do it all at once."
> Personally, I don't think it is possible to do all at once. As a
> corollary, if you can't do it all at once, but you *require* that it be
> done all at once, then you have effectively deferred the problem. To put
> it another way, Guido has already invented a reason to not do it: he just
> requires that it be done all at once. Result: it won't be done.

Bullshit, Greg.  (I don't normally like to use such strong words, but
since you're being confrontational here...)

I'm all for doing it incrementally -- but I want the plan for how to
do it made up front.  That doesn't require all the details to be
worked out -- but it requires a general idea about what kind of things
we will have in the namespace and what kinds of names they get.  An
organizing principle, if you like.  If we were to decide later that we
go for a Java-like deep hierarchy, the network package would have to
be moved around again -- what a waste.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Moshe Zadka <mzadka@geocities.com>  Sat Mar 25 19:35:37 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sat, 25 Mar 2000 21:35:37 +0200 (IST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <LMBBIEIJKMPMLBONJMFCKEGOCDAA.DavidA@ActiveState.com>
Message-ID: <Pine.GSO.4.10.10003252127560.8000-100000@sundial>

On Sat, 25 Mar 2000, David Ascher wrote:

> > My position is that any 3rd party module decides for itself where it wants
> > to live -- once we formalized the framework. Consider PyGTK/PyGnome,
> > PyQT/PyKDE -- they should live in the UI package too...
> 
> That sounds good in theory, but I can see possible problems down the line:
> 
> 1) The current mapping between package names and directory structure means
> that installing a third party package hierarchy in a different place on disk
> than the standard library requires some work on the import mechanisms (this
> may have been discussed already) and a significant amount of user education.

Ummmm....
1.a) If the work of the import-sig produces something (which I suspect it
will), it's more complicated -- you could have JAR-like files with
hierarchies inside.

1.b) Installation is the domain of the distutils-sig. I seem to remember
Greg Ward saying something about installing packages.

> 2) We either need a 'registration' mechanism whereby people can claim a name
> in the standard hierarchy or expect conflicts.  As far as I can gather, in
> the Perl world registration occurs by submission to CPAN.  Correct?

Yes. But this is no worse then the current situation, where people pick 
a toplevel name <wink>. I agree a registration mechanism would be helpful.

> One alternative is to go the Java route, which would then mean, I think,
> that some core modules are placed very high in the hierarchy (the equivalent
> of the java. subtree), and some others are deprecated to lower subtree (the
> equivalent of com.sun).

Personally, I *hate* the Java mechanism -- see Stallman's position on why
GNU Java packages use gnu.* rather then org.gnu.* for some of the reasons.
I really, really, like the Perl mechanism, and I think we would do well
to think if something like that wouldn't suit us, with minor
modifications. (Remember that lwall copied the Pythonic module mechanism, 
so Perl and Python modules are quite similar)

> Anyway, I agree with Guido on this one -- naming is a contentious issue
> wrought with long-term implications.  Let's not rush into a decision just
> yet.

I agree. That's why I pushed out the straw-man proposal.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From bwarsaw@cnri.reston.va.us  Sat Mar 25 20:07:27 2000
From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us)
Date: Sat, 25 Mar 2000 15:07:27 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <14555.57858.824301.693390@anthem.cnri.reston.va.us>
 <Pine.LNX.4.10.10003241718500.26440-100000@nebula.lyra.org>
Message-ID: <14557.7295.451011.36533@anthem.cnri.reston.va.us>

I guess I was making a request for a more comprehensive list.  People
are asking to packagize the entire directory, so I'd like to know what
organization they'd propose for all the modules.

-Barry


From bwarsaw@cnri.reston.va.us  Sat Mar 25 20:20:09 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Sat, 25 Mar 2000 15:20:09 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <200003242129.QAA06510@eric.cnri.reston.va.us>
 <Pine.GSO.4.10.10003251214081.3539-100000@sundial>
Message-ID: <14557.8057.896921.693908@anthem.cnri.reston.va.us>

>>>>> "MZ" == Moshe Zadka <moshez@math.huji.ac.il> writes:

    MZ> Hmmmmm....this is a big problem. Maybe we need to have more
    MZ> people with access to the CVS?

To make changes like this, you don't just need write access to CVS,
you need physical access to the repository filesystem.  It's not
possible to provide this access to non-CNRI'ers.

-Barry


From gstein@lyra.org  Sat Mar 25 20:40:59 2000
From: gstein@lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 12:40:59 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14557.8057.896921.693908@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003251240010.2490-100000@nebula.lyra.org>

On Sat, 25 Mar 2000, Barry A. Warsaw wrote:
> >>>>> "MZ" == Moshe Zadka <moshez@math.huji.ac.il> writes:
> 
>     MZ> Hmmmmm....this is a big problem. Maybe we need to have more
>     MZ> people with access to the CVS?
> 
> To make changes like this, you don't just need write access to CVS,
> you need physical access to the repository filesystem.  It's not
> possible to provide this access to non-CNRI'ers.

Unless the CVS repository was moved to, say, SourceForge. 

:-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From bwarsaw@cnri.reston.va.us  Sat Mar 25 21:00:39 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Sat, 25 Mar 2000 16:00:39 -0500 (EST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
References: <LMBBIEIJKMPMLBONJMFCKEGOCDAA.DavidA@ActiveState.com>
 <Pine.GSO.4.10.10003252127560.8000-100000@sundial>
Message-ID: <14557.10487.736544.336550@anthem.cnri.reston.va.us>

>>>>> "MZ" == Moshe Zadka <moshez@math.huji.ac.il> writes:

    MZ> Personally, I *hate* the Java mechanism -- see Stallman's
    MZ> position on why GNU Java packages use gnu.* rather then
    MZ> org.gnu.* for some of the reasons.

Actually, it's Per Bothner's position:

http://www.gnu.org/software/java/why-gnu-packages.txt

and I agree with him.  I kind of wished that JimH had chosen simply
`python' as JPython's top level package heirarchy, but that's too late
to change now.

-Barry


From bwarsaw@cnri.reston.va.us  Sat Mar 25 21:03:08 2000
From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us)
Date: Sat, 25 Mar 2000 16:03:08 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <14557.8057.896921.693908@anthem.cnri.reston.va.us>
 <Pine.LNX.4.10.10003251240010.2490-100000@nebula.lyra.org>
Message-ID: <14557.10636.504088.517078@anthem.cnri.reston.va.us>

>>>>> "GS" == Greg Stein <gstein@lyra.org> writes:

    GS> Unless the CVS repository was moved to, say, SourceForge.

I didn't want to rehash that, but yes, you're absolutely right!

-Barry


From gstein@lyra.org  Sat Mar 25 21:13:00 2000
From: gstein@lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 13:13:00 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14557.10636.504088.517078@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003251309050.2490-100000@nebula.lyra.org>

On Sat, 25 Mar 2000 bwarsaw@cnri.reston.va.us wrote:
> >>>>> "GS" == Greg Stein <gstein@lyra.org> writes:
> 
>     GS> Unless the CVS repository was moved to, say, SourceForge.
> 
> I didn't want to rehash that, but yes, you're absolutely right!

Me neither, ergo the smiley :-)

Just felt inclined to mention it, and I think the conversation stopped
last time at that point; not sure it ever was "hashed" :-). But it is only
a discussion to raise if checkins-via-CNRI-guys becomes a true bottleneck.
Which it hasn't and doesn't look to be. Constrained? Yes. Bottleneck? No.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From jeremy-home@cnri.reston.va.us  Sat Mar 25 21:22:09 2000
From: jeremy-home@cnri.reston.va.us (Jeremy Hylton)
Date: Sat, 25 Mar 2000 16:22:09 -0500 (EST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBCECKCHAA.mhammond@skippinet.com.au>
References: <Pine.LNX.4.10.10003250005430.30345-100000@nebula.lyra.org>
 <ECEPKNMJLHAPFFJHDOJBCECKCHAA.mhammond@skippinet.com.au>
Message-ID: <14557.4689.858620.578102@walden>

>>>>> "MH" == Mark Hammond <mhammond@skippinet.com.au> writes:

  MH> [Greg writes]
  >> I'm not even going to attempt to try to define a hierarchy for
  >> all those modules. I count 137 on my local system.  Let's say
  >> that I *do* try... some are going to end up "forced" rather than
  >> obeying some obvious grouping. If you do it a chunk at a time,
  >> then you get the obvious, intuitive groupings. Try for more, and
  >> you just bung it all up.

  MH> I agree with Greg - every module will not fit into a package.

Sure.  No one is arguing with that :-).

Where I disagree with Greg, is that we shouldn't approach this
piecemeal.  A greedy algorithm can lead to a locally optimal solution
that isn't the right for the whole library.  A name or grouping might
make sense on its own, but isn't sufficiently clear when taking all
137 odd modules into account.

  MH> But I also agree with Guido - we _should_ attempt to go through
  MH> the 137 modules and put the ones that fit into logical
  MH> groupings.  Greg is probably correct with his selection for
  MH> "net", but a general evaluation is still a good thing.  A view
  MH> of the bigger picture will help to quell debates over the
  MH> structure, and only leave us with the squabbles over the exact
  MH> spelling :-)

x1.5 on this. I'm not sure which direction you ended up thinking this
was (+ or -), but which ever direction it was I like it.

Jeremy


From gstein@lyra.org  Sat Mar 25 21:40:48 2000
From: gstein@lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 13:40:48 -0800 (PST)
Subject: [Python-Dev] voting numbers
Message-ID: <Pine.LNX.4.10.10003251328190.2490-100000@nebula.lyra.org>

Hey... just thought I'd drop off a description of the "formal" mechanism
that the ASF uses for voting since it has been seen here and there on this
group :-)

+1  "I'm all for it. Do it!"
+0  "Seems cool and acceptable, but I can also live without it"
-0  "Not sure this is the best thing to do, but I'm not against it."
-1  "Veto. And <HERE> is my reasoning."


Strictly speaking, there is no vetoing here, other than by Guido. For
changes to Apache (as opposed to bug fixes), it depends on where the
development is. Early stages, it is reasonably open and people work
straight against CVS (except for really big design changes). Late stage,
it requires three +1 votes during discussion of a patch before it goes in.

Here on python-dev, it would seem that the votes are a good way to quickly
let Guido know people's feelings about topic X or Y.

On the patches mailing list, the voting could actually be quite a useful
measure for the people with CVS commit access. If a patch gets -1, then
its commit should wait until reason X has been resolved. Note that it can
be resolved in two ways: the person lifts their veto (after some amount of
persuasion or explanation), or the patch is updated to address the
concerns (well, unless the veto is against the concept of the patch
entirely :-). If a patch gets a few +1 votes, then it can probably go
straight in. Note that the Apache guys sometimes say things like "+1 on
concept" meaning they like the idea, but haven't reviewed the code.

Do we formalize on using these? Not really suggesting that. But if myself
(and others) drop these things into mail notes, then we may as well have a
description of just what the heck is going on :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From Moshe Zadka <mzadka@geocities.com>  Sat Mar 25 23:27:18 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sun, 26 Mar 2000 01:27:18 +0200 (IST)
Subject: [Python-Dev] Q: repr.py vs. pprint.py
Message-ID: <Pine.GSO.4.10.10003260123420.9956-100000@sundial>

Is there any reason to keep two seperate modules with simple-formatting
functions? I think pprint is somewhat more sophisticated, but in the
worst case, we can just dump them both in the same file (the only thing
would be that pprint would export "repr", in addition to "saferepr" (among
others).

(Just bumped into this in my reorg suggestion)
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From Moshe Zadka <mzadka@geocities.com>  Sat Mar 25 23:32:38 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sun, 26 Mar 2000 01:32:38 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
Message-ID: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>

Here's a second version of the straw man proposal for the reorganization
of modules in packages. Note that I'm treating it as a strictly 1.7
proposal, so I don't care a "lot" about backwards compatiblity.

I'm down to 4 unhandled modules, which means that if no one objects (and
I'm sure someone will <wink>), this can be a plan of action. So get your
objections ready guys!

net
	httplib
	ftplib
	urllib
	cgi
	gopherlib
	imaplib
	poplib
	nntplib
	smptlib
	urlparse
	telnetlib
	server
		BaseHTTPServer
		CGIHTTPServer
		SimpleHTTPServer
		SocketServer
		asynchat
		asyncore
text
	sgmllib
	htmllib
	htmlentitydefs
	xml
		whatever the xml-sig puts here
	mail
		rfc822
		mime
			MimeWriter
			mimetools
			mimify
			mailcap
			mimetypes
			base64
			quopri
		mailbox
		mhlib
	binhex
	parse
		string
		re
		regex
		reconvert
		regex_syntax
		regsub
		shlex
	ConfigParser
	linecache
	multifile
	netrc
bin
	gzip
	zlib
	aifc
	chunk
	image
		imghdr
		colorsys
		imageop
		imgfile
		rgbimg
		yuvconvert
	sound
		sndhdr
		toaiff
		audiodev
		sunau
		sunaudio
		wave
		audioop
		sunaudiodev
db
	anydbm
	whichdb
	bsddb
	dbm
	dbhash
	dumbdbm
	gdbm
math
	bisect
	fpformat
	random
	whrandom
	cmath
	math
	crypt
	fpectl
	fpetest
	array
	md5
	mpz
	rotor
	sha
time
	calendar
	time
	tzparse
	sched
	timing
interpreter
	new
	py_compile
	code
	codeop
	compileall
	keyword
	token
	tokenize
	parser
	dis
	bdb
	pdb
	profile
	pyclbr
	tabnanny
	symbol
	pstats
	traceback
	rlcompleter
security
	Bastion
	rexec
	ihooks
file
	dircache
	path -- a virtual module which would do a from <something>path import *
	dospath
	posixpath
	macpath
	nturl2path
	ntpath
	macurl2path
	filecmp
	fileinput
	StringIO
	cStringIO
	glob
	fnmatch
	posixfile
	stat
	statcache
	statvfs
	tempfile
	shutil
	pipes
	popen2
	commands
	dl
	fcntl
	lowlevel
		socket
		select
	terminal
		termios
		pty
		tty
		readline
	syslog
serialize
	pickle
	cPickle
	shelve
	xdrlib
	copy
	copy_reg
threads
	thread
	threading
	Queue
	mutex
ui
	curses
	Tkinter
	cmd
	getpass
internal
	_codecs
	_locale
	_tkinter
	pcre
	strop
	posix
users
	pwd
	grp
	nis
sgi
	al
	cd
	cl
	fl
	fm
	gl
	misc (what used to be sgimodule.c)
	sv
unicode
	codecs
	unicodedata
	unicodedatabase
exceptions
os
types
UserDict
UserList
user
site
locale
pure
formatter
getopt
signal
pprint
========== Modules not handled ============
errno
resource
operator
struct

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From DavidA@ActiveState.com  Sat Mar 25 23:39:51 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Sat, 25 Mar 2000 15:39:51 -0800
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <Pine.GSO.4.10.10003252127560.8000-100000@sundial>
Message-ID: <LMBBIEIJKMPMLBONJMFCEEHFCDAA.DavidA@ActiveState.com>

> I really, really, like the Perl mechanism, and I think we would do well
> to think if something like that wouldn't suit us, with minor
> modifications.

The biggest modification which I think is needed to a Perl-like organization
is that IMO there is value in knowing what packages are 'blessed' by Guido.
In other words, some sort of Q/A mechanism would be good, if it can be kept
simple.

[Alternatively, let's not put a Q/A mechanism in place and my employer can
make money selling that information, the way they do for Perl! =)]

> (Remember that lwall copied the Pythonic module mechanism,
> so Perl and Python modules are quite similar)

That's stretching things a bit (the part after the 'so' doesn't follow from
the part before), as there is a lot more to the nature of module systems,
but the point is well taken.

--david


From Moshe Zadka <mzadka@geocities.com>  Sun Mar 26 04:44:02 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sun, 26 Mar 2000 06:44:02 +0200 (IST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <LMBBIEIJKMPMLBONJMFCEEHFCDAA.DavidA@ActiveState.com>
Message-ID: <Pine.GSO.4.10.10003260642150.11076-100000@sundial>

On Sat, 25 Mar 2000, David Ascher wrote:

> The biggest modification which I think is needed to a Perl-like organization
> is that IMO there is value in knowing what packages are 'blessed' by Guido.
> In other words, some sort of Q/A mechanism would be good, if it can be kept
> simple.

You got a point. Anyone knows how the perl-porters decide what modules to 
put in source.tar.gz?

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping@lfw.org  Sun Mar 26 05:01:58 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 21:01:58 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003252035050.2741-100000@skuld.lfw.org>

On Sun, 26 Mar 2000, Moshe Zadka wrote:
> Here's a second version of the straw man proposal for the reorganization
> of modules in packages. Note that I'm treating it as a strictly 1.7
> proposal, so I don't care a "lot" about backwards compatiblity.

Hey, this looks pretty good.  For the most part i agree with
your layout.  Here are a few notes...

> net
[...]
> 	server
[...]

Good.

> text
[...]
> 	xml
> 		whatever the xml-sig puts here
> 	mail
> 		rfc822
> 		mime
> 			MimeWriter
> 			mimetools
> 			mimify
> 			mailcap
> 			mimetypes
> 			base64
> 			quopri
> 		mailbox
> 		mhlib
> 	binhex

I'm not convinced "mime" needs a separate branch here.
(This is the deepest part of the tree, and at three levels
small alarm bells went off in my head.)

For example, why text.binhex but text.mail.mime.base64?

> 	parse
> 		string
> 		re
> 		regex
> 		reconvert
> 		regex_syntax
> 		regsub
> 		shlex
> 	ConfigParser
> 	linecache
> 	multifile
> 	netrc

The "re" module, in particular, will get used a lot,
and it's not clear why these all belong under "parse".
I suggest dropping "parse" and moving these up.
What's "multifile" doing here instead of with the rest
of the mail/mime stuff?

> bin
[...]

I like this.  Good idea.

> 	gzip
> 	zlib
> 	aifc

Shouldn't "aifc" be under "sound"?

> 	image
[...]
> 	sound
[...]

> db
[...]

Yup.

> math
[...]
> time
[...]

Looks good.

> interpreter
[...]

How about just "interp"?

> security
[...]

> file
[...]
> 	lowlevel
> 		socket
> 		select

Why the separate "lowlevel" branch?
Why doesn't "socket" go under "net"?

> 	terminal
> 		termios
> 		pty
> 		tty
> 		readline

Why does "terminal" belong under "file"?
Maybe it could go under "ui"?  Hmm... "pty" doesn't
really belong.

> 	syslog

Hmm...

> serialize

> 	pickle
> 	cPickle
> 	shelve
> 	xdrlib
> 	copy
> 	copy_reg

"copy" doesn't really fit here under "serialize", and
"serialize" is kind of a long name.

How about a "data types" package?  We could then put
"struct", "UserDict", "UserList", "pprint", and "repr" here.

    data
        copy
        copy_reg
        pickle
        cPickle
        shelve
        xdrlib
        struct
        UserDict
        UserList
        pprint
        repr

On second thought, maybe "struct" fits better under "bin".

> threads
[...]
> ui
[...]

Uh huh.

> internal
> 	_codecs
> 	_locale
> 	_tkinter
> 	pcre
> 	strop
> 	posix

Not sure this is a good idea.  It means the Unicode
work lives under both "unicode" and "internal._codecs",
Tk is split between "ui" and "internal._tkinter",
regular expressions are split between "text.re" and
"internal.pcre".  I can see your motivation for getting
"posix" out of the way, but i suspect this is likely to
confuse people.

> users
> 	pwd
> 	grp
> 	nis

Hmm.  Yes, i suppose so.

> sgi
[...]
> unicode
[...]

Indeed.

> os
> UserDict
> UserList
> exceptions
> types
> operator
> user
> site

Yeah, these are all top-level (except maybe UserDict and
UserList, see above).

> locale

I think "locale" belongs under "math" with "fpformat" and
the others.  It's for numeric formatting.

> pure

What the heck is "pure"?

> formatter

This probably goes under "text".

> struct

See above under "data".  I can't decide whether "struct"
should be part of "data" or "bin".  Hmm... probably "bin" --
since, unlike the serializers under "data", "struct" does
not actually specify a serialization format, it only provides
fairly low-level operations.

Well, this leaves a few system-like modules that didn't
really fit elsewhere for me:

    pty
    tty
    termios
    syslog
    select
    getopt
    signal
    errno
    resource

They all seem to be Unix-related.  How about putting these
in a "unix" or "system" package?


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From Moshe Zadka <mzadka@geocities.com>  Sun Mar 26 05:58:34 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sun, 26 Mar 2000 07:58:34 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252035050.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003260742070.11386-100000@sundial>

On Sat, 25 Mar 2000, Ka-Ping Yee wrote:

> I'm not convinced "mime" needs a separate branch here.
> (This is the deepest part of the tree, and at three levels
> small alarm bells went off in my head.)

I've had my problems with that too, but it seemed to many modules were
mime specific.

> For example, why text.binhex but text.mail.mime.base64?

Actually, I thought about this (this isn't random at all): base64 encoding
is part of the mime standard, together with quoted-printable. Binhex
isn't. I don't know if you find it reason enough, and it may be smarter
just having a text.encode.{quopri,uu,base64,binhex}

> > 	parse
> > 		string
> > 		re
> > 		regex
> > 		reconvert
> > 		regex_syntax
> > 		regsub
> > 		shlex
> > 	ConfigParser
> > 	linecache
> > 	multifile
> > 	netrc
> 
> The "re" module, in particular, will get used a lot,

and 
from <something> import re

Doesn't seem too painful.

> and it's not clear why these all belong under "parse".

These are all used for parsing data (which does not have some pre-written
parser). I had problems with the name too...

> What's "multifile" doing here instead of with the rest
> of the mail/mime stuff?

It's also useful generally.

> Shouldn't "aifc" be under "sound"?

You're right.

> > interpreter
> [...]
> 
> How about just "interp"?

I've no *strong* feelings, just a vague "don't abbrev." hunch <wink>

> Why the separate "lowlevel" branch?

Because it is -- most Python code will use one of the higher level
modules.

> Why doesn't "socket" go under "net"?

What about UNIX domain sockets? Again, no *strong* opinion, though.

> > 	terminal
> > 		termios
> > 		pty
> > 		tty
> > 		readline
> 
> Why does "terminal" belong under "file"?

Because it is (a special kind of file)

> > serialize
> 
> > 	pickle
> > 	cPickle
> > 	shelve
> > 	xdrlib
> > 	copy
> > 	copy_reg
> 
> "copy" doesn't really fit here under "serialize", and
> "serialize" is kind of a long name.

I beg to disagree -- "copy" is frequently close to serialization, both in
the model (serializing to a "data structure") and in real life (that's the
way people copy stuff in Java, and UNIX too: think tar cvf - | tar xvf -)

What's more, copy_reg is used both for copy and for pickle

I do like the idea of "data-types" package, but it needs to be ironed 
out a bit.

> > internal
> > 	_codecs
> > 	_locale
> > 	_tkinter
> > 	pcre
> > 	strop
> > 	posix
> 
> Not sure this is a good idea.  It means the Unicode
> work lives under both "unicode" and "internal._codecs",
> Tk is split between "ui" and "internal._tkinter",
> regular expressions are split between "text.re" and
> "internal.pcre".  I can see your motivation for getting
> "posix" out of the way, but i suspect this is likely to
> confuse people.

You mistook my motivation -- I just want unadvertised modules (AKA
internal use modules) to live in a carefully segregate section of the
namespace. How would this confuse people? No one imports _tkinter or pcre,
so no one would notice the change.


> > locale
> 
> I think "locale" belongs under "math" with "fpformat" and
> the others.  It's for numeric formatting.

Only? And anyway, I doubt many people will think like that.

> > pure
> 
> What the heck is "pure"?

A module that helps work with purify.

> > formatter
> 
> This probably goes under "text".

You're right.

> Well, this leaves a few system-like modules that didn't
> really fit elsewhere for me:
> 
>     pty
>     tty
>     termios
>     syslog
>     select
>     getopt
>     signal
>     errno
>     resource
> 
> They all seem to be Unix-related.  How about putting these
> in a "unix" or "system" package?

"select", "signal" aren't UNIX specific.
"getopt" is used for generic argument processing, so it isn't really UNIX
specific. And I don't like the name "system" either. But I have no
constructive proposals about thos either.

so-i'll-just-shut-up-now-ly y'rs, Z.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From dan@cgsoftware.com  Sun Mar 26 06:05:44 2000
From: dan@cgsoftware.com (Daniel Berlin)
Date: Sat, 25 Mar 2000 22:05:44 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260742070.11386-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003252202110.11001-100000@propylaea.anduin.com>

> "select", "signal" aren't UNIX specific.
Huh?
How not?
Can you name a non-UNIX that is providing them?
(BeOS wouldn't count, select is broken, and nobody uses signals.)
and if you can, is it providing them for something other than "UNIX/POSIX
compatibility"
> "getopt" is used for generic argument processing, so it isn't really UNIX
> specific.

It's a POSIX.2 function.
I consider that UNIX.

> And I don't like the name "system" either. But I have no
> constructive proposals about thos either.
> 
> so-i'll-just-shut-up-now-ly y'rs, Z.
> --
just-picking-nits-ly y'rs,
Dan


From Moshe Zadka <mzadka@geocities.com>  Sun Mar 26 06:32:33 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sun, 26 Mar 2000 08:32:33 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252202110.11001-100000@propylaea.anduin.com>
Message-ID: <Pine.GSO.4.10.10003260830110.12676-100000@sundial>

On Sat, 25 Mar 2000, Daniel Berlin wrote:

> 
> > "select", "signal" aren't UNIX specific.
> Huh?
> How not?
> Can you name a non-UNIX that is providing them?

Win32. Both of them. I've even used select there.

> and if you can, is it providing them for something other than "UNIX/POSIX
> compatibility"

I don't know what it provides them for, but I've *used* *select* on
*WinNT*. I don't see why Python should make me feel bad when I'm doing
that.

> > "getopt" is used for generic argument processing, so it isn't really UNIX
> > specific.
> 
> It's a POSIX.2 function.
> I consider that UNIX.

Well, the argument style it processes is not unheard of in other OSes, and
it's nice to have command line apps that have a common ui. That's it!
"getopt" belongs in the ui package!


--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping@lfw.org  Sun Mar 26 07:23:45 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 23:23:45 -0800 (PST)
Subject: [Python-Dev] cPickle and cStringIO
Message-ID: <Pine.LNX.4.10.10003252320490.2741-100000@skuld.lfw.org>

Are there any objections to including

    try:
        from cPickle import *
    except:
        pass

in pickle and

    try:
        from cStringIO import *
    except:
        pass

in StringIO?


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From Moshe Zadka <mzadka@geocities.com>  Sun Mar 26 07:14:10 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sun, 26 Mar 2000 09:14:10 +0200 (IST)
Subject: [Python-Dev] cPickle and cStringIO
In-Reply-To: <Pine.LNX.4.10.10003252320490.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003260913130.12676-100000@sundial>

On Sat, 25 Mar 2000, Ka-Ping Yee wrote:

> Are there any objections to including
> 
>     try:
>         from cPickle import *
>     except:
>         pass
> 
> in pickle and
> 
>     try:
>         from cStringIO import *
>     except:
>         pass
> 
> in StringIO?

Yes, until Python types are subclassable. Currently, one can inherit from
pickle.Pickler/Unpickler and StringIO.StringIO.


--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping@lfw.org  Sun Mar 26 07:37:11 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 23:37:11 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>

Okay, here's another shot at it.  Notice a few things:

    - no text.mime package
    - encoders moved to text.encode
    - Unix stuff moved to unix package (no file.lowlevel, file.terminal)
    - aifc moved to bin.sound package
    - struct moved to bin package
    - locale moved to math package
    - linecache moved to interp package
    - data-type stuff moved to data package
    - modules in internal package moved to live with their friends

Modules that are deprecated or not really intended to be imported
are listed in parentheses (to give a better idea of the "real"
size of each package).  cStringIO and cPickle are parenthesized
in hopeful anticipation of agreement on my last message...


net
        urlparse
        urllib
        ftplib
        gopherlib
        imaplib
        poplib
        nntplib
        smtplib
        telnetlib
        httplib
        cgi
        server
                BaseHTTPServer
                CGIHTTPServer
                SimpleHTTPServer
                SocketServer
                asynchat
                asyncore
text
        re              # general-purpose parsing
        sgmllib
        htmllib
        htmlentitydefs
        xml
                whatever the xml-sig puts here
        mail
                rfc822
                mailbox
                mhlib
        encode          # i'm also ok with moving text.encode.* to text.*
                binhex
                uu
                base64
                quopri
        MimeWriter
        mimify
        mimetools
        mimetypes
        multifile
        mailcap         # special-purpose file parsing
        shlex
        ConfigParser
        netrc
        formatter
        (string, strop, pcre, reconvert, regex, regex_syntax, regsub)
bin
        gzip
        zlib
        chunk
        struct
        image
                imghdr
                colorsys        # a bit unsure, but doesn't go anywhere else
                imageop
                imgfile
                rgbimg
                yuvconvert
        sound
                aifc
                sndhdr
                toaiff
                audiodev
                sunau
                sunaudio
                wave
                audioop
                sunaudiodev
db
        anydbm
        whichdb
        bsddb
        dbm
        dbhash
        dumbdbm
        gdbm
math
        math            # library functions
        cmath
        fpectl          # type-related
        fpetest
        array
        mpz
        fpformat        # formatting
        locale
        bisect          # algorithm: also unsure, but doesn't go anywhere else
        random          # randomness
        whrandom
        crypt           # cryptography
        md5
        rotor
        sha
time
        calendar
        time
        tzparse
        sched
        timing
interp
        new
        linecache       # handling .py files
        py_compile
        code            # manipulating internal objects
        codeop
        dis
        traceback
        compileall
        keyword         # interpreter constants
        token
        symbol
        tokenize        # parsing
        parser
        bdb             # development
        pdb
        profile
        pyclbr
        tabnanny
        pstats
        rlcompleter     # this might go in "ui"...
security
        Bastion
        rexec
        ihooks
file
        dircache
        path -- a virtual module which would do a from <something>path import *
        nturl2path
        macurl2path
        filecmp
        fileinput
        StringIO
        glob
        fnmatch
        stat
        statcache
        statvfs
        tempfile
        shutil
        pipes
        popen2
        commands
        dl
        (dospath, posixpath, macpath, ntpath, cStringIO)
data
        pickle
        shelve
        xdrlib
        copy
        copy_reg
        UserDict
        UserList
        pprint
        repr
        (cPickle)
threads
        thread
        threading
        Queue
        mutex
ui
        _tkinter
        curses
        Tkinter
        cmd
        getpass
        getopt
        readline
users
        pwd
        grp
        nis
sgi
        al
        cd
        cl
        fl
        fm
        gl
        misc (what used to be sgimodule.c)
        sv
unicode
        _codecs
        codecs
        unicodedata
        unicodedatabase
unix
        errno
        resource
        signal
        posix
        posixfile
        socket
        select
        syslog
        fcntl
        termios
        pty
        tty
_locale
exceptions
sys
os
types
user
site
pure
operator


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From ping@lfw.org  Sun Mar 26 07:40:27 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 23:40:27 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>
Message-ID: <Pine.LNX.4.10.10003252337160.2741-100000@skuld.lfw.org>

Hey, while we're at it... as long as we're renaming modules,
what do you all think of getting rid of that "lib" suffix?

As in:

> net
>         urlparse
>         url
>         ftp
>         gopher
>         imap
>         pop
>         nntp
>         smtp
>         telnet
>         http
>         cgi
>         server
[...]
> text
>         re              # general-purpose parsing
>         sgml
>         html
>         htmlentitydefs
[...]


"import net.ftp" seems nicer to me than "import ftplib".

We could also just stick htmlentitydefs.entitydefs in html
and deprecate htmlentitydefs.


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From ping@lfw.org  Sun Mar 26 07:53:06 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 23:53:06 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260742070.11386-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003252300230.2741-100000@skuld.lfw.org>

On Sun, 26 Mar 2000, Moshe Zadka wrote:
> > For example, why text.binhex but text.mail.mime.base64?
> 
> Actually, I thought about this (this isn't random at all): base64 encoding
> is part of the mime standard, together with quoted-printable. Binhex
> isn't. I don't know if you find it reason enough, and it may be smarter
> just having a text.encode.{quopri,uu,base64,binhex}

I think i'd like that better, yes.

> > and it's not clear why these all belong under "parse".
> 
> These are all used for parsing data (which does not have some pre-written
> parser). I had problems with the name too...

And parsing is what the "text" package is about anyway.
I say move them up.  (See the layout in my other message.
Notice most of the regular-expression stuff is deprecated
anyway, so it's not like there are really that many.)

> > Why doesn't "socket" go under "net"?
> 
> What about UNIX domain sockets? Again, no *strong* opinion, though.

Bleck, you're right.  Well, i think we just have to pick one
or the other here, and i think most people would guess "net"
first.  (You can think of it as IPC, and file IPC-related
things under then "net" category...?)

> > Why does "terminal" belong under "file"?
> 
> Because it is (a special kind of file)

Only in Unix.  It's Unix that likes to think of all things,
including terminals, as files.

> I do like the idea of "data-types" package, but it needs to be ironed 
> out a bit.

See my other message for a possible suggested hierarchy...

> > > internal
[...]
> You mistook my motivation -- I just want unadvertised modules (AKA
> internal use modules) to live in a carefully segregate section of the
> namespace. How would this confuse people? No one imports _tkinter or pcre,
> so no one would notice the change.

I think it makes more sense to classify modules by their
topic rather than their exposure.  (For example, you wouldn't
move deprecated modules to a "deprecated" package.)

Keep in mind that (well, at least to me) the main point of
any naming hierarchy is to avoid name collisions.  "internal"
doesn't really help that purpose.  You also want to be sure
(or as sure as you can) that modules will be obvious to find
in the hierarchy.  An "internal" package creates a distinction
orthogonal to the topic-matter distinction we're using for the
rest of the packages, which *potentially* introduces the
question "well... is this module internal or not?" for every
other module.  Yes, admittedly this is only "potentially",
but i hope you see the abstract point i'm trying to make...

> > > locale
> > 
> > I think "locale" belongs under "math" with "fpformat" and
> > the others.  It's for numeric formatting.
> 
> Only? And anyway, I doubt many people will think like that.

Yeah, it is pretty much only for numeric formatting.  The
more generic locale stuff seems to be in _locale.

> > They all seem to be Unix-related.  How about putting these
> > in a "unix" or "system" package?
> 
> "select", "signal" aren't UNIX specific.

Yes, but when they're available on other systems they're an
attempt to emulate Unix or Posix functionality, aren't they?

> Well, the argument style it processes is not unheard of in other OSes, and
> it's nice to have command line apps that have a common ui. That's it!
> "getopt" belongs in the ui package!

I like ui.getopt.  It's a pretty good idea.


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From Moshe Zadka <mzadka@geocities.com>  Sun Mar 26 08:05:49 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sun, 26 Mar 2000 10:05:49 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003261004550.14456-100000@sundial>

+1. I've had minor nits, but nothing is perfect, and this is definitely
"good enough".

Now we'll just have to wait until the BDFL says something...

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From Moshe Zadka <mzadka@geocities.com>  Sun Mar 26 08:06:59 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sun, 26 Mar 2000 10:06:59 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252337160.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003261006280.14456-100000@sundial>

On Sat, 25 Mar 2000, Ka-Ping Yee wrote:

> Hey, while we're at it... as long as we're renaming modules,
> what do you all think of getting rid of that "lib" suffix?

+0

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From Moshe Zadka <mzadka@geocities.com>  Sun Mar 26 08:19:34 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sun, 26 Mar 2000 10:19:34 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252300230.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003261017470.14456-100000@sundial>

On Sat, 25 Mar 2000, Ka-Ping Yee wrote:

> > "select", "signal" aren't UNIX specific.
> 
> Yes, but when they're available on other systems they're an
> attempt to emulate Unix or Posix functionality, aren't they?

I thinki "signal" is ANSI C, but I'm not sure.

no-other-comments-ly y'rs, Z.


From gstein@lyra.org  Sun Mar 26 11:52:53 2000
From: gstein@lyra.org (Greg Stein)
Date: Sun, 26 Mar 2000 03:52:53 -0800 (PST)
Subject: [Python-Dev] Windows and PyObject_NEW
In-Reply-To: <1258123323-10623548@hypernet.com>
Message-ID: <Pine.LNX.4.10.10003260350510.7085-100000@nebula.lyra.org>

On Sat, 25 Mar 2000, Gordon McMillan wrote:
>...
> I doubt very much that you would break anybody's code by 
> removing the Windows specific behavior.
> 
> But it seems to me that unless Python always uses the 
> default malloc, those of us who write C++ extensions will have 
> to override operator new? I'm not sure. I've used placement 
> new to allocate objects in a memory mapped file, but I've never 
> tried to muck with the global memory policy of C++ program.

Actually, the big problem arises when you have debug vs. non-debug DLLs.
malloc() uses different heaps based on the debug setting. As a result, it
is a bad idea to call malloc() from a debug DLL and free() it from a
non-debug DLL.

If the allocation pattern is fixed, then things may be okay. IF.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Sun Mar 26 12:02:40 2000
From: gstein@lyra.org (Greg Stein)
Date: Sun, 26 Mar 2000 04:02:40 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003260359070.7085-100000@nebula.lyra.org>

On Sun, 26 Mar 2000, Moshe Zadka wrote:
>...
> [ tree ]

This is a great start. I have two comments:

1) keep it *very* shallow. depth just makes it conceptually difficult.

2) you're pushing too hard. modules do not *have* to go into a package.
   there are some placements that you've made which are very
   questionable... it appears they are done for movement's sake rather
   than for being "right"

I'm off to sleep, but will look into specific comments tomorrow or so.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Sun Mar 26 12:14:32 2000
From: gstein@lyra.org (Greg Stein)
Date: Sun, 26 Mar 2000 04:14:32 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003251856.NAA09636@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003260403180.7085-100000@nebula.lyra.org>

On Sat, 25 Mar 2000, Guido van Rossum wrote:
> > I say "do it incrementally" while others say "do it all at once."
> > Personally, I don't think it is possible to do all at once. As a
> > corollary, if you can't do it all at once, but you *require* that it be
> > done all at once, then you have effectively deferred the problem. To put
> > it another way, Guido has already invented a reason to not do it: he just
> > requires that it be done all at once. Result: it won't be done.
> 
> Bullshit, Greg.  (I don't normally like to use such strong words, but
> since you're being confrontational here...)

Fair enough, and point accepted. Sorry. I will say, tho, that you've taken
this slightly out of context. The next paragraph explicitly stated that I
don't believe you had this intent. I just felt that coming up with a
complete plan before doing anything would be prone to failure. You asked
to invent a new reason :-), so I said you had one already :-)

Confrontational? Yes, guilty as charged. I was a bit frustrated.

> I'm all for doing it incrementally -- but I want the plan for how to
> do it made up front.  That doesn't require all the details to be
> worked out -- but it requires a general idea about what kind of things
> we will have in the namespace and what kinds of names they get.  An
> organizing principle, if you like.  If we were to decide later that we
> go for a Java-like deep hierarchy, the network package would have to
> be moved around again -- what a waste.

All righty. So I think there is probably a single question that I have
here:

  Moshe posted a large breakdown of how things could be packaged. He and
  Ping traded a number of comments, and more will be coming as soon as
  people wake up :-)

  However, if you are only looking for a "general idea", then should
  python-dev'ers nit pick the individual modules, or just examine the
  general breakdown and hierarchy?

thx,
-g

-- 
Greg Stein, http://www.lyra.org/


From Moshe Zadka <mzadka@geocities.com>  Sun Mar 26 12:09:02 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sun, 26 Mar 2000 14:09:02 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003260359070.7085-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003261405460.25062-100000@sundial>

On Sun, 26 Mar 2000, Greg Stein wrote:

> This is a great start. I have two comments:
> 
> 1) keep it *very* shallow. depth just makes it conceptually difficult.

I tried, and Ping shallowed it even more. 
BTW: Anyone who cares to comment, please comment on Ping's last
suggestion. I pretty much agree with the changes he made.

> 2) you're pushing too hard. modules do not *have* to go into a package.
>    there are some placements that you've made which are very
>    questionable... it appears they are done for movement's sake rather
>    than for being "right"

Well, I'm certainly sorry I gave that impression -- the reason I wans't
"right" wasn't that, it was more my desire to be "fast" -- I wanted to
have some proposal out the door, since it is harder to argue about
something concrete. The biggest prrof of concept that we all agree is that
no one seriously took objections to anything -- there were just some minor
nits to pick.

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From Moshe Zadka <mzadka@geocities.com>  Sun Mar 26 12:11:10 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Sun, 26 Mar 2000 14:11:10 +0200 (IST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <Pine.LNX.4.10.10003260403180.7085-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003261409540.25062-100000@sundial>

On Sun, 26 Mar 2000, Greg Stein wrote:

>   Moshe posted a large breakdown of how things could be packaged. He and
>   Ping traded a number of comments, and more will be coming as soon as
>   people wake up :-)

Just a general comment -- it's so much fun to live in a different zone
then all of you guys.

just-wasting-time-ly y'rs, Z.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From gstein@lyra.org  Sun Mar 26 12:23:57 2000
From: gstein@lyra.org (Greg Stein)
Date: Sun, 26 Mar 2000 04:23:57 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003261405460.25062-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003260420480.7085-100000@nebula.lyra.org>

On Sun, 26 Mar 2000, Moshe Zadka wrote:
> On Sun, 26 Mar 2000, Greg Stein wrote:
>...
> > 2) you're pushing too hard. modules do not *have* to go into a package.
> >    there are some placements that you've made which are very
> >    questionable... it appears they are done for movement's sake rather
> >    than for being "right"
> 
> Well, I'm certainly sorry I gave that impression -- the reason I wans't
> "right" wasn't that, it was more my desire to be "fast" -- I wanted to
> have some proposal out the door, since it is harder to argue about
> something concrete. The biggest prrof of concept that we all agree is that
> no one seriously took objections to anything -- there were just some minor
> nits to pick.

Not something to apologize for! :-)

Well, the indicator was the line in your original post about "unhandled
modules" and the conversation between you and Ping with statements along
the lines of "wasn't sure where to put this." I say just leave it then :-)

If a module does not make *obvious* sense to be in a package, then it
should not be there. For example: locale. That is not about numbers or
about text. It has general utility. If there was an i18n package, then it
would go there. Otherwise, don't force it somewhere else. Other packages
are similar, so don't single out my comment about locale.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From DavidA@ActiveState.com  Sun Mar 26 18:09:15 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Sun, 26 Mar 2000 10:09:15 -0800
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003260420480.7085-100000@nebula.lyra.org>
Message-ID: <LMBBIEIJKMPMLBONJMFCGEIACDAA.DavidA@ActiveState.com>

> If a module does not make *obvious* sense to be in a package, then it
> should not be there. For example: locale. That is not about numbers or
> about text. It has general utility. If there was an i18n package, then it
> would go there. Otherwise, don't force it somewhere else. Other packages
> are similar, so don't single out my comment about locale.

I maintain that a general principle re: what the aim of this reorg is is
needed before the partitioning of the space can make sense.

What Moshe and Ping have is a good stab at partitioning of a subspace of the
total space of Python modules and packages, i.e., the standard library.

If we limit the aim of the reorg to cover just that subspace, then that's
fine and Ping's proposal seems grossly fine to me.

If we want to have a Perl-like packaging, then we _need_ to take into
account all known Python modules of general utility, such as the database
modules, the various GUI packages, the mx* packages, Aaron's work, PIL,
etc., etc.  Ignoring those means that the dataset used to decide the
partitioning function is highly biased.  Given the larger dataset, locale
might very well fit in a not-toplevel location.

I know that any organizational scheme is going to be optimal at best at its
inception, and that as history happens, it will become suboptimal.  However,
it's important to know what the space being partitioned is supposed to look
like.

A final comment: there's a history and science to this kind of organization,
which is part of library science.  I suspect there is quite a bit of
knowledge available as to organizing principles to do it right.  It would be
nice if someone could research it a bit and summarize the basic principles
to the rest of us.

I agree with Greg that we need high-level input from Guido on this.

--david 'academic today' ascher


From ping@lfw.org  Sun Mar 26 20:34:11 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Sun, 26 Mar 2000 12:34:11 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003260420480.7085-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.10.10003261142420.2741-100000@skuld.lfw.org>

On Sun, 26 Mar 2000, Greg Stein wrote:
> 
> If a module does not make *obvious* sense to be in a package, then it
> should not be there. For example: locale. That is not about numbers or
> about text. It has general utility. If there was an i18n package, then it
> would go there. Otherwise, don't force it somewhere else. Other packages
> are similar, so don't single out my comment about locale.

I goofed.  I apologize.  Moshe and Greg are right: locale isn't
just about numbers.  I just read the comment at the top of locale.py:

    "Support for number formatting using the current locale settings"

and didn't notice the

    from _locale import *

a couple of lines down.

"import locale; dir(locale)" didn't work for me because for some
reason there's no _locale built-in on my system (Red Hat 6.1,
python-1.5.1-10).  So i looked for 'def's and they all looked
like they had to do with numeric formatting.

My mistake.  "locale", at least, belongs at the top level.

Other candidates for top-level:

    bisect              # algorithm
    struct              # more general than "bin" or "data"
    colorsys            # not really just for image file formats
    yuvconvert          # not really just for image file formats
    rlcompleter         # not really part of the interpreter
    dl                  # not really just about files

Alternatively, we could have: ui.rlcompleter, unix.dl

(It would be nice, by the way, to replace "bisect" with
an "algorithm" module containing some nice pedagogical
implementations of things like bisect, quicksort, heapsort,
Dijkstra's algorithm etc.)

The following also could be left at the top-level, since
they seem like applications (i.e. they probably won't
get imported by code, only interactively).  No strong
opinion on this.

    bdb
    pdb
    pyclbr
    tabnanny
    profile
    pstats

Also... i was avoiding calling the "unix" package "posix"
because we already have a "posix" module.  But wait... the
proposed tree already contains "math" and "time" packages.
If there is no conflict (is there a conflict?) then the
"unix" package should probably be named "posix".


-- ?!ng

"In the sciences, we are now uniquely privileged to sit side by side
with the giants on whose shoulders we stand."
    -- Gerald Holton


From Moshe Zadka <mzadka@geocities.com>  Mon Mar 27 05:35:23 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Mon, 27 Mar 2000 07:35:23 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003261142420.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003270728070.12902-100000@sundial>

On Sun, 26 Mar 2000, Ka-Ping Yee wrote:

> The following also could be left at the top-level, since
> they seem like applications (i.e. they probably won't
> get imported by code, only interactively).  No strong
> opinion on this.
> 
>     bdb
>     pdb
>     pyclbr
>     tabnanny
>     profile
>     pstats

Let me just state my feelings about the interpreter package: since Python
programs are probably the most suited to reasoning about Python programs 
(among other things, thanks to the strong introspection capabilities of
Python), many Python modules were written to supply a convenient interface
to that introspection. These modules are *only* needed by programs dealing
with Python programs, and hence should live in a well defined part of the
namespace. I regret calling it "interpreter" though: "Python" is a better
name (something like that java.lang package)

> Also... i was avoiding calling the "unix" package "posix"
> because we already have a "posix" module.  But wait... the
> proposed tree already contains "math" and "time" packages.

Yes. That was a hard decision I made, and I'm sort of waiting for Guido to
veto it: it would negate the easy backwards compatible path of providing
a toplevel module for each module which is moved somewhere else which does
"from import *".

> If there is no conflict (is there a conflict?) then the
> "unix" package should probably be named "posix".

I hardly agree. "dl", for example, is a common function on unices, but it
is not part of the POSIX standard. I think "posix" module should have
POSIX fucntions, and the "unix" package should deal with functinality
available on real-life unices.

standards-are-fun-aren't-they-ly y'rs, Z.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From pf@artcom-gmbh.de  Mon Mar 27 06:52:25 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Mon, 27 Mar 2000 08:52:25 +0200 (MEST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.GSO.4.10.10003270728070.12902-100000@sundial> from Moshe Zadka at "Mar 27, 2000  7:35:23 am"
Message-ID: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>

Hi!

Moshe Zadka wrote:
> Yes. That was a hard decision I made, and I'm sort of waiting for Guido to
> veto it: it would negate the easy backwards compatible path of providing
> a toplevel module for each module which is moved somewhere else which does
> "from import *".

If the result of this renaming initiative will be that I can't use
	import sys, os, time, re, struct, cPickle, parser
	import Tkinter; Tk=Tkinter; del Tkinter
anymore in Python 1.x and instead I have to change this into (for example):
	form posix import time
	from text import re
	from bin import struct
	from Python import parser
	from ui import Tkinter; ...
	...
I would really really *HATE* this change!

[side note:
  The 'from MODULE import ...' form is evil and I have abandoned its use
  in favor of the 'import MODULE' form in 1987 or so, as our Modula-2
  programs got bigger and bigger.  With 20+ software developers working
  on a ~1,000,000 LOC of Modula-2 software system, this decision
  proofed itself well.

  The situation with Python is comparable.  Avoiding 'from ... import'
  rewards itself later, when your software has grown bigger and when it
  comes to maintaince by people not familar with the used modules.
]

May be I didn't understand what this new subdivision of the standard
library should achieve.  

The library documentation provides a existing logical subdivision into 
chapters, which group the library into several kinds of services.  
IMO this subdivision could be discussed and possibly revised.  
But at the moment I got the impression, that it was simply ignored.  
Why?  What's so bad with it?  
Why is a subdivision on the documentation level not sufficient?  
Why should modules be moved into packages?  I don't get it.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From Moshe Zadka <mzadka@geocities.com>  Mon Mar 27 07:09:18 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Mon, 27 Mar 2000 09:09:18 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.GSO.4.10.10003270904190.15099-100000@sundial>

On Mon, 27 Mar 2000, Peter Funk wrote:

> If the result of this renaming initiative will be that I can't use
> 	import sys, os, time, re, struct, cPickle, parser
> 	import Tkinter; Tk=Tkinter; del Tkinter
> anymore in Python 1.x and instead I have to change this into (for example):
> 	form posix import time

from time import time

> 	from text import re
> 	from bin import struct
> 	from Python import parser
> 	from ui import Tkinter; ...

Yes.

> I would really really *HATE* this change!

Well, I'm sorry to hear that -- I'm waiting for this change to happen
for a long time.

> [side note:
>   The 'from MODULE import ...' form is evil and I have abandoned its use
>   in favor of the 'import MODULE' form in 1987 or so, as our Modula-2
>   programs got bigger and bigger.  With 20+ software developers working
>   on a ~1,000,000 LOC of Modula-2 software system, this decision
>   proofed itself well.

Well, yes. Though syntactically equivalent,

from package import module

Is the recommended way to use packages, unless there is a specific need.

> May be I didn't understand what this new subdivision of the standard
> library should achieve.  

Namespace cleanup. Too many toplevel names seem evil to some of us.

> Why is a subdivision on the documentation level not sufficient?  
> Why should modules be moved into packages?  I don't get it.

To allow a greater number of modules to live without worrying about
namespace collision.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping@lfw.org  Mon Mar 27 08:08:57 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Mon, 27 Mar 2000 00:08:57 -0800 (PST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org>

Hi, Peter.

Your question as to the purpose of module reorganization is
well worth asking, and perhaps we should stand back for a
while and try to really answer it well first.

I think that my answers for your question would be:

    1. To alleviate potential namespace collision.

    2. To permit talking about packages as a unit.

I hereby solicit other reasons from the rest of the group...

Reason #1 is not a serious problem yet, but i think i've
seen a few cases where it might start to be an issue.
Reason #2 has to do with things like assigning people
responsibility for taking care of a particular package,
or making commitments about which packages will be
available with which distributions or platforms.  Hence,
for example, the idea of the "unix" package.

Neither of these reasons necessitate a deep and holy
hierarchy, so we certainly want to keep it shallow and
simple if we're going to do this at all.

> If the result of this renaming initiative will be that I can't use
> 	import sys, os, time, re, struct, cPickle, parser
> 	import Tkinter; Tk=Tkinter; del Tkinter
> anymore in Python 1.x and instead I have to change this into (for example):
> 	form posix import time
> 	from text import re
> 	from bin import struct
> 	from Python import parser
> 	from ui import Tkinter; ...

Won't

    import sys, os, time.time, text.re, bin.struct, data.pickle, python.parser

also work?  ...i hope?

> The library documentation provides a existing logical subdivision into 
> chapters, which group the library into several kinds of services.  
> IMO this subdivision could be discussed and possibly revised.  
> But at the moment I got the impression, that it was simply ignored.  
> Why?  What's so bad with it?  

I did look at the documentation for some guidance in arranging
the modules, though admittedly it didn't direct me much.


-- ?!ng

"In the sciences, we are now uniquely privileged to sit side by side
with the giants on whose shoulders we stand."
    -- Gerald Holton


From pf@artcom-gmbh.de  Mon Mar 27 08:35:50 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Mon, 27 Mar 2000 10:35:50 +0200 (MEST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org> from Ka-Ping Yee at "Mar 27, 2000  0: 8:57 am"
Message-ID: <m12ZV02-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> > 	import sys, os, time, re, struct, cPickle, parser
[...]

Ka-Ping Yee:
> Won't
> 
>     import sys, os, time.time, text.re, bin.struct, data.pickle, python.parser
> 
> also work?  ...i hope?

That is even worse.  So not only the 'import' sections, which I usually 
keep at the top of my modules, have to be changed:  This way for example
're.compile(...' has to be changed into 'text.re.compile(...' all over 
the place possibly breaking the 'Maximum Line Length' styleguide rule.

Regards, Peter


From pf@artcom-gmbh.de  Mon Mar 27 10:16:48 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Mon, 27 Mar 2000 12:16:48 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
Message-ID: <m12ZWZk-000CpwC@artcom0.artcom-gmbh.de>

String objects have grown methods since 1.5.2.  So it makes sense to
provide a class 'UserString' similar to 'UserList' and 'UserDict', so
that there is a standard base class to inherit from, if someone has the
desire to extend the string methods.  What do you think?

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From fdrake@acm.org  Mon Mar 27 15:12:55 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 27 Mar 2000 10:12:55 -0500 (EST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003261405460.25062-100000@sundial>
References: <Pine.LNX.4.10.10003260359070.7085-100000@nebula.lyra.org>
 <Pine.GSO.4.10.10003261405460.25062-100000@sundial>
Message-ID: <14559.31351.783771.472320@weyr.cnri.reston.va.us>

Moshe Zadka writes:
 > Well, I'm certainly sorry I gave that impression -- the reason I wans't
 > "right" wasn't that, it was more my desire to be "fast" -- I wanted to
 > have some proposal out the door, since it is harder to argue about
 > something concrete. The biggest prrof of concept that we all agree is that
 > no one seriously took objections to anything -- there were just some minor
 > nits to pick.

  It's *really easy* to argue about something concrete.  ;)  It's just 
harder to misunderstand the specifics of the proposal.
  It's too early to say what people think; not enough people have had
time to look at the proposals yet.
  On the other hand, I think its great -- that we have a proposal to
discuss.  I'll make my comments after I've read through the last
version posted when I have time to read these.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake@acm.org  Mon Mar 27 16:20:43 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 27 Mar 2000 11:20:43 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org>
References: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
 <Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org>
Message-ID: <14559.35419.793906.868645@weyr.cnri.reston.va.us>

Peter Funk said:
 > The library documentation provides a existing logical subdivision into 
 > chapters, which group the library into several kinds of services.  
 > IMO this subdivision could be discussed and possibly revised.  
 > But at the moment I got the impression, that it was simply ignored.  
 > Why?  What's so bad with it?  

Ka-Ping Yee writes:
 > I did look at the documentation for some guidance in arranging
 > the modules, though admittedly it didn't direct me much.

  The library reference is pretty well disorganized at this point.  I
want to improve that for the 1.6 docs.
  I received a suggestion a few months back, but haven't had a chance
to dig into it, or even respond to the email.  ;(


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From jeremy@cnri.reston.va.us  Mon Mar 27 17:14:46 2000
From: jeremy@cnri.reston.va.us (Jeremy Hylton)
Date: Mon, 27 Mar 2000 12:14:46 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12ZV02-000CpwC@artcom0.artcom-gmbh.de>
References: <Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org>
 <m12ZV02-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <14559.38662.835289.499610@goon.cnri.reston.va.us>

>>>>> "PF" == Peter Funk <pf@artcom-gmbh.de> writes:

  PF> That is even worse.  So not only the 'import' sections, which I
  PF> usually keep at the top of my modules, have to be changed: This
  PF> way for example 're.compile(...' has to be changed into
  PF> 'text.re.compile(...' all over the place possibly breaking the
  PF> 'Maximum Line Length' styleguide rule.

There is nothing wrong with changing only the import statement:
    from text import re

The only problematic use of from ... import ... is
    from text.re import *
which adds an unspecified set of names to the current namespace.

Jeremy


From Moshe Zadka <mzadka@geocities.com>  Mon Mar 27 17:59:34 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Mon, 27 Mar 2000 19:59:34 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14559.35419.793906.868645@weyr.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003271956270.14218-100000@sundial>

Peter Funk said:
> The library documentation provides a existing logical subdivision into 
> chapters, which group the library into several kinds of services.  
> IMO this subdivision could be discussed and possibly revised.  
> But at the moment I got the impression, that it was simply ignored.  
> Why?  What's so bad with it?  

Ka-Ping Yee writes:
> I did look at the documentation for some guidance in arranging
> the modules, though admittedly it didn't direct me much.

Fred L. Drake, Jr. writes:
>   The library reference is pretty well disorganized at this point.  I
> want to improve that for the 1.6 docs.

Let me just mention where my inspirations came from: shame of shames, it
came from Perl. It's hard to use Perl's organization as is, because it
doesn't (view itself) as a general purpose langauge: so things like CGI.pm
are toplevel, and regex's are part of the syntax. However, there are a lot 
of good hints there.


--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From klm@digicool.com  Mon Mar 27 18:31:01 2000
From: klm@digicool.com (Ken Manheimer)
Date: Mon, 27 Mar 2000 13:31:01 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14559.38662.835289.499610@goon.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003271316090.1711-100000@korak.digicool.com>

On Mon, 27 Mar 2000, Jeremy Hylton wrote:

> >>>>> "PF" == Peter Funk <pf@artcom-gmbh.de> writes:
> 
>   PF> That is even worse.  So not only the 'import' sections, which I
>   PF> usually keep at the top of my modules, have to be changed: This
>   PF> way for example 're.compile(...' has to be changed into
>   PF> 'text.re.compile(...' all over the place possibly breaking the
>   PF> 'Maximum Line Length' styleguide rule.
> 
> There is nothing wrong with changing only the import statement:
>     from text import re
> 
> The only problematic use of from ... import ... is
>     from text.re import *
> which adds an unspecified set of names to the current namespace.

Actually, i think there's another important gotcha with from .. import
which may be contributing to peter's sense of concern, but which i don't
think needs to in this case.  I also thought we had discussed providing
transparency in general, at least of the 1.x series.  ?

The other gotcha i mean applies when the thing you're importing is a
terminal, ie a non-module.  Then, changes to the assignments of the names
in the original module aren't reflected in the names you've imported -
they're decoupled from the namespace of the original module.

When the thing you're importing is, itself, a module, the same kind of
thing *can* happen, but you're more generally concerned with tracking
revisions to the contents of those modules, which is tracked ok in the
thing you "from .. import"ed.

I thought the other problem peter was objecting to, having to change the
import sections in the first place, was going to be avoided in the 1.x
series (if we do this kind of thing) by inherently extending the import
path to include all the packages, so people need not change their code?  
Seems like most of this would be fairly transparent w.r.t. the operation
of existing applications.  Have i lost track of the discussion?

Ken
klm@digicool.com


From Moshe Zadka <mzadka@geocities.com>  Mon Mar 27 18:55:35 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Mon, 27 Mar 2000 20:55:35 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.LNX.4.21.0003271316090.1711-100000@korak.digicool.com>
Message-ID: <Pine.GSO.4.10.10003272051001.14639-100000@sundial>

On Mon, 27 Mar 2000, Ken Manheimer wrote:

> I also thought we had discussed providing
> transparency in general, at least of the 1.x series.  ?

Yes, but it would be clearly marked as deprecated in 1.7, print out
error messages in 1.8 and won't work at all in 3000. (That's my view on
the point, but I got the feeling this is where the wind is blowing).

So the transperancy mechanism is intended only to be "something backwards
compatible"...it's not supposed to be a reason why things are ugly (I
don't think they are, though). 

BTW: the transperancy mechanism I suggested was not pushing things into
the import path, but rather having toplevel modules which "from import *"
from the modules that were moved.

E.g.,
re.py would contain

# Deprecated: don't import re, it won't work in future releases
from text.re import *

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From skip@mojam.com (Skip Montanaro)  Mon Mar 27 19:34:39 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Mon, 27 Mar 2000 13:34:39 -0600 (CST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
References: <Pine.GSO.4.10.10003270728070.12902-100000@sundial>
 <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <14559.47055.604042.381126@beluga.mojam.com>

    Peter> The library documentation provides a existing logical subdivision
    Peter> into chapters, which group the library into several kinds of
    Peter> services.

Perhaps it makes sense to revise the library reference manual's
documentation to reflect the proposed package hierarchy once it becomes
concrete.

-- 
Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/


From skip@mojam.com (Skip Montanaro)  Mon Mar 27 19:52:08 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Mon, 27 Mar 2000 13:52:08 -0600 (CST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252035050.2741-100000@skuld.lfw.org>
References: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>
 <Pine.LNX.4.10.10003252035050.2741-100000@skuld.lfw.org>
Message-ID: <14559.48104.34263.680278@beluga.mojam.com>

Responding to an early item in this thread and trying to adapt to later
items...

Ping wrote:

    I'm not convinced "mime" needs a separate branch here.  (This is the
    deepest part of the tree, and at three levels small alarm bells went off
    in my head.)

It's not clear that mime should be beneath text/mail.  Moshe moved it up a
level, but not the way I would have done it.  I think the mime stuff still
belongs in a separate mime package.  I wouldn't just sprinkle the modules
under text.  I see two possibilities:

    text>mime
    net>mime

I prefer net>mime, because MIME and its artifacts are used heavily in
networked applications where the content being transferred isn't text.

-- 
Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/


From fdrake@acm.org  Mon Mar 27 20:05:32 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 27 Mar 2000 15:05:32 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14559.47055.604042.381126@beluga.mojam.com>
References: <Pine.GSO.4.10.10003270728070.12902-100000@sundial>
 <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
 <14559.47055.604042.381126@beluga.mojam.com>
Message-ID: <14559.48908.354425.313775@weyr.cnri.reston.va.us>

Skip Montanaro writes:
 > Perhaps it makes sense to revise the library reference manual's
 > documentation to reflect the proposed package hierarchy once it becomes
 > concrete.

  I'd go for this.  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido@python.org  Mon Mar 27 20:43:06 2000
From: guido@python.org (Guido van Rossum)
Date: Mon, 27 Mar 2000 15:43:06 -0500
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
Message-ID: <200003272043.PAA18445@eric.cnri.reston.va.us>

The _tkinter.c source code is littered with #ifdefs that mostly center
around distinguishing between Tcl/Tk 8.0 and older versions.  The
two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.

Would it be reasonable to assume that everybody is using at least
Tcl/Tk version 8.0?  This would simplify the code somewhat.

Or should I ask this in a larger forum?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake@acm.org  Mon Mar 27 20:59:04 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 27 Mar 2000 15:59:04 -0500 (EST)
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us>
References: <200003272043.PAA18445@eric.cnri.reston.va.us>
Message-ID: <14559.52120.633384.651377@weyr.cnri.reston.va.us>

Guido van Rossum writes:
 > The _tkinter.c source code is littered with #ifdefs that mostly center
 > around distinguishing between Tcl/Tk 8.0 and older versions.  The
 > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.
 > 
 > Would it be reasonable to assume that everybody is using at least
 > Tcl/Tk version 8.0?  This would simplify the code somewhat.

  Simplify!  It's more important that the latest versions are
supported than pre-8.0 versions.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From gstein@lyra.org  Mon Mar 27 21:31:30 2000
From: gstein@lyra.org (Greg Stein)
Date: Mon, 27 Mar 2000 13:31:30 -0800 (PST)
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: <14559.52120.633384.651377@weyr.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003271330000.17374-100000@nebula.lyra.org>

On Mon, 27 Mar 2000, Fred L. Drake, Jr. wrote:
> Guido van Rossum writes:
>  > The _tkinter.c source code is littered with #ifdefs that mostly center
>  > around distinguishing between Tcl/Tk 8.0 and older versions.  The
>  > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.
>  > 
>  > Would it be reasonable to assume that everybody is using at least
>  > Tcl/Tk version 8.0?  This would simplify the code somewhat.
> 
>   Simplify!  It's more important that the latest versions are
> supported than pre-8.0 versions.

I strongly agree.

My motto is, "if the latest Python version doesn't work for you, then
don't upgrade!"  This is also Open Source -- they can easily get the
source to the old _Tkinter if they want new Python + 7.x support.

If you ask in a larger forum, then you are certain to get somebody to say,
"yes... I need that support." Then you have yourself a quandary :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From Fredrik Lundh" <effbot@telia.com  Mon Mar 27 21:46:50 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Mon, 27 Mar 2000 23:46:50 +0200
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
References: <200003272043.PAA18445@eric.cnri.reston.va.us>
Message-ID: <009801bf9835$f85b87e0$34aab5d4@hagrid>

Guido van Rossum wrote:
> The _tkinter.c source code is littered with #ifdefs that mostly center
> around distinguishing between Tcl/Tk 8.0 and older versions.  The
> two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.
>=20
> Would it be reasonable to assume that everybody is using at least
> Tcl/Tk version 8.0?  This would simplify the code somewhat.

yes.

if people are using older versions, they can always
use the version shipped with 1.5.2.

(has anyone actually tested that one with pre-8.0
versions, btw?)

> Or should I ask this in a larger forum?

maybe.  maybe not.

</F>


From jack@oratrix.nl  Mon Mar 27 21:58:56 2000
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 27 Mar 2000 23:58:56 +0200
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Message by Moshe Zadka <moshez@math.huji.ac.il> ,
 Sat, 25 Mar 2000 12:16:23 +0200 (IST) , <Pine.GSO.4.10.10003251214081.3539-100000@sundial>
Message-ID: <20000327215901.ABA08F58C1@oratrix.oratrix.nl>

Recently, Moshe Zadka <moshez@math.huji.ac.il> said:
> Here's a reason: there shouldn't be changes we'll retract later -- we
> need to come up with the (more or less) right hierarchy the first time,
> or we'll do a lot of work for nothing.

I think I disagree here (hmm, it's probably better to say that I
agree, but I agree on a tangent:-). I think we can be 100% sure that
we're wrong the first time around, and we should plan for that.

One of the reasons why were' wrong is because the world is moving
on. A module that at this point in time will reside at some level in
the hierarchy may in a few years (or shorter) be one of a large family 
and be beter off elsewhere in the hierarchy. It would be silly if it
would have to stay where it was because of backward compatability.

If we plan for being wrong we can make the mistakes less painful. I
think that a simple scheme where a module can say "I'm expecting the
Python 1.6 namespace layout" would make transition to a completely
different Python 1.7 namespace layout a lot less painful, because some 
agent could do the mapping. This can either happen at runtime (through 
a namespace, or through an import hook, or probably through other
tricks as well) or optionally by a script that would do the
translations.

Of course this doesn't mean we should go off and hack in a couple of
namespaces (hence my "agreeing on a tangent"), but it does mean that I
think Gregs idea of not wanting to change everything at once has
merit.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From pf@artcom-gmbh.de  Mon Mar 27 22:11:39 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Tue, 28 Mar 2000 00:11:39 +0200 (MEST)
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 27, 2000  3:43: 6 pm"
Message-ID: <m12ZhjX-000CpzC@artcom0.artcom-gmbh.de>

Guido van Rossum:
> Or should I ask this in a larger forum?

Don't ask.  Simply tell the people on comp.lang.python that support
for the ancient Tcl/Tk versions < 8.0 will be dropped in Python 1.6.
Period. ;-)

Regards, Peter


From guido@python.org  Mon Mar 27 22:17:33 2000
From: guido@python.org (Guido van Rossum)
Date: Mon, 27 Mar 2000 17:17:33 -0500
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: Your message of "Tue, 28 Mar 2000 00:11:39 +0200."
 <m12ZhjX-000CpzC@artcom0.artcom-gmbh.de>
References: <m12ZhjX-000CpzC@artcom0.artcom-gmbh.de>
Message-ID: <200003272217.RAA28910@eric.cnri.reston.va.us>

> Don't ask.  Simply tell the people on comp.lang.python that support
> for the ancient Tcl/Tk versions < 8.0 will be dropped in Python 1.6.
> Period. ;-)

OK, I'm convinced.  We will pre-8.0 support.  Could someone submit a
set of patches?  It would make sense to call #error if a pre-8.0
version is detected at compile-time!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond@skippinet.com.au  Mon Mar 27 23:02:21 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Tue, 28 Mar 2000 09:02:21 +1000
Subject: [Python-Dev] Windows and PyObject_NEW
In-Reply-To: <200003251459.PAA09181@python.inrialpes.fr>
Message-ID: <ECEPKNMJLHAPFFJHDOJBOEEHCHAA.mhammond@skippinet.com.au>

Sorry for the delay, but Gordon's reply was accurate so should have kept you
going ;-)

> I've been reading Jeffrey Richter's "Advanced Windows" last night in order
> to try understanding better why PyObject_NEW is implemented
> differently for
> Windows.

So that is where the heaps discussion came from :-)  The problem is simply
"too many heaps are available".

> Again, I feel uncomfortable with this, especially now, when
> I'm dealing with the memory aspect of Python's object
> constructors/desctrs.

It is this exact reason it was added in the first place.

I believe this code predates the "_d" convention on Windows.  AFAIK, this
could could be removed today and everything should work (but see below why
it probably wont)

MSVC allows you to choose from a number of CRT versions.  Only in one of
these versions is the CRTL completely shared between the .EXE and all the
various .DLLs in the application.

What was happening is that this macro ended up causing the "malloc" for a
new object to occur in Python15.dll, but the Python type system meant that
tp_dealloc() (to cleanup the object) was called in the DLL implementing the
new type.  Unless Python15.dll and our extension DLL shared the same CRTL
(and hence the same malloc heap, fileno table etc) things would die.  The
DLL version of "free()" would complain, as it had never seen the pointer
before.  This change meant the malloc() and the free() were both implemented
in the same DLL/EXE

This was particularly true with Debug builds.  MSVC's debug CRTL
implementations have some very nice debugging features (guard-blocks, block
validity checks with debugger breapoints when things go wrong, leak
tracking, etc).  However, this means they use yet another heap.  Mixing
debug builds with release builds in Python is a recipe for disaster.

Theoretically, the problem has largely gone away now that a) we have
seperate "_d" versions and b) the "official" postition is to use the same
CRTL as Python15.dll.  However, is it still a minor FAQ on comp.lang.python
why PyRun_ExecFile (or whatever) fails with mysterious errors - the reason
is exactly the same - they are using a different CRTL, so the CRTL can't map
the file pointers correctly, and we get unexplained IO errors.  But now that
this macro hides the malloc problem, there may be plenty of "home grown"
extensions out there that do use a different CRTL and dont see any
problems - mainly cos they arent throwing file handles around!

Finally getting to the point of all this:

We now also have the PyMem_* functions.  This problem also doesnt exist if
extension modules use these functions instead of malloc()/free().  We only
ask them to change the PyObject allocations and deallocations, not the rest
of their code, so it is no real burden.  IMO, we should adopt these
functions for most internal object allocations and the extension
samples/docs.

Also, we should consider adding relevant PyFile_fopen(), PyFile_fclose()
type functions, that simply are a thin layer over the fopen/fclose
functions.  If extensions writers used these instead of fopen/fclose we
would gain a few fairly intangible things - lose the minor FAQ, platforms
that dont have fopen at all (eg, CE) would love you, etc.

Mark.


From mhammond@skippinet.com.au  Tue Mar 28 01:04:11 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Tue, 28 Mar 2000 11:04:11 +1000
Subject: [Python-Dev] Windows and PyObject_NEW
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBOEEHCHAA.mhammond@skippinet.com.au>
Message-ID: <ECEPKNMJLHAPFFJHDOJBOEEJCHAA.mhammond@skippinet.com.au>

[I wrote]

> Also, we should consider adding relevant PyFile_fopen(), PyFile_fclose()

Maybe I had something like PyFile_FromString in mind!!

That-damn-time-machine-again-ly,

Mark.


From Moshe Zadka <mzadka@geocities.com>  Tue Mar 28 05:36:59 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Tue, 28 Mar 2000 07:36:59 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <14559.48104.34263.680278@beluga.mojam.com>
Message-ID: <Pine.GSO.4.10.10003280734001.19279-100000@sundial>

On Mon, 27 Mar 2000, Skip Montanaro wrote:

> Responding to an early item in this thread and trying to adapt to later
> items...
> 
> Ping wrote:
> 
>     I'm not convinced "mime" needs a separate branch here.  (This is the
>     deepest part of the tree, and at three levels small alarm bells went off
>     in my head.)
> 
> It's not clear that mime should be beneath text/mail.  Moshe moved it up a
> level,

Actually, Ping moved it up a level. I only decided to agree with him
retroactively...

> I think the mime stuff still
> belongs in a separate mime package.  I wouldn't just sprinkle the modules
> under text.  I see two possibilities:
> 
>     text>mime
>     net>mime
> 
> I prefer net>mime,

I don't. MIME is not a "wire protocol" like all the other things in net --
it's used inside another wire protocol, like RFC822 or HTTP. If at all,
I'd go for having a 
net/
	mail/
		mime/
Package, but Ping would yell at me again for nesting 3 levels. 
I could live with text/mime, because the mime format basically *is* text.


--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From Moshe Zadka <mzadka@geocities.com>  Tue Mar 28 05:47:13 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Tue, 28 Mar 2000 07:47:13 +0200 (IST)
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003280745210.19279-100000@sundial>

On Mon, 27 Mar 2000, Guido van Rossum wrote:

> The _tkinter.c source code is littered with #ifdefs that mostly center
> around distinguishing between Tcl/Tk 8.0 and older versions.  The
> two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.
> 
> Would it be reasonable to assume that everybody is using at least
> Tcl/Tk version 8.0?  This would simplify the code somewhat.

I want to ask a different question: when is Python going to officially
support Tcl/Tk v8.2/8.3? I'd really like for this to happen, as I hate
having several libraries of Tcl/Tk on my machine.

(I assume you know the joke about Jews always answering a question 
with a question <wink>)
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From jack@oratrix.nl  Tue Mar 28 08:55:56 2000
From: jack@oratrix.nl (Jack Jansen)
Date: Tue, 28 Mar 2000 10:55:56 +0200
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: Message by Ka-Ping Yee <ping@lfw.org> ,
 Sat, 25 Mar 2000 23:37:11 -0800 (PST) , <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>
Message-ID: <20000328085556.CFEAC370CF2@snelboot.oratrix.nl>

> Okay, here's another shot at it.  Notice a few things:
> ...
> bin
>	  ...
>         image
		  ...
>         sound
>		  ...

These I don't like, I think image and sound should be either at toplevel, or 
otherwise in a separate package (mm?). I know images and sounds are 
customarily stored in binary files, but so are databases and other things.

Hmm, the bin group in general seems to be a bit of a catch-all. gzip, zlib and 
chunk definitely belong together, but struct is a wholly different beast.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From jack@oratrix.nl  Tue Mar 28 09:01:51 2000
From: jack@oratrix.nl (Jack Jansen)
Date: Tue, 28 Mar 2000 11:01:51 +0200
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: Message by Moshe Zadka <moshez@math.huji.ac.il> ,
 Sat, 25 Mar 2000 20:30:26 +0200 (IST) , <Pine.GSO.4.10.10003252028290.7664-100000@sundial>
Message-ID: <20000328090151.86B59370CF2@snelboot.oratrix.nl>

> On Sat, 25 Mar 2000, David Ascher wrote:
> 
> > This made me think of one issue which is worth considering -- is there a
> > mechanism for third-party packages to hook into the standard naming
> > hierarchy?  It'd be weird not to have the oracle and sybase modules within
> > the db toplevel package, for example.
> 
> My position is that any 3rd party module decides for itself where it wants
> to live -- once we formalized the framework. Consider PyGTK/PyGnome,
> PyQT/PyKDE -- they should live in the UI package too...

For separate modules, yes. For packages this is different. As a point in case 
think of MacPython: it could stuff all mac-specific packages under the 
toplevel "mac", but it would probably be nicer if it could extend the existing 
namespace. It is a bit silly if mac users have to do "from mac.text.encoding 
import macbinary" but "from text.encoding import binhex", just because BinHex 
support happens to live in the core (purely for historical reasons).

But maybe this holds only for the platform distributions, then it shouldn't be 
as much of a problem as there aren't that many.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From Moshe Zadka <mzadka@geocities.com>  Tue Mar 28 09:24:14 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Tue, 28 Mar 2000 11:24:14 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <20000328085556.CFEAC370CF2@snelboot.oratrix.nl>
Message-ID: <Pine.GSO.4.10.10003281121380.23735-100000@sundial>

On Tue, 28 Mar 2000, Jack Jansen wrote:

> These I don't like, I think image and sound should be either at toplevel, or 
> otherwise in a separate package (mm?). I know images and sounds are 
> customarily stored in binary files, but so are databases and other things.

Hmmm...I think of "bin" as "interface to binary files". Agreed that I
don't have a good reason for seperating gdbm from zlib.

> Hmm, the bin group in general seems to be a bit of a catch-all. gzip, zlib and 
> chunk definitely belong together, but struct is a wholly different beast.

I think Ping and I decided to move struct to toplevel.
Ping, would you like to take your last proposal and fold into it the
consensual changes,, or should I?
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From Fredrik Lundh" <effbot@telia.com  Tue Mar 28 09:44:14 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Tue, 28 Mar 2000 11:44:14 +0200
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
Message-ID: <02c101bf989a$2ee35860$34aab5d4@hagrid>

Guido van Rossum <guido@python.org> wrote:
> Similar to append(), I'd like to close this gap, and I've made the
> necessary changes.  This will probably break lots of code.
>=20
> Similar to append(), I'd like people to fix their code rather than
> whine -- two-arg connect() has never been documented, although it's
> found in much code (even the socket module test code :-( ).
>=20
> Similar to append(), I may revert the change if it is shown to cause
> too much pain during beta testing...

proposal: if anyone changes the API for a fundamental module, and
fails to update the standard library, the change is automatically "minus
one'd" for each major module that no longer works :-)

(in this case, that would be -5 or so...)

</F>


From Fredrik Lundh" <effbot@telia.com  Tue Mar 28 09:55:19 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Tue, 28 Mar 2000 11:55:19 +0200
Subject: [Python-Dev] Great Renaming?  What is the goal?
References: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <02c901bf989b$be203d80$34aab5d4@hagrid>

Peter Funk wrote:
> Why should modules be moved into packages?  I don't get it.

fwiw, neither do I...

I'm not so sure that Python really needs a simple reorganization
of the existing set of standard library modules.  just moving the
modules around won't solve the real problems with the 1.5.2 std
library...

> IMO this subdivision could be discussed and possibly revised. =20

here's one proposal:
http://www.pythonware.com/people/fredrik/librarybook-contents.htm

</F>


From gstein@lyra.org  Tue Mar 28 10:09:44 2000
From: gstein@lyra.org (Greg Stein)
Date: Tue, 28 Mar 2000 02:09:44 -0800 (PST)
Subject: [Python-Dev] 3rd parties in the hierarchy (was: module reorg)
In-Reply-To: <20000328090151.86B59370CF2@snelboot.oratrix.nl>
Message-ID: <Pine.LNX.4.10.10003280207350.17374-100000@nebula.lyra.org>

On Tue, 28 Mar 2000, Jack Jansen wrote:
> > On Sat, 25 Mar 2000, David Ascher wrote:
> > > This made me think of one issue which is worth considering -- is there a
> > > mechanism for third-party packages to hook into the standard naming
> > > hierarchy?  It'd be weird not to have the oracle and sybase modules within
> > > the db toplevel package, for example.
> > 
> > My position is that any 3rd party module decides for itself where it wants
> > to live -- once we formalized the framework. Consider PyGTK/PyGnome,
> > PyQT/PyKDE -- they should live in the UI package too...
> 
> For separate modules, yes. For packages this is different. As a point in case 
> think of MacPython: it could stuff all mac-specific packages under the 
> toplevel "mac", but it would probably be nicer if it could extend the existing 
> namespace. It is a bit silly if mac users have to do "from mac.text.encoding 
> import macbinary" but "from text.encoding import binhex", just because BinHex 
> support happens to live in the core (purely for historical reasons).
> 
> But maybe this holds only for the platform distributions, then it shouldn't be 
> as much of a problem as there aren't that many.

Assuming that you use an archive like those found in my "small" distro or
Gordon's distro, then this is no problem. The archive simply recognizes
and maps "text.encoding.macbinary" to its own module.

Another way to say it: stop thinking in terms of the filesystem as the
sole mechanism for determining placement in the package hierarchy.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From guido@python.org  Tue Mar 28 13:38:12 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 08:38:12 -0500
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: Your message of "Tue, 28 Mar 2000 07:47:13 +0200."
 <Pine.GSO.4.10.10003280745210.19279-100000@sundial>
References: <Pine.GSO.4.10.10003280745210.19279-100000@sundial>
Message-ID: <200003281338.IAA29532@eric.cnri.reston.va.us>

> I want to ask a different question: when is Python going to officially
> support Tcl/Tk v8.2/8.3? I'd really like for this to happen, as I hate
> having several libraries of Tcl/Tk on my machine.

This is already in the CVS tree, except for the Windows installer.
Python 1.6 will not install a separate complete Tcl installation;
instead, it will install the needed Tcl/Tk files (Tcl/Tk 8.3 or newer)
in the Python tree, so it won't affect existing Tcl/Tk installations.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Mar 28 13:57:02 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 08:57:02 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: Your message of "Tue, 28 Mar 2000 11:44:14 +0200."
 <02c101bf989a$2ee35860$34aab5d4@hagrid>
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
 <02c101bf989a$2ee35860$34aab5d4@hagrid>
Message-ID: <200003281357.IAA29621@eric.cnri.reston.va.us>

> proposal: if anyone changes the API for a fundamental module, and
> fails to update the standard library, the change is automatically "minus
> one'd" for each major module that no longer works :-)
> 
> (in this case, that would be -5 or so...)

Oops.  Sigh.  While we're pretending that this change goes in, could
you point me to those five modules?  Also, we need to add test cases
to the standard test suite that would have found these!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward@cnri.reston.va.us  Tue Mar 28 15:04:47 2000
From: gward@cnri.reston.va.us (Greg Ward)
Date: Tue, 28 Mar 2000 10:04:47 -0500
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>; from ping@lfw.org on Sat, Mar 25, 2000 at 11:37:11PM -0800
References: <Pine.GSO.4.10.10003260129180.9956-100000@sundial> <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>
Message-ID: <20000328100446.A2586@cnri.reston.va.us>

On 25 March 2000, Ka-Ping Yee said:
> Okay, here's another shot at it.  Notice a few things:

Damn, I started writing a response to Moshe's original proposal -- and
*then* saw this massive thread.  Oh well.  Turns out I still have a few
useful things to say:

First, any organization scheme for the standard library (or anything
else, for that matter) should have a few simple guidelines.  Here are
two:

  * "deep hierarchies considered harmful": ie. avoid sub-packages if at
    all possible

  * "everything should have a purpose": every top-level package should
    be describable with a single, clear sentence of plain language.
    Eg.:
       net - Internet protocols, data formats, and client/server infrastructure
       unix - Unix-specific system calls, protocols, and conventions

And two somewhat open issues:

  * "as long as we're renaming...": maybe this would be a good time to
    standardize naming conventions, eg. "cgi" -> "cgilib" *or*
    "{http,ftp,url,...}lib" -> "{http,ftp,url}...", "MimeWriter" ->
    "mimewriter", etc.

  * "shared namespaces vs system namespaces": the Perl model of "nothing
    belongs to The System; anyone can add a module in Text:: or Net:: or
    whatever" works there because Perl doesn't have __init__ files or
    anything to distinguish module namespaces; they just are.  Python's
    import mechanism would have to change to support this, and the fact
    that __init__ files may contain arbitrary code makes this feel
    like a very tricky change to make.

Now specific comments...

> net
>         urlparse
>         urllib
>         ftplib
>         gopherlib
>         imaplib
>         poplib
>         nntplib
>         smtplib
>         telnetlib
>         httplib
>         cgi

Rename?  Either cgi -> cgilib or foolib -> foo?

>         server
>                 BaseHTTPServer
>                 CGIHTTPServer
>                 SimpleHTTPServer
>                 SocketServer
>                 asynchat
>                 asyncore

This is one good place for a sub-package.  It's a also a good place to
rename: the convention for Python module names seems to be
all-lowercase; and "Server" is redundant when you're in the net.server
package.  How about:

    net.server.base_http
    net.server.cgi_http
    net.server.simple_http
    net.server.socket

Underscores negotiable.  They don't seem to be popular in module names,
although sometimes they would be real life-savers.

> text

I think "text" should mean "plain old unstructured, un-marked-up ASCII
text", where "unstructured, un-marked-up" really means "not structured
or marked up in a well-known standard way".

Or maybe not.  I'm just trying to come up with an excuse for moving xml
to top-level, which I think is where it belongs.  Maybe the excuse
should just be, "XML is really important and visible, and anyways Paul
Prescod will raise a stink if it isn't put at top-level in Python
package-space".

>         re              # general-purpose parsing
 
Top-level: this is a fundamental module that should be treated on a par
with 'string'.  (Well, except for building RE methods into
strings... hmmMMmm...maybe... [no, I'm kidding!])

>         sgmllib
>         htmllib
>         htmlentitydefs

Not sure what to do about these.  Someone referred somewhere to a "web"
top-level package, which seems to have disappeared.  If it reappars, it
would be a good place for the HTML modules (not to mention a big chunk
of "net") -- this would mainly be for "important and visible" (ie. PR)
reasons, rather than sound technical reasons.

>         xml
>                 whatever the xml-sig puts here

Should be top-level.

>         mail
>                 rfc822
>                 mailbox
>                 mhlib

"mail" should either be top-level or under "net".  (Yes, I *know* it's
not a wire-level protocol: that's what net.smtplib is for.  But last
time I checked, email is pretty useless without a network.  And
vice-versa.)

Or maybe these all belong in a top-level "data" package: I'm starting to
warm to that.

> bin
>         gzip
>         zlib
>         chunk
>         struct
>         image
>                 imghdr
>                 colorsys        # a bit unsure, but doesn't go anywhere else
>                 imageop
>                 imgfile
>                 rgbimg
>                 yuvconvert
>         sound
>                 aifc
>                 sndhdr
>                 toaiff
>                 audiodev
>                 sunau
>                 sunaudio
>                 wave
>                 audioop
>                 sunaudiodev

I agree with Jack: image and sound (audio?) should be top-level.  I
don't think I like the idea of an intervening "mm" or "multimedia" or
"media" or what-have-you package, though.

The other stuff in "bin" is kind of a grab-bag: "chunk" and "struct"
might belong in the mythical "data" package.

> db
>         anydbm
>         whichdb
>         bsddb
>         dbm
>         dbhash
>         dumbdbm
>         gdbm

Yup.

> math
>         math            # library functions
>         cmath
>         fpectl          # type-related
>         fpetest
>         array
>         mpz
>         fpformat        # formatting
>         locale
>         bisect          # algorithm: also unsure, but doesn't go anywhere else
>         random          # randomness
>         whrandom
>         crypt           # cryptography
>         md5
>         rotor
>         sha

Hmmm.  "locale" has already been dealt with; obviously it should be
top-evel.  I think "array" should be top-level or under the mythical
"data".

Six crypto-related modules seems like enough to justify a top-level
"crypt" package, though.

> time
>         calendar
>         time
>         tzparse
>         sched
>         timing

Yup.

> interp
>         new
>         linecache       # handling .py files
[...]
>         tabnanny
>         pstats
>         rlcompleter     # this might go in "ui"...

I like "python" for this one.  (But I'm not sure if tabnanny and
rlcompleter belong there.)

> security
>         Bastion
>         rexec
>         ihooks

What does ihooks have to do with security?

> file
>         dircache
>         path -- a virtual module which would do a from <something>path import *
>         nturl2path
>         macurl2path
>         filecmp
>         fileinput
>         StringIO

Lowercase for consistency?

>         glob
>         fnmatch
>         stat
>         statcache
>         statvfs
>         tempfile
>         shutil
>         pipes
>         popen2
>         commands
>         dl

No problem until these last two -- 'commands' is a Unix-specific thing
that has very little to do with the filesystem per se, and 'dl' is (as I
understand it) deep ju-ju with sharp edges that should probably be
hidden away in the 'python' ('sys'?) package.

Oh yeah, "dl" should be elsewhere -- "python" maybe?  Top-level?
Perhaps we need a "deepmagic" package for "dl" and "new"?  ;-)

> data
>         pickle
>         shelve
>         xdrlib
>         copy
>         copy_reg
>         UserDict
>         UserList
>         pprint
>         repr
>         (cPickle)

Oh hey, it's *not* a mythical package!  Guess I didn't read far enough
ahead.  I like it, but would add more stuff to it (obviously): 'struct',
'chunk', 'array' for starters.

Should cPickle be renamed to fastpickle?

> threads
>         thread
>         threading
>         Queue

Lowercase?

> ui
>         _tkinter
>         curses
>         Tkinter
>         cmd
>         getpass
>         getopt
>         readline

> users
>         pwd
>         grp
>         nis

These belong in "unix".  Possibly "nis" belongs in "net" -- do any
non-Unix OSes use NIS?

> sgi
>         al
>         cd
>         cl
>         fl
>         fm
>         gl
>         misc (what used to be sgimodule.c)
>         sv

Should this be "sgi" or "irix"?  Ditto for "sun" vs "solaris" if there
are a significant number of Sun/Solaris modules.  Note that the
respective trademark holders might get very antsy about who gets to put
names in those namespaces -- that's exactly what happened with Sun,
Solaris 8, and Perl.  I believe the compromise they arrived at was that
the "Solaris::" namespace remains open, but Sun gets the "Sun::"
namespace.

There should probably be a win32 package, for core registry access stuff
if nothing else.  There might someday be a "linux" package; it's highly
unlikely there would be a "pc" or "alpha" package though.  All of those
argue over "irix" and "solaris" instead of "sgi" and "sun".

        Greg


From gvwilson@nevex.com  Tue Mar 28 15:45:10 2000
From: gvwilson@nevex.com (gvwilson@nevex.com)
Date: Tue, 28 Mar 2000 10:45:10 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.GSO.4.10.10003251036170.3539-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003281038400.29911-100000@akbar.nevex.com>

> > Greg Wilson
> > If None becomes a keyword, I would like to ask whether it could be
> > used to signal that a method is a class method, as opposed to an
> > instance method:

> I'd like to know what you mean by "class" method. (I do know C++ and
> Java, so I have some idea...). Specifically, my question is: how does
> a class method access class variables? They can't be totally
> unqualified (because that's very unpythonic). If they are qualified by
> the class's name, I see it as a very mild improvement on the current
> situation. You could suggest, for example, to qualify class variables
> by "class" (so you'd do things like:
>
> 	class.x = 1
>
> ), but I'm not sure I like it. On the whole, I think it is a much
> bigger issue on how be denote class methods.

I don't like overloading the word 'class' this way, as it makes it
difficult to distinguish a parent's 'foo' member and a child's 'foo'
member:

class Parent:
    foo = 3
    ...other stuff...

class Child(Parent):
    foo = 9
    def test():
        print class.foo   # obviously 9, but how to get 3?

I think that using the class's name instead of 'self' will be easy to
explain, will look like it belongs in the language, will be unlikely to
lead to errors, and will handle multiple inheritance with ease:

class Child(Parent):
    foo = 9
    def test():
        print Child.foo   # 9
        print Parent.foo  # 3

> Also, one slight problem with your method of denoting class methods:
> currently, it is possible to add instance method at run time to a
> class by something like
> 
> class C:
> 	pass
> 
> def foo(self):
> 	pass
> 
> C.foo = foo
> 
> In your suggestion, how do you view the possiblity of adding class
> methods to a class? (Note that "foo", above, is also perfectly usable
> as a plain function).

Hm, I hadn't thought of this... :-(

> > I'd also like to ask (separately) that assignment to None be defined as a
> > no-op, so that programmers can write:
> > 
> >     year, month, None, None, None, None, weekday, None, None = gmtime(time())
> > 
> > instead of having to create throw-away variables to fill in slots in
> > tuples that they don't care about.
> 
> Currently, I use "_" for that purpose, after I heard the idea from
> Fredrik Lundh.

I do the same thing when I need to; I just thought that making assignment
to "None" special would formalize this in a readable way.


From jeremy@cnri.reston.va.us  Tue Mar 28 17:31:48 2000
From: jeremy@cnri.reston.va.us (Jeremy Hylton)
Date: Tue, 28 Mar 2000 12:31:48 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.LNX.4.21.0003271316090.1711-100000@korak.digicool.com>
References: <14559.38662.835289.499610@goon.cnri.reston.va.us>
 <Pine.LNX.4.21.0003271316090.1711-100000@korak.digicool.com>
Message-ID: <14560.60548.74378.613188@goon.cnri.reston.va.us>

>>>>> "KLM" == Ken Manheimer <klm@digicool.com> writes:

  >> The only problematic use of from ... import ... is 
  >>     from text.re import * 
  >> which adds an unspecified set of names to the current
  >> namespace.

  KLM> The other gotcha i mean applies when the thing you're importing
  KLM> is a terminal, ie a non-module.  Then, changes to the
  KLM> assignments of the names in the original module aren't
  KLM> reflected in the names you've imported - they're decoupled from
  KLM> the namespace of the original module.

This isn't an import issue.  Some people simply don't understand
that assignment (and import as form of assignment) is name binding.
Import binds an imported object to a name in the current namespace.
It does not affect bindings in other namespaces, nor should it.

  KLM> I thought the other problem peter was objecting to, having to
  KLM> change the import sections in the first place, was going to be
  KLM> avoided in the 1.x series (if we do this kind of thing) by
  KLM> inherently extending the import path to include all the
  KLM> packages, so people need not change their code?  Seems like
  KLM> most of this would be fairly transparent w.r.t. the operation
  KLM> of existing applications.

I'm not sure if there is consensus on backwards compatibility.  I'm
not in favor of creating a huge sys.path that includes every package's
contents.  It would be a big performance hit.

Jeremy


From Moshe Zadka <mzadka@geocities.com>  Tue Mar 28 17:36:47 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Tue, 28 Mar 2000 19:36:47 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <20000328100446.A2586@cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003281914251.14542-100000@sundial>

On Tue, 28 Mar 2000, Greg Ward wrote:

>   * "deep hierarchies considered harmful": ie. avoid sub-packages if at
>     all possible
> 
>   * "everything should have a purpose": every top-level package should
>     be describable with a single, clear sentence of plain language.

Good guidelines, but they aren't enough. And anyway, rules were meant to
be broken <0.9 wink>

>   * "as long as we're renaming...": maybe this would be a good time to
>     standardize naming conventions, eg. "cgi" -> "cgilib" *or*
>     "{http,ftp,url,...}lib" -> "{http,ftp,url}...", "MimeWriter" ->
>     "mimewriter", etc.

+1

>   * "shared namespaces vs system namespaces": the Perl model of "nothing
>     belongs to The System; anyone can add a module in Text:: or Net:: or
>     whatever" works there because Perl doesn't have __init__ files or
>     anything to distinguish module namespaces; they just are.  Python's
>     import mechanism would have to change to support this, and the fact
>     that __init__ files may contain arbitrary code makes this feel
>     like a very tricky change to make.

Indeed. But I still feel that "few things should belong to the system"
is quite a useful rule...
(That's what I referred to when I said Perl's module system is more suited
to CPAN (now there's a surprise))

> Rename?  Either cgi -> cgilib or foolib -> foo?

Yes. But I wanted the first proposal to be just about placing stuff,
because that airs out more disagreements.

> This is one good place for a sub-package.  It's a also a good place to
> rename: the convention for Python module names seems to be
> all-lowercase; and "Server" is redundant when you're in the net.server
> package.  How about:
> 
>     net.server.base_http
>     net.server.cgi_http
>     net.server.simple_http
>     net.server.socket

Hmmmmm......+0

> Underscores negotiable.  They don't seem to be popular in module names,
> although sometimes they would be real life-savers.

Personally, I prefer underscores to CamelCase.

> Or maybe not.  I'm just trying to come up with an excuse for moving xml
> to top-level, which I think is where it belongs.  Maybe the excuse
> should just be, "XML is really important and visible, and anyways Paul
> Prescod will raise a stink if it isn't put at top-level in Python
> package-space".

I still think "xml" should be a brother to "html" and "sgml".
Current political trans not withstanding.

> Not sure what to do about these.  Someone referred somewhere to a "web"
> top-level package, which seems to have disappeared.  If it reappars, it
> would be a good place for the HTML modules (not to mention a big chunk
> of "net") -- this would mainly be for "important and visible" (ie. PR)
> reasons, rather than sound technical reasons.

I think the "web" package should be reinstated. But you won't like it:
I'd put xml in web.

> "mail" should either be top-level or under "net".  (Yes, I *know* it's
> not a wire-level protocol: that's what net.smtplib is for.  But last
> time I checked, email is pretty useless without a network.  And
> vice-versa.)

Ummmm.....I'd disagree, but I lack the strength and the moral conviction.
Put it under net and we'll call it a deal <wink>

> Or maybe these all belong in a top-level "data" package: I'm starting to
> warm to that.

Ummmm...I don't like the "data" package personally. It seems to disobey
your second guideline.

> I agree with Jack: image and sound (audio?) should be top-level.  I
> don't think I like the idea of an intervening "mm" or "multimedia" or
> "media" or what-have-you package, though.

Definitely multimedia. Okay, I'm bought.

> Six crypto-related modules seems like enough to justify a top-level
> "crypt" package, though.

It seemed obvious to me that "crypt" should be under "math". But maybe
that's just the mathematician in me speaking.

> I like "python" for this one.  (But I'm not sure if tabnanny and
> rlcompleter belong there.)

I agree, and I'm not sure about rlcompleter, but am sure about tabnanny.

> What does ihooks have to do with security?

Well, it was more or less written to support rexec. A weak argument,
admittedly

> No problem until these last two -- 'commands' is a Unix-specific thing
> that has very little to do with the filesystem per se

Hmmmmm...it is on the same level with popen. Why not move popen too?

>, and 'dl' is (as I
> understand it) deep ju-ju with sharp edges that should probably be
> hidden away 

Ummmmmm.....not in the "python" package: it doesn't have anything to
do with the interpreter.

> Should this be "sgi" or "irix"?  Ditto for "sun" vs "solaris" if there
> are a significant number of Sun/Solaris modules.  Note that the
> respective trademark holders might get very antsy about who gets to put
> names in those namespaces -- that's exactly what happened with Sun,
> Solaris 8, and Perl.  I believe the compromise they arrived at was that
> the "Solaris::" namespace remains open, but Sun gets the "Sun::"
> namespace.

Ummmmm.....I don't see how they have any legal standing. I for one refuse
to care about what Sun Microsystem thinks about names for Python packages.

> There should probably be a win32 package, for core registry access stuff
> if nothing else.

And for all the other extensions in win32all
Yep! 
(Just goes to show what happens when you decide to package based on a UNIX
system)

> All of those
> argue over "irix" and "solaris" instead of "sgi" and "sun".

Fine with me -- just wanted to move them out of my face <wink>
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From andy@reportlab.com  Tue Mar 28 18:13:02 2000
From: andy@reportlab.com (Andy Robinson)
Date: Tue, 28 Mar 2000 18:13:02 GMT
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <20000327170031.693531CDF6@dinsdale.python.org>
References: <20000327170031.693531CDF6@dinsdale.python.org>
Message-ID: <38e0f4cf.24247656@post.demon.co.uk>

On Mon, 27 Mar 2000 12:00:31 -0500 (EST), Peter Funk wrote:

> Do we need a UserString class?

This will probably be useful on top of the i18n stuff in due course,
so I'd like it.

Something Mike Da Silva and I have discussed a lot is implementing a
higher-level 'typed string' library on top of the Unicode stuff. =20
A 'typed string' is like a string, but knows what encoding it is in -
possibly Unicode, possibly a native encoding and embodies some basic
type safety and convenience notions, like not being able to add a
Shift-JIS and an EUC string together.  Iteration would always be per
character, not per byte; and a certain amount of magic would say that
if the string was (say) Japanese, it would acquire a few extra methods
for doing some Japan-specific things like expanding half-width
katakana.

Of course, we can do this anyway, but I think defining the API clearly
in UserString is a great idea.

- Andy Robinson


From guido@python.org  Tue Mar 28 19:22:43 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 14:22:43 -0500
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: Your message of "Tue, 28 Mar 2000 18:13:02 GMT."
 <38e0f4cf.24247656@post.demon.co.uk>
References: <20000327170031.693531CDF6@dinsdale.python.org>
 <38e0f4cf.24247656@post.demon.co.uk>
Message-ID: <200003281922.OAA03113@eric.cnri.reston.va.us>

> > Do we need a UserString class?
> 
> This will probably be useful on top of the i18n stuff in due course,
> so I'd like it.
> 
> Something Mike Da Silva and I have discussed a lot is implementing a
> higher-level 'typed string' library on top of the Unicode stuff.  
> A 'typed string' is like a string, but knows what encoding it is in -
> possibly Unicode, possibly a native encoding and embodies some basic
> type safety and convenience notions, like not being able to add a
> Shift-JIS and an EUC string together.  Iteration would always be per
> character, not per byte; and a certain amount of magic would say that
> if the string was (say) Japanese, it would acquire a few extra methods
> for doing some Japan-specific things like expanding half-width
> katakana.
> 
> Of course, we can do this anyway, but I think defining the API clearly
> in UserString is a great idea.

Agreed.  Please somebody send a patch!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Mar 28 19:25:39 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 14:25:39 -0500
Subject: [Python-Dev] First alpha release of Python 1.6
Message-ID: <200003281925.OAA03287@eric.cnri.reston.va.us>

I'm hoping to release a first, rough alpha of Python 1.6 by April 1st
(no joke!).

Not everything needs to be finished by then, but I hope to have the
current versions of distutil, expat, and sre in there.

Anything else that needs to go into 1.6 and isn't ready yet?  (Small
stuff doesn't matter, everything currently in the patches queue can
probably go in if it isn't rejected by then.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From DavidA@ActiveState.com  Tue Mar 28 19:40:24 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Tue, 28 Mar 2000 11:40:24 -0800
Subject: [Python-Dev] First alpha release of Python 1.6
In-Reply-To: <200003281925.OAA03287@eric.cnri.reston.va.us>
Message-ID: <NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com>

> Anything else that needs to go into 1.6 and isn't ready yet? 

No one seems to have found time to figure out the mmap module support.

--david


From guido@python.org  Tue Mar 28 19:33:29 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 14:33:29 -0500
Subject: [Python-Dev] First alpha release of Python 1.6
In-Reply-To: Your message of "Tue, 28 Mar 2000 11:40:24 PST."
 <NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com>
References: <NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com>
Message-ID: <200003281933.OAA04896@eric.cnri.reston.va.us>

> > Anything else that needs to go into 1.6 and isn't ready yet? 
> 
> No one seems to have found time to figure out the mmap module support.

I wasn't even aware that that was a priority.  If someone submits it,
it will go in -- alpha 1 is not a total feature freeze, just a
"testing the waters".

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tismer@tismer.com  Tue Mar 28 19:49:17 2000
From: tismer@tismer.com (Christian Tismer)
Date: Tue, 28 Mar 2000 21:49:17 +0200
Subject: [Python-Dev] First alpha release of Python 1.6
References: <200003281925.OAA03287@eric.cnri.reston.va.us>
Message-ID: <38E10CBD.C6B71D50@tismer.com>


Guido van Rossum wrote:
...
> Anything else that needs to go into 1.6 and isn't ready yet?

Stackless Python of course, but it *is* ready yet.

Just kidding. I will provide a compressed unicode database
in a few days. That will be a non-Python-specific module,
and (Marc or I) will provide a Python specific wrapper.
This will probably not get ready until April 1.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From akuchlin@mems-exchange.org  Tue Mar 28 19:51:29 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Tue, 28 Mar 2000 14:51:29 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com>
References: <200003281925.OAA03287@eric.cnri.reston.va.us>
 <NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com>
Message-ID: <14561.3393.761177.776684@amarok.cnri.reston.va.us>

David Ascher writes:
>> Anything else that needs to go into 1.6 and isn't ready yet? 
>No one seems to have found time to figure out the mmap module support.

The issue there is cross-platform compatibility; the Windows and Unix
versions take completely different constructor arguments, so how
should we paper over the differences?

Unix arguments: (file descriptor, size, flags, protection)
Win32 arguments:(filename, tagname, size)

We could just say, "OK, the args are completely different between
Win32 and Unix, despite it being the same function name".  Maybe
that's best, because there seems no way to reconcile those two
different sets of arguments.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
I'm here for the FBI, not the _Weekly World News_.
  -- Scully in X-FILES #1


From DavidA@ActiveState.com  Tue Mar 28 20:06:09 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Tue, 28 Mar 2000 12:06:09 -0800
Subject: [Python-Dev] mmapfile module
In-Reply-To: <14561.3393.761177.776684@amarok.cnri.reston.va.us>
Message-ID: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>

> The issue there is cross-platform compatibility; the Windows and Unix
> versions take completely different constructor arguments, so how
> should we paper over the differences?
>
> Unix arguments: (file descriptor, size, flags, protection)
> Win32 arguments:(filename, tagname, size)
>
> We could just say, "OK, the args are completely different between
> Win32 and Unix, despite it being the same function name".  Maybe
> that's best, because there seems no way to reconcile those two
> different sets of arguments.

I guess my approach would be to provide two platform-specific modules, and
to figure out a high-level Python module which could provide a reasonable
platform-independent interface on top of it.  One problem with that approach
is that I think that there is also great value in having a portable mmap
interface in the C layer, where i see lots of possible uses in extension
modules (much like the threads API).

--david


From guido@python.org  Tue Mar 28 20:00:57 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 15:00:57 -0500
Subject: [Python-Dev] mmapfile module
In-Reply-To: Your message of "Tue, 28 Mar 2000 12:06:09 PST."
 <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
Message-ID: <200003282000.PAA11988@eric.cnri.reston.va.us>

> > The issue there is cross-platform compatibility; the Windows and Unix
> > versions take completely different constructor arguments, so how
> > should we paper over the differences?
> >
> > Unix arguments: (file descriptor, size, flags, protection)
> > Win32 arguments:(filename, tagname, size)
> >
> > We could just say, "OK, the args are completely different between
> > Win32 and Unix, despite it being the same function name".  Maybe
> > that's best, because there seems no way to reconcile those two
> > different sets of arguments.
> 
> I guess my approach would be to provide two platform-specific modules, and
> to figure out a high-level Python module which could provide a reasonable
> platform-independent interface on top of it.  One problem with that approach
> is that I think that there is also great value in having a portable mmap
> interface in the C layer, where i see lots of possible uses in extension
> modules (much like the threads API).

I don't know enough about this, but it seems that there might be two
steps: *creating* a mmap object is necessarily platform-specific; but
*using* a mmap object could be platform-neutral.

What is the API for mmap objects?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From klm@digicool.com  Tue Mar 28 20:07:25 2000
From: klm@digicool.com (Ken Manheimer)
Date: Tue, 28 Mar 2000 15:07:25 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14560.60548.74378.613188@goon.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003281504430.10812-100000@korak.digicool.com>

On Tue, 28 Mar 2000, Jeremy Hylton wrote:

> >>>>> "KLM" == Ken Manheimer <klm@digicool.com> writes:
> 
>   >> The only problematic use of from ... import ... is 
>   >>     from text.re import * 
>   >> which adds an unspecified set of names to the current
>   >> namespace.
> 
>   KLM> The other gotcha i mean applies when the thing you're importing
>   KLM> is a terminal, ie a non-module.  Then, changes to the
>   KLM> assignments of the names in the original module aren't
>   KLM> reflected in the names you've imported - they're decoupled from
>   KLM> the namespace of the original module.
> 
> This isn't an import issue.  Some people simply don't understand
> that assignment (and import as form of assignment) is name binding.
> Import binds an imported object to a name in the current namespace.
> It does not affect bindings in other namespaces, nor should it.

I know that - i was addressing the asserted evilness of

from ... import ...

and how it applied - and didn't - w.r.t. packages.

>   KLM> I thought the other problem peter was objecting to, having to
>   KLM> change the import sections in the first place, was going to be
>   KLM> avoided in the 1.x series (if we do this kind of thing) by
>   KLM> inherently extending the import path to include all the
>   KLM> packages, so people need not change their code?  Seems like
>   KLM> most of this would be fairly transparent w.r.t. the operation
>   KLM> of existing applications.
> 
> I'm not sure if there is consensus on backwards compatibility.  I'm
> not in favor of creating a huge sys.path that includes every package's
> contents.  It would be a big performance hit.

Yes, someone reminded me that the other (better, i think) option is stub
modules in the current places that do the "from ... import *" for the
right values of "...".  py3k finishes the migration by eliminating the
stubs.

Ken
klm@digicool.com


From gward@cnri.reston.va.us  Tue Mar 28 20:29:55 2000
From: gward@cnri.reston.va.us (Greg Ward)
Date: Tue, 28 Mar 2000 15:29:55 -0500
Subject: [Python-Dev] First alpha release of Python 1.6
In-Reply-To: <200003281925.OAA03287@eric.cnri.reston.va.us>; from guido@python.org on Tue, Mar 28, 2000 at 02:25:39PM -0500
References: <200003281925.OAA03287@eric.cnri.reston.va.us>
Message-ID: <20000328152955.A3136@cnri.reston.va.us>

On 28 March 2000, Guido van Rossum said:
> I'm hoping to release a first, rough alpha of Python 1.6 by April 1st
> (no joke!).
> 
> Not everything needs to be finished by then, but I hope to have the
> current versions of distutil, expat, and sre in there.

We just need to do a bit of CVS trickery to put Distutils under the
Python tree.  I'd *like* for Distutils to have its own CVS existence at
least until 1.6 is released, but it's not essential.

Two of the big Distutils to-do items that I enumerated at IPC8 have been
knocked off: the "dist" command has been completely redone (and renamed
"sdist", for "source distribution"), as has the "install" command.

The really major to-do items left for Distutils are:

  * implement the "bdist" command with enough marbles to generate RPMs
    and some sort of Windows installer (Wise?); Solaris packages,
    Debian packages, and something for the Mac would be nice too.

  * documentation (started, but only just)

And there are some almost-as-important items:

  * Mac OS support; this has been started, at least for the
    unfashionable and clunky sounding MPW compiler; CodeWarrior
    support (via AppleEvents, I think) would be nice

  * test suite -- at least the fundamental Distutils marbles should get
    a good exercise; it would also be nice to put together a bunch
    of toy module distributions and make sure that "build" and "install"
    on them do the right things... all automatically, of course!

  * reduce number of tracebacks: right now, certain errors in the setup
    script or on the command line can result in a traceback, when
    they should just result in SystemExit with "error in setup script:
    ..." or "error on command line: ..."

  * fold in Finn Bock's JPython compat. patch

  * fold in Michael Muller's "pkginfo" patch

  * finish and fold in my Python 1.5.1 compat. patch (only necessary
    as long as Distutils has a life of its own, outside Python)

Well, I'd better get cracking ... Guido, we can do the CVS thing any
time; I guess I'll mosey on downstairs.

        Greg
-- 
Greg Ward - software developer                    gward@cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From Fredrik Lundh" <effbot@telia.com  Tue Mar 28 19:46:17 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Tue, 28 Mar 2000 21:46:17 +0200
Subject: [Python-Dev] mmapfile module
References: <200003281925.OAA03287@eric.cnri.reston.va.us><NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com> <14561.3393.761177.776684@amarok.cnri.reston.va.us>
Message-ID: <003501bf98ee$50097a20$34aab5d4@hagrid>

Andrew M. Kuchling wrote:
> The issue there is cross-platform compatibility; the Windows and Unix
> versions take completely different constructor arguments, so how
> should we paper over the differences?
>=20
> Unix arguments: (file descriptor, size, flags, protection)
> Win32 arguments:(filename, tagname, size)
>=20
> We could just say, "OK, the args are completely different between
> Win32 and Unix, despite it being the same function name".  Maybe
> that's best, because there seems no way to reconcile those two
> different sets of arguments.

I don't get this.  Why expose low-level implementation details
to the user (flags, protection, tagname)?

(And how come the Windows implementation doesn't support
read-only vs. read/write flags?)

Unless the current implementation uses something radically
different from mmap/MapViewOfFile, wouldn't an interface like:

    (filename, mode=3D"rb", size=3Dentire file, offset=3D0)

be sufficient?  (where mode can be "wb" or "wb+" or "rb+",
optionally without the "b")

</F>


From Donald Beaudry <donb@init.com>  Tue Mar 28 20:46:06 2000
From: Donald Beaudry <donb@init.com> (Donald Beaudry)
Date: Tue, 28 Mar 2000 15:46:06 -0500
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003281038400.29911-100000@akbar.nevex.com>
Message-ID: <200003282046.PAA18822@zippy.init.com>

...sorry to jump in on the middle of this one, but.

A while back I put a lot of thought into how to support class methods
and class attributes.  I feel that I solved the problem in a fairly
complete way though the solution does have some warts.  Here's an
example:

>>> class foo(base):
...     value = 10 # this is an instance attribute called 'value'
...                # as usual, it is shared between all instances
...                # until explicitly set on a particular instance
... 
...     def set_value(self, x):
...         print "instance method"
...         self.value = x
... 
...     #
...     # here come the weird part
...     #
...     class __class__:
...         value = 5  # this is a class attribute called value
... 
...         def set_value(cl, x):
...             print "class method"
...             cl.value = x
... 
...         def set_instance_default_value(cl, x):
...             cl._.value = x
...
>>> f = foo()
>>> f.value
10
>>> foo.value = 20
>>> f.value
10
>>> f.__class__.value
20
>>> foo._.value
10
>>> foo._.value = 1
>>> f.value
1
>>> foo.set_value(100)
class method
>>> foo.value
100
>>> f.value
1
>>> f.set_value(40)
instance method
>>> f.value
40
>>> foo._.value
1
>>> ff=foo()
>>> foo.set_instance_default_value(15)
>>> ff.value
15
>>> foo._.set_value(ff, 5)
instance method
>>> ff.value
5
>>>


Is anyone still with me?

The crux of the problem is that in the current python class/instance
implementation, classes dont have attributes of their own.  All of
those things that look like class attributes are really there as
defaults for the instances.  To support true class attributes a new
name space must be invented.  Since I wanted class objects to look
like any other object, I chose to move the "instance defaults" name
space under the underscore attribute.  This allows the class's
unqualified namespace to refer to its own attributes.  Clear as mud,
right?

In case you are wondering, yes, the code above is a working example.
I released it a while back as the 'objectmodule' and just updated it
to work with Python-1.5.2.  The update has yet to be released.

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb@init.com                                      Lexington, MA 02421
                      ...Will hack for sushi...


From akuchlin@mems-exchange.org  Tue Mar 28 20:50:18 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Tue, 28 Mar 2000 15:50:18 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <003501bf98ee$50097a20$34aab5d4@hagrid>
References: <200003281925.OAA03287@eric.cnri.reston.va.us>
 <NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com>
 <14561.3393.761177.776684@amarok.cnri.reston.va.us>
 <003501bf98ee$50097a20$34aab5d4@hagrid>
Message-ID: <14561.6922.415063.279939@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>(And how come the Windows implementation doesn't support
>read-only vs. read/write flags?)

Good point; that should be fixed.

>    (filename, mode="rb", size=entire file, offset=0)
>be sufficient?  (where mode can be "wb" or "wb+" or "rb+",
>optionally without the "b")

Hmm... maybe we can dispose of the PROT_* argument that way on Unix.
But how would you specify MAP_SHARED vs. MAP_PRIVATE, or
MAP_ANONYMOUS?  (MAP_FIXED seems useless to a Python programmer.)
Another character in the mode argument, or a flags argument?

Worse, as you pointed out in the same thread, MAP_ANONYMOUS on OSF/1
doesn't want to take a file descriptor at all.

Also, the tag name on Windows seems important, from Gordon McMillan's
explanation of it:
http://www.python.org/pipermail/python-dev/1999-November/002808.html

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
You mustn't kill me. You don't love me. You d-don't even know me.
  -- The Furies kill Abel, in SANDMAN #66: "The Kindly Ones:10"


From guido@python.org  Tue Mar 28 21:02:04 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 16:02:04 -0500
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: Your message of "Tue, 28 Mar 2000 15:46:06 EST."
 <200003282046.PAA18822@zippy.init.com>
References: <Pine.LNX.4.10.10003281038400.29911-100000@akbar.nevex.com>
 <200003282046.PAA18822@zippy.init.com>
Message-ID: <200003282102.QAA13041@eric.cnri.reston.va.us>

> A while back I put a lot of thought into how to support class methods
> and class attributes.  I feel that I solved the problem in a fairly
> complete way though the solution does have some warts.  Here's an
> example:
[...]
> Is anyone still with me?
> 
> The crux of the problem is that in the current python class/instance
> implementation, classes dont have attributes of their own.  All of
> those things that look like class attributes are really there as
> defaults for the instances.  To support true class attributes a new
> name space must be invented.  Since I wanted class objects to look
> like any other object, I chose to move the "instance defaults" name
> space under the underscore attribute.  This allows the class's
> unqualified namespace to refer to its own attributes.  Clear as mud,
> right?
> 
> In case you are wondering, yes, the code above is a working example.
> I released it a while back as the 'objectmodule' and just updated it
> to work with Python-1.5.2.  The update has yet to be released.

This looks like it would break a lot of code.  How do you refer to a
superclass method?  It seems that ClassName.methodName would refer to
the class method, not to the unbound instance method.  Also, moving
the default instance attributes to a different namespace seems to be a
semantic change that could change lots of things.

I am still in favor of saying "Python has no class methods -- use
module-global functions for that".  Between the module, the class and
the instance, there are enough namespaces -- we don't need another
one.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pf@artcom-gmbh.de  Tue Mar 28 21:01:29 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Tue, 28 Mar 2000 23:01:29 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <200003281922.OAA03113@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000  2:22:43 pm"
Message-ID: <m12a37B-000CpwC@artcom0.artcom-gmbh.de>

I wrote:
> > > Do we need a UserString class?
> > 
Andy Robinson:
> > This will probably be useful on top of the i18n stuff in due course,
> > so I'd like it.
> > 
> > Something Mike Da Silva and I have discussed a lot is implementing a
> > higher-level 'typed string' library on top of the Unicode stuff.  
> > A 'typed string' is like a string, but knows what encoding it is in -
> > possibly Unicode, possibly a native encoding and embodies some basic
> > type safety and convenience notions, like not being able to add a
> > Shift-JIS and an EUC string together.  Iteration would always be per
> > character, not per byte; and a certain amount of magic would say that
> > if the string was (say) Japanese, it would acquire a few extra methods
> > for doing some Japan-specific things like expanding half-width
> > katakana.
> > 
> > Of course, we can do this anyway, but I think defining the API clearly
> > in UserString is a great idea.
> 
Guido van Rossum:
> Agreed.  Please somebody send a patch!

I feel unable to do, what Andy proposed.  What I had in mind was a
simple wrapper class around the builtin string type similar to 
UserDict and UserList which can be used to derive other classes from.

I use UserList and UserDict quite often and find them very useful.
They are simple and powerful and easy to extend.

May be the things Andy Robinson proposed above belong into a sub class
which inherits from a simple UserString class?  Do we need
an additional UserUnicode class for unicode string objects?

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From guido@python.org  Tue Mar 28 21:56:49 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 16:56:49 -0500
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: Your message of "Tue, 28 Mar 2000 23:01:29 +0200."
 <m12a37B-000CpwC@artcom0.artcom-gmbh.de>
References: <m12a37B-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <200003282156.QAA13361@eric.cnri.reston.va.us>

[Peter Funk]
> > > > Do we need a UserString class?
> > > 
> Andy Robinson:
> > > This will probably be useful on top of the i18n stuff in due course,
> > > so I'd like it.
> > > 
> > > Something Mike Da Silva and I have discussed a lot is implementing a
> > > higher-level 'typed string' library on top of the Unicode stuff.  
> > > A 'typed string' is like a string, but knows what encoding it is in -
> > > possibly Unicode, possibly a native encoding and embodies some basic
> > > type safety and convenience notions, like not being able to add a
> > > Shift-JIS and an EUC string together.  Iteration would always be per
> > > character, not per byte; and a certain amount of magic would say that
> > > if the string was (say) Japanese, it would acquire a few extra methods
> > > for doing some Japan-specific things like expanding half-width
> > > katakana.
> > > 
> > > Of course, we can do this anyway, but I think defining the API clearly
> > > in UserString is a great idea.
> > 
> Guido van Rossum:
> > Agreed.  Please somebody send a patch!

[PF]
> I feel unable to do, what Andy proposed.  What I had in mind was a
> simple wrapper class around the builtin string type similar to 
> UserDict and UserList which can be used to derive other classes from.

Yes.  I think Andy wanted his class to be a subclass of UserString.

> I use UserList and UserDict quite often and find them very useful.
> They are simple and powerful and easy to extend.

Agreed.

> May be the things Andy Robinson proposed above belong into a sub class
> which inherits from a simple UserString class?  Do we need
> an additional UserUnicode class for unicode string objects?

It would be great if there was a single UserString class which would
work with either Unicode or 8-bit strings.  I think that shouldn't be
too hard, since it's just a wrapper.

So why don't you give the UserString.py a try and leave Andy's wish alone?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From python-dev@python.org  Tue Mar 28 21:47:59 2000
From: python-dev@python.org (Peter Funk)
Date: Tue, 28 Mar 2000 23:47:59 +0200 (MEST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <02c901bf989b$be203d80$34aab5d4@hagrid> from Fredrik Lundh at "Mar 28, 2000 11:55:19 am"
Message-ID: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> Peter Funk wrote:
> > Why should modules be moved into packages?  I don't get it.
> 
Fredrik Lundh:
> fwiw, neither do I...

Pheeewww... And I thought I'am the only one! ;-)

> I'm not so sure that Python really needs a simple reorganization
> of the existing set of standard library modules.  just moving the
> modules around won't solve the real problems with the 1.5.2 std
> library...

Right.  I propose to leave the namespace flat.

I like to argue with Brad J. Cox ---the author of the book "Object
Oriented Programming - An Evolutionary Approach" Addison Wesley,
1987--- who proposes the idea of what he calls a "Software-IC":
He looks closely to design process of electronic engineers which 
ussually deal with large data books with prefabricated components.  
There are often hundreds of them in such a databook and most of
them have terse and not very mnemonic names.
But the engineers using them all day *know* after a short while that a 
7400 chip is a TTL-chip containing 4 NAND gates.  

Nearly the same holds true for software engineers using Software-IC
like 're' or 'struct' as their daily building blocks.

A software engineer who is already familar with his/her building
blocks has absolutely no advantage from a deeply nested namespace.

Now for something completely different:  
Fredrik Lundh about the library documentation:
> here's one proposal:
> http://www.pythonware.com/people/fredrik/librarybook-contents.htm

Whether 'md5', 'getpass' and 'traceback' fit into a category 
'Commonly Used Modules' is ....ummmm.... at least a bit questionable.

But we should really focus the discussion on the structure of the 
documentation.  Since many standard library modules belong into
several logical catagories at once, a true tree structured organization
is simply not sufficient to describe everything.  So it is important
to set up pointers between related functionality.  For example 
'string.replace' is somewhat related to 're.sub' or 'getpass' is
related to 'crypt', however 'crypt' is related to 'md5' and so on.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From pf@artcom-gmbh.de  Tue Mar 28 22:13:02 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 00:13:02 +0200 (MEST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules _tkinter.c,1.91,1.92
In-Reply-To: <200003282007.PAA12045@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000  3: 7: 9 pm"
Message-ID: <m12a4EQ-000CpzC@artcom0.artcom-gmbh.de>

Hi!

Guido van Rossum:
> Modified Files:
> 	_tkinter.c 
[...]

> *** 491,501 ****
>   
>   	v->interp = Tcl_CreateInterp();
> - 
> - #if TKMAJORMINOR == 8001
> - 	TclpInitLibraryPath(baseName);
> - #endif /* TKMAJORMINOR */
>   
> ! #if defined(macintosh) && TKMAJORMINOR >= 8000
> ! 	/* This seems to be needed since Tk 8.0 */
>   	ClearMenuBar();
>   	TkMacInitMenus(v->interp);
> --- 475,481 ----
>   
>   	v->interp = Tcl_CreateInterp();
>   
> ! #if defined(macintosh)
> ! 	/* This seems to be needed */
>   	ClearMenuBar();
>   	TkMacInitMenus(v->interp);
> ***************

Are you sure that the call to 'TclpInitLibraryPath(baseName);' 
is not required in Tcl/Tk 8.1, 8.2, 8.3 ?  
I would propose the following:

+#if TKMAJORMINOR >= 8001
+ TclpInitLibraryPath(baseName);
+# endif /* TKMAJORMINOR */

Here I quote from the Tcl8.3 source distribution:
/*
 *---------------------------------------------------------------------------
 *
 * TclpInitLibraryPath --
 *
 *      Initialize the library path at startup.  We have a minor
 *      metacircular problem that we don't know the encoding of the
 *      operating system but we may need to talk to operating system
 *      to find the library directories so that we know how to talk to
 *      the operating system.
 *
 *      We do not know the encoding of the operating system.
 *      We do know that the encoding is some multibyte encoding.
 *      In that multibyte encoding, the characters 0..127 are equivalent
 *          to ascii.
 *
 *      So although we don't know the encoding, it's safe:
 *          to look for the last slash character in a path in the encoding.
 *          to append an ascii string to a path.
 *          to pass those strings back to the operating system.
 *
 *      But any strings that we remembered before we knew the encoding of
 *      the operating system must be translated to UTF-8 once we know the
 *      encoding so that the rest of Tcl can use those strings.
 *
 *      This call sets the library path to strings in the unknown native
 *      encoding.  TclpSetInitialEncodings() will translate the library
 *      path from the native encoding to UTF-8 as soon as it determines
 *      what the native encoding actually is.
 *
 *      Called at process initialization time.
 *
 * Results:
 *      None.
 */

Sorry, but I don't know enough about this in connection with the 
unicode patches and if we should pay attention to this.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From akuchlin@mems-exchange.org  Tue Mar 28 22:21:07 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Tue, 28 Mar 2000 17:21:07 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de>
References: <02c901bf989b$be203d80$34aab5d4@hagrid>
 <m12a3qB-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <14561.12371.857178.550236@amarok.cnri.reston.va.us>

Peter Funk quoted:
>Fredrik Lundh:
>> I'm not so sure that Python really needs a simple reorganization
>> of the existing set of standard library modules.  just moving the
>> modules around won't solve the real problems with the 1.5.2 std
>> library...
>Right.  I propose to leave the namespace flat.

I third that comment.  Arguments against reorganizing for 1.6:

  1) I doubt that we have time to do a good job of it for 1.6.  
  (1.7, maybe.)

  2) Right now there's no way for third-party extensions to add
  themselves to a package in the standard library.  Once Python finds
  foo/__init__.py, it won't look for site-packages/foo/__init__.py, so
  if you grab, say, "crypto" as a package name in the standard library,
  it's forever lost to third-party extensions.

  3) Rearranging the modules is a good chance to break backward
  compatibility in other ways.  If you want to rewrite, say, httplib
  in a non-compatible way to support HTTP/1.1, then the move from
  httplib.py to net.http.py is a great chance to do that, and leave
  httplib.py as-is for old programs.  If you just copy httplib.py,
  rewriting net.http.py is now harder, since you have to either 
  maintain compatibility or break things *again* in the next version
  of Python.

  4) We wanted to get 1.6 out fairly quickly, and therefore limited 
  the number of features that would get in.  (Vide the "Python 1.6
  timing" thread last ... November, was it?)  Packagizing is feature
  creep that'll slow things down
     
Maybe we should start a separate list to discuss a package hierarchy
for 1.7.  But for 1.6, forget it.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Posting "Please send e-mail, since I don't read this group": Poster is
rendered illiterate by a simple trepanation.
  -- Kibo, in the Happynet Manifesto


From guido@python.org  Tue Mar 28 22:24:46 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 17:24:46 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules _tkinter.c,1.91,1.92
In-Reply-To: Your message of "Wed, 29 Mar 2000 00:13:02 +0200."
 <m12a4EQ-000CpzC@artcom0.artcom-gmbh.de>
References: <m12a4EQ-000CpzC@artcom0.artcom-gmbh.de>
Message-ID: <200003282224.RAA13573@eric.cnri.reston.va.us>

> Are you sure that the call to 'TclpInitLibraryPath(baseName);' 
> is not required in Tcl/Tk 8.1, 8.2, 8.3 ?  
> I would propose the following:
> 
> +#if TKMAJORMINOR >= 8001
> + TclpInitLibraryPath(baseName);
> +# endif /* TKMAJORMINOR */

It is an internal routine which shouldn't be called at all by the
user.  I believe it is called internally at the right time.  Note that
we now call Tcl_FindExecutable(), which *is* intended to be called by
the user (and exists in all 8.x versions) -- maybe this causes
TclpInitLibraryPath() to be called.

I tested it on Solaris, with Tcl/Tk versions 8.0.4, 8.1.1, 8.2.3 and
8.3.0, and it doesn't seem to make any difference, as long as that
version of Tcl/Tk has actually been installed.  (When it's not
installed, TclpInitLibraryPath() doesn't help either.)

I still have to check this on Windows -- maybe it'll have to go back in.

[...]
> Sorry, but I don't know enough about this in connection with the 
> unicode patches and if we should pay attention to this.

It seems to be allright...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Mar 28 22:25:27 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 17:25:27 -0500
Subject: [Python-Dev] Great Renaming? What is the goal?
In-Reply-To: Your message of "Tue, 28 Mar 2000 17:21:07 EST."
 <14561.12371.857178.550236@amarok.cnri.reston.va.us>
References: <02c901bf989b$be203d80$34aab5d4@hagrid> <m12a3qB-000CpwC@artcom0.artcom-gmbh.de>
 <14561.12371.857178.550236@amarok.cnri.reston.va.us>
Message-ID: <200003282225.RAA13586@eric.cnri.reston.va.us>

> Maybe we should start a separate list to discuss a package hierarchy
> for 1.7.  But for 1.6, forget it.

Yes!  Please!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Donald Beaudry <donb@init.com>  Tue Mar 28 22:56:03 2000
From: Donald Beaudry <donb@init.com> (Donald Beaudry)
Date: Tue, 28 Mar 2000 17:56:03 -0500
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003281038400.29911-100000@akbar.nevex.com> <200003282046.PAA18822@zippy.init.com> <200003282102.QAA13041@eric.cnri.reston.va.us>
Message-ID: <200003282256.RAA21080@zippy.init.com>

Guido van Rossum <guido@python.org> wrote,
> This looks like it would break a lot of code.

Only if it were to replace the current implementation.  Perhaps I
inadvertly made that suggestion.  It was not my intention.  Another
way to look at my post is to say that it was intended to point out why
we cant have class methods in the current implementation... it's a
name space issue.

> How do you refer to a superclass method?  It seems that
> ClassName.methodName would refer to the class method, not to the
> unbound instance method.

Right.  To get at the unbound instance methods you must go through the
'unbound accessor' which is accessed via the underscore.

If you wanted to chain to a superclass method it would look like this:

    class child(parent):
        def do_it(self, x):
            z = parent._.do_it(self, x)
            return z

> Also, moving the default instance attributes to a different
> namespace seems to be a semantic change that could change lots of
> things.

I agree... and that's why I wouldnt suggest doing it to the current
class/instance implementation.  However, for those who insist on
having class attributes and methods I think it would be cool to settle
on a standard "syntax".

> I am still in favor of saying "Python has no class methods -- use
> module-global functions for that".

Or use a class/instance implementation provided via an extension
module rather than the built-in one.  The class named 'base' shown in
my example is a class designed for that purpose.

> Between the module, the class and the instance, there are enough
> namespaces -- we don't need another one.

The topic comes up often enough to make me think some might disagree.

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb@init.com                                      Lexington, MA 02421
                  ...So much code, so little time...


From Moshe Zadka <mzadka@geocities.com>  Tue Mar 28 23:24:29 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Wed, 29 Mar 2000 01:24:29 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14561.12371.857178.550236@amarok.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003290119110.18366-100000@sundial>

On Tue, 28 Mar 2000, Andrew M. Kuchling wrote:

> Peter Funk quoted:
> >Fredrik Lundh:
> >> I'm not so sure that Python really needs a simple reorganization
> >> of the existing set of standard library modules.  just moving the
> >> modules around won't solve the real problems with the 1.5.2 std
> >> library...
> >Right.  I propose to leave the namespace flat.
> 
> I third that comment.  Arguments against reorganizing for 1.6:

Let me just note that my original great renaming proposal was titled
"1.7". I'm certain I don't want it to affect the 1.6 release -- my god,
it's almost alpha time and we don't even know how to reorganize.
Strictly 1.7.

>   4) We wanted to get 1.6 out fairly quickly, and therefore limited 
>   the number of features that would get in.  (Vide the "Python 1.6
>   timing" thread last ... November, was it?)  Packagizing is feature
>   creep that'll slow things down

Oh yes. I'm waiting for that 1.6....I wouldn't want to stall it for the
world.

But this is a good chance as any to discuss reasons, before strategies.
Here's why I believe we should re-organize Python modules:
 -- modules fall quite naturally into subpackages. Reducing the number
    of toplevel modules will lessen the clutter
 -- it would be easier to synchronize documentation and code (think
    "automatically generated documentation")
 -- it would enable us to move toward a CPAN-like module repository,
    together with the dist-sig efforts.

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From gmcm@hypernet.com  Tue Mar 28 23:44:27 2000
From: gmcm@hypernet.com (Gordon McMillan)
Date: Tue, 28 Mar 2000 18:44:27 -0500
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14561.12371.857178.550236@amarok.cnri.reston.va.us>
References: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <1257835425-27941123@hypernet.com>

Andrew M. Kuchling wrote:
[snip]
>   2) Right now there's no way for third-party extensions to add
>   themselves to a package in the standard library.  Once Python finds
>   foo/__init__.py, it won't look for site-packages/foo/__init__.py, so
>   if you grab, say, "crypto" as a package name in the standard library,
>   it's forever lost to third-party extensions.

That way lies madness. While I'm happy to carp at Java for 
requiring "com", "net" or whatever as a top level name, their 
intent is correct: the names grabbed by the Python standard 
packages belong to no one but the Python standard 
packages. If you *don't* do that, upgrades are an absolute 
nightmare. 

Marc-Andre grabbed "mx". If (as I rather suspect <wink>) he 
wants to remake the entire standard lib in his image, he's 
welcome to - *under* mx.

What would happen if he (and everyone else) installed 
themselves *into* my core packages, then I decided I didn't 
want his stuff? More than likely I'd have to scrub the damn 
installation and start all over again.

- Gordon


From DavidA@ActiveState.com  Wed Mar 29 00:01:57 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Tue, 28 Mar 2000 16:01:57 -0800
Subject: [Python-Dev] yeah! for Jeremy and Greg
Message-ID: <NDBBJPNCJLKKIOBLDOMJGEDNCDAA.DavidA@ActiveState.com>

I'm thrilled to see the extended call syntax patches go in!  One less wart
in the language!

Jeremy ZitBlaster Hylton and Greg Noxzema Ewing!

--david


From pf@artcom-gmbh.de  Tue Mar 28 23:53:50 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 01:53:50 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <200003282156.QAA13361@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000  4:56:49 pm"
Message-ID: <m12a5ny-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> [Peter Funk]
> > > > > Do we need a UserString class?
[...]
Guido van Rossum:
> So why don't you give the UserString.py a try and leave Andy's wish alone?

Okay.  Here we go.  Could someone please have a close eye on this?
I've haccked it up in hurry.
---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ----
#!/usr/bin/env python
"""A user-defined wrapper around string objects

Note: string objects have grown methods in Python 1.6 
This module requires Python 1.6 or later.
"""
import sys

# XXX Totally untested and hacked up until 2:00 am with too less sleep ;-)

class UserString:
    def __init__(self, string=""):
        self.data = string
    def __repr__(self): return repr(self.data)
    def __cmp__(self, string):
        if isinstance(string, UserString):
            return cmp(self.data, string.data)
        else:
            return cmp(self.data, string)
    def __len__(self): return len(self.data)
    # methods defined in alphabetical order
    def capitalize(self): return self.__class__(self.data.capitalize())
    def center(self, width): return self.__class__(self.data.center(width))
    def count(self, sub, start=0, end=sys.maxint):
        return self.data.count(sub, start, end)
    def encode(self, encoding=None, errors=None): # XXX improve this?
        if encoding:
	    if errors:
		return self.__class__(self.data.encode(encoding, errors))
	    else:
		return self.__class__(self.data.encode(encoding))
	else: 
	    return self.__class__(self.data.encode())
    def endswith(self):
        raise NotImplementedError
    def	find(self, sub, start=0, end=sys.maxint): 
        return self.data.find(sub, start, end)
    def index(self): 
        return self.data.index(sub, start, end)
    def isdecimal(self): return self.data.isdecimal()
    def isdigit(self): return self.data.isdigit()
    def islower(self): return self.data.islower()
    def isnumeric(self): return self.data.isnumeric()
    def isspace(self): return self.data.isspace()
    def istitle(self): return self.data.istitle()
    def isupper(self): return self.data.isupper()
    def join(self, seq): return self.data.join(seq)
    def ljust(self, width): return self.__class__(self.data.ljust(width))
    def lower(self): return self.__class__(self.data.lower())
    def lstrip(self): return self.__class__(self.data.lstrip())
    def replace(self, old, new, maxsplit=-1): 
	return self.__class__(self.data.replace(old, new, maxsplit))
    def rfind(self, sub, start=0, end=sys.maxint): 
        return self.data.rfind(sub, start, end)
    def rindex(self, sub, start=0, end=sys.maxint): 
        return self.data.rindex(sub, start, end)
    def rjust(self, width): return self.__class__(self.data.rjust(width))
    def rstrip(self): return self.__class__(self.data.rstrip())
    def split(self, sep=None, maxsplit=-1): 
        return self.data.split(sep, maxsplit)
    def splitlines(self, maxsplit=-1): return self.data.splitlines(maxsplit)
    def startswith(self, prefix, start=0, end=sys.maxint): 
        return self.data.startswith(prefix, start, end)
    def strip(self): return self.__class__(self.data.strip())
    def swapcase(self): return self.__class__(self.data.swapcase())
    def title(self): return self.__class__(self.data.title())
    def translate(self, table, deletechars=""): 
        return self.__class__(self.data.translate(table, deletechars))
    def upper(self): return self.__class__(self.data.upper())

    def __add__(self, other):
        if isinstance(other, UserString):
            return self.__class__(self.data + other.data)
        elif isinstance(other, type(self.data)):
            return self.__class__(self.data + other)
        else:
            return self.__class__(self.data + str(other))
    def __radd__(self, other):
        if isinstance(other, type(self.data)):
            return self.__class__(other + self.data)
        else:
            return self.__class__(str(other) + self.data)
    def __mul__(self, n):
        return self.__class__(self.data*n)
    __rmul__ = __mul__

def _test():
    s = UserString("abc")
    u = UserString(u"efg")
    # XXX add some real tests here?
    return [0]

if __name__ == "__main__":
    import sys
    sys.exit(_test()[0])


From Fredrik Lundh" <effbot@telia.com  Tue Mar 28 23:12:55 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Wed, 29 Mar 2000 01:12:55 +0200
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <NDBBJPNCJLKKIOBLDOMJGEDNCDAA.DavidA@ActiveState.com>
Message-ID: <012301bf990b$2a494c80$34aab5d4@hagrid>

> I'm thrilled to see the extended call syntax patches go in!  One less =
wart
> in the language!

but did he compile before checking in?

..\Python\compile.c(1225) : error C2065: 'CALL_FUNCTION_STAR' :
undeclared identifier

(compile.c and opcode.h both mention this identifier, but
nobody defines it...  should it be CALL_FUNCTION_VAR,
perhaps?)

</F>


From guido@python.org  Wed Mar 29 00:07:34 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 19:07:34 -0500
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: Your message of "Wed, 29 Mar 2000 01:53:50 +0200."
 <m12a5ny-000CpwC@artcom0.artcom-gmbh.de>
References: <m12a5ny-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <200003290007.TAA16081@eric.cnri.reston.va.us>

> > [Peter Funk]
> > > > > > Do we need a UserString class?
> [...]
> Guido van Rossum:
> > So why don't you give the UserString.py a try and leave Andy's wish alone?
[Peter]
> Okay.  Here we go.  Could someone please have a close eye on this?
> I've haccked it up in hurry.

Good job!

Go get some sleep, and tomorrow morning when you're fresh, compare it
to UserList.  From visual inpsection, you seem to be missing
__getitem__ and __getslice__, and maybe more (of course not __set*__).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From ping@lfw.org  Wed Mar 29 00:13:24 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Tue, 28 Mar 2000 18:13:24 -0600 (CST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <012301bf990b$2a494c80$34aab5d4@hagrid>
Message-ID: <Pine.LNX.4.10.10003281809490.4220-100000@server1.lfw.org>

On Wed, 29 Mar 2000, Fredrik Lundh wrote:
> > I'm thrilled to see the extended call syntax patches go in!  One less wart
> > in the language!
> 
> but did he compile before checking in?

You beat me to it.  I read David's message and got so excited
i just had to try it right away.  So i updated my CVS tree,
did "make", and got the same error:

    make[1]: Entering directory `/home/ping/dev/python/dist/src/Python'
    gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H   -c compile.c -o compile.o
    compile.c: In function `com_call_function':
    compile.c:1225: `CALL_FUNCTION_STAR' undeclared (first use in this function)
    compile.c:1225: (Each undeclared identifier is reported only once
    compile.c:1225: for each function it appears in.)
    make[1]: *** [compile.o] Error 1

> (compile.c and opcode.h both mention this identifier, but
> nobody defines it...  should it be CALL_FUNCTION_VAR,
> perhaps?)

But CALL_FUNCTION_STAR is mentioned in the comments...

    #define CALL_FUNCTION   131     /* #args + (#kwargs<<8) */
    #define MAKE_FUNCTION   132     /* #defaults */
    #define BUILD_SLICE     133     /* Number of items */

    /* The next 3 opcodes must be contiguous and satisfy
       (CALL_FUNCTION_STAR - CALL_FUNCTION) & 3 == 1  */
    #define CALL_FUNCTION_VAR          140  /* #args + (#kwargs<<8) */
    #define CALL_FUNCTION_KW           141  /* #args + (#kwargs<<8) */
    #define CALL_FUNCTION_VAR_KW       142  /* #args + (#kwargs<<8) */

The condition (CALL_FUNCTION_STAR - CALL_FUNCTION) & 3 == 1
doesn't make much sense, though...


-- ?!ng


From jeremy@cnri.reston.va.us  Wed Mar 29 00:18:54 2000
From: jeremy@cnri.reston.va.us (Jeremy Hylton)
Date: Tue, 28 Mar 2000 19:18:54 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <012301bf990b$2a494c80$34aab5d4@hagrid>
References: <NDBBJPNCJLKKIOBLDOMJGEDNCDAA.DavidA@ActiveState.com>
 <012301bf990b$2a494c80$34aab5d4@hagrid>
Message-ID: <14561.19438.157799.810802@goon.cnri.reston.va.us>

>>>>> "FL" == Fredrik Lundh <effbot@telia.com> writes:

  >> I'm thrilled to see the extended call syntax patches go in!  One
  >> less wart in the language!

  FL> but did he compile before checking in?

Indeed, but not often enough :-).

  FL> ..\Python\compile.c(1225) : error C2065: 'CALL_FUNCTION_STAR' :
  FL> undeclared identifier

  FL> (compile.c and opcode.h both mention this identifier, but nobody
  FL> defines it...  should it be CALL_FUNCTION_VAR, perhaps?)

This was a last minute change of names.  I had previously compiled
under the old names.  The Makefile doesn't describe the dependency
between opcode.h and compile.c.  And the compile.o file I had worked,
because the only change was to the name of a macro.

It's too bad the Makefile doesn't have all the dependencies.  It seems
that it's necessary to do a make clean before checking in a change
that affects many files.

Jeremy


From klm@digicool.com  Wed Mar 29 00:30:05 2000
From: klm@digicool.com (Ken Manheimer)
Date: Tue, 28 Mar 2000 19:30:05 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJGEDNCDAA.DavidA@ActiveState.com>
Message-ID: <Pine.LNX.4.21.0003281922460.10812-100000@korak.digicool.com>

On Tue, 28 Mar 2000, David Ascher wrote:

> I'm thrilled to see the extended call syntax patches go in!  One less wart
> in the language!

Me too!  Even the lisps i used to know (albeit ancient, according to eric)
couldn't get it as tidy as this.

(Silly me, now i'm imagining we're going to see operator assignments just
around the bend.  "Give them a tasty morsel, they ask for your dinner..."-)

Ken
klm@digicool.com


From ping@lfw.org  Wed Mar 29 00:35:54 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Tue, 28 Mar 2000 18:35:54 -0600 (CST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <14561.19438.157799.810802@goon.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>

On Tue, 28 Mar 2000, Jeremy Hylton wrote:
> 
> It's too bad the Makefile doesn't have all the dependencies.  It seems
> that it's necessary to do a make clean before checking in a change
> that affects many files.

I updated again and rebuilt.

    >>> def sum(*args):
    ...     s = 0
    ...     for x in args: s = s + x
    ...     return s
    ... 
    >>> sum(2,3,4)
    9
    >>> sum(*[2,3,4])
    9
    >>> x = (2,3,4)
    >>> sum(*x)
    9
    >>> def func(a, b, c):
    ...     print a, b, c
    ... 
    >>> func(**{'a':2, 'b':1, 'c':6})
    2 1 6
    >>> func(**{'c':8, 'a':1, 'b':9})
    1 9 8
    >>> 

*cool*.

So does this completely obviate the need for "apply", then?

    apply(x, y, z)  <==>  x(*y, **z)


-- ?!ng


From guido@python.org  Wed Mar 29 00:35:17 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 19:35:17 -0500
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: Your message of "Tue, 28 Mar 2000 18:35:54 CST."
 <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
References: <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
Message-ID: <200003290035.TAA16278@eric.cnri.reston.va.us>

> *cool*.
> 
> So does this completely obviate the need for "apply", then?
> 
>     apply(x, y, z)  <==>  x(*y, **z)

I think so (except for backwards compatibility).  The 1.6 docs for
apply should point this out!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From DavidA@ActiveState.com  Wed Mar 29 00:42:20 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Tue, 28 Mar 2000 16:42:20 -0800
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
Message-ID: <NDBBJPNCJLKKIOBLDOMJIEEACDAA.DavidA@ActiveState.com>

> I updated again and rebuilt.
> 
>     >>> def sum(*args):
>     ...     s = 0
>     ...     for x in args: s = s + x
>     ...     return s
>     ... 
>     >>> sum(2,3,4)
>     9
>     >>> sum(*[2,3,4])
>     9
>     >>> x = (2,3,4)
>     >>> sum(*x)
>     9
>     >>> def func(a, b, c):
>     ...     print a, b, c
>     ... 
>     >>> func(**{'a':2, 'b':1, 'c':6})
>     2 1 6
>     >>> func(**{'c':8, 'a':1, 'b':9})
>     1 9 8
>     >>> 
> 
> *cool*.


But most importantly, IMO:

class SubClass(Class):
	def __init__(self, a, *args, **kw):
		self.a = a
		Class.__init__(self, *args, **kw)

Much neater.


From bwarsaw@cnri.reston.va.us  Wed Mar 29 00:46:11 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 28 Mar 2000 19:46:11 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <14561.19438.157799.810802@goon.cnri.reston.va.us>
 <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
Message-ID: <14561.21075.637108.322536@anthem.cnri.reston.va.us>

Uh oh.  Fresh CVS update and make clean, make:

-------------------- snip snip --------------------
Python 1.5.2+ (#20, Mar 28 2000, 19:37:38)  [GCC 2.8.1] on sunos5
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> def sum(*args):
...  s = 0
...  for x in args: s = s + x
...  return s
... 
>>> class Nums:
...  def __getitem__(self, i):
...   if i >= 10 or i < 0: raise IndexError
...   return i
... 
>>> n = Nums()
>>> for i in n: print i
... 
0
1
2
3
4
5
6
7
8
9
>>> sum(*n)
Traceback (innermost last):
  File "<stdin>", line 1, in ?
SystemError: bad argument to internal function
-------------------- snip snip --------------------

-Barry


From bwarsaw@cnri.reston.va.us  Wed Mar 29 01:02:16 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 28 Mar 2000 20:02:16 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <14561.19438.157799.810802@goon.cnri.reston.va.us>
 <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
 <14561.21075.637108.322536@anthem.cnri.reston.va.us>
Message-ID: <14561.22040.383370.283163@anthem.cnri.reston.va.us>

Changing the definition of class Nums to

class Nums:
    def __getitem__(self, i):
        if 0 <= i < 10: return i
        raise IndexError
    def __len__(self):
        return 10

I.e. adding the __len__() method avoids the SystemError.

Either the *arg call should not depend on the sequence being
lenght-able, or it should error check that the length calculation
doesn't return -1 or raise an exception.

Looking at PySequence_Length() though, it seems that m->sq_length(s)
can return -1 without setting a type_error.  So the fix is either to
include a check for return -1 in PySequence_Length() when calling
sq_length, or instance_length() should set a TypeError when it has no
__len__() method and returns -1.

I gotta run so I can't follow this through -- I'm sure I'll see the
right solution from someone in tomorrow mornings email :)

-Barry


From ping@lfw.org  Wed Mar 29 01:17:27 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Tue, 28 Mar 2000 19:17:27 -0600 (CST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <14561.22040.383370.283163@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003281916100.4584-100000@server1.lfw.org>

On Tue, 28 Mar 2000, Barry A. Warsaw wrote:
> 
> Changing the definition of class Nums to
> 
> class Nums:
>     def __getitem__(self, i):
>         if 0 <= i < 10: return i
>         raise IndexError
>     def __len__(self):
>         return 10
> 
> I.e. adding the __len__() method avoids the SystemError.

It should be noted that "apply" has the same problem, with a
different counterintuitive error message:

    >>> n = Nums()
    >>> apply(sum, n)
    Traceback (innermost last):
      File "<stdin>", line 1, in ?
    AttributeError: __len__


-- ?!ng


From jeremy@cnri.reston.va.us  Wed Mar 29 02:59:26 2000
From: jeremy@cnri.reston.va.us (Jeremy Hylton)
Date: Tue, 28 Mar 2000 21:59:26 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJIEEACDAA.DavidA@ActiveState.com>
References: <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
 <NDBBJPNCJLKKIOBLDOMJIEEACDAA.DavidA@ActiveState.com>
Message-ID: <14561.29070.940238.542509@bitdiddle.cnri.reston.va.us>

>>>>> "DA" == David Ascher <DavidA@ActiveState.com> writes:

  DA> But most importantly, IMO:

  DA> class SubClass(Class):
  DA> 	def __init__(self, a, *args, **kw):
  DA> 		self.a = a
  DA> 		Class.__init__(self, *args, **kw)

  DA> Much neater.

This version of method overloading was what I liked most about Greg's
patch.  Note that I also prefer:

class SubClass(Class):
    super_init = Class.__init__

    def __init__(self, a, *args, **kw):
        self.a = a
	self.super_init(*args, **kw)

I've been happy to have all the overridden methods explicitly labelled
at the top of a class lately.  It is much easier to change the class
hierarchy later.

Jeremy


From gward@cnri.reston.va.us  Wed Mar 29 03:15:00 2000
From: gward@cnri.reston.va.us (Greg Ward)
Date: Tue, 28 Mar 2000 22:15:00 -0500
Subject: [Python-Dev] __debug__ and py_compile
Message-ID: <20000328221500.A3290@cnri.reston.va.us>

Hi all --

a particularly active member of the Distutils-SIG brought the
global '__debug__' flag to my attention, since I (and thus my code)
didn't know if calling 'py_compile.compile()' would result in a ".pyc"
or a ".pyo" file.  It appears that, using __debug__, you can determine
what you're going to get.  Cool!

However, it doesn't look like you can *choose* what you're going to
get.  Is this correct?  Ie. does the presence/absence of -O when the
interpreter starts up *completely* decide how code is compiled?

Also, can I rely on __debug__ being there in the future?  How about in
the past?  I still occasionally ponder making Distutils compatible with
Python 1.5.1.

Thanks --

       Greg


From guido@python.org  Wed Mar 29 04:08:12 2000
From: guido@python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 23:08:12 -0500
Subject: [Python-Dev] __debug__ and py_compile
In-Reply-To: Your message of "Tue, 28 Mar 2000 22:15:00 EST."
 <20000328221500.A3290@cnri.reston.va.us>
References: <20000328221500.A3290@cnri.reston.va.us>
Message-ID: <200003290408.XAA17991@eric.cnri.reston.va.us>

> a particularly active member of the Distutils-SIG brought the
> global '__debug__' flag to my attention, since I (and thus my code)
> didn't know if calling 'py_compile.compile()' would result in a ".pyc"
> or a ".pyo" file.  It appears that, using __debug__, you can determine
> what you're going to get.  Cool!
> 
> However, it doesn't look like you can *choose* what you're going to
> get.  Is this correct?  Ie. does the presence/absence of -O when the
> interpreter starts up *completely* decide how code is compiled?

Correct.  You (currently) can't change the opt setting of the
compiler.  (It was part of the compiler restructuring to give more
freedom here; this has been pushed back to 1.7.)

> Also, can I rely on __debug__ being there in the future?  How about in
> the past?  I still occasionally ponder making Distutils compatible with
> Python 1.5.1.

__debug__ is as old as the assert statement, going back to at least
1.5.0.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Moshe Zadka <mzadka@geocities.com>  Wed Mar 29 05:35:51 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Wed, 29 Mar 2000 07:35:51 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <1257835425-27941123@hypernet.com>
Message-ID: <Pine.GSO.4.10.10003290729530.20524-100000@sundial>

On Tue, 28 Mar 2000, Gordon McMillan wrote:

> What would happen if he (and everyone else) installed 
> themselves *into* my core packages, then I decided I didn't 
> want his stuff? More than likely I'd have to scrub the damn 
> installation and start all over again.

I think Greg Stein answered that objection, by reminding us that the
filesystem isn't the only way to set up a package hierarchy. In
particular, even with Python's current module system, there is no need to
scrub installations: Python core modules go (under UNIX) in
/usr/local/lib/python1.5, and 3rd party modules go in
/usr/local/lib/python1.5/site-packages. Need to remove stuff? Remove
whatever is in /usr/local/lib/python1.5/site-packages. Need to upgrade?
Just backup /usr/local/lib/python1.5/site-packages, remove
/usr/local/lib/python1.5/, install, and move 3rd party modules back from
backup. This becomes even easier if the standard installation is in a
JAR-like file, and 3rd party modules are also in a JAR-like file, but
specified to be in their natural place.

Wow! That was a long rant!

Anyway, I already expressed my preference of the Perl way, over the Java
way. For one thing, I don't want to have to register a domain just so I
could distribute Python code <wink>

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From bwarsaw@cnri.reston.va.us  Wed Mar 29 05:42:34 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Wed, 29 Mar 2000 00:42:34 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <14561.19438.157799.810802@goon.cnri.reston.va.us>
 <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
 <14561.21075.637108.322536@anthem.cnri.reston.va.us>
Message-ID: <14561.38858.41246.28460@anthem.cnri.reston.va.us>

>>>>> "BAW" == Barry A Warsaw <bwarsaw@cnri.reston.va.us> writes:

    BAW> Uh oh.  Fresh CVS update and make clean, make:

    >>> sum(*n)
    | Traceback (innermost last):
    |   File "<stdin>", line 1, in ?
    | SystemError: bad argument to internal function

Here's a proposed patch that will cause a TypeError to be raised
instead.

-Barry

-------------------- snip snip --------------------
Index: abstract.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Objects/abstract.c,v
retrieving revision 2.33
diff -c -r2.33 abstract.c
*** abstract.c	2000/03/10 22:55:18	2.33
--- abstract.c	2000/03/29 05:36:21
***************
*** 860,866 ****
  	PyObject *s;
  {
  	PySequenceMethods *m;
! 
  	if (s == NULL) {
  		null_error();
  		return -1;
--- 860,867 ----
  	PyObject *s;
  {
  	PySequenceMethods *m;
! 	int size = -1;
! 	
  	if (s == NULL) {
  		null_error();
  		return -1;
***************
*** 868,877 ****
  
  	m = s->ob_type->tp_as_sequence;
  	if (m && m->sq_length)
! 		return m->sq_length(s);
  
! 	type_error("len() of unsized object");
! 	return -1;
  }
  
  PyObject *
--- 869,879 ----
  
  	m = s->ob_type->tp_as_sequence;
  	if (m && m->sq_length)
! 		size = m->sq_length(s);
  
! 	if (size < 0)
! 		type_error("len() of unsized object");
! 	return size;
  }
  
  PyObject *
Index: ceval.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Python/ceval.c,v
retrieving revision 2.169
diff -c -r2.169 ceval.c
*** ceval.c	2000/03/28 23:49:16	2.169
--- ceval.c	2000/03/29 05:39:00
***************
*** 1636,1641 ****
--- 1636,1649 ----
  				break;
  			    }
  			    nstar = PySequence_Length(stararg);
+ 			    if (nstar < 0) {
+ 				    if (!PyErr_Occurred)
+ 					    PyErr_SetString(
+ 						    PyExc_TypeError,
+ 						    "len() of unsized object");
+ 				    x = NULL;
+ 				    break;
+ 			    }
  			}
  			if (nk > 0) {
  			    if (kwdict == NULL) {


From bwarsaw@cnri.reston.va.us  Wed Mar 29 05:46:19 2000
From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us)
Date: Wed, 29 Mar 2000 00:46:19 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <14561.22040.383370.283163@anthem.cnri.reston.va.us>
 <Pine.LNX.4.10.10003281916100.4584-100000@server1.lfw.org>
Message-ID: <14561.39083.748093.694726@anthem.cnri.reston.va.us>

>>>>> "KY" == Ka-Ping Yee <ping@lfw.org> writes:

    | It should be noted that "apply" has the same problem, with a
    | different counterintuitive error message:

    >> n = Nums() apply(sum, n)
    |     Traceback (innermost last):
    |       File "<stdin>", line 1, in ?
    |     AttributeError: __len__

The patch I just posted fixes this too.  The error message ain't
great, but at least it's consistent with the direct call.

-Barry

-------------------- snip snip --------------------
Traceback (innermost last):
  File "/tmp/doit.py", line 15, in ?
    print apply(sum, n)
TypeError: len() of unsized object


From pf@artcom-gmbh.de  Wed Mar 29 06:30:22 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 08:30:22 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <Pine.GSO.4.10.10003290737580.20524-100000@sundial> from Moshe Zadka at "Mar 29, 2000  7:44:42 am"
Message-ID: <m12aBzi-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> On Wed, 29 Mar 2000, Peter Funk wrote:
> 
> > class UserString:
> >     def __init__(self, string=""):
> >         self.data = string
>           ^^^^^^^
Moshe Zadka wrote:
> Why do you feel there is a need to default? Strings are immutable

I had something like this in my mind:

class MutableString(UserString):
    """Python strings are immutable objects.  But of course this can
    be changed in a derived class implementing the missing methods.

        >>> s = MutableString()
	>>> s[0:5] = "HUH?"
    """
    def __setitem__(self, char):
        ....
    def __setslice__(self, i, j, substring):
        ....
> What about __int__, __long__, __float__, __str__, __hash__?
> And what about __getitem__ and __contains__?
> And __complex__?

I was obviously too tired and too eager to get this out!  
Thanks for reviewing and responding so quickly.  I will add them.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From Moshe Zadka <mzadka@geocities.com>  Wed Mar 29 06:51:30 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Wed, 29 Mar 2000 08:51:30 +0200 (IST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString
 class?
In-Reply-To: <m12aBzi-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.GSO.4.10.10003290850310.20736-100000@sundial>

On Wed, 29 Mar 2000, Peter Funk wrote:

> Moshe Zadka wrote:
> > Why do you feel there is a need to default? Strings are immutable
> 
> I had something like this in my mind:
> 
> class MutableString(UserString):
>     """Python strings are immutable objects.  But of course this can
>     be changed in a derived class implementing the missing methods.

Then add the default in the constructor for MutableString....

eagerly-waiting-for-UserString.py-ly y'rs, Z.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From Moshe Zadka <mzadka@geocities.com>  Wed Mar 29 07:03:53 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Wed, 29 Mar 2000 09:03:53 +0200 (IST)
Subject: [Python-Dev] 1.5.2->1.6 Changes
Message-ID: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>

I'm starting to compile a list of changes from 1.5.2 to 1.6. Here's what I
came up with so far
-- string objects now have methods (though they are still immutable)
-- unicode support: Unicode strings are marked with u"string", and there
   is support for arbitrary encoders/decoders
-- "in" operator can now be overriden in user-defined classes to mean anything:
   it calls the magic method __contains__
-- SRE is the new regular expression engine. re.py became an interface to
   the same engine. The new engine fully supports unicode regular expressions.
-- Some methods which would take multiple arguments and treat them as a tuple
   were fixed: list.{append, insert, remove, count}, socket.connect
-- Some modules were made obsolete
-- filecmp.py (supersedes the old cmp.py and dircmp.py modules),
-- tabnanny.py (make sure the source file doesn't assume a specific tab-width)
-- win32reg (win32 registry editor)
-- unicode module, and codecs package
-- New calling syntax: f(*args, **kw) equivalent to apply(f, args, kw)
-- _tkinter now uses the object, rather then string, interface to Tcl.

Please e-mail me personally if you think of any other changes, and I'll 
try to integrate them into a complete "changes" document.

Thanks in advance
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From esr@thyrsus.com  Wed Mar 29 07:21:29 2000
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 29 Mar 2000 02:21:29 -0500
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>; from Moshe Zadka on Wed, Mar 29, 2000 at 09:03:53AM +0200
References: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>
Message-ID: <20000329022129.A15539@thyrsus.com>

Moshe Zadka <moshez@math.huji.ac.il>:
> -- _tkinter now uses the object, rather then string, interface to Tcl.

Hm, does this mean that the annoying requirement to do explicit gets and
sets to move data between the Python world and the Tcl/Tk world is gone?
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

"A system of licensing and registration is the perfect device to deny
gun ownership to the bourgeoisie."
	-- Vladimir Ilyich Lenin


From Moshe Zadka <mzadka@geocities.com>  Wed Mar 29 07:22:54 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Wed, 29 Mar 2000 09:22:54 +0200 (IST)
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: <20000329022129.A15539@thyrsus.com>
Message-ID: <Pine.GSO.4.10.10003290921450.21447-100000@sundial>

On Wed, 29 Mar 2000, Eric S. Raymond wrote:

> Moshe Zadka <moshez@math.huji.ac.il>:
> > -- _tkinter now uses the object, rather then string, interface to Tcl.
> 
> Hm, does this mean that the annoying requirement to do explicit gets and
> sets to move data between the Python world and the Tcl/Tk world is gone?

I doubt it. It's just that Python and Tcl have such a different outlook
about variables, that I don't think it can be slided over.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From pf@artcom-gmbh.de  Wed Mar 29 09:16:17 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 11:16:17 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <Pine.GSO.4.10.10003290850310.20736-100000@sundial> from Moshe Zadka at "Mar 29, 2000  8:51:30 am"
Message-ID: <m12aEaH-000CpwC@artcom0.artcom-gmbh.de>

Hi!

Moshe Zadka:
> eagerly-waiting-for-UserString.py-ly y'rs, Z.

Well, I've added the missing methods.  Unfortunately I ran out of time now and
a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still 
missing.  

Regards, Peter
---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ----
#!/usr/bin/env python
"""A user-defined wrapper around string objects

Note: string objects have grown methods in Python 1.6 
This module requires Python 1.6 or later.
"""
from types import StringType, UnicodeType
import sys

class UserString:
    def __init__(self, string):
        self.data = string
    def __str__(self): return str(self.data)
    def __repr__(self): return repr(self.data)
    def __int__(self): return int(self.data)
    def __long__(self): return long(self.data)
    def __float__(self): return float(self.data)
    def __hash__(self): return hash(self.data)

    def __cmp__(self, string):
        if isinstance(string, UserString):
            return cmp(self.data, string.data)
        else:
            return cmp(self.data, string)
    def __contains__(self, char):
        return char in self.data

    def __len__(self): return len(self.data)
    def __getitem__(self, index): return self.__class__(self.data[index])
    def __getslice__(self, start, end):
        start = max(start, 0); end = max(end, 0)
        return self.__class__(self.data[start:end])

    def __add__(self, other):
        if isinstance(other, UserString):
            return self.__class__(self.data + other.data)
        elif isinstance(other, StringType) or isinstance(other, UnicodeType):
            return self.__class__(self.data + other)
        else:
            return self.__class__(self.data + str(other))
    def __radd__(self, other):
        if isinstance(other, StringType) or isinstance(other, UnicodeType):
            return self.__class__(other + self.data)
        else:
            return self.__class__(str(other) + self.data)
    def __mul__(self, n):
        return self.__class__(self.data*n)
    __rmul__ = __mul__

    # the following methods are defined in alphabetical order:
    def capitalize(self): return self.__class__(self.data.capitalize())
    def center(self, width): return self.__class__(self.data.center(width))
    def count(self, sub, start=0, end=sys.maxint):
        return self.data.count(sub, start, end)
    def encode(self, encoding=None, errors=None): # XXX improve this?
        if encoding:
            if errors:
                return self.__class__(self.data.encode(encoding, errors))
            else:
                return self.__class__(self.data.encode(encoding))
        else: 
            return self.__class__(self.data.encode())
    def endswith(self, suffix, start=0, end=sys.maxint):
        return self.data.endswith(suffix, start, end)
    def find(self, sub, start=0, end=sys.maxint): 
        return self.data.find(sub, start, end)
    def index(self, sub, start=0, end=sys.maxint): 
        return self.data.index(sub, start, end)
    def isdecimal(self): return self.data.isdecimal()
    def isdigit(self): return self.data.isdigit()
    def islower(self): return self.data.islower()
    def isnumeric(self): return self.data.isnumeric()
    def isspace(self): return self.data.isspace()
    def istitle(self): return self.data.istitle()
    def isupper(self): return self.data.isupper()
    def join(self, seq): return self.data.join(seq)
    def ljust(self, width): return self.__class__(self.data.ljust(width))
    def lower(self): return self.__class__(self.data.lower())
    def lstrip(self): return self.__class__(self.data.lstrip())
    def replace(self, old, new, maxsplit=-1): 
        return self.__class__(self.data.replace(old, new, maxsplit))
    def rfind(self, sub, start=0, end=sys.maxint): 
        return self.data.rfind(sub, start, end)
    def rindex(self, sub, start=0, end=sys.maxint): 
        return self.data.rindex(sub, start, end)
    def rjust(self, width): return self.__class__(self.data.rjust(width))
    def rstrip(self): return self.__class__(self.data.rstrip())
    def split(self, sep=None, maxsplit=-1): 
        return self.data.split(sep, maxsplit)
    def splitlines(self, maxsplit=-1): return self.data.splitlines(maxsplit)
    def startswith(self, prefix, start=0, end=sys.maxint): 
        return self.data.startswith(prefix, start, end)
    def strip(self): return self.__class__(self.data.strip())
    def swapcase(self): return self.__class__(self.data.swapcase())
    def title(self): return self.__class__(self.data.title())
    def translate(self, table, deletechars=""): 
        return self.__class__(self.data.translate(table, deletechars))
    def upper(self): return self.__class__(self.data.upper())

class MutableString(UserString):
    """mutable string objects

    Python strings are immutable objects.  This has the advantage, that
    strings may be used as dictionary keys.  If this property isn't needed
    and you insist on changing string values in place instead, you may cheat
    and use MutableString.

    But the purpose of this class is an educational one: to prevent
    people from inventing their own mutable string class derived
    from UserString and than forget thereby to remove (override) the
    __hash__ method inherited from ^UserString.  This would lead to
    errors that would be very hard to track down.

    A faster and better solution is to rewrite the program using lists."""
    def __init__(self, string=""):
        self.data = string
    def __hash__(self): 
        raise TypeError, "unhashable type (it is mutable)"
    def __setitem__(self, index, sub):
	if index < 0 or index >= len(self.data): raise IndexError
        self.data = self.data[:index] + sub + self.data[index+1:]
    def __delitem__(self, index):
	if index < 0 or index >= len(self.data): raise IndexError
        self.data = self.data[:index] + self.data[index+1:]
    def __setslice__(self, start, end, sub):
        start = max(start, 0); end = max(end, 0)
        if isinstance(sub, UserString):
            self.data = self.data[:start]+sub.data+self.data[end:]
        elif isinstance(sub, StringType) or isinstance(sub, UnicodeType):
            self.data = self.data[:start]+sub+self.data[end:]
        else:
            self.data =  self.data[:start]+str(sub)+self.data[end:]
    def __delslice__(self, start, end):
        start = max(start, 0); end = max(end, 0)
        self.data = self.data[:start] + self.data[end:]
    def immutable(self):
        return UserString(self.data)
    
def _test():
    s = UserString("abc")
    u = UserString(u"efg")
    # XXX add some real tests here?
    return 0

if __name__ == "__main__":
    sys.exit(_test())


From mal@lemburg.com  Wed Mar 29 09:34:21 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 29 Mar 2000 11:34:21 +0200
Subject: [Python-Dev] Great Renaming?  What is the goal?
References: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de> <1257835425-27941123@hypernet.com>
Message-ID: <38E1CE1D.7899B1BC@lemburg.com>

Gordon McMillan wrote:
> 
> Andrew M. Kuchling wrote:
> [snip]
> >   2) Right now there's no way for third-party extensions to add
> >   themselves to a package in the standard library.  Once Python finds
> >   foo/__init__.py, it won't look for site-packages/foo/__init__.py, so
> >   if you grab, say, "crypto" as a package name in the standard library,
> >   it's forever lost to third-party extensions.
> 
> That way lies madness. While I'm happy to carp at Java for
> requiring "com", "net" or whatever as a top level name, their
> intent is correct: the names grabbed by the Python standard
> packages belong to no one but the Python standard
> packages. If you *don't* do that, upgrades are an absolute
> nightmare.
> 
> Marc-Andre grabbed "mx". If (as I rather suspect <wink>) he
> wants to remake the entire standard lib in his image, he's
> welcome to - *under* mx.

Right, that's the way I see it too. BTW, where can I register
the "mx" top-level package name ? Should these be registered
in the NIST registry ? Will the names registered there be
honored ?
 
> What would happen if he (and everyone else) installed
> themselves *into* my core packages, then I decided I didn't
> want his stuff? More than likely I'd have to scrub the damn
> installation and start all over again.

That's a no-no, IMHO. Unless explicitly allowed, packages
should *not* install themselves as subpackages to other
existing top-level packages. If they do, its their problem
if the hierarchy changes...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Moshe Zadka <mzadka@geocities.com>  Wed Mar 29 09:59:47 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Wed, 29 Mar 2000 11:59:47 +0200 (IST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString
 class?
In-Reply-To: <m12aEaH-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.GSO.4.10.10003291152340.28879-100000@sundial>

On Wed, 29 Mar 2000, Peter Funk wrote:

> Hi!
> 
> Moshe Zadka:
> > eagerly-waiting-for-UserString.py-ly y'rs, Z.
> 
> Well, I've added the missing methods.  Unfortunately I ran out of time now and
> a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still 
> missing.  

Great work, Peter! I really like UserString. However, I have two issues
with MutableString:

1. I tshouldn't share implementation with UserString, otherwise your
algorithm are not behaving with correct big-O properties. It should
probably use a char-array (from the array module) as the internal
representation.

2. It shouldn't share interface iwth UserString, since it doesn't have a
proper implementation with __hash__.


All in all, I probably disagree with making MutableString a subclass of
UserString. If I have time later today, I'm hoping to be able to make my
own MutableString


From pf@artcom-gmbh.de  Wed Mar 29 10:35:32 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 12:35:32 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <Pine.GSO.4.10.10003291152340.28879-100000@sundial> from Moshe Zadka at "Mar 29, 2000 11:59:47 am"
Message-ID: <m12aFoy-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> > Moshe Zadka:
> > > eagerly-waiting-for-UserString.py-ly y'rs, Z.
> > 
> On Wed, 29 Mar 2000, Peter Funk wrote:
> > Well, I've added the missing methods.  Unfortunately I ran out of time now and
> > a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still 
> > missing.  
> 
Moshe Zadka schrieb:
> Great work, Peter! I really like UserString. However, I have two issues
> with MutableString:
> 
> 1. I tshouldn't share implementation with UserString, otherwise your
> algorithm are not behaving with correct big-O properties. It should
> probably use a char-array (from the array module) as the internal
> representation.

Hmm.... I don't understand what you mean with 'big-O properties'.  
The internal representation of any object should be considered ...
umm ... internal.

> 2. It shouldn't share interface iwth UserString, since it doesn't have a
> proper implementation with __hash__.

What's wrong with my implementation of __hash__ raising a TypeError with
the attribution 'unhashable object'.  This is the same behaviour, if 
you try to add some other mutable object as key to dictionary:

>>> l = []
>>> d = { l : 'foo' }
Traceback (innermost last):
  File "<stdin>", line 1, in ?
TypeError: unhashable type

> All in all, I probably disagree with making MutableString a subclass of
> UserString. If I have time later today, I'm hoping to be able to make my
> own MutableString

As I tried to point out in the docstring of 'MutableString', I don't want 
people actually start using the 'MutableString' class.  My Intentation 
was to prevent people from trying to invent their own and than probably 
wrong MutableString class derived from UserString.  Only Newbies will really
ever need mutable strings in Python (see FAQ).

May be my 'MutableString' idea belongs somewhere into 
the to be written src/Doc/libuserstring.tex.  But since Newbies tend
to ignore docs ... Sigh.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From gmcm@hypernet.com  Wed Mar 29 11:07:20 2000
From: gmcm@hypernet.com (Gordon McMillan)
Date: Wed, 29 Mar 2000 06:07:20 -0500
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.GSO.4.10.10003290729530.20524-100000@sundial>
References: <1257835425-27941123@hypernet.com>
Message-ID: <1257794452-30405909@hypernet.com>

Moshe Zadka wrote:

> On Tue, 28 Mar 2000, Gordon McMillan wrote:
> 
> > What would happen if he (and everyone else) installed 
> > themselves *into* my core packages, then I decided I didn't 
> > want his stuff? More than likely I'd have to scrub the damn 
> > installation and start all over again.
> 
> I think Greg Stein answered that objection, by reminding us that the
> filesystem isn't the only way to set up a package hierarchy.

You mean when Greg said:
>Assuming that you use an archive like those found in my "small" distro or
> Gordon's distro, then this is no problem. The archive simply recognizes
> and maps "text.encoding.macbinary" to its own module.

I don't know what this has to do with it. When we get around 
to the 'macbinary' part, we have already established that 
'text.encoding' is the parent which should supply 'macbinary'.

>  In
> particular, even with Python's current module system, there is no need to
> scrub installations: Python core modules go (under UNIX) in
> /usr/local/lib/python1.5, and 3rd party modules go in
> /usr/local/lib/python1.5/site-packages. 

And if there's a /usr/local/lib/python1.5/text/encoding, there's 
no way that /usr/local/lib/python1.5/site-
packages/text/encoding will get searched.

I believe you could hack up an importer that did allow this, and 
I think you'd be 100% certifiable if you did. Just look at the 
surprise factor.

Hacking stuff into another package is just as evil as math.pi = 
42.

> Anyway, I already expressed my preference of the Perl way, over the Java
> way. For one thing, I don't want to have to register a domain just so I
> could distribute Python code <wink>

I haven't the foggiest what the "Perl way" is; I wouldn't be 
surprised if it relied on un-Pythonic sociological factors. I 
already said the Java mechanics are silly; uniqueness is what 
matters. When Python packages start selling in the four and 
five figure range <snort>, then a registry mechanism will likely 
be necessary.

- Gordon


From Moshe Zadka <mzadka@geocities.com>  Wed Mar 29 11:21:09 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Wed, 29 Mar 2000 13:21:09 +0200 (IST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString
 class?
In-Reply-To: <m12aFoy-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.GSO.4.10.10003291316170.2448-100000@sundial>

On Wed, 29 Mar 2000, Peter Funk wrote:

> > 1. I tshouldn't share implementation with UserString, otherwise your
> > algorithm are not behaving with correct big-O properties. It should
> > probably use a char-array (from the array module) as the internal
> > representation.
> 
> Hmm.... I don't understand what you mean with 'big-O properties'.  
> The internal representation of any object should be considered ...
> umm ... internal.

Yes, but
s[0] = 'a'

Should take O(1) time, not O(len(s))

> > 2. It shouldn't share interface iwth UserString, since it doesn't have a
> > proper implementation with __hash__.
> 
> What's wrong with my implementation of __hash__ raising a TypeError with
> the attribution 'unhashable object'. 

A subtype shouldn't change contracts of its supertypes. hash() was
implicitly contracted as "raising no exceptions".


--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From Moshe Zadka <mzadka@geocities.com>  Wed Mar 29 11:30:59 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Wed, 29 Mar 2000 13:30:59 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <1257794452-30405909@hypernet.com>
Message-ID: <Pine.GSO.4.10.10003291325270.2448-100000@sundial>

On Wed, 29 Mar 2000, Gordon McMillan wrote:

> And if there's a /usr/local/lib/python1.5/text/encoding, there's 
> no way that /usr/local/lib/python1.5/site-
> packages/text/encoding will get searched.

Oh my god! I just realized you're right. Well, back to the drawing board.

> I haven't the foggiest what the "Perl way" is; I wouldn't be 
> surprised if it relied on un-Pythonic sociological factors. 

No, it relies on non-Pythonic (but not unpythonic -- simply different)
technical choices.

> I 
> already said the Java mechanics are silly; uniqueness is what 
> matters. 

As in all things namespacish ;-)

Though I suspect a registry will be needed much sooner.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From guido@python.org  Wed Mar 29 12:26:56 2000
From: guido@python.org (Guido van Rossum)
Date: Wed, 29 Mar 2000 07:26:56 -0500
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: Your message of "Wed, 29 Mar 2000 02:21:29 EST."
 <20000329022129.A15539@thyrsus.com>
References: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>
 <20000329022129.A15539@thyrsus.com>
Message-ID: <200003291226.HAA18216@eric.cnri.reston.va.us>

> Moshe Zadka <moshez@math.huji.ac.il>:
> > -- _tkinter now uses the object, rather then string, interface to Tcl.

Eric Raymond:
> Hm, does this mean that the annoying requirement to do explicit gets and
> sets to move data between the Python world and the Tcl/Tk world is gone?

Not sure what you are referring to -- this should be completely
transparant to Python/Tkinter users.  If you are thinking of the way
Tcl variables are created and manipulated in Python, no, this doesn't
change, alas (Tcl variables aren't objects -- they are manipulated
through get and set commands. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Mar 29 12:32:16 2000
From: guido@python.org (Guido van Rossum)
Date: Wed, 29 Mar 2000 07:32:16 -0500
Subject: [Python-Dev] Great Renaming? What is the goal?
In-Reply-To: Your message of "Wed, 29 Mar 2000 11:34:21 +0200."
 <38E1CE1D.7899B1BC@lemburg.com>
References: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de> <1257835425-27941123@hypernet.com>
 <38E1CE1D.7899B1BC@lemburg.com>
Message-ID: <200003291232.HAA18234@eric.cnri.reston.va.us>

> > Marc-Andre grabbed "mx". If (as I rather suspect <wink>) he
> > wants to remake the entire standard lib in his image, he's
> > welcome to - *under* mx.
> 
> Right, that's the way I see it too. BTW, where can I register
> the "mx" top-level package name ? Should these be registered
> in the NIST registry ? Will the names registered there be
> honored ?

I think the NIST registry is a failed experiment -- too cumbersome to
maintain or consult.  We can do this the same way as common law
handles trade marks: if you have used it as your brand name long
enough, even if you didn't register, someone else cannot grab it away
from you.

> > What would happen if he (and everyone else) installed
> > themselves *into* my core packages, then I decided I didn't
> > want his stuff? More than likely I'd have to scrub the damn
> > installation and start all over again.
> 
> That's a no-no, IMHO. Unless explicitly allowed, packages
> should *not* install themselves as subpackages to other
> existing top-level packages. If they do, its their problem
> if the hierarchy changes...

Agreed.  Although some people seem to *want* this.  Probably because
it's okay to do that in Java and (apparently?) in Perl.  And C++,
probably.  It all probably stems back to Lisp.  I admit that I didn't
see this subtlety when I designed Python's package architecture.  It's
too late to change (e.g. because of __init__.py).  Is it a problem
though?  Let's be open-minded about this and think about whether we
want to allow this or not, and why...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Mar 29 12:35:33 2000
From: guido@python.org (Guido van Rossum)
Date: Wed, 29 Mar 2000 07:35:33 -0500
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: Your message of "Wed, 29 Mar 2000 13:21:09 +0200."
 <Pine.GSO.4.10.10003291316170.2448-100000@sundial>
References: <Pine.GSO.4.10.10003291316170.2448-100000@sundial>
Message-ID: <200003291235.HAA18249@eric.cnri.reston.va.us>

> > What's wrong with my implementation of __hash__ raising a TypeError with
> > the attribution 'unhashable object'. 
> 
> A subtype shouldn't change contracts of its supertypes. hash() was
> implicitly contracted as "raising no exceptions".

Let's not confuse subtypes and subclasses.  One of the things implicit
in the discussion on types-sig is that not every subclass is a
subtype!  Yes, this violates something we all learned from C++ -- but
it's a great insight.  No time to explain it more, but for me, Peter's
subclassing UserString for MutableString to borrow implementation is
fine.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pf@artcom-gmbh.de  Wed Mar 29 13:49:24 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 15:49:24 +0200 (MEST)
Subject: [Python-Dev] NIST Registry (was Great Renaming? What is the goal?)
In-Reply-To: <200003291232.HAA18234@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 29, 2000  7:32:16 am"
Message-ID: <m12aIqa-000CpwC@artcom0.artcom-gmbh.de>

Hi!

Guido van Rossum:
> I think the NIST registry is a failed experiment -- too cumbersome to
> maintain or consult.  

The WEB frontend of the NIST registry is not that bad --- if you are
even aware of the fact, that such a beast exists!

I use Python since 1994 and discovered the NIST registry incidental
a few weeks ago, when I was really looking for something about the
Win32 registry and used the search engine on www.python.org.
My first thought was: What a neat clever idea!

I think this is an example how the Python community suffers from 
poor advertising of good ideas.

> We can do this the same way as common law
> handles trade marks: if you have used it as your brand name long
> enough, even if you didn't register, someone else cannot grab it away
> from you.

Okay.  But a more formal registry wouldn't hurt.  Something like the
global module index from the current docs supplemented with all 
contribution modules which can be currently found a www.vex.net would
be a useful resource.

Regards, Peter


From Moshe Zadka <mzadka@geocities.com>  Wed Mar 29 14:15:36 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Wed, 29 Mar 2000 16:15:36 +0200 (IST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString
 class?
In-Reply-To: <200003291235.HAA18249@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003291614360.2448-100000@sundial>

On Wed, 29 Mar 2000, Guido van Rossum wrote:

> Let's not confuse subtypes and subclasses.  One of the things implicit
> in the discussion on types-sig is that not every subclass is a
> subtype!  Yes, this violates something we all learned from C++ -- but
> it's a great insight.  No time to explain it more, but for me, Peter's
> subclassing UserString for MutableString to borrow implementation is
> fine.

Oh, I agree with this. An earlier argument which got snipped in the
discussion is why it's a bad idea to borrow implementation (a totally
different argument)

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From fdrake@acm.org  Wed Mar 29 16:02:13 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 29 Mar 2000 11:02:13 -0500 (EST)
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>
References: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>
Message-ID: <14562.10501.726637.335088@seahag.cnri.reston.va.us>

Moshe Zadka writes:
 > -- filecmp.py (supersedes the old cmp.py and dircmp.py modules),
 > -- tabnanny.py (make sure the source file doesn't assume a specific tab-width)

  Weren't these in 1.5.2?  I think filecmp is documented in the
released docs... ah, no, I'm safe.  ;)

 > Please e-mail me personally if you think of any other changes, and I'll 
 > try to integrate them into a complete "changes" document.

  The documentation is updated.  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From skip@mojam.com (Skip Montanaro)  Wed Mar 29 16:57:51 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Wed, 29 Mar 2000 10:57:51 -0600
Subject: [Python-Dev] CVS woes...
Message-ID: <200003291657.KAA22177@beluga.mojam.com>

Does anyone else besides me have trouble getting their Python tree to sync
with the CVS repository?  I've tried all manner of flags to "cvs update",
most recently "cvs update -d -A ." with no success.  There are still some
files I know Fred Drake has patched that show up as different and it refuses 
to pick up Lib/robotparser.py.

I'm going to blast my current tree and start anew after saving one or two
necessary files.  Any thoughts you might have would be much appreciated.

(Private emails please, unless for some reason you think this should be a
python-dev topic.  I only post here because I suspect most of the readers
use CVS to keep in frequent sync and may have some insight.)

Thx,

-- 
Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/


From Moshe Zadka <mzadka@geocities.com>  Wed Mar 29 17:06:59 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Wed, 29 Mar 2000 19:06:59 +0200 (IST)
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: <14562.10501.726637.335088@seahag.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003291905430.11398-100000@sundial>

On Wed, 29 Mar 2000, Fred L. Drake, Jr. wrote:

> 
> Moshe Zadka writes:
>  > -- filecmp.py (supersedes the old cmp.py and dircmp.py modules),
>  > -- tabnanny.py (make sure the source file doesn't assume a specific tab-width)
> 
>   Weren't these in 1.5.2?  I think filecmp is documented in the
> released docs... ah, no, I'm safe.  ;)

Tabnanny wasn't a module, and filecmp wasn't at all.

>   The documentation is updated.  ;)

Yes, but it was released as a late part of 1.5.2.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From Fredrik Lundh" <effbot@telia.com  Wed Mar 29 16:38:00 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Wed, 29 Mar 2000 18:38:00 +0200
Subject: [Python-Dev] CVS woes...
References: <200003291657.KAA22177@beluga.mojam.com>
Message-ID: <01b701bf999d$267b6740$34aab5d4@hagrid>

Skip wrote:
> Does anyone else besides me have trouble getting their Python tree to =
sync
> with the CVS repository?  I've tried all manner of flags to "cvs =
update",
> most recently "cvs update -d -A ." with no success.  There are still =
some
> files I know Fred Drake has patched that show up as different and it =
refuses=20
> to pick up Lib/robotparser.py.

note that robotparser doesn't show up on cvs.python.org
either.  maybe cnri's cvs admins should look into this...

</F>


From fdrake@acm.org  Wed Mar 29 18:20:14 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 29 Mar 2000 13:20:14 -0500 (EST)
Subject: [Python-Dev] CVS woes...
In-Reply-To: <200003291657.KAA22177@beluga.mojam.com>
References: <200003291657.KAA22177@beluga.mojam.com>
Message-ID: <14562.18782.465814.696099@seahag.cnri.reston.va.us>

Skip Montanaro writes:
 > most recently "cvs update -d -A ." with no success.  There are still some
 > files I know Fred Drake has patched that show up as different and it refuses

  You should be aware that many of the more recent documentation
patches have been in the 1.5.2p2 branch (release-1.5.2p1-patches, I
think), rather than the development head.  I'm hoping to begin the
merge in the next week.
  I also have a few patches that I haven't had time to look at yet,
and I'm not inclined to make any changes until I've merged the 1.5.2p2
docs with the 1.6 tree, mostly to keep the merge from being any more
painful than I already expect it to be.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From bwarsaw@cnri.reston.va.us  Wed Mar 29 18:22:57 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Wed, 29 Mar 2000 13:22:57 -0500 (EST)
Subject: [Python-Dev] CVS woes...
References: <200003291657.KAA22177@beluga.mojam.com>
 <01b701bf999d$267b6740$34aab5d4@hagrid>
Message-ID: <14562.18945.407398.812930@anthem.cnri.reston.va.us>

>>>>> "FL" == Fredrik Lundh <effbot@telia.com> writes:

    FL> note that robotparser doesn't show up on cvs.python.org
    FL> either.  maybe cnri's cvs admins should look into this...

I've just resync'd python/dist and am doing a fresh checkout now.
Looks like Lib/robotparser.py is there now.

-Barry


From guido@python.org  Wed Mar 29 18:23:38 2000
From: guido@python.org (Guido van Rossum)
Date: Wed, 29 Mar 2000 13:23:38 -0500
Subject: [Python-Dev] CVS woes...
In-Reply-To: Your message of "Wed, 29 Mar 2000 10:57:51 CST."
 <200003291657.KAA22177@beluga.mojam.com>
References: <200003291657.KAA22177@beluga.mojam.com>
Message-ID: <200003291823.NAA20134@eric.cnri.reston.va.us>

> Does anyone else besides me have trouble getting their Python tree to sync
> with the CVS repository?  I've tried all manner of flags to "cvs update",
> most recently "cvs update -d -A ." with no success.  There are still some
> files I know Fred Drake has patched that show up as different and it refuses 
> to pick up Lib/robotparser.py.

My bad.  When I move or copy a file around in the CVS repository
directly instead of using cvs commit, I have to manually call a script
that updates the mirror.  I've done that now, and robotparser.py
should now be in the mirror.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward@cnri.reston.va.us  Wed Mar 29 19:06:14 2000
From: gward@cnri.reston.va.us (Greg Ward)
Date: Wed, 29 Mar 2000 14:06:14 -0500
Subject: [Python-Dev] Distutils now in Python CVS tree
Message-ID: <20000329140613.A5850@cnri.reston.va.us>

Hi all --

Distutils is now available through the Python CVS tree *in addition to
its own CVS tree*.  That is, if you keep on top of developments in the
Python CVS tree, then you will be tracking the latest Distutils code in
Lib/distutils.  Or, you can keep following the Distutils through its own
CVS tree.  (This is all done through one itty-bitty little symlink in
the CNRI CVS repository, and It Just Works.  Cool.)

Note that only the 'distutils' subdirectory of the distutils
distribution is tracked by Python: that is, changes to the
documentation, test suites, and example setup scripts are *not*
reflected in the Python CVS tree.

If you follow neither Python nor Distutils CVS updates, this doesn't
affect you.

If you've been following Distutils CVS updates, you can continue to do so
as you've always done (and as is documented on the Distutils "Anonymous
CVS" web page).

If you've been following Python CVS updates, then you are now following
most Distutils CVS updates too -- as long as you do "cvs update -d", of
course.  If you're interested in following updates in the Distutils
documentation, tests, examples, etc. then you should follow the
Distutils CVS tree directly.

If you've been following *both* Python and Distutils CVS updates, and
hacking on the Distutils, then you should pick one or the other as your
working directory.  If you submit patches, it doesn't really matter if
they're relative to the top of the Python tree, the top of the Distutils
tree, or what -- I'll probably figure it out.  However, it's probably
best to continue sending Distutils patches to distutils-sig@python.org,
*or* direct to me (gward@python.net) for trivial patches.  Unless Guido
says otherwise, I don't see a compelling reason to send Distutils
patches to patches@python.org.

In related news, the distutils-checkins list is probably going to go
away, and all Distutils checkin messages will go python-checkins
instead.  Let me know if you avidly follow distutils-checkins, but do
*not* want to follow python-checkins -- if lots of people respond
(doubtful, as distutils-checkins only had 3 subscribers last I
checked!), we'll reconsider.

        Greg


From fdrake@acm.org  Wed Mar 29 19:28:19 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 29 Mar 2000 14:28:19 -0500 (EST)
Subject: [Python-Dev] Re: [Distutils] Distutils now in Python CVS tree
In-Reply-To: <20000329140525.A5842@cnri.reston.va.us>
References: <20000329140525.A5842@cnri.reston.va.us>
Message-ID: <14562.22867.998809.897214@seahag.cnri.reston.va.us>

Greg Ward writes:
 > Distutils is now available through the Python CVS tree *in addition to
 > its own CVS tree*.  That is, if you keep on top of developments in the
 > Python CVS tree, then you will be tracking the latest Distutils code in
 > Lib/distutils.  Or, you can keep following the Distutils through its own
 > CVS tree.  (This is all done through one itty-bitty little symlink in
 > the CNRI CVS repository, and It Just Works.  Cool.)

Greg,
  You may want to point out the legalese requirements for patches to
the Python tree.  ;(  That means the patches should probably go to
patches@python.org or you should ensure an archive of all the legal
statements is maintained at CNRI.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From ping@lfw.org  Wed Mar 29 21:44:31 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Wed, 29 Mar 2000 15:44:31 -0600 (CST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <02c901bf989b$be203d80$34aab5d4@hagrid>
Message-ID: <Pine.LNX.4.10.10003291539340.7351-100000@server1.lfw.org>

On Tue, 28 Mar 2000, Fredrik Lundh wrote:
> 
> > IMO this subdivision could be discussed and possibly revised.  
> 
> here's one proposal:
> http://www.pythonware.com/people/fredrik/librarybook-contents.htm

Wow.  I don't think i hardly ever use any of the modules in your
"Commonly Used Modules" category.  Except traceback, from time to
time, but that's really the only one!

Hmm.  I'd arrange things a little differently, though i do like
the category for Data Representation (it should probably go next
to Data Storage though).  I would prefer a separate group for
interpreter-and-development-related things.  The "File Formats"
group seems weak... to me, its contents would better belong in a
"parsing" or "text processing" classification.

urlparse definitely goes with urllib.

These comments are kind of random, i know... maybe i'll try
putting together another grouping if i have any time.


-- ?!ng


From adustman@comstar.net  Thu Mar 30 00:57:06 2000
From: adustman@comstar.net (Andy Dustman)
Date: Wed, 29 Mar 2000 19:57:06 -0500 (EST)
Subject: [Python-Dev] socketmodule with SSL enabled
In-Reply-To: <200003290150.UAA17819@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003291952110.20418-100000@kenny.comstar.net>

I had to make the following one-line change to socketmodule.c so that it
would link properly with openssl-0.9.4. In studying the openssl include
files, I found:

#define SSLeay_add_ssl_algorithms()   SSL_library_init()

SSL_library_init() seems to be the "correct" call nowadays. I don't know
why this isn't being picked up. I also don't know how well the module
works, other than it imports, but I sure would like to try it with
Zope/ZServer/Medusa...

-- 
andy dustman       |     programmer/analyst     |      comstar.net, inc.
telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d
"Therefore, sweet knights, if you may doubt your strength or courage, 
come no further, for death awaits you all, with nasty, big, pointy teeth!"

Index: socketmodule.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Modules/socketmodule.c,v
retrieving revision 1.98
diff -c -r1.98 socketmodule.c
*** socketmodule.c      2000/03/24 20:56:56     1.98
--- socketmodule.c      2000/03/30 00:49:09
***************
*** 2384,2390 ****
                return;
  #ifdef USE_SSL
        SSL_load_error_strings();
!       SSLeay_add_ssl_algorithms();
        SSLErrorObject = PyErr_NewException("socket.sslerror", NULL, NULL);
        if (SSLErrorObject == NULL)
                return;
--- 2384,2390 ----
                return;
  #ifdef USE_SSL
        SSL_load_error_strings();
!       SSL_library_init();
        SSLErrorObject = PyErr_NewException("socket.sslerror", NULL, NULL);
        if (SSLErrorObject == NULL)
                return;


From gstein@lyra.org  Thu Mar 30 02:54:27 2000
From: gstein@lyra.org (Greg Stein)
Date: Wed, 29 Mar 2000 18:54:27 -0800 (PST)
Subject: [Python-Dev] installation points (was: Great Renaming?  What is the goal?)
In-Reply-To: <1257794452-30405909@hypernet.com>
Message-ID: <Pine.LNX.4.10.10003291832350.8823-100000@nebula.lyra.org>

On Wed, 29 Mar 2000, Gordon McMillan wrote:
> Moshe Zadka wrote:
> > On Tue, 28 Mar 2000, Gordon McMillan wrote:
> > > What would happen if he (and everyone else) installed 
> > > themselves *into* my core packages, then I decided I didn't 
> > > want his stuff? More than likely I'd have to scrub the damn 
> > > installation and start all over again.
> > 
> > I think Greg Stein answered that objection, by reminding us that the
> > filesystem isn't the only way to set up a package hierarchy.
> 
> You mean when Greg said:
> >Assuming that you use an archive like those found in my "small" distro or
> > Gordon's distro, then this is no problem. The archive simply recognizes
> > and maps "text.encoding.macbinary" to its own module.
> 
> I don't know what this has to do with it. When we get around 
> to the 'macbinary' part, we have already established that 
> 'text.encoding' is the parent which should supply 'macbinary'.

good point...

> >  In
> > particular, even with Python's current module system, there is no need to
> > scrub installations: Python core modules go (under UNIX) in
> > /usr/local/lib/python1.5, and 3rd party modules go in
> > /usr/local/lib/python1.5/site-packages. 
> 
> And if there's a /usr/local/lib/python1.5/text/encoding, there's 
> no way that /usr/local/lib/python1.5/site-
> packages/text/encoding will get searched.
> 
> I believe you could hack up an importer that did allow this, and 
> I think you'd be 100% certifiable if you did. Just look at the 
> surprise factor.
> 
> Hacking stuff into another package is just as evil as math.pi = 
> 42.

Not if the package was designed for it. For a "package" like "net", it
would be perfectly acceptable to allow third-parties to define that as
their installation point.

And yes, assume there is an importer that looks into the installed
archives for modules. In the example, the harder part is determining where
the "text.encoding" package is loaded from. And yah: it may be difficult
to arrange the the text.encoding's importer to allow for archive
searching.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From thomas.heller@ion-tof.com  Thu Mar 30 19:30:25 2000
From: thomas.heller@ion-tof.com (Thomas Heller)
Date: Thu, 30 Mar 2000 21:30:25 +0200
Subject: [Python-Dev] Metaclasses, customizing attribute access for classes
Message-ID: <021c01bf9a7e$662327c0$4500a8c0@thomasnotebook>

Dear Python-developers,

Recently I played with metaclasses from within python,
also with Jim Fulton's ExtensionClass.
I even tried to write my own metaclass in a C-extension, using the
famous Don Beaudry hook.
It seems that ExtensionClass does not completely what I want.
Metaclasses implemented in python are somewhat slow,
also writing them is a lot of work.
Writing a metaclass in C is even more work...

Well, what do I want?

Often, I use the following pattern:
class X:
    def __init__ (self):
        self.delegate = anObjectImplementedInC(...)

    def __getattr__ (self, key):
        return self.delegate.dosomething(key)

    def __setattr__ (self, key, value):
        self.delegate.doanotherthing(key, value)

    def __delattr__ (self, key):
        self.delegate.doevenmore(key)

This is too slow (for me).
So what I would like do to is:

class X:
    def __init__ (self):
        self.__dict__ = aMappingObject(...)

and now aMappingObject will automatically receive
all the setattr, getattr, and delattr calls.

The *only* thing which is required for this is to remove
the restriction that the __dict__ attribute must be a dictionary.
This is only a small change to classobject.c (which unfortunately I
have only implemented for 1.5.2, not for the CVS version).
The performance impact for this change is unnoticable in pystone.

What do you think?
Should I prepare a patch?
Any chance that this can be included in a future python version?

Thomas Heller


From petrilli@amber.org  Thu Mar 30 19:52:02 2000
From: petrilli@amber.org (Christopher Petrilli)
Date: Thu, 30 Mar 2000 14:52:02 -0500
Subject: [Python-Dev] Unicode compile
Message-ID: <20000330145202.B9078@trump.amber.org>

I don't know how much memory other people have in their machiens, but
in this machine (128Mb), I get the following trying to compile a CVS
checkout of Python:

gcc  -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c
./unicodedatabase.c:53482: virtual memory exhausted

I hope that this is a temporary thing, or we ship the database some
other manner, but I would argue that you should be able to compile
Python on a machine with 32Mb of RAM at MOST.... for an idea of how
much VM this machine has, i have 256Mb of SWAP on top of it.

Chris
-- 
| Christopher Petrilli
| petrilli@amber.org


From guido@python.org  Thu Mar 30 20:12:22 2000
From: guido@python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 15:12:22 -0500
Subject: [Python-Dev] Unicode compile
In-Reply-To: Your message of "Thu, 30 Mar 2000 14:52:02 EST."
 <20000330145202.B9078@trump.amber.org>
References: <20000330145202.B9078@trump.amber.org>
Message-ID: <200003302012.PAA22062@eric.cnri.reston.va.us>

> I don't know how much memory other people have in their machiens, but
> in this machine (128Mb), I get the following trying to compile a CVS
> checkout of Python:
> 
> gcc  -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c
> ./unicodedatabase.c:53482: virtual memory exhausted
> 
> I hope that this is a temporary thing, or we ship the database some
> other manner, but I would argue that you should be able to compile
> Python on a machine with 32Mb of RAM at MOST.... for an idea of how
> much VM this machine has, i have 256Mb of SWAP on top of it.

I'm not sure how to fix this, short of reading the main database from
a file.  Marc-Andre?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tismer@tismer.com  Thu Mar 30 20:14:55 2000
From: tismer@tismer.com (Christian Tismer)
Date: Thu, 30 Mar 2000 22:14:55 +0200
Subject: [Python-Dev] Unicode compile
References: <20000330145202.B9078@trump.amber.org>
Message-ID: <38E3B5BF.2D00F930@tismer.com>


Christopher Petrilli wrote:
> 
> I don't know how much memory other people have in their machiens, but
> in this machine (128Mb), I get the following trying to compile a CVS
> checkout of Python:
> 
> gcc  -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c
> ./unicodedatabase.c:53482: virtual memory exhausted
> 
> I hope that this is a temporary thing, or we ship the database some
> other manner, but I would argue that you should be able to compile
> Python on a machine with 32Mb of RAM at MOST.... for an idea of how
> much VM this machine has, i have 256Mb of SWAP on top of it.

I had similar effects, what made me work on a compressed database
(see older messages). Due to time limits, I will not get ready
before 1.6.a1 is out. And then quite a lot of other changes
will be necessary by Marc, since the API changes quite much.
But it will definately be a less than 20 KB module, proven.

ciao - chris(2)

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From akuchlin@mems-exchange.org  Thu Mar 30 20:14:27 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 15:14:27 -0500 (EST)
Subject: [Python-Dev] Unicode compile
In-Reply-To: <200003302012.PAA22062@eric.cnri.reston.va.us>
References: <20000330145202.B9078@trump.amber.org>
 <200003302012.PAA22062@eric.cnri.reston.va.us>
Message-ID: <14563.46499.555853.413690@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>I'm not sure how to fix this, short of reading the main database from
>a file.  Marc-Andre?

Turning off optimization may help.  (Or it may not -- it might be
creating the data structures for a large static table that's the
problem.)

--amk


From akuchlin@mems-exchange.org  Thu Mar 30 20:22:02 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 15:22:02 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <200003282000.PAA11988@eric.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
 <200003282000.PAA11988@eric.cnri.reston.va.us>
Message-ID: <14563.46954.70800.706245@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>I don't know enough about this, but it seems that there might be two
>steps: *creating* a mmap object is necessarily platform-specific; but
>*using* a mmap object could be platform-neutral.
>
>What is the API for mmap objects?

You create them; Unix wants a file descriptor, and Windows wants a
filename.  Then they behave like buffer objects, like mutable strings.

I like Fredrik's suggestion of an 'open(filename, mode, ...)' type of
interface.  If someone can suggest a way to handle the extra flags
such as MAP_SHARED and the Windows tag argument, I'll happily
implement it.  Maybe just keyword arguments that differ across
platforms?  open(filename, mode, [tag = 'foo',] [flags =
mmapfile.MAP_SHARED]).  We could preserve the ability to mmap() only a
file descriptor on Unix through a separate openfd() function.  I'm
also strongly tempted to rename the module from mmapfile to just
'mmap'.

I'd suggest waiting until the interface is finalized before adding the
module to the CVS tree -- which means after 1.6a1 -- but I can add the
module as it stands if you like.  Guido, let me know if you want me to
do that.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
A Puck is harder by far to hurt than some little lord of malice from the lands
of ice and snow. We Pucks are old and hard and wild...
  -- Robin Goodfellow, in SANDMAN #66: "The Kindly Ones:10"


From guido@python.org  Thu Mar 30 20:23:42 2000
From: guido@python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 15:23:42 -0500
Subject: [Python-Dev] socketmodule with SSL enabled
In-Reply-To: Your message of "Wed, 29 Mar 2000 19:57:06 EST."
 <Pine.LNX.4.10.10003291952110.20418-100000@kenny.comstar.net>
References: <Pine.LNX.4.10.10003291952110.20418-100000@kenny.comstar.net>
Message-ID: <200003302023.PAA22350@eric.cnri.reston.va.us>

> I had to make the following one-line change to socketmodule.c so that it
> would link properly with openssl-0.9.4. In studying the openssl include
> files, I found:
> 
> #define SSLeay_add_ssl_algorithms()   SSL_library_init()
> 
> SSL_library_init() seems to be the "correct" call nowadays. I don't know
> why this isn't being picked up. I also don't know how well the module
> works, other than it imports, but I sure would like to try it with
> Zope/ZServer/Medusa...

Strange -- the version of OpenSSL I have also calls itself 0.9.4
("OpenSSL 0.9.4 09 Aug 1999" to be precise) and doesn't have
SSL_library_init().

I wonder what gives...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Thu Mar 30 20:25:58 2000
From: guido@python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 15:25:58 -0500
Subject: [Python-Dev] mmapfile module
In-Reply-To: Your message of "Thu, 30 Mar 2000 15:22:02 EST."
 <14563.46954.70800.706245@amarok.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us>
 <14563.46954.70800.706245@amarok.cnri.reston.va.us>
Message-ID: <200003302025.PAA22367@eric.cnri.reston.va.us>

> Guido van Rossum writes:
> >I don't know enough about this, but it seems that there might be two
> >steps: *creating* a mmap object is necessarily platform-specific; but
> >*using* a mmap object could be platform-neutral.
> >
> >What is the API for mmap objects?

[AMK]
> You create them; Unix wants a file descriptor, and Windows wants a
> filename.  Then they behave like buffer objects, like mutable strings.
> 
> I like Fredrik's suggestion of an 'open(filename, mode, ...)' type of
> interface.  If someone can suggest a way to handle the extra flags
> such as MAP_SHARED and the Windows tag argument, I'll happily
> implement it.  Maybe just keyword arguments that differ across
> platforms?  open(filename, mode, [tag = 'foo',] [flags =
> mmapfile.MAP_SHARED]).  We could preserve the ability to mmap() only a
> file descriptor on Unix through a separate openfd() function.

Yes, keyword args seem to be the way to go.  To avoid an extra
function you could add a fileno=... kwarg, in which case the filename
is ignored or required to be "".

> I'm
> also strongly tempted to rename the module from mmapfile to just
> 'mmap'.

Sure.

> I'd suggest waiting until the interface is finalized before adding the
> module to the CVS tree -- which means after 1.6a1 -- but I can add the
> module as it stands if you like.  Guido, let me know if you want me to
> do that.

Might as well check it in -- the alpha is going to be rough and I
expect another alpha to come out shortly to correct the biggest
problems.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Thu Mar 30 20:22:08 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 30 Mar 2000 22:22:08 +0200
Subject: [Python-Dev] Unicode compile
References: <20000330145202.B9078@trump.amber.org> <200003302012.PAA22062@eric.cnri.reston.va.us>
Message-ID: <38E3B770.6CD61C37@lemburg.com>

Guido van Rossum wrote:
> 
> > I don't know how much memory other people have in their machiens, but
> > in this machine (128Mb), I get the following trying to compile a CVS
> > checkout of Python:
> >
> > gcc  -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c
> > ./unicodedatabase.c:53482: virtual memory exhausted
> >
> > I hope that this is a temporary thing, or we ship the database some
> > other manner, but I would argue that you should be able to compile
> > Python on a machine with 32Mb of RAM at MOST.... for an idea of how
> > much VM this machine has, i have 256Mb of SWAP on top of it.
> 
> I'm not sure how to fix this, short of reading the main database from
> a file.  Marc-Andre?

Hmm, the file compiles fine on my 64MB Linux machine with about 100MB 
of swap. What gcc version do you use ?

Anyway, once Christian is ready with his compact
replacement I think we no longer have to worry about that
chunk of static data :-)

Reading in the data from a file is not a very good solution,
because it would override the OS optimizations for static
data in object files (like e.g. swapping in only those pages
which are really needed, etc.).

An alternative solution would be breaking the large
table into several smaller ones and accessing it via
a redirection function.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From adustman@comstar.net  Thu Mar 30 21:12:51 2000
From: adustman@comstar.net (Andy Dustman)
Date: Thu, 30 Mar 2000 16:12:51 -0500 (EST)
Subject: [Python-Dev] socketmodule with SSL enabled
In-Reply-To: <200003302023.PAA22350@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003301611430.32616-100000@kenny.comstar.net>

On Thu, 30 Mar 2000, Guido van Rossum wrote:

> > I had to make the following one-line change to socketmodule.c so that it
> > would link properly with openssl-0.9.4. In studying the openssl include
> > files, I found:
> > 
> > #define SSLeay_add_ssl_algorithms()   SSL_library_init()
> > 
> > SSL_library_init() seems to be the "correct" call nowadays. I don't know
> > why this isn't being picked up. I also don't know how well the module
> > works, other than it imports, but I sure would like to try it with
> > Zope/ZServer/Medusa...
> 
> Strange -- the version of OpenSSL I have also calls itself 0.9.4
> ("OpenSSL 0.9.4 09 Aug 1999" to be precise) and doesn't have
> SSL_library_init().
> 
> I wonder what gives...

I don't know. Right after I made the patch, I found that 0.9.5 is
available, and I was able to successfully compile against that version
(with the patch). 

-- 
andy dustman       |     programmer/analyst     |      comstar.net, inc.
telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d
"Therefore, sweet knights, if you may doubt your strength or courage, 
come no further, for death awaits you all, with nasty, big, pointy teeth!"


From akuchlin@mems-exchange.org  Thu Mar 30 21:19:45 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 16:19:45 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <200003302025.PAA22367@eric.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
 <200003282000.PAA11988@eric.cnri.reston.va.us>
 <14563.46954.70800.706245@amarok.cnri.reston.va.us>
 <200003302025.PAA22367@eric.cnri.reston.va.us>
Message-ID: <14563.50417.909045.81868@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>Might as well check it in -- the alpha is going to be rough and I
>expect another alpha to come out shortly to correct the biggest
>problems.

Done -- just doing my bit to ensure the first alpha is rough! :)

My next task is to add the Expat module.  My understanding is that
it's OK to add Expat itself, too; where should I put all that code?
Modules/expat/* ?

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
I'll bring the Kindly Ones down on his blasted head.
  -- Desire, in SANDMAN #31: "Three Septembers and a January"


From fdrake@acm.org  Thu Mar 30 21:29:58 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 30 Mar 2000 16:29:58 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <14563.50417.909045.81868@amarok.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
 <200003282000.PAA11988@eric.cnri.reston.va.us>
 <14563.46954.70800.706245@amarok.cnri.reston.va.us>
 <200003302025.PAA22367@eric.cnri.reston.va.us>
 <14563.50417.909045.81868@amarok.cnri.reston.va.us>
Message-ID: <14563.51030.24773.587972@seahag.cnri.reston.va.us>

Andrew M. Kuchling writes:
 > Done -- just doing my bit to ensure the first alpha is rough! :)
 > 
 > My next task is to add the Expat module.  My understanding is that
 > it's OK to add Expat itself, too; where should I put all that code?
 > Modules/expat/* ?

  Do you have documentation for this?


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From akuchlin@mems-exchange.org  Thu Mar 30 21:30:35 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 16:30:35 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <14563.51030.24773.587972@seahag.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
 <200003282000.PAA11988@eric.cnri.reston.va.us>
 <14563.46954.70800.706245@amarok.cnri.reston.va.us>
 <200003302025.PAA22367@eric.cnri.reston.va.us>
 <14563.50417.909045.81868@amarok.cnri.reston.va.us>
 <14563.51030.24773.587972@seahag.cnri.reston.va.us>
Message-ID: <14563.51067.560938.367690@amarok.cnri.reston.va.us>

Fred L. Drake, Jr. writes:
>  Do you have documentation for this?

Somewhere at home, I think, but not here at work.  I'll try to get it
checked in before 1.6alpha1, but don't hold me to that.

--amk


From guido@python.org  Thu Mar 30 21:31:58 2000
From: guido@python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 16:31:58 -0500
Subject: [Python-Dev] mmapfile module
In-Reply-To: Your message of "Thu, 30 Mar 2000 16:19:45 EST."
 <14563.50417.909045.81868@amarok.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us>
 <14563.50417.909045.81868@amarok.cnri.reston.va.us>
Message-ID: <200003302131.QAA22897@eric.cnri.reston.va.us>

> Done -- just doing my bit to ensure the first alpha is rough! :)

When the going gets rough, the rough get going :-)

> My next task is to add the Expat module.  My understanding is that
> it's OK to add Expat itself, too; where should I put all that code?
> Modules/expat/* ?

Whoa...  Not sure.  This will give issues with Patrice, at least (even
if it is pure Open Source -- given the size).  I'd prefer to add
instructions to Setup.in about where to get it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake@acm.org  Thu Mar 30 21:34:55 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 30 Mar 2000 16:34:55 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <14563.51067.560938.367690@amarok.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
 <200003282000.PAA11988@eric.cnri.reston.va.us>
 <14563.46954.70800.706245@amarok.cnri.reston.va.us>
 <200003302025.PAA22367@eric.cnri.reston.va.us>
 <14563.50417.909045.81868@amarok.cnri.reston.va.us>
 <14563.51030.24773.587972@seahag.cnri.reston.va.us>
 <14563.51067.560938.367690@amarok.cnri.reston.va.us>
Message-ID: <14563.51327.190466.477566@seahag.cnri.reston.va.us>

Andrew M. Kuchling writes:
 > Somewhere at home, I think, but not here at work.  I'll try to get it
 > checked in before 1.6alpha1, but don't hold me to that.

  The date isn't important; I'm not planning to match alpha/beta
releases with Doc releases.  I just want to be sure it gets in soon so
that the debugging process can kick in for that as well.  ;)
  Thanks!


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido@python.org  Thu Mar 30 21:34:02 2000
From: guido@python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 16:34:02 -0500
Subject: [Python-Dev] mmapfile module
In-Reply-To: Your message of "Thu, 30 Mar 2000 16:31:58 EST."
 <200003302131.QAA22897@eric.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us>
 <200003302131.QAA22897@eric.cnri.reston.va.us>
Message-ID: <200003302134.QAA22939@eric.cnri.reston.va.us>

> Whoa...  Not sure.  This will give issues with Patrice, at least (even
> if it is pure Open Source -- given the size).

For those outside CNRI -- Patrice is CNRI's tough IP lawyer.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From akuchlin@mems-exchange.org  Thu Mar 30 21:48:13 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 16:48:13 -0500 (EST)
Subject: [Python-Dev] Expat module
In-Reply-To: <200003302131.QAA22897@eric.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
 <200003282000.PAA11988@eric.cnri.reston.va.us>
 <14563.46954.70800.706245@amarok.cnri.reston.va.us>
 <200003302025.PAA22367@eric.cnri.reston.va.us>
 <14563.50417.909045.81868@amarok.cnri.reston.va.us>
 <200003302131.QAA22897@eric.cnri.reston.va.us>
Message-ID: <14563.52125.401817.986919@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>> My next task is to add the Expat module.  My understanding is that
>> it's OK to add Expat itself, too; where should I put all that code?
>> Modules/expat/* ?
>
>Whoa...  Not sure.  This will give issues with Patrice, at least (even
>if it is pure Open Source -- given the size).  I'd prefer to add
>instructions to Setup.in about where to get it.

Fair enough; I'll just add the module itself, then, and we can always
change it later.  

Should we consider replacing the makesetup/Setup.in mechanism with a
setup.py script that uses the Distutils?  You'd have to compile a
minipython with just enough critical modules -- strop and posixmodule
are probably the most important ones -- in order to run setup.py.
It's something I'd like to look at for 1.6, because then you could be
much smarter in automatically enabling modules.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
This is the way of Haskell or Design by Contract of Eiffel. This one is like
wearing a XV century armor, you walk very safely but in a very tiring way.
  -- Manuel Gutierrez Algaba, 26 Jan 2000


From guido@python.org  Thu Mar 30 22:41:45 2000
From: guido@python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 17:41:45 -0500
Subject: [Python-Dev] Expat module
In-Reply-To: Your message of "Thu, 30 Mar 2000 16:48:13 EST."
 <14563.52125.401817.986919@amarok.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us>
 <14563.52125.401817.986919@amarok.cnri.reston.va.us>
Message-ID: <200003302241.RAA23050@eric.cnri.reston.va.us>

> Fair enough; I'll just add the module itself, then, and we can always
> change it later.  

OK.

> Should we consider replacing the makesetup/Setup.in mechanism with a
> setup.py script that uses the Distutils?  You'd have to compile a
> minipython with just enough critical modules -- strop and posixmodule
> are probably the most important ones -- in order to run setup.py.
> It's something I'd like to look at for 1.6, because then you could be
> much smarter in automatically enabling modules.

If you can come up with something that works well enough, that would
be great.  (Although I'm not sure where the distutils come in.)

We still need to use configure/autoconf though.

Hardcoding a small complement of modules is no problem.  (Why do you
think you need strop though?  Remember we have string methods!)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond@skippinet.com.au  Thu Mar 30 23:03:39 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Fri, 31 Mar 2000 09:03:39 +1000
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/PC python_nt.rc,1.8,1.9
In-Reply-To: <200003302259.RAA23266@eric.cnri.reston.va.us>
Message-ID: <ECEPKNMJLHAPFFJHDOJBAEJICHAA.mhammond@skippinet.com.au>

This is the version number as displayed by Windows Explorer in the
"properties" dialog.

Mark.

> Modified Files:
> 	python_nt.rc
> Log Message:
> Seems there was a version string here that still looked
> like 1.5.2.
>
>
> Index: python_nt.rc
> ==========================================================
> =========
> RCS file: /projects/cvsroot/python/dist/src/PC/python_nt.rc,v
> retrieving revision 1.8
> retrieving revision 1.9
> diff -C2 -r1.8 -r1.9
> *** python_nt.rc	2000/03/29 01:50:50	1.8
> --- python_nt.rc	2000/03/30 22:59:09	1.9
> ***************
> *** 29,34 ****
>
>   VS_VERSION_INFO VERSIONINFO
> !  FILEVERSION 1,5,2,3
> !  PRODUCTVERSION 1,5,2,3
>    FILEFLAGSMASK 0x3fL
>   #ifdef _DEBUG
> --- 29,34 ----
>
>   VS_VERSION_INFO VERSIONINFO
> !  FILEVERSION 1,6,0,0
> !  PRODUCTVERSION 1,6,0,0
>    FILEFLAGSMASK 0x3fL
>   #ifdef _DEBUG
>
>
> _______________________________________________
> Python-checkins mailing list
> Python-checkins@python.org
> http://www.python.org/mailman/listinfo/python-checkins
>


From Fredrik Lundh" <effbot@telia.com  Thu Mar 30 22:40:51 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 00:40:51 +0200
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
Message-ID: <00b701bf9a99$022339c0$34aab5d4@hagrid>

at this time, SRE uses types instead of classes for compiled
patterns and matches.  these classes provide a documented
interface, and a bunch of internal attributes, for example:

RegexObjects:

    code -- a PCRE code object
    pattern -- the source pattern
    groupindex -- maps group names to group indices

MatchObjects:

    regs -- same as match.span()?
    groupindex -- as above
    re -- the pattern object used for this match
    string -- the target string used for this match

the problem is that some other modules use these attributes
directly.  for example, xmllib.py uses the pattern attribute, and
other code I've seen uses regs to speed things up.

in SRE, I would like to get rid of all these (except possibly for
the match.string attribute).

opinions?

</F>


From guido@python.org  Thu Mar 30 23:31:43 2000
From: guido@python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 18:31:43 -0500
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
In-Reply-To: Your message of "Fri, 31 Mar 2000 00:40:51 +0200."
 <00b701bf9a99$022339c0$34aab5d4@hagrid>
References: <00b701bf9a99$022339c0$34aab5d4@hagrid>
Message-ID: <200003302331.SAA24895@eric.cnri.reston.va.us>

> at this time, SRE uses types instead of classes for compiled
> patterns and matches.  these classes provide a documented
> interface, and a bunch of internal attributes, for example:
> 
> RegexObjects:
> 
>     code -- a PCRE code object
>     pattern -- the source pattern
>     groupindex -- maps group names to group indices
> 
> MatchObjects:
> 
>     regs -- same as match.span()?
>     groupindex -- as above
>     re -- the pattern object used for this match
>     string -- the target string used for this match
> 
> the problem is that some other modules use these attributes
> directly.  for example, xmllib.py uses the pattern attribute, and
> other code I've seen uses regs to speed things up.
> 
> in SRE, I would like to get rid of all these (except possibly for
> the match.string attribute).
> 
> opinions?

Sounds reasonable.  All std lib modules that violate this will need to
be fixed once sre.py replaces re.py.

(Checkin of sre is next.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From akuchlin@mems-exchange.org  Thu Mar 30 23:40:16 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 18:40:16 -0500 (EST)
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
In-Reply-To: <00b701bf9a99$022339c0$34aab5d4@hagrid>
References: <00b701bf9a99$022339c0$34aab5d4@hagrid>
Message-ID: <14563.58848.109072.339060@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>RegexObjects:
>    code -- a PCRE code object
>    pattern -- the source pattern
>    groupindex -- maps group names to group indices

pattern and groupindex are documented in the Library Reference, and
they're part of the public interface.  .code is not, so you can drop
it.

>MatchObjects:
>    regs -- same as match.span()?
>    groupindex -- as above
>    re -- the pattern object used for this match
>    string -- the target string used for this match

.re and .string are documented. I don't see a reference to
MatchObject.groupindex anywhere, and .regs isn't documented, so those
two can be ignored; xmllib or whatever external modules use them are
being very naughty, so go ahead and break them.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Imagine a thousand thousand fireflies of every shape and color; Oh, that was
Baghdad at night in those days.
  -- From SANDMAN #50: "Ramadan"


From Fredrik Lundh" <effbot@telia.com  Thu Mar 30 23:05:15 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 01:05:15 +0200
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
References: <00b701bf9a99$022339c0$34aab5d4@hagrid> <14563.58848.109072.339060@amarok.cnri.reston.va.us>
Message-ID: <00e901bf9a9c$6c036240$34aab5d4@hagrid>

Andrew wrote:
> >RegexObjects:
> >    code -- a PCRE code object
> >    pattern -- the source pattern
> >    groupindex -- maps group names to group indices
>=20
> pattern and groupindex are documented in the Library Reference, and
> they're part of the public interface.

hmm.  I could have sworn...   guess I didn't look carefully
enough (or someone's used his time machine again :-).

oh well, more bloat...

btw, "pattern" doesn't make much sense in SRE -- who says
the pattern object was created by re.compile?  guess I'll just
set it to None in other cases (e.g. sregex, sreverb, sgema...)

</F>


From bwarsaw@cnri.reston.va.us  Fri Mar 31 00:35:16 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 30 Mar 2000 19:35:16 -0500 (EST)
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
References: <00b701bf9a99$022339c0$34aab5d4@hagrid>
 <14563.58848.109072.339060@amarok.cnri.reston.va.us>
 <00e901bf9a9c$6c036240$34aab5d4@hagrid>
Message-ID: <14563.62148.860971.360871@anthem.cnri.reston.va.us>

>>>>> "FL" == Fredrik Lundh <effbot@telia.com> writes:

    FL> hmm.  I could have sworn...   guess I didn't look carefully
    FL> enough (or someone's used his time machine again :-).

Yep, sorry.  If it's documented as in the public interface, it should
be kept.  Anything else can go (he says without yet grep'ing through
his various code bases).

-Barry


From bwarsaw@cnri.reston.va.us  Fri Mar 31 04:34:15 2000
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 30 Mar 2000 23:34:15 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us>
Message-ID: <14564.10951.90258.729547@anthem.cnri.reston.va.us>

>>>>> "Guido" == Guido van Rossum <guido@cnri.reston.va.us> writes:

    Guido> Modified Files: mmapmodule.c Log Message: Hacked for Win32
    Guido> by Mark Hammond.  Reformatted for 8-space tabs and fitted
    Guido> into 80-char lines by GvR.

Can we change the 8-space-tab rule for all new C code that goes in?  I
know that we can't practically change existing code right now, but for
new C code, I propose we use no tab characters, and we use a 4-space
block indentation.

-Barry


From DavidA@ActiveState.com  Fri Mar 31 05:07:02 2000
From: DavidA@ActiveState.com (David Ascher)
Date: Thu, 30 Mar 2000 21:07:02 -0800
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
In-Reply-To: <14564.10951.90258.729547@anthem.cnri.reston.va.us>
Message-ID: <NDBBJPNCJLKKIOBLDOMJGEIOCDAA.DavidA@ActiveState.com>

> Can we change the 8-space-tab rule for all new C code that goes in?  I
> know that we can't practically change existing code right now, but for
> new C code, I propose we use no tab characters, and we use a 4-space
> block indentation.

Heretic!  

+1, FWIW =)


From bwarsaw@cnri.reston.va.us  Fri Mar 31 05:16:48 2000
From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us)
Date: Fri, 31 Mar 2000 00:16:48 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
References: <14564.10951.90258.729547@anthem.cnri.reston.va.us>
 <NDBBJPNCJLKKIOBLDOMJGEIOCDAA.DavidA@ActiveState.com>
Message-ID: <14564.13504.310866.835201@anthem.cnri.reston.va.us>

>>>>> "DA" == David Ascher <DavidA@ActiveState.com> writes:

    DA> Heretic!

    DA> +1, FWIW =)

I hereby offer to so untabify and reformat any C code in the standard
distribution that Guido will approve of.

-Barry


From mhammond@skippinet.com.au  Fri Mar 31 05:16:26 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Fri, 31 Mar 2000 15:16:26 +1000
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJGEIOCDAA.DavidA@ActiveState.com>
Message-ID: <ECEPKNMJLHAPFFJHDOJBMEKCCHAA.mhammond@skippinet.com.au>

+1 for me too.  It also brings all source files under the same
guidelines (rather than seperate ones for .py and .c)

Mark.


From bwarsaw@cnri.reston.va.us  Fri Mar 31 05:40:16 2000
From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us)
Date: Fri, 31 Mar 2000 00:40:16 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
References: <NDBBJPNCJLKKIOBLDOMJGEIOCDAA.DavidA@ActiveState.com>
 <ECEPKNMJLHAPFFJHDOJBMEKCCHAA.mhammond@skippinet.com.au>
Message-ID: <14564.14912.629414.970309@anthem.cnri.reston.va.us>

>>>>> "MH" == Mark Hammond <mhammond@skippinet.com.au> writes:

    MH> +1 for me too.  It also brings all source files under the same
    MH> guidelines (rather than seperate ones for .py and .c)

BTW, I further propose that if Guido lets me reformat the C code, that
we freeze other checkins for the duration and I temporarily turn off
the python-checkins email.  That is, unless you guys /want/ to be
bombarded with boatloads of useless diffs. :)

-Barry


From pf@artcom-gmbh.de  Fri Mar 31 06:45:45 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Fri, 31 Mar 2000 08:45:45 +0200 (MEST)
Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....)
In-Reply-To: <14564.14912.629414.970309@anthem.cnri.reston.va.us> from "bwarsaw@cnri.reston.va.us" at "Mar 31, 2000  0:40:16 am"
Message-ID: <m12avBh-000CnCC@artcom0.artcom-gmbh.de>

Hi!

sigh :-(

> >>>>> "MH" == Mark Hammond <mhammond@skippinet.com.au> writes:
> 
>     MH> +1 for me too.  It also brings all source files under the same
>     MH> guidelines (rather than seperate ones for .py and .c)
 
bwarsaw@cnri.reston.va.us:
> BTW, I further propose that if Guido lets me reformat the C code, that
> we freeze other checkins for the duration and I temporarily turn off
> the python-checkins email.  That is, unless you guys /want/ to be
> bombarded with boatloads of useless diffs. :)

-1 for C reformatting.  The 4 space intendation seesm reasonable for
Python sources, but I disaggree for C code.  C is not Python.  Let me cite 
a very prominent member of the open source community (pasted from
/usr/src/linux/Documentation/CodingStyle):

		   Chapter 1: Indentation

   Tabs are 8 characters, and thus indentations are also 8 characters. 
   There are heretic movements that try to make indentations 4 (or even 2!)
   characters deep, and that is akin to trying to define the value of PI to
   be 3. 

   Rationale: The whole idea behind indentation is to clearly define where
   a block of control starts and ends.  Especially when you've been looking
   at your screen for 20 straight hours, you'll find it a lot easier to see
   how the indentation works if you have large indentations. 

   Now, some people will claim that having 8-character indentations makes
   the code move too far to the right, and makes it hard to read on a
   80-character terminal screen.  The answer to that is that if you need
   more than 3 levels of indentation, you're screwed anyway, and should fix
   your program. 

   In short, 8-char indents make things easier to read, and have the added
   benefit of warning you when you're nesting your functions too deep. 
   Heed that warning. 

Also the Python interpreter has no strong relationship with Linux kernel
a agree with Linus on this topic.  Python source code is another thing:
Python identifiers are usually longer due to qualifiying and Python
operands are often lists, tuples or the like, so lines contain more stuff.

disliking-yet-another-white-space-discussion-ly y'rs  - peter


From mhammond@skippinet.com.au  Fri Mar 31 07:11:50 2000
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Fri, 31 Mar 2000 17:11:50 +1000
Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....)
In-Reply-To: <m12avBh-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <ECEPKNMJLHAPFFJHDOJBOEKECHAA.mhammond@skippinet.com.au>

>    Rationale: The whole idea behind indentation is to
> clearly define where
>    a block of control starts and ends.  Especially when

Ironically, this statement is a strong argument for insisting on
Python using real tab characters!  "Clearly define" is upgraded to
"used to define".

>    80-character terminal screen.  The answer to that is
> that if you need
>    more than 3 levels of indentation, you're screwed
> anyway, and should fix
>    your program.

Yeah, right!

int foo()
{
	// one level for the privilege of being here.
	switch (bar) {
		// uh oh - running out of room...
		case WTF:
			// Oh no - if I use an "if" statement,
			// my code is "screwed"??
	}

}

> disliking-yet-another-white-space-discussion-ly y'rs  - peter

Like-death-and-taxes-ly y'rs - Mark.


From Moshe Zadka <mzadka@geocities.com>  Fri Mar 31 08:04:32 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Fri, 31 Mar 2000 10:04:32 +0200 (IST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <200003302134.QAA22939@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003311002290.27570-100000@sundial>

On Thu, 30 Mar 2000, Guido van Rossum wrote:

> > Whoa...  Not sure.  This will give issues with Patrice, at least (even
> > if it is pure Open Source -- given the size).
> 
> For those outside CNRI -- Patrice is CNRI's tough IP lawyer.

It was understandable from the context...
Personally, I'd rather if it was folded in by value, and not by reference:
one reason is versioning problems, and another is pure laziness on my
part.

what-do-you-have-when-you-got-a-lawyer-up-to-his-neck-in-the-sand-ly y'rs,
Z.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mal@lemburg.com  Fri Mar 31 07:42:04 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 31 Mar 2000 09:42:04 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules
 mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us>
Message-ID: <38E456CC.1A49334A@lemburg.com>

"Barry A. Warsaw" wrote:
> 
> >>>>> "Guido" == Guido van Rossum <guido@cnri.reston.va.us> writes:
> 
>     Guido> Modified Files: mmapmodule.c Log Message: Hacked for Win32
>     Guido> by Mark Hammond.  Reformatted for 8-space tabs and fitted
>     Guido> into 80-char lines by GvR.
> 
> Can we change the 8-space-tab rule for all new C code that goes in?  I
> know that we can't practically change existing code right now, but for
> new C code, I propose we use no tab characters, and we use a 4-space
> block indentation.

Why not just leave new code formatted as it is (except maybe
to bring the used TAB width to the standard 8 spaces used throughout
the Python C source code) ?

BTW, most of the new unicode stuff uses 4-space indents.
Unfortunately, it mixes whitespace and tabs since Emacs 
c-mode doesn't do the python-mode magic yet (is there a
way to turn it on ?).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Fredrik Lundh" <effbot@telia.com  Fri Mar 31 09:14:49 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 11:14:49 +0200
Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....)
References: <m12avBh-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <01ae01bf9af1$927b1940$34aab5d4@hagrid>

Peter Funk wrote:

> Also the Python interpreter has no strong relationship with Linux =
kernel
> a agree with Linus on this topic.  Python source code is another =
thing:
> Python identifiers are usually longer due to qualifiying and Python
> operands are often lists, tuples or the like, so lines contain more =
stuff.

you're just guessing, right?

(if you check, you'll find that the actual difference is very small.
iirc, that's true for c, c++, java, python, tcl, and probably a few
more languages.  dunno about perl, though... :-)

</F>


From Fredrik Lundh" <effbot@telia.com  Fri Mar 31 09:17:42 2000
From: Fredrik Lundh" <effbot@telia.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 11:17:42 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com>
Message-ID: <01b501bf9af1$f9b44500$34aab5d4@hagrid>

M.-A. Lemburg <mal@lemburg.com> wrote:
> Why not just leave new code formatted as it is (except maybe
> to bring the used TAB width to the standard 8 spaces used throughout
> the Python C source code) ?
>=20
> BTW, most of the new unicode stuff uses 4-space indents.
> Unfortunately, it mixes whitespace and tabs since Emacs=20
> c-mode doesn't do the python-mode magic yet (is there a
> way to turn it on ?).

http://www.jwz.org/doc/tabs-vs-spaces.html
contains some hints.

</F>


From Moshe Zadka <mzadka@geocities.com>  Fri Mar 31 11:24:05 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Fri, 31 Mar 2000 13:24:05 +0200 (IST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
Message-ID: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>

Here is a new list of things that will change in the next release. 
Thanks to all the people who gave me hints and information!
If you have anything you think I missed, or mistreated, please e-mail
me personally -- I'll post an updated version soon.

Obligatory
==========
A lot of bug-fixes, some optimizations, many improvements in the documentation

Core changes
============
Deleting objects is safe even for deeply nested data structures.

Long/int unifications: long integers can be used in seek() calls, as slice
indexes. str(1L) --> '1', not '1L' (repr() is still the same)

Builds on NT Alpha

UnboundLocalError is raised when a local variable is undefined
long, int take optional "base" parameter

string objects now have methods (though they are still immutable)

unicode support: Unicode strings are marked with u"string", and there
is support for arbitrary encoders/decoders

"in" operator can now be overriden in user-defined classes to mean anything:
it calls the magic method __contains__

New calling syntax: f(*args, **kw) equivalent to apply(f, args, kw)

Some methods which would take multiple arguments and treat them as a tuple
were fixed: list.{append, insert, remove, count}, socket.connect

New modules
===========
winreg - Windows registry interface.
Distutils - tools for distributing Python modules
robotparser - parse a robots.txt file (for writing web spiders)
linuxaudio - audio for Linux
mmap - treat a file as a memory buffer
sre -  regular expressions (fast, supports unicode)
filecmp - supersedes the old cmp.py and dircmp.py modules
tabnanny - check Python sources for tab-width dependance
unicode - support for unicode
codecs - support for Unicode encoders/decoders

Module changes
==============
re - changed to be a frontend to sre
readline, ConfigParser, cgi, calendar, posix, readline, xmllib, aifc, chunk, 
wave, random, shelve, nntplib - minor enhancements
socket, httplib, urllib - optional OpenSSL support
_tkinter - support for 8.1,8.2,8.3 (no support for versions older then 8.0)

Tool changes
============
IDLE -- complete overhaul

(Andrew, I'm still waiting for the expat support and integration to add to
this list -- other than that, please contact me if you want something less
telegraphic <wink>)
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping@lfw.org  Fri Mar 31 12:01:21 2000
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 31 Mar 2000 04:01:21 -0800 (PST)
Subject: [Python-Dev] Roundup et al.
Message-ID: <Pine.LNX.4.10.10003310355250.1007-100000@skuld.lfw.org>

Hi -- there was some talk on this list earlier about nosy
lists, managing patches, and such things, so i just wanted
to mention, for anybody interested, that i threw together
Roundup very quickly for you to try out.

    http://www.lfw.org/python/

There's a tar file there -- it's very messy code, and i
apologize (it was hastily hacked out of the running
prototype implementation), but it should be workable
enough to play with.  There's a test installation to play
with at

    http://www.lfw.org/ping/roundup/roundup.cgi

Dummy user:password pairs are test:test, spam:spam, eggs:eggs.

A fancier design, still in the last stages of coming
together (which will be my submission to the Software
Carpentry contest) is up at

    http://crit.org/http://www.lfw.org/ping/sctrack.html

and i welcome your thoughts and comments on that if you
have the spare time (ha!) and generous inclination to
contribute them.

Thank you and apologies for the interruption.


-- ?!ng

"To be human is to continually change.  Your desire to remain as you are
is what ultimately limits you."
    -- The Puppet Master, Ghost in the Shell


From guido@python.org  Fri Mar 31 12:10:45 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 31 Mar 2000 07:10:45 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
In-Reply-To: Your message of "Thu, 30 Mar 2000 23:34:15 EST."
 <14564.10951.90258.729547@anthem.cnri.reston.va.us>
References: <200003310117.UAA26774@eric.cnri.reston.va.us>
 <14564.10951.90258.729547@anthem.cnri.reston.va.us>
Message-ID: <200003311210.HAA29010@eric.cnri.reston.va.us>

> Can we change the 8-space-tab rule for all new C code that goes in?  I
> know that we can't practically change existing code right now, but for
> new C code, I propose we use no tab characters, and we use a 4-space
> block indentation.

Actually, this one was formatted for 8-space indents but using 4-space
tabs, so in my editor it looked like 16-space indents!

Given that we don't want to change existing code, I'd prefer to stick
with 1-tab 8-space indents.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Moshe Zadka <mzadka@geocities.com>  Fri Mar 31 13:10:06 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Fri, 31 Mar 2000 15:10:06 +0200 (IST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52
In-Reply-To: <200003311301.IAA29221@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003311507270.3725-100000@sundial>

On Fri, 31 Mar 2000, Guido van Rossum wrote:

> + Christian Tismer
> + Christian Tismer

Ummmmm....I smell something fishy here. Are there two Christian Tismers?
That would explain how Christian has so much time to work on Stackless.

Well, between the both of them, Guido will have no chance but to put
Stackless in the standard distribution.

--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From fredrik@pythonware.com  Fri Mar 31 13:16:16 2000
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 15:16:16 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52
References: <200003311301.IAA29221@eric.cnri.reston.va.us>
Message-ID: <000d01bf9b13$4be1db00$0500a8c0@secret.pythonware.com>

>   Tracy Tims
> + Christian Tismer
> + Christian Tismer
>   R Lindsay Todd

two christians?

</F>


From bwarsaw@cnri.reston.va.us  Fri Mar 31 13:55:13 2000
From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us)
Date: Fri, 31 Mar 2000 08:55:13 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules
 mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us>
 <14564.10951.90258.729547@anthem.cnri.reston.va.us>
 <38E456CC.1A49334A@lemburg.com>
Message-ID: <14564.44609.221250.471147@anthem.cnri.reston.va.us>

>>>>> "M" == M  <mal@lemburg.com> writes:

    M> BTW, most of the new unicode stuff uses 4-space indents.
    M> Unfortunately, it mixes whitespace and tabs since Emacs 
    M> c-mode doesn't do the python-mode magic yet (is there a
    M> way to turn it on ?).

(setq indent-tabs-mode nil)

I could add that to the "python" style.  And to zap all your existing
tab characters:

C-M-h M-x untabify RET

-Barry


From skip@mojam.com (Skip Montanaro)  Fri Mar 31 14:04:46 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Fri, 31 Mar 2000 08:04:46 -0600 (CST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>
References: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>
Message-ID: <14564.45182.460160.589244@beluga.mojam.com>

Moshe,

I would highlight those bits that are likely to warrant a little closer
scrutiny.  The list.{append,insert,...} and socket.connect change certainly
qualify.  Perhaps split the Core Changes section into two subsections, one
set of changes likely to require some adaptation and one set that should be
backwards-compatible. 

-- 
Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/


From guido@python.org  Fri Mar 31 14:47:31 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 31 Mar 2000 09:47:31 -0500
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: Your message of "Fri, 31 Mar 2000 08:04:46 CST."
 <14564.45182.460160.589244@beluga.mojam.com>
References: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>
 <14564.45182.460160.589244@beluga.mojam.com>
Message-ID: <200003311447.JAA29633@eric.cnri.reston.va.us>

See what I've done to Moshe's list: http://www.python.org/1.6/

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@mojam.com (Skip Montanaro)  Fri Mar 31 15:28:56 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Fri, 31 Mar 2000 09:28:56 -0600 (CST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: <200003311447.JAA29633@eric.cnri.reston.va.us>
References: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>
 <14564.45182.460160.589244@beluga.mojam.com>
 <200003311447.JAA29633@eric.cnri.reston.va.us>
Message-ID: <14564.50232.734778.152933@beluga.mojam.com>

--Uivpi/QkbC
Content-Type: text/plain; charset=us-ascii
Content-Description: message body text
Content-Transfer-Encoding: 7bit


    Guido> See what I've done to Moshe's list: http://www.python.org/1.6/

Looks good.  Attached are a couple nitpicky diffs.

Skip


--Uivpi/QkbC
Content-Type: application/octet-stream
Content-Description: diffs to 1.6 Release Notes
Content-Disposition: attachment;
	filename="1.6.diff"
Content-Transfer-Encoding: base64

ZGlmZiAtYzIgMS42Lmh0bWwub3JpZyAxLjYuaHRtbAoqKiogMS42Lmh0bWwub3JpZwlGcmkg
TWFyIDMxIDA5OjI3OjA4IDIwMDAKLS0tIDEuNi5odG1sCUZyaSBNYXIgMzEgMDk6MjY6MzYg
MjAwMAoqKioqKioqKioqKioqKioKKioqIDIwNSwyMTMgKioqKgogIGNhbGxzIHRoZSBtYWdp
YyBtZXRob2QgX19jb250YWluc19fLgogIAohIDxwPk5ldyBjYWxsaW5nIHN5bnRheDogZigq
YXJncywgKiprdykgaXMgZXF1aXZhbGVudCB0byBhcHBseShmLCBhcmdzLAohIGt3KS4gIFRo
aXMgY2FuIGFsc28gYmUgY29tYmluZWQgd2l0aCByZWd1bGFyIGFyZ3VtZW50cywgZS5nLiBm
KDEsIDIsCiEgeD0zLCB5PTQsICooNSwgNiksICoqeydwJzogNywgJ3EnPTh9KSBpcyBlcXVp
dmFsZW50IHRvIGYoMSwgMiwgNSwgNiwgeD0zLAohIHk9NCwgcD03LCBxPTgpLiAgQ29tbW9u
IHVzYWdlIGlzIGZvciBiYXNlIGNsYXNzIG1ldGhvZHM6IGRlZgohIG1ldGhvZChzZWxmLCAq
YXJncyk6IEJhc2VDbGFzcy5tZXRob2Qoc2VsZiwgKmFyZ3MpLgogIAogIAotLS0gMjA1LDIx
NyAtLS0tCiAgY2FsbHMgdGhlIG1hZ2ljIG1ldGhvZCBfX2NvbnRhaW5zX18uCiAgCiEgPHA+
TmV3IGNhbGxpbmcgc3ludGF4OiA8Y29kZT5mKCphcmdzLCAqKmt3KTwvY29kZT4gaXMgZXF1
aXZhbGVudCB0byA8Y29kZT5hcHBseShmLCBhcmdzLAohIGt3KTwvY29kZT4uICBUaGlzIGNh
biBhbHNvIGJlIGNvbWJpbmVkIHdpdGggcmVndWxhciBhcmd1bWVudHMsIGUuZy4gPGNvZGU+
ZigxLCAyLAohIHg9MywgeT00LCAqKDUsIDYpLCAqKnsncCc6IDcsICdxJz04fSk8L2NvZGU+
IGlzIGVxdWl2YWxlbnQgdG8gPGNvZGU+ZigxLCAyLCA1LCA2LCB4PTMsCiEgeT00LCBwPTcs
IHE9OCk8L2NvZGU+LiAgQ29tbW9uIHVzYWdlIGlzIGZvciBiYXNlIGNsYXNzIG1ldGhvZHM6
CiEgPHByZT4KISAgICAgZGVmIG1ldGhvZChzZWxmLCAqYXJncyk6CiEgICAgICAgICBCYXNl
Q2xhc3MubWV0aG9kKHNlbGYsICphcmdzKQohICAgICAgICAgLi4uCiEgPC9wcmU+CiAgCiAg
CioqKioqKioqKioqKioqKgoqKiogMjQ0LDI0OCAqKioqCiAgCiAgPHA+X3RraW50ZXIgLSBz
dXBwb3J0IGZvciA4LjEsOC4yLDguMyAobm8gc3VwcG9ydCBmb3IgdmVyc2lvbnMgb2xkZXIK
ISB0aGVuIDguMCkuCiAgCiAgCi0tLSAyNDgsMjUyIC0tLS0KICAKICA8cD5fdGtpbnRlciAt
IHN1cHBvcnQgZm9yIDguMSw4LjIsOC4zIChubyBzdXBwb3J0IGZvciB2ZXJzaW9ucyBvbGRl
cgohIHRoYW4gOC4wKS4KICAKICAK

--Uivpi/QkbC--


From guido@python.org  Fri Mar 31 15:47:56 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 31 Mar 2000 10:47:56 -0500
Subject: [Python-Dev] Windows installer pre-prelease
Message-ID: <200003311547.KAA15538@eric.cnri.reston.va.us>

The Windows installer is always hard to get just right.  If you have a
moment, go to http://www.python.org/1.6/ and download the Windows
Installer prerelease.  Let me know what works, what doesn't!

I've successfully installed it on Windows NT 4.0 and on Windows 98,
both with default install target and with a modified install target.

I'd love to hear that it also installs cleanly on Windows 95.  Please
test IDLE from the start menu!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward@cnri.reston.va.us  Fri Mar 31 16:18:43 2000
From: gward@cnri.reston.va.us (Greg Ward)
Date: Fri, 31 Mar 2000 11:18:43 -0500
Subject: [Python-Dev] Distutils for the std. library (was: Expat module)
In-Reply-To: <14563.52125.401817.986919@amarok.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Mar 30, 2000 at 04:48:13PM -0500
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us>
Message-ID: <20000331111842.A8060@cnri.reston.va.us>

On 30 March 2000, Andrew M. Kuchling said:
> Should we consider replacing the makesetup/Setup.in mechanism with a
> setup.py script that uses the Distutils?  You'd have to compile a
> minipython with just enough critical modules -- strop and posixmodule
> are probably the most important ones -- in order to run setup.py.
> It's something I'd like to look at for 1.6, because then you could be
> much smarter in automatically enabling modules.

Gee, I didn't think anyone was gonna open *that* can of worms for 1.6.
Obviously, I'd love to see the Distutils used to build parts of the
Python library.  Some possible problems:

  * Distutils relies heavily on the sys, os, string, and re modules,
    so those would have to be built and included in the mythical
    mini-python (as would everything they rely on -- strop, pcre, ... ?)

  * Distutils currently assumes that it's working with an installed
    Python -- it doesn't know anything about working in the Python
    source tree.  I think this could be fixed just be tweaking the
    distutils.sysconfig module, but there might be subtle assumptions
    elsewhere in the code.

  * I haven't written the mythical Autoconf-in-Python yet, so we'd still have
    to rely on either the configure script or user intervention to find
    out whether library X is installed, and where its header and library
    files live (for X in zlib, tcl, tk, ...).

Of course, the configure script would still be needed to build the
mini-python, so it's not going away any time soon.

        Greg


From skip@mojam.com (Skip Montanaro)  Fri Mar 31 16:26:55 2000
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Fri, 31 Mar 2000 10:26:55 -0600 (CST)
Subject: [Python-Dev] Distutils for the std. library (was: Expat module)
In-Reply-To: <20000331111842.A8060@cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
 <200003282000.PAA11988@eric.cnri.reston.va.us>
 <14563.46954.70800.706245@amarok.cnri.reston.va.us>
 <200003302025.PAA22367@eric.cnri.reston.va.us>
 <14563.50417.909045.81868@amarok.cnri.reston.va.us>
 <200003302131.QAA22897@eric.cnri.reston.va.us>
 <14563.52125.401817.986919@amarok.cnri.reston.va.us>
 <20000331111842.A8060@cnri.reston.va.us>
Message-ID: <14564.53711.803509.962248@beluga.mojam.com>

    Greg>   * Distutils relies heavily on the sys, os, string, and re
    Greg>     modules, so those would have to be built and included in the
    Greg>     mythical mini-python (as would everything they rely on --
    Greg>     strop, pcre, ... ?)

With string methods in 1.6, reliance on the string and strop modules should
be lessened or eliminated, right?  re and os may need a tweak or two to use
string methods themselves. The sys module is always available.  Perhaps it
would make sense to put sre(module)?.c into the Python directory where
sysmodule.c lives.  That way, a Distutils-capable mini-python could be built
without messing around in the Modules directory at all...

-- 
Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/


From Moshe Zadka <mzadka@geocities.com>  Fri Mar 31 16:25:11 2000
From: Moshe Zadka <mzadka@geocities.com> (Moshe Zadka)
Date: Fri, 31 Mar 2000 18:25:11 +0200 (IST)
Subject: [Python-Dev] Distutils for the std. library (was: Expat module)
In-Reply-To: <20000331111842.A8060@cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003311817090.7408-100000@sundial>

On Fri, 31 Mar 2000, Greg Ward wrote:

> Gee, I didn't think anyone was gonna open *that* can of worms for 1.6.

Well, it's not like it's not a lot of work, but it could be done, with
liberal interpretation of "mini": include in "mini" Python *all* modules
which do not rely on libraries not distributed with the Python core --
zlib, expat and Tkinter go right out the window, but most everything
else can stay. That way, Distutils can use all modules it currently 
uses <wink>.

The other problem, file-location, is a problem I have talked about
earlier: it *cannot* be assumed that the default place for putting new
libraries is the same place the Python interpreter resides, for many
reasons. Why not ask the user explicitly?


--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From gward@cnri.reston.va.us  Fri Mar 31 16:29:33 2000
From: gward@cnri.reston.va.us (Greg Ward)
Date: Fri, 31 Mar 2000 11:29:33 -0500
Subject: [Python-Dev] Distutils for the std. library (was: Expat module)
In-Reply-To: <14564.53711.803509.962248@beluga.mojam.com>; from skip@mojam.com on Fri, Mar 31, 2000 at 10:26:55AM -0600
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> <20000331111842.A8060@cnri.reston.va.us> <14564.53711.803509.962248@beluga.mojam.com>
Message-ID: <20000331112933.B8060@cnri.reston.va.us>

On 31 March 2000, Skip Montanaro said:
> With string methods in 1.6, reliance on the string and strop modules should
> be lessened or eliminated, right?  re and os may need a tweak or two to use
> string methods themselves. The sys module is always available.  Perhaps it
> would make sense to put sre(module)?.c into the Python directory where
> sysmodule.c lives.  That way, a Distutils-capable mini-python could be built
> without messing around in the Modules directory at all...

But I'm striving to maintain compatability with (at least) Python 1.5.2
in Distutils.  That need will fade with time, but it's not going to
disappear the moment Python 1.6 is released.  (Guess I'll have to find
somewhere else to play with string methods and extended call syntax).

        Greg


From thomas.heller@ion-tof.com  Fri Mar 31 17:09:41 2000
From: thomas.heller@ion-tof.com (Thomas Heller)
Date: Fri, 31 Mar 2000 19:09:41 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils msvccompiler.py
References: <200003311653.LAA08175@thrak.cnri.reston.va.us>
Message-ID: <038701bf9b33$e7c49240$4500a8c0@thomasnotebook>

> Simplified Thomas Heller's registry patch: just assign all those
> HKEY_* and Reg* names once, rather than having near-duplicate code
> in the two import attempts.

Your change won't work, the function names in win32api and winreg are not the same:
Example:    win32api.RegEnumValue <-> winreg.EnumValue 

> 
> Also dropped the leading underscore on all the imported symbols,
> as it's not appropriate (they're not local to this module).

Are they used anywhere else? Or do you think they *could* be used somewhere else?

Thomas Heller


From mal@lemburg.com  Fri Mar 31 10:19:58 2000
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 31 Mar 2000 12:19:58 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules
 mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com> <01b501bf9af1$f9b44500$34aab5d4@hagrid>
Message-ID: <38E47BCE.94E4E012@lemburg.com>

Fredrik Lundh wrote:
> 
> M.-A. Lemburg <mal@lemburg.com> wrote:
> > Why not just leave new code formatted as it is (except maybe
> > to bring the used TAB width to the standard 8 spaces used throughout
> > the Python C source code) ?
> >
> > BTW, most of the new unicode stuff uses 4-space indents.
> > Unfortunately, it mixes whitespace and tabs since Emacs
> > c-mode doesn't do the python-mode magic yet (is there a
> > way to turn it on ?).
> 
> http://www.jwz.org/doc/tabs-vs-spaces.html
> contains some hints.

Ah, cool. Thanks :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From pf@artcom-gmbh.de  Fri Mar 31 18:56:40 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Fri, 31 Mar 2000 20:56:40 +0200 (MEST)
Subject: [Python-Dev] 'make install' should create lib/site-packages IMO
In-Reply-To: <200003311513.KAA00790@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 31, 2000 10:13:20 am"
Message-ID: <m12b6b2-000CnCC@artcom0.artcom-gmbh.de>

Hi!

Guido van Rossum:
[...]
> Modified Files:
> 	Makefile.in 
> Log Message:
> Added distutils and distutils/command to LIBSUBDIRS.  Noted by Andrew
> Kuchling.
[...]
> ! LIBSUBDIRS=	lib-old lib-tk test test/output encodings \
> ! 		distutils distutils/command $(MACHDEPS)
[...]

What about 'site-packages'?  SuSE added this to their Python packaging
and I think it is a good idea to have an empty 'site-packages' directory
installed by default.

Regards, Peter


From akuchlin@mems-exchange.org  Fri Mar 31 20:16:53 2000
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 31 Mar 2000 15:16:53 -0500 (EST)
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
In-Reply-To: <00e901bf9a9c$6c036240$34aab5d4@hagrid>
References: <00b701bf9a99$022339c0$34aab5d4@hagrid>
 <14563.58848.109072.339060@amarok.cnri.reston.va.us>
 <00e901bf9a9c$6c036240$34aab5d4@hagrid>
Message-ID: <14565.1973.361549.291817@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>btw, "pattern" doesn't make much sense in SRE -- who says
>the pattern object was created by re.compile?  guess I'll just
>set it to None in other cases (e.g. sregex, sreverb, sgema...)

Good point; I can imagine fabulously complex patterns assembled
programmatically, for which no summary could be made.  I guess there
could be another attribute that also gives the class (module?
function?) used to compile the pattern, but more likely, the pattern
attribute should be deprecated and eventually dropped.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
You know how she is when she gets an idea into her head. I mean, when one
finally penetrates.
  -- Desire describes Delirium, in SANDMAN #41: "Brief Lives:1"


From pf@artcom-gmbh.de  Fri Mar 31 20:14:41 2000
From: pf@artcom-gmbh.de (Peter Funk)
Date: Fri, 31 Mar 2000 22:14:41 +0200 (MEST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: <200003311447.JAA29633@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 31, 2000  9:47:31 am"
Message-ID: <m12b7oX-000CnCC@artcom0.artcom-gmbh.de>

Hi!

Guido van Rossum :
> See what I've done to Moshe's list: http://www.python.org/1.6/

Very fine, but I have a few small annotations:

1.'linuxaudio' has been renamed to 'linuxaudiodev'

2.The following text:

  "_tkinter - support for 8.1,8.2,8.3 (no support for versions older than 8.0)."

  looks a bit misleading, since it is not explicit about Version 8.0.x
  I suggest the following wording:

  "_tkinter - supports Tcl/Tk from version 8.0 up to the current 8.3.  
   Support for versions older than 8.0 has been dropped."

3.'src/Tools/i18n/pygettext.py' by Barry should be mentioned.  This is
  a very useful utility.  I suggest to append the following text:

   "New utility pygettext.py -- Python equivalent of xgettext(1).
    A message text extraction tool used for internationalizing 
    applications written in Python"

Regards, Peter


From fdrake@acm.org  Fri Mar 31 20:30:00 2000
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 31 Mar 2000 15:30:00 -0500 (EST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: <m12b7oX-000CnCC@artcom0.artcom-gmbh.de>
References: <200003311447.JAA29633@eric.cnri.reston.va.us>
 <m12b7oX-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <14565.2760.665022.206361@seahag.cnri.reston.va.us>

Peter Funk writes:
 >   I suggest the following wording:
...
 >   a very useful utility.  I suggest to append the following text:

Peter,
  I'm beginning to figure this out -- you really just want to get
published!  ;)
  You forgot the legelese.  ;(


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido@python.org  Fri Mar 31 21:30:42 2000
From: guido@python.org (Guido van Rossum)
Date: Fri, 31 Mar 2000 16:30:42 -0500
Subject: [Python-Dev] Python 1.6 alpha 1 released
Message-ID: <200003312130.QAA04361@eric.cnri.reston.va.us>

I've just released a source tarball and a Windows installer for Python
1.6 alpha 1 to the Python website:

  http://www.python.org/1.6/

Probably the biggest news (if you hadn't heard the rumors) is Unicode
support.  More news on the above webpage.

Note: this is an alpha release.  Some of the code is very rough!
Please give it a try with your favorite Python application, but don't
trust it for production use yet.  I plan to release several more alpha
and beta releases over the next two months, culminating in an 1.6
final release around June first.

We need your help to make the final 1.6 release as robust as possible
-- please test this alpha release!!!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From bjorn@roguewave.com  Fri Mar 31 22:02:07 2000
From: bjorn@roguewave.com (Bjorn Pettersen)
Date: Fri, 31 Mar 2000 15:02:07 -0700
Subject: [Python-Dev] Re: Python 1.6 alpha 1 released
References: <200003312130.QAA04361@eric.cnri.reston.va.us>
Message-ID: <38E5205F.DE811F61@roguewave.com>

Guido van Rossum wrote:
> 
> I've just released a source tarball and a Windows installer for Python
> 1.6 alpha 1 to the Python website:
> 
>   http://www.python.org/1.6/
> 
> Probably the biggest news (if you hadn't heard the rumors) is Unicode
> support.  More news on the above webpage.
> 
> Note: this is an alpha release.  Some of the code is very rough!
> Please give it a try with your favorite Python application, but don't
> trust it for production use yet.  I plan to release several more alpha
> and beta releases over the next two months, culminating in an 1.6
> final release around June first.
> 
> We need your help to make the final 1.6 release as robust as possible
> -- please test this alpha release!!!
> 
> --Guido van Rossum (home page: http://www.python.org/~guido/)

Just read the announcement page, and found that socket.connect() no
longer takes two arguments as was previously documented.  If this change
is staying I'm assuming the examples in the manual that uses a two
argument socket.connect() will be changed?

A quick look shows that this breaks all the network scripts I have
installed (at least the ones that I found, undoubtedly there are many
more).  Because of this I will put any upgrade plans on hold.

-- bjorn


From gandalf@starship.python.net  Fri Mar 31 21:56:16 2000
From: gandalf@starship.python.net (Vladimir Ulogov)
Date: Fri, 31 Mar 2000 16:56:16 -0500 (EST)
Subject: [Python-Dev] Re: Python 1.6 alpha 1 released
In-Reply-To: <200003312130.QAA04361@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003311651590.22919-100000@starship.python.net>

Guido,

"""where you used to write sock.connect(host, port) you must now write
sock.connect((host, port))"""

Is it possible to keep old notation ? I'm understand (according you past
mail about parameters of the connect) this may be not what you has have in
mind, but we do use this notation "a lot" and for us it will means to
create workaround for socket.connect function. It's inconvinient. In
general, I'm thinknig the socket.connect(Host, Port) looks prettier :))
than socket.connect((Host, Port))
Vladimir


From gstein at lyra.org  Wed Mar  1 00:47:55 2000
From: gstein at lyra.org (Greg Stein)
Date: Tue, 29 Feb 2000 15:47:55 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <38BC2375.5C832488@tismer.com>
Message-ID: <Pine.LNX.4.10.10002291543330.10607-100000@nebula.lyra.org>

On Tue, 29 Feb 2000, Christian Tismer wrote:
> Greg Stein wrote:
> > +1 on breaking it now, rather than deferring it Yet Again.
> > 
> > IMO, there has been plenty of warning, and there is plenty of time to
> > correct the software.
> > 
> > I'm +0 on adding a warning architecture to Python to support issuing a
> > warning/error when .append is called with multiple arguments.
> 
> Well, the (bad) effect of this patch is that you cannot run
> PythonWin any longer unless Mark either supplies an updated
> distribution, or one corrects the two barfing Scintilla
> support scripts by hand.

Yes, but there is no reason to assume this won't happen.

Why don't we simply move forward with the assumption that PythonWin and
Scintilla will be updated? If we stand around pointing at all the uses of
append that are incorrect and claim that is why we can't move forward,
then we won't get anywhere. Instead, let's just *MOVE* and see that
software authors update accordingly. It isn't like it is a difficult
change to make. Heck, PythonWin and Scintilla could be updated within the
week and re-released. *WAY* ahead of the 1.6 release.

> Bad for me, since I'm building Stackless Python against 1.5.2+,
> and that means the users will see PythonWin barf when installing SLP.

If you're building a system using an interim release of Python, then I
think you need to take responsibility for that. If you don't want those
people to have problems, then you can back out the list.append change. Or
you can release patches to PythonWin. I don't think the Python world at
large should be hampered because somebody is using an unstable/interim
version of Python. Again: we couldn't move forward.

> Adding a warning instead of raising an exception would be nice IMHO,
> since the warning could probably contain the file name and line
> number to change, and I would leave my users with this easy task.

Yes, this would be nice. But somebody has to take the time to code it up.
The warning won't appear out of nowhere...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mhammond at skippinet.com.au  Wed Mar  1 00:57:38 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Wed, 1 Mar 2000 10:57:38 +1100
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.10.10002291543330.10607-100000@nebula.lyra.org>
Message-ID: <ECEPKNMJLHAPFFJHDOJBAEJNCFAA.mhammond@skippinet.com.au>

> Why don't we simply move forward with the assumption that PythonWin and
> Scintilla will be updated?

Done :-)

However, I think dropping it now _is_ a little heavy handed.  I decided to
do a wider search and found a few in, eg, Sam Rushings calldll based ODBC
package.

Personally, I would much prefer a warning now, and drop it later.  _Then_ we
can say we have made enough noise about it.  It would only be 2 years ago
that I became aware that this "feature" of append was not a feature at all -
up until then I used it purposely, and habits are sometimes hard to change
:-)

MArk.


From gstein at lyra.org  Wed Mar  1 01:12:29 2000
From: gstein at lyra.org (Greg Stein)
Date: Tue, 29 Feb 2000 16:12:29 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBAEJNCFAA.mhammond@skippinet.com.au>
Message-ID: <Pine.LNX.4.10.10002291608020.10607-100000@nebula.lyra.org>

On Wed, 1 Mar 2000, Mark Hammond wrote:
> > Why don't we simply move forward with the assumption that PythonWin and
> > Scintilla will be updated?
> 
> Done :-)

hehe...

> However, I think dropping it now _is_ a little heavy handed.  I decided to
> do a wider search and found a few in, eg, Sam Rushings calldll based ODBC
> package.
> 
> Personally, I would much prefer a warning now, and drop it later.  _Then_ we
> can say we have made enough noise about it.  It would only be 2 years ago
> that I became aware that this "feature" of append was not a feature at all -
> up until then I used it purposely, and habits are sometimes hard to change
> :-)

What's the difference between a warning and an error? If you're running a
program and it suddenly spits out a warning about a misuse of list.append,
I'd certainly see that as "the program did something unexpected; that is
an error."

But this is all moot. Guido has already said that we would be amenable to
a warning/error infrastructure which list.append could use. His
description used some awkward sentences, so I'm not sure (without spending
some brain cycles to parse the email) exactly what his desired defaults
and behavior are. But hey... the possibility is there, and is just waiting
for somebody to code it.

IMO, Guido has left an out for people that are upset with the current
hard-line approach. One of those people just needs to spend a bit of time
coming up with a patch :-)

And yes, Guido is also the Benevolent Dictator and can certainly have his
mind changed, so people can definitely continue pestering him to back away
from the hard-line approach...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From ping at lfw.org  Wed Mar  1 01:20:07 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Tue, 29 Feb 2000 18:20:07 -0600 (CST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.10.10002291608020.10607-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.10.10002291816190.10505-100000@server1.lfw.org>

On Tue, 29 Feb 2000, Greg Stein wrote:
>
> What's the difference between a warning and an error? If you're running a
> program and it suddenly spits out a warning about a misuse of list.append,
> I'd certainly see that as "the program did something unexpected; that is
> an error."

A big, big difference.  Perhaps to one of us, it's the minor inconvenience
of reading the error message and inserting a couple of parentheses in the
appropriate file -- but to the end user, it's the difference between the
program working (albeit noisily) and *not* working.  When the program throws
an exception and stops, it is safe to say most users will declare it broken
and give up.

We can't assume that they're going to be able to figure out what to edit
(or be brave enough to try) just by reading the error message... or even
what interpreter flag to give, if errors (rather than warnings) are the
default behaviour.


-- ?!ng


From klm at digicool.com  Wed Mar  1 01:37:09 2000
From: klm at digicool.com (Ken Manheimer)
Date: Tue, 29 Feb 2000 19:37:09 -0500 (EST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBAEJNCFAA.mhammond@skippinet.com.au>
Message-ID: <Pine.LNX.4.21.0002291925060.22173-100000@korak.digicool.com>

On Wed, 1 Mar 2000, Mark Hammond wrote:

> > Why don't we simply move forward with the assumption that PythonWin and
> > Scintilla will be updated?
> 
> Done :-)
> 
> However, I think dropping it now _is_ a little heavy handed.  I decided to
> do a wider search and found a few in, eg, Sam Rushings calldll based ODBC
> package.
> 
> Personally, I would much prefer a warning now, and drop it later.  _Then_ we
> can say we have made enough noise about it.  It would only be 2 years ago
> that I became aware that this "feature" of append was not a feature at all -
> up until then I used it purposely, and habits are sometimes hard to change
> :-)

I agree with mark.  Why the sudden rush??  It seems to me to be unfair to
make such a change - one that will break peoples code - without advanced
warning, which typically is handled by a deprecation period.  There *are*
going to be people who won't be informed of the change in the short span
of less than a single release. Just because it won't cause you pain isn't
a good reason to disregard the pain of those that will suffer,
particularly when you can do something relatively low-cost to avoid it.

Ken
klm at digicool.com


From gstein at lyra.org  Wed Mar  1 01:57:56 2000
From: gstein at lyra.org (Greg Stein)
Date: Tue, 29 Feb 2000 16:57:56 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.21.0002291925060.22173-100000@korak.digicool.com>
Message-ID: <Pine.LNX.4.10.10002291642080.10607-100000@nebula.lyra.org>

On Tue, 29 Feb 2000, Ken Manheimer wrote:
>...
> I agree with mark.  Why the sudden rush??  It seems to me to be unfair to
> make such a change - one that will break peoples code - without advanced
> warning, which typically is handled by a deprecation period.  There *are*
> going to be people who won't be informed of the change in the short span
> of less than a single release. Just because it won't cause you pain isn't
> a good reason to disregard the pain of those that will suffer,
> particularly when you can do something relatively low-cost to avoid it.

Sudden rush?!?

Mark said he knew about it for a couple years. Same here. It was a long
while ago that .append()'s semantics were specified to "no longer" accept
multiple arguments.

I see in the HISTORY file, that changes were made to Python 1.4 (October,
1996) to avoid calling append() with multiple arguments.

So, that is over three years that append() has had multiple-args
deprecated. There was probably discussion even before that, but I can't
seem to find something to quote. Seems like plenty of time -- far from
rushed.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From klm at digicool.com  Wed Mar  1 02:02:02 2000
From: klm at digicool.com (Ken Manheimer)
Date: Tue, 29 Feb 2000 20:02:02 -0500 (EST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.10.10002291642080.10607-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>

On Tue, 29 Feb 2000, Greg Stein wrote:

> On Tue, 29 Feb 2000, Ken Manheimer wrote:
> >...
> > I agree with mark.  Why the sudden rush??  It seems to me to be unfair to
> > make such a change - one that will break peoples code - without advanced
> > warning, which typically is handled by a deprecation period.  There *are*
> > going to be people who won't be informed of the change in the short span
> > of less than a single release. Just because it won't cause you pain isn't
> > a good reason to disregard the pain of those that will suffer,
> > particularly when you can do something relatively low-cost to avoid it.
> 
> Sudden rush?!?
> 
> Mark said he knew about it for a couple years. Same here. It was a long
> while ago that .append()'s semantics were specified to "no longer" accept
> multiple arguments.
> 
> I see in the HISTORY file, that changes were made to Python 1.4 (October,
> 1996) to avoid calling append() with multiple arguments.
> 
> So, that is over three years that append() has had multiple-args
> deprecated. There was probably discussion even before that, but I can't
> seem to find something to quote. Seems like plenty of time -- far from
> rushed.

None the less, for those practicing it, the incorrectness of it will be
fresh news.  I would be less sympathetic with them if there was recent
warning, eg, the schedule for changing it in the next release was part of
the current release.  But if you tell somebody you're going to change
something, and then don't for a few years, you probably need to renew the
warning before you make the change.  Don't you think so?  Why not?

Ken
klm at digicool.com


From paul at prescod.net  Wed Mar  1 03:56:33 2000
From: paul at prescod.net (Paul Prescod)
Date: Tue, 29 Feb 2000 18:56:33 -0800
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>
Message-ID: <38BC86E1.53F69776@prescod.net>

Software configuration management is HARD. Every sudden backwards
incompatible change (warranted or not) makes it harder. Mutli-arg append
is not hurting anyone as much as a sudden change to it would. It would
be better to leave append() alone and publicize its near-term removal
rather than cause random, part-time supported modules to stop working
because their programmers may be too busy to update them right now.

So no, I'm not stepping up to do it. But I'm also saying that the better
"lazy" option is to put something in a prominent place in the
documentation and otherwise leave it alone.

<aside>
As far as I am concerned, a formal warning-based deprecation mechanism
is necessary for Python's continued evolution. Perhaps we can even
expose the deprecation flag to the programmer so we can say:

if deprecation:
	print "This module isn't supported anymore."

if deprecation:
	print "Use method FooEx instead."

If we had a deprecation mechanism, maybe introducing new keywords would
not be quite so painful. Version x deprecates, version y adds the
keyword. Mayhap we should also deprecate implicit truncating integral
division while we are at it...
</aside>

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"The calculus and the rich body of mathematical analysis to which it
gave rise made modern science possible, but it was the algorithm that
made possible the modern world." 
        - from "Advent of the Algorithm" David Berlinski
	http://www.opengroup.com/mabooks/015/0151003386.shtml


From guido at python.org  Wed Mar  1 05:11:02 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Feb 2000 23:11:02 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: Your message of "Tue, 29 Feb 2000 18:56:33 PST."
             <38BC86E1.53F69776@prescod.net> 
References: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>  
            <38BC86E1.53F69776@prescod.net> 
Message-ID: <200003010411.XAA12988@eric.cnri.reston.va.us>

> Software configuration management is HARD. Every sudden backwards
> incompatible change (warranted or not) makes it harder. Mutli-arg append
> is not hurting anyone as much as a sudden change to it would. It would
> be better to leave append() alone and publicize its near-term removal
> rather than cause random, part-time supported modules to stop working
> because their programmers may be too busy to update them right now.

I'm tired of this rhetoric.  It's not like I'm changing existing
Python installations retroactively.  I'm planning to release a new
version of Python which no longer supports certain long-obsolete and
undocumented behavior.  If you maintain a non-core Python module, you
should test it against the new release and fix anything that comes up.
This is why we have an alpha and beta test cycle and even before that
the CVS version.  If you are a Python user who depends on a 3rd party
module, you need to find out whether the new version is compatible
with the 3rd party code you are using, or whether there's a newer
version available that solves the incompatibility.

There are people who still run Python 1.4 (really!) because they
haven't upgraded.  I don't have a problem with that -- they don't get
much support, but it's their choice, and they may not need the new
features introduced since then.  I expect that lots of people won't
upgrade their Python 1.5.2 to 1.6 right away -- they'll wait until the
other modules/packages they need are compatible with 1.6.  Multi-arg
append probably won't be the only reason why e.g. Digital Creations
may need to release an update to Zope for Python 1.6.  Zope comes with
its own version of Python anyway, so they have control over when they
make the switch.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Wed Mar  1 06:04:35 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 00:04:35 -0500
Subject: [Python-Dev] Size of int across machines (was RE: Blowfish in Python?)
In-Reply-To: <al87lfovzm7.fsf@sirppi.helsinki.fi>
Message-ID: <000201bf833b$a3b01bc0$412d153f@tim>

[Markus Stenberg]
> ...
>  speed was horrendous. >
> I think the main reason was the fact that I had to use _long ints_ for
> calculations, as the normal ints are signed, and apparently the bitwise
> operators do not work as advertised when bit32 is set (=number is
> negative).

[Tim, takes "bitwise operators" to mean & | ^ ~, and expresses surprise]

[Markus, takes umbrage, and expresses umbrage <wink>]
> Hmm.. As far as I'm concerned, shifts for example do screw up.

Do you mean "for example" as in "there are so many let's just pick one at
random", or as in "this is the only one I've stumbled into" <0.9 wink>?

> i.e.
>
> 0xffffffff >> 30
>
> [64bit Python: 3]
> [32bit Python: -1]
>
> As far as I'm concerned, that should _not_ happen. Or maybe it's just me.

I could not have guessed that your complaint was about 64-bit Python from
your "when bit32 is set (=number is negative)" description <wink>.

The behavior shown in a Python compiled under a C in which sizeof(long)==4
matches the Reference Manual (see the "Integer and long integer literals"
and "shifting operations" sections).  So that can't be considered broken
(you may not *like* it, but it's functioning as designed & as documented).

The behavior under a sizeof(long)==8 C seems more of an ill-documented (and
debatable to me too) feature.  The possibility is mentioned in the "The
standard type hierarchy" section (under Numbers -> Integers -> Plain
integers) but really not fleshed out, and the "Integer and long integer
literals" section plainly contradicts it.

Python's going to have to clean up its act here -- 64-bit machines are
getting more common.  There's a move afoot to erase the distinction between
Python ints and longs (in the sense of auto-converting from one to the other
under the covers, as needed).  In that world, your example would work like
the "64bit Python" one.  There are certainly compatability issues, though,
in that int left shifts are end-off now, and on a 32-bit machine any int for
which i & 0x8000000 is true "is negative" (and so sign-extends on a right
shift; note that Python guarantees sign-extending right shifts *regardless*
of what the platform C does (C doesn't define what happens here -- Python
does)).

[description of pain getting a fast C-like "mod 2**32 int +" to work too]

Python really wasn't designed for high-performance bit-fiddling, so you're
(as you've discovered <wink>) swimming upstream with every stroke.  Given
that you can't write a C module here, there's nothing better than to do the
^ & | ~ parts with ints, and fake the rest slowly & painfully.  Note that
you can at least determine the size of a Python int via inspecting
sys.maxint.

sympathetically-unhelpfully y'rs  - tim


From guido at python.org  Wed Mar  1 06:44:10 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 00:44:10 -0500
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: Your message of "Tue, 29 Feb 2000 15:34:21 MST."
             <20000229153421.A16502@acs.ucalgary.ca> 
References: <20000229153421.A16502@acs.ucalgary.ca> 
Message-ID: <200003010544.AAA13155@eric.cnri.reston.va.us>

[I don't like to cross-post to patches and python-dev, but I think
this belongs in patches because it's a followup to Neil's post there
and also in -dev because of its longer-term importance.]

Thanks for the new patches, Neil!

We had a visitor here at CNRI today, Eric Tiedemann
<est at hyperreal.org>, who had a look at your patches before.  Eric
knows his way around the Scheme, Lisp and GC literature, and presented
a variant on your approach which takes the bite out of the recursive
passes.

Eric had commented earlier on Neil's previous code, and I had used the
morning to make myself familiar with Neil's code.  This was relatively
easy because Neil's code is very clear.

Today, Eric proposed to do away with Neil's hash table altogether --
as long as we're wasting memory, we might as well add 3 fields to each
container object rather than allocating the same amount in a separate
hash table.  Eric expects that this will run faster, although this
obviously needs to be tried.

Container types are: dict, list, tuple, class, instance; plus
potentially user-defined container types such as kjbuckets.  I have a
feeling that function objects should also be considered container
types, because of the cycle involving globals.

Eric's algorithm, then, consists of the following parts.

Each container object has three new fields: gc_next, gc_prev, and
gc_refs.  (Eric calls the gc_refs "refcount-zero".)

We color objects white (initial), gray (root), black (scanned root).
(The terms are explained later; we believe we don't actually need bits
in the objects to store the color; see later.)

All container objects are chained together in a doubly-linked list --
this is the same as Neil's code except Neil does it only for dicts.
(Eric postulates that you need a list header.)

When GC is activated, all objects are colored white; we make a pass
over the entire list and set gc_refs equal to the refcount for each
object.

Next, we make another pass over the list to collect the internal
references.  Internal references are (just like in Neil's version)
references from other container types.  In Neil's version, this was
recursive; in Eric's version, we don't need recursion, since the list
already contains all containers.  So we simple visit the containers in
the list in turn, and for each one we go over all the objects it
references and subtract one from *its* gc_refs field.  (Eric left out
the little detail that we ened to be able to distinguish between
container and non-container objects amongst those references; this can
be a flag bit in the type field.)

Now, similar to Neil's version, all objects for which gc_refs == 0
have only internal references, and are potential garbage; all objects
for which gc_refs > 0 are "roots".  These have references to them from
other places, e.g. from globals or stack frames in the Python virtual
machine.

We now start a second list, to which we will move all roots.  The way
to do this is to go over the first list again and to move each object
that has gc_refs > 0 to the second list.  Objects placed on the second
list in this phase are considered colored gray (roots).

Of course, some roots will reference some non-roots, which keeps those
non-roots alive.  We now make a pass over the second list, where for
each object on the second list, we look at every object it references.
If a referenced object is a container and is still in the first list
(colored white) we *append* it to the second list (colored gray).
Because we append, objects thus added to the second list will
eventually be considered by this same pass; when we stop finding
objects that sre still white, we stop appending to the second list,
and we will eventually terminate this pass.  Conceptually, objects on
the second list that have been scanned in this pass are colored black
(scanned root); but there is no need to to actually make the
distinction.

(How do we know whether an object pointed to is white (in the first
list) or gray or black (in the second)?  We could use an extra
bitfield, but that's a waste of space.  Better: we could set gc_refs
to a magic value (e.g. 0xffffffff) when we move the object to the
second list.  During the meeting, I proposed to set the back pointer
to NULL; that might work too but I think the gc_refs field is more
elegant.  We could even just test for a non-zero gc_refs field; the
roots moved to the second list initially all have a non-zero gc_refs
field already, and for the objects with a zero gc_refs field we could
indeed set it to something arbitrary.)

Once we reach the end of the second list, all objects still left in
the first list are garbage.  We can destroy them in a similar to the
way Neil does this in his code.  Neil calls PyDict_Clear on the
dictionaries, and ignores the rest.  Under Neils assumption that all
cycles (that he detects) involve dictionaries, that is sufficient.  In
our case, we may need a type-specific "clear" function for containers
in the type object.

We discussed more things, but not as thoroughly.  Eric & Eric stressed
the importance of making excellent statistics available about the rate
of garbage collection -- probably as data structures that Python code
can read rather than debugging print statements.  Eric T also sketched
an incremental version of the algorithm, usable for real-time
applications.  This involved keeping the gc_refs field ("external"
reference counts) up-to-date at all times, which would require two
different versions of the INCREF/DECREF macros: one for
adding/deleting a reference from a container, and another for
adding/deleting a root reference.  Also, a 4th color (red) was added,
to distinguish between scanned roots and scanned non-roots.  We
decided not to work this out in more detail because the overhead cost
appeared to be much higher than for the previous algorithm; instead,
we recommed that for real-time requirements the whole GC is disabled
(there should be run-time controls for this, not just compile-time).
We also briefly discussed possibilities for generational schemes.

The general opinion was that we should first implement and test the
algorithm as sketched above, and then changes or extensions could be
made.

I was pleasantly surprised to find Neil's code in my inbox when we
came out of the meeting; I think it would be worthwhile to compare and
contrast the two approaches.  (Hm, maybe there's a paper in it?)

The rest of the afternoon was spent discussing continuations,
coroutines and generators, and the fundamental reason why
continuations are so hard (the C stack getting in the way everywhere).
But that's a topic for another mail, maybe.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Wed Mar  1 06:57:49 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 00:57:49 -0500
Subject: need .append patch (was RE: [Python-Dev] Re: Python-checkins digest, Vol 1 #370 - 8 msgs)
In-Reply-To: <200002291302.IAA04581@eric.cnri.reston.va.us>
Message-ID: <000601bf8343$13575040$412d153f@tim>

[Tim, runs checkappend.py over the entire CVS tree, comes up with
 surprisingly many remaining problems, and surprisingly few false hits]

[Guido fixes mailerdaemon.py, and argues for nuking

    Demo\tkinter\www\           (the whole directory)
    Demo\sgi\video\VcrIndex.py  (unclear whether the dir or just the file)

    Demo\sgi\gl\glstdwin\glstdwin.py   (stdwin-related)
    Demo\ibrowse\ibrowse.py            (stdwin-related)
> All these are stdwin-related.  Stdwin will also go out of service per
> 1.6.
]

Then the sooner someone nukes them from the CVS tree, the sooner my
automated hourly checkappend complaint generator will stop pestering
Python-Dev about them <wink>.

> (Conclusion: most multi-arg append() calls are *very* old,

But part of that is because we went thru this exercise a couple years ago
too, and you repaired all the ones in the less obscure parts of the
distribution then.

> or contributed by others.  Sigh.  I must've given bad examples long
> ago...)

Na, I doubt that.  Most people will not read a language defn, at least not
until "something doesn't work".  If the compiler accepts a thing, they
simply *assume* it's correct.  It's pretty easy (at least for me!) to make
this particular mistake as a careless typo, so I assume that's the "source
origin" for many of these too.  As soon you *notice* you've done it, and
that nothing bad happened, the natural tendencies are to (a) believe it's
OK, and (b) save 4 keystrokes (incl. the SHIFTs) over & over again in the
glorious indefinite future <wink>.

Reminds me of a c.l.py thread a while back, wherein someone did stuff like

    None, x, y, None = function_returning_a_4_tuple

to mean that they didn't care what the 1st & 4th values were.  It happened
to work, so they did it more & more.  Eventually a function containing this
mistake needed to reference None after that line, and "suddenly for no
reason at all Python stopped working".

To the extent that you're serious about CP4E, you're begging for more of
this, not less <wink>.

newbies-even-keep-on-doing-things-that-*don't*-work!-ly y'rs  - tim


From tim_one at email.msn.com  Wed Mar  1 07:50:44 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 01:50:44 -0500
Subject: [Python-Dev] Unicode mapping tables
In-Reply-To: <38BBD1A2.CD29AADD@lemburg.com>
Message-ID: <000701bf834a$77acdfe0$412d153f@tim>

[M.-A. Lemburg]
> ...
> Currently, mapping tables map characters to Unicode characters
> and vice-versa. Now the .translate method will use a different
> kind of table: mapping integer ordinals to integer ordinals.

You mean that if I want to map u"a" to u"A", I have to set up some sort of
dict mapping ord(u"a") to ord(u"A")?  I simply couldn't follow this.

> Question: What is more of efficient: having lots of integers
> in a dictionary or lots of characters ?

My bet is "lots of integers", to reduce both space use and comparison time.

> ...
> Something else that changed is the way .capitalize() works. The
> Unicode version uses the Unicode algorithm for it (see TechRep. 13
> on the www.unicode.org site).

#13 is "Unicode Newline Guidelines".  I assume you meant #21 ("Case
Mappings").

> Here's the new doc string:
>
> S.capitalize() -> unicode
>
> Return a capitalized version of S, i.e. words start with title case
> characters, all remaining cased characters have lower case.
>
> Note that *all* characters are touched, not just the first one.
> The change was needed to get it in sync with the .iscapitalized()
> method which is based on the Unicode algorithm too.
>
> Should this change be propogated to the string implementation ?

Unicode makes distinctions among "upper case", "lower case" and "title
case", and you're trying to get away with a single "capitalize" function.
Java has separate toLowerCase, toUpperCase and toTitleCase methods, and
that's the way to do it.  Whatever you do, leave .capitalize alone for 8-bit
strings -- there's no reason to break code that currently works.
"capitalize" seems a terrible choice of name for a titlecase method anyway,
because of its baggage connotations from 8-bit strings.  Since this stuff is
complicated, I say it would be much better to use the same names for these
things as the Unicode and Java folk do:  there's excellent documentation
elsewhere for all this stuff, and it's Bad to make users mentally translate
unique Python terminology to make sense of the official docs.

So my vote is:  leave capitalize the hell alone <wink>.  Do not implement
capitialize for Unicode strings.  Introduce a new titlecase method for
Unicode strings.  Add a new titlecase method to 8-bit strings too.  Unicode
strings should also have methods to get at uppercase and lowercase (as
Unicode defines those).


From tim_one at email.msn.com  Wed Mar  1 08:36:03 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 02:36:03 -0500
Subject: [Python-Dev] Re: Python / Haskell  (fwd)
In-Reply-To: <Pine.LNX.4.10.10002291126370.9095-100000@akbar.nevex.com>
Message-ID: <000801bf8350$cc4ec580$412d153f@tim>

[Greg Wilson, quoting Philip Wadler]

> Well, what I most want is typing.  But you already know that.

So invite him to contribute to the Types-SIG <0.5 wink>.

> Next after typing?  Full lexical scoping for closures.  I want to write:
>
> 	fun x: fun y: x+y
>
> Not:
>
> 	fun x: fun y, x=x: x+y
>
> Lexically scoped closures would be a big help for the embedding technique
> I described [GVW: in a posting to the Software Carpentry discussion list,
> archived at
>
>  http://software-carpentry.codesourcery.com/lists/sc-discuss/msg00068.html
>
> which discussed how to build a flexible 'make' alternative in Python].

So long as we're not deathly concerned over saving a few lines of easy
boilerplate code, Python already supports this approach wonderfully well --
but via using classes with __call__ methods instead of lexical closures.  I
can't make time to debate this now, but suffice it to say dozens on c.l.py
would be delighted to <wink>.  Philip is understandably attached to the
"functional way of spelling things", but Python's way is at least as usable
for this (and many-- including me --would say more so).

> Next after closures?  Disjoint sums.  E.g.,
>
>    fun area(shape) :
>        switch shape:
>            case Circle(r):
>                return pi*r*r
>            case Rectangle(h,w):
>                return h*w
>
> (I'm making up a Python-like syntax.)  This is an alternative to the OO
> approach.  With the OO approach, it is hard to add area, unless you modify
> the Circle and Rectangle class definitions.

Python allows adding new methods to classes dynamically "from the
outside" -- the original definitions don't need to be touched (although it's
certainly preferable to add new methods directly!).  Take this complaint to
the extreme, and I expect you end up reinventing multimethods (suppose you
need to add an intersection(shape1, shape2) method:  N**2 nesting of
"disjoint sums" starts to appear ludicrous <wink>).

In any case, the Types-SIG already seems to have decided that some form of
"typecase" stmt will be needed; see the archives for that; I expect the use
above would be considered abuse, though; Python has no "switch" stmt of any
kind today, and the use above can already be spelled via

    if isinstance(shape, Circle):
        etc
    elif isinstace(shape, Rectange):
        etc
    else:
        raise TypeError(etc)


From gstein at lyra.org  Wed Mar  1 08:51:29 2000
From: gstein at lyra.org (Greg Stein)
Date: Tue, 29 Feb 2000 23:51:29 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>
Message-ID: <Pine.LNX.4.10.10002292348430.19420-100000@nebula.lyra.org>

On Tue, 29 Feb 2000, Ken Manheimer wrote:
>...
> None the less, for those practicing it, the incorrectness of it will be
> fresh news.  I would be less sympathetic with them if there was recent
> warning, eg, the schedule for changing it in the next release was part of
> the current release.  But if you tell somebody you're going to change
> something, and then don't for a few years, you probably need to renew the
> warning before you make the change.  Don't you think so?  Why not?

I agree.

Note that Guido posted a note to c.l.py on Monday. I believe that meets
your notification criteria.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Wed Mar  1 09:10:28 2000
From: gstein at lyra.org (Greg Stein)
Date: Wed, 1 Mar 2000 00:10:28 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <200003010411.XAA12988@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10002292352590.19420-100000@nebula.lyra.org>

On Tue, 29 Feb 2000, Guido van Rossum wrote:
> I'm tired of this rhetoric.  It's not like I'm changing existing
> Python installations retroactively.  I'm planning to release a new
> version of Python which no longer supports certain long-obsolete and
> undocumented behavior.  If you maintain a non-core Python module, you
> should test it against the new release and fix anything that comes up.
> This is why we have an alpha and beta test cycle and even before that
> the CVS version.  If you are a Python user who depends on a 3rd party
> module, you need to find out whether the new version is compatible
> with the 3rd party code you are using, or whether there's a newer
> version available that solves the incompatibility.
> 
> There are people who still run Python 1.4 (really!) because they
> haven't upgraded.  I don't have a problem with that -- they don't get
> much support, but it's their choice, and they may not need the new
> features introduced since then.  I expect that lots of people won't
> upgrade their Python 1.5.2 to 1.6 right away -- they'll wait until the
> other modules/packages they need are compatible with 1.6.  Multi-arg
> append probably won't be the only reason why e.g. Digital Creations
> may need to release an update to Zope for Python 1.6.  Zope comes with
> its own version of Python anyway, so they have control over when they
> make the switch.

I wholeheartedly support his approach. Just ask Mark Hammond :-) how many
times I've said "let's change the code to make it Right; people aren't
required to upgrade [and break their code]."

Of course, his counter is that people need to upgrade to fix other,
unrelated problems. So I relax and try again later :-). But I still
maintain that they can independently grab the specific fixes and leave the
other changes we make.

Maybe it is grey, but I think this change is quite fine. Especially given
Tim's tool.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tim_one at email.msn.com  Wed Mar  1 09:22:06 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 03:22:06 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.10.10002292352590.19420-100000@nebula.lyra.org>
Message-ID: <000b01bf8357$3af08d60$412d153f@tim>

[Greg Stein]
> ...
> Maybe it is grey, but I think this change is quite fine. Especially given
> Tim's tool.

What the heck does Tim's one-eyed trouser snake have to do with this?  I
know *it* likes to think it's the measure of all things, but, frankly, my
tool barely affects the world at all a mere two feet beyond its base <wink>.

tim-and-his-tool-think-the-change-is-a-mixed-thing-but-on-balance-
    the-best-thing-ly y'rs  - tim


From effbot at telia.com  Wed Mar  1 09:40:01 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Wed, 1 Mar 2000 09:40:01 +0100
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.10.10002292348430.19420-100000@nebula.lyra.org>
Message-ID: <00fb01bf8359$c8196a20$34aab5d4@hagrid>

Greg Stein wrote:
> Note that Guido posted a note to c.l.py on Monday. I believe that meets
> your notification criteria.

ahem.  do you seriously believe that everyone in the
Python universe reads comp.lang.python?

afaik, most Python programmers don't.

...

so as far as I'm concerned, this was officially deprecated
with Guido's post.  afaik, no official python documentation
has explicitly mentioned this (and the fact that it doesn't
explicitly allow it doesn't really matter, since the docs don't
explicitly allow the x[a, b, c] syntax either.  both work in
1.5.2).

has anyone checked the recent crop of Python books,
btw?  the eff-bot guide uses old syntax in two examples
out of 320.  how about the others?

...

sigh.  running checkappend over a 50k LOC application, I
just realized that it doesn't catch a very common append
pydiom.  

how fun.  even though 99% of all append calls are "legal",
this "minor" change will break every single application and
library we have :-(

oh, wait.  xmlrpclib isn't affected.  always something!

</F>


From gstein at lyra.org  Wed Mar  1 09:43:02 2000
From: gstein at lyra.org (Greg Stein)
Date: Wed, 1 Mar 2000 00:43:02 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <00fb01bf8359$c8196a20$34aab5d4@hagrid>
Message-ID: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org>

On Wed, 1 Mar 2000, Fredrik Lundh wrote:
> Greg Stein wrote:
> > Note that Guido posted a note to c.l.py on Monday. I believe that meets
> > your notification criteria.
> 
> ahem.  do you seriously believe that everyone in the
> Python universe reads comp.lang.python?
> 
> afaik, most Python programmers don't.

Now you're simply taking my comments out of context. Not a proper thing to
do. Ken said that he wanted notification along certain guidelines. I said
that I believed Guido's post did just that. Period.

Personally, I think it is fine. I also think that a CHANGES file that
arrives with 1.6 that points out the incompatibility is also fine.

>...
> sigh.  running checkappend over a 50k LOC application, I
> just realized that it doesn't catch a very common append
> pydiom.  

And which is that? Care to help out? Maybe just a little bit? Or do you
just want to talk about how bad this change is? :-(

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Wed Mar  1 10:01:52 2000
From: gstein at lyra.org (Greg Stein)
Date: Wed, 1 Mar 2000 01:01:52 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <000b01bf8357$3af08d60$412d153f@tim>
Message-ID: <Pine.LNX.4.10.10003010101100.19420-100000@nebula.lyra.org>

On Wed, 1 Mar 2000, Tim Peters wrote:
> [Greg Stein]
> > ...
> > Maybe it is grey, but I think this change is quite fine. Especially given
> > Tim's tool.
> 
> What the heck does Tim's one-eyed trouser snake have to do with this?  I
> know *it* likes to think it's the measure of all things, but, frankly, my
> tool barely affects the world at all a mere two feet beyond its base <wink>.
> 
> tim-and-his-tool-think-the-change-is-a-mixed-thing-but-on-balance-
>     the-best-thing-ly y'rs  - tim

Heh. Now how is one supposed to respond to *that* ??!

All right. Fine. +3 cool points go to Tim.

:-)

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Wed Mar  1 10:03:32 2000
From: gstein at lyra.org (Greg Stein)
Date: Wed, 1 Mar 2000 01:03:32 -0800 (PST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src Makefile.in,1.82,1.83
In-Reply-To: <14523.56638.286603.340358@weyr.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003010102080.19420-100000@nebula.lyra.org>

On Tue, 29 Feb 2000, Fred L. Drake, Jr. wrote:
> Guido van Rossum writes:
>  > You can already extract this from the updated documetation on the
>  > website (which has a list of obsolete modules).
>  > 
>  > But you're righ,t it would be good to be open about this.  I'll think
>  > about it.
> 
>   Note that the updated documentation isn't yet "published"; there are 
> no links to it and it hasn't been checked as much as I need it to be
> before announcing it.

Isn't the documentation better than what has been released? In other
words, if you release now, how could you make things worse? If something
does turn up during a check, you can always release again...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From effbot at telia.com  Wed Mar  1 10:13:13 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Wed, 1 Mar 2000 10:13:13 +0100
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org>
Message-ID: <011001bf835e$600d1da0$34aab5d4@hagrid>

Greg Stein <gstein at lyra.org> wrote:
> On Wed, 1 Mar 2000, Fredrik Lundh wrote:
> > Greg Stein wrote:
> > > Note that Guido posted a note to c.l.py on Monday. I believe that meets
> > > your notification criteria.
> > 
> > ahem.  do you seriously believe that everyone in the
> > Python universe reads comp.lang.python?
> > 
> > afaik, most Python programmers don't.
> 
> Now you're simply taking my comments out of context. Not a proper thing to
> do. Ken said that he wanted notification along certain guidelines. I said
> that I believed Guido's post did just that. Period.

my point was that most Python programmers won't
see that notification.  when these people download
1.6 final and find that all theirs apps just broke, they
probably won't be happy with a pointer to dejanews.

> And which is that? Care to help out? Maybe just a little bit?

this rather common pydiom:

    append = list.append
    for x in something:
        append(...)

it's used a lot where performance matters.

> Or do you just want to talk about how bad this change is? :-(

yes, I think it's bad.  I've been using Python since 1.2,
and no other change has had the same consequences
(wrt. time/money required to fix it)

call me a crappy programmer if you want, but I'm sure
there are others out there who are nearly as bad.  and
lots of them won't be aware of this change until some-
one upgrades the python interpreter on their server.

</F>


From mal at lemburg.com  Wed Mar  1 09:38:52 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 01 Mar 2000 09:38:52 +0100
Subject: [Python-Dev] Unicode mapping tables
References: <000701bf834a$77acdfe0$412d153f@tim>
Message-ID: <38BCD71C.3592E6A@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > ...
> > Currently, mapping tables map characters to Unicode characters
> > and vice-versa. Now the .translate method will use a different
> > kind of table: mapping integer ordinals to integer ordinals.
> 
> You mean that if I want to map u"a" to u"A", I have to set up some sort of
> dict mapping ord(u"a") to ord(u"A")?  I simply couldn't follow this.

I meant:

  'a': u'A' vs. ord('a'): ord(u'A')

The latter wins ;-) Reasoning for the first was that it allows
character sequences to be handled by the same mapping algorithm.
I decided to leave those techniques to some future implementation,
since mapping integers has the nice side-effect of also allowing
sequences to be used as mapping tables... resulting in some
speedup at the cost of memory consumption.

BTW, there are now three different ways to do char translations:

1. char -> unicode  (char mapping codec's decode)
2. unicode -> char  (char mapping codec's encode)
3. unicode -> unicode (unicode's .translate() method)
 
> > Question: What is more of efficient: having lots of integers
> > in a dictionary or lots of characters ?
> 
> My bet is "lots of integers", to reduce both space use and comparison time.

Right. That's what I found too... it's "lots of integers" now :-)
 
> > ...
> > Something else that changed is the way .capitalize() works. The
> > Unicode version uses the Unicode algorithm for it (see TechRep. 13
> > on the www.unicode.org site).
> 
> #13 is "Unicode Newline Guidelines".  I assume you meant #21 ("Case
> Mappings").

Dang. You're right. Here's the URL in case someone
wants to join in:

   http://www.unicode.org/unicode/reports/tr21/tr21-2.html

> > Here's the new doc string:
> >
> > S.capitalize() -> unicode
> >
> > Return a capitalized version of S, i.e. words start with title case
> > characters, all remaining cased characters have lower case.
> >
> > Note that *all* characters are touched, not just the first one.
> > The change was needed to get it in sync with the .iscapitalized()
> > method which is based on the Unicode algorithm too.
> >
> > Should this change be propogated to the string implementation ?
> 
> Unicode makes distinctions among "upper case", "lower case" and "title
> case", and you're trying to get away with a single "capitalize" function.
> Java has separate toLowerCase, toUpperCase and toTitleCase methods, and
> that's the way to do it.

The Unicode implementation has the corresponding:

.upper(), .lower() and .capitalize()

They work just like .toUpperCase, .toLowerCase, .toTitleCase
resp. (well at least they should ;).

> Whatever you do, leave .capitalize alone for 8-bit
> strings -- there's no reason to break code that currently works.
> "capitalize" seems a terrible choice of name for a titlecase method anyway,
> because of its baggage connotations from 8-bit strings.  Since this stuff is
> complicated, I say it would be much better to use the same names for these
> things as the Unicode and Java folk do:  there's excellent documentation
> elsewhere for all this stuff, and it's Bad to make users mentally translate
> unique Python terminology to make sense of the official docs.

Hmm, that's an argument but it breaks the current method
naming scheme of all lowercase letter. Perhaps I should simply
provide a new method for .toTitleCase(), e.g. .title(), and
leave the previous definition of .capitalize() intact...

> So my vote is:  leave capitalize the hell alone <wink>.  Do not implement
> capitialize for Unicode strings.  Introduce a new titlecase method for
> Unicode strings.  Add a new titlecase method to 8-bit strings too.  Unicode
> strings should also have methods to get at uppercase and lowercase (as
> Unicode defines those).

...looks like you're more or less on the same wave length here ;-)

Here's what I'll do:

* implement .capitalize() in the traditional way for Unicode
  objects (simply convert the first char to uppercase)
* implement u.title() to mean the same as Java's toTitleCase()
* don't implement s.title(): the reasoning here is that it would
  confuse the user when she get's different return values for
  the same string (titlecase chars usually live in higher Unicode
  code ranges not reachable in Latin-1)

Thanks for the feedback,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From tim_one at email.msn.com  Wed Mar  1 11:06:58 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 05:06:58 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <00fb01bf8359$c8196a20$34aab5d4@hagrid>
Message-ID: <000e01bf8365$e1e0b9c0$412d153f@tim>

[/F]
> ...
> so as far as I'm concerned, this was officially deprecated
> with Guido's post.  afaik, no official python documentation
> has explicitly mentioned this (and the fact that it doesn't
> explicitly allow it doesn't really matter, since the docs don't
> explicitly allow the x[a, b, c] syntax either.  both work in
> 1.5.2).

The "Subscriptions" section of the Reference Manual explicitly allows for

    dict[a, b, c]

and explicitly does not allow for

    sequence[a, b, c]

The "Mapping Types" section of the Library Ref does not explicitly allow for
it, though, and if you read it as implicitly allowing for it (based on the
Reference Manual's clarification of "key" syntax), you would also have to
read the Library Ref as allowing for

    dict.has_key(a, b, c)

Which 1.5.2 does allow, but which Guido very recently patched to treat as a
syntax error.

> ...
> sigh.  running checkappend over a 50k LOC application, I
> just realized that it doesn't catch a very common append
> pydiom.

[And, later, after prodding by GregS]

> this rather common pydiom:
>
>    append = list.append
>    for x in something:
>        append(...)

This limitation was pointed out in checkappend's module docstring.  Doesn't
make it any easier for you to swallow, but I needed to point out that you
didn't *have* to stumble into this the hard way <wink>.

> how fun.  even though 99% of all append calls are "legal",
> this "minor" change will break every single application and
> library we have :-(
>
> oh, wait.  xmlrpclib isn't affected.  always something!

What would you like to do, then?  The code will be at least as broken a year
from now, and probably more so -- unless you fix it.  So this sounds like an
indirect argument for never changing Python's behavior here.  Frankly, I
expect you could fix the 50K LOC in less time than it took me to write this
naggy response <0.50K wink>.

embrace-change-ly y'rs  - tim


From tim_one at email.msn.com  Wed Mar  1 11:31:12 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 05:31:12 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <000e01bf8365$e1e0b9c0$412d153f@tim>
Message-ID: <001001bf8369$453e9fc0$412d153f@tim>

[Tim. needing sleep]
>     dict.has_key(a, b, c)
> 
> Which 1.5.2 does allow, but which Guido very recently patched to 
> treat as a syntax error.

No, a runtime error.  haskeynanny.py, anyone?

not-me-ly y'rs  - tim


From fredrik at pythonware.com  Wed Mar  1 12:14:18 2000
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 1 Mar 2000 12:14:18 +0100
Subject: [Python-Dev] breaking list.append()
References: <000e01bf8365$e1e0b9c0$412d153f@tim>
Message-ID: <002101bf836f$4a012220$f29b12c2@secret.pythonware.com>

Tim Peters wrote:
> The "Subscriptions" section of the Reference Manual explicitly allows for
> 
>     dict[a, b, c]
> 
> and explicitly does not allow for
> 
>     sequence[a, b, c]

I'd thought we'd agreed that nobody reads the
reference manual ;-)

> What would you like to do, then?

more time to fix it, perhaps?  it's surely a minor
code change, but fixing it can be harder than
you think (just witness Gerrit's bogus patches)

after all, python might be free, but more and more
people are investing lots of money in using it [1].

> The code will be at least as broken a year
> from now, and probably more so -- unless you fix it. 

sure.  we've already started.  but it's a lot of work,
and it's quite likely that it will take a while until we
can be 100% confident that all the changes are pro-
perly done.

(not all software have a 100% complete test suite that
simply says "yes, this works" or "no, it doesn't")

</F>

1) fwiw, some poor soul over here posted a short note
to the pythonworks mailing, mentioning that we've now
fixed the price.  a major flamewar erupted, and my mail-
box is now full of mail from unknowns telling me that I
must be a complete moron that doesn't understand that
Python is just a toy system, which everyone uses just be-
cause they cannot afford anything better...


From tim_one at email.msn.com  Wed Mar  1 12:26:21 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 06:26:21 -0500
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: <200003010544.AAA13155@eric.cnri.reston.va.us>
Message-ID: <001101bf8370$f881dfa0$412d153f@tim>

Very briefly:

[Guido]
> ...
> Today, Eric proposed to do away with Neil's hash table altogether --
> as long as we're wasting memory, we might as well add 3 fields to each
> container object rather than allocating the same amount in a separate
> hash table.  Eric expects that this will run faster, although this
> obviously needs to be tried.

No, it doesn't <wink>:  it will run faster.

> Container types are: dict, list, tuple, class, instance; plus
> potentially user-defined container types such as kjbuckets.  I
> have a feeling that function objects should also be considered
> container types, because of the cycle involving globals.

Note that the list-migrating steps you sketch later are basically the same
as (but hairier than) the ones JimF and I worked out for M&S-on-RC a few
years ago, right down to using appending to effect a breadth-first traversal
without requiring recursion -- except M&S doesn't have to bother accounting
for sources of refcounts.  Since *this* scheme does more work per item per
scan, to be as fast in the end it has to touch less stuff than M&S.  But the
more kinds of types you track, the more stuff this scheme will have to
chase.

The tradeoffs are complicated & unclear, so I'll just raise an uncomfortable
meta-point <wink>:  you balked at M&S the last time around because of the
apparent need for two link fields + a bit or two per object of a "chaseable
type".  If that's no longer perceived as being a showstopper, M&S should be
reconsidered too.

I happen to be a fan of both approaches <wink>.  The worst part of M&S-on-RC
(== the one I never had a good answer for) is that a non-cooperating
extension type E can't be chased, hence objects reachable only from objects
of type E never get marked, so are vulnerable to bogus collection.  In the
Neil/Toby scheme, objects of type E merely act as  sources of "external"
references, so the scheme fails safe (in the sense of never doing a bogus
collection due to non-cooperating types).

Hmm ... if both approaches converge on keeping a list of all chaseable
objects, and being careful of uncoopoerating types, maybe the only real
difference in the end is whether the root set is given explicitly (as in
traditional M&S) or inferred indirectly (but where "root set" has a
different meaning in the scheme you sketched).

> ...
> In our case, we may need a type-specific "clear" function for containers
> in the type object.

I think definitely, yes.

full-speed-sideways<wink>-ly y'rs  - tim


From mal at lemburg.com  Wed Mar  1 11:40:36 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 01 Mar 2000 11:40:36 +0100
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org> <011001bf835e$600d1da0$34aab5d4@hagrid>
Message-ID: <38BCF3A4.1CCADFCE@lemburg.com>

Fredrik Lundh wrote:
> 
> Greg Stein <gstein at lyra.org> wrote:
> > On Wed, 1 Mar 2000, Fredrik Lundh wrote:
> > > Greg Stein wrote:
> > > > Note that Guido posted a note to c.l.py on Monday. I believe that meets
> > > > your notification criteria.
> > >
> > > ahem.  do you seriously believe that everyone in the
> > > Python universe reads comp.lang.python?
> > >
> > > afaik, most Python programmers don't.
> >
> > Now you're simply taking my comments out of context. Not a proper thing to
> > do. Ken said that he wanted notification along certain guidelines. I said
> > that I believed Guido's post did just that. Period.
> 
> my point was that most Python programmers won't
> see that notification.  when these people download
> 1.6 final and find that all theirs apps just broke, they
> probably won't be happy with a pointer to dejanews.

Dito. Anyone remember the str(2L) == '2' change, BTW ?
That one will cost lots of money in case someone implemented
an eShop using the common str(2L)[:-1] idiom...

There will need to be a big warning sign somewhere that
people see *before* finding the download link. (IMHO, anyways.)

> > And which is that? Care to help out? Maybe just a little bit?
> 
> this rather common pydiom:
> 
>     append = list.append
>     for x in something:
>         append(...)
> 
> it's used a lot where performance matters.

Same here. checkappend.py doesn't find these (a great tool BTW,
thanks Tim; I noticed that it leaks memory badly though).
 
> > Or do you just want to talk about how bad this change is? :-(
> 
> yes, I think it's bad.  I've been using Python since 1.2,
> and no other change has had the same consequences
> (wrt. time/money required to fix it)
> 
> call me a crappy programmer if you want, but I'm sure
> there are others out there who are nearly as bad.  and
> lots of them won't be aware of this change until some-
> one upgrades the python interpreter on their server.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Wed Mar  1 13:07:42 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 07:07:42 -0500
Subject: need .append patch (was RE: [Python-Dev] Re: Python-checkins digest, Vol 1 #370 - 8 msgs)
In-Reply-To: Your message of "Wed, 01 Mar 2000 00:57:49 EST."
             <000601bf8343$13575040$412d153f@tim> 
References: <000601bf8343$13575040$412d153f@tim> 
Message-ID: <200003011207.HAA13342@eric.cnri.reston.va.us>

> To the extent that you're serious about CP4E, you're begging for more of
> this, not less <wink>.

Which is exactly why I am breaking multi-arg append now -- this is my
last chance.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Wed Mar  1 13:27:10 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 07:27:10 -0500
Subject: [Python-Dev] Unicode mapping tables
In-Reply-To: Your message of "Wed, 01 Mar 2000 09:38:52 +0100."
             <38BCD71C.3592E6A@lemburg.com> 
References: <000701bf834a$77acdfe0$412d153f@tim>  
            <38BCD71C.3592E6A@lemburg.com> 
Message-ID: <200003011227.HAA13396@eric.cnri.reston.va.us>

> Here's what I'll do:
> 
> * implement .capitalize() in the traditional way for Unicode
>   objects (simply convert the first char to uppercase)
> * implement u.title() to mean the same as Java's toTitleCase()
> * don't implement s.title(): the reasoning here is that it would
>   confuse the user when she get's different return values for
>   the same string (titlecase chars usually live in higher Unicode
>   code ranges not reachable in Latin-1)

Huh?  For ASCII at least, titlecase seems to map to ASCII; in your
current implementation, only two Latin-1 characters (u'\265' and
u'\377', I have no easy way to show them in Latin-1) map outside the
Latin-1 range.

Anyway, I would suggest to add a title() call to 8-bit strings as
well; then we can do away with string.capwords(), which does something
similar but different, mostly by accident.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jack at oratrix.nl  Wed Mar  1 13:34:42 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 01 Mar 2000 13:34:42 +0100
Subject: [Python-Dev] Re: A warning switch? 
In-Reply-To: Message by Guido van Rossum <guido@python.org> ,
	     Mon, 28 Feb 2000 12:35:12 -0500 , <200002281735.MAA27771@eric.cnri.reston.va.us> 
Message-ID: <20000301123442.7DEF8371868@snelboot.oratrix.nl>

> > What about adding a command-line switch for enabling warnings, as has
> > been suggested long ago?  The .append() change could then print a
> > warning in 1.6alphas (and betas?), but still run, and be turned into
> > an error later.
> 
> That's better.  I propose that the warnings are normally on, and that
> there are flags to turn them off or thrn them into errors.

Can we then please have an interface to the "give warning" call (in stead of a 
simple fprintf)? On the mac (and possibly also in PythonWin) it's probably 
better to pop up a dialog (possibly with a "don't show again" button) than do 
a printf which may get lost.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido at python.org  Wed Mar  1 13:55:42 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 07:55:42 -0500
Subject: [Python-Dev] Re: A warning switch?
In-Reply-To: Your message of "Wed, 01 Mar 2000 13:34:42 +0100."
             <20000301123442.7DEF8371868@snelboot.oratrix.nl> 
References: <20000301123442.7DEF8371868@snelboot.oratrix.nl> 
Message-ID: <200003011255.HAA13489@eric.cnri.reston.va.us>

> Can we then please have an interface to the "give warning" call (in
> stead of a simple fprintf)? On the mac (and possibly also in
> PythonWin) it's probably better to pop up a dialog (possibly with a
> "don't show again" button) than do a printf which may get lost.

Sure.  All you have to do is code it (or get someone else to code it).

<0.9 wink>

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed Mar  1 14:32:02 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 01 Mar 2000 14:32:02 +0100
Subject: [Python-Dev] Unicode mapping tables
References: <000701bf834a$77acdfe0$412d153f@tim>  
	            <38BCD71C.3592E6A@lemburg.com> <200003011227.HAA13396@eric.cnri.reston.va.us>
Message-ID: <38BD1BD2.792E9B73@lemburg.com>

Guido van Rossum wrote:
> 
> > Here's what I'll do:
> >
> > * implement .capitalize() in the traditional way for Unicode
> >   objects (simply convert the first char to uppercase)
> > * implement u.title() to mean the same as Java's toTitleCase()
> > * don't implement s.title(): the reasoning here is that it would
> >   confuse the user when she get's different return values for
> >   the same string (titlecase chars usually live in higher Unicode
> >   code ranges not reachable in Latin-1)
> 
> Huh?  For ASCII at least, titlecase seems to map to ASCII; in your
> current implementation, only two Latin-1 characters (u'\265' and
> u'\377', I have no easy way to show them in Latin-1) map outside the
> Latin-1 range.

You're right, sorry for the confusion. I was thinking of other
encodings like e.g. cp437 which have corresponding characters
in the higher Unicode ranges.

> Anyway, I would suggest to add a title() call to 8-bit strings as
> well; then we can do away with string.capwords(), which does something
> similar but different, mostly by accident.

Ok, I'll do it this way then: s.title() will use C's toupper() and
tolower() for case mapping and u.title() the Unicode routines.

This will be in sync with the rest of the 8-bit string world
(which is locale aware on many platforms AFAIK), even though
it might not return the same string as the corresponding
u.title() call.

u.capwords() will be disabled in the Unicode implemetation...
it wasn't even implemented for the string implementetation,
so there's no breakage ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From akuchlin at mems-exchange.org  Wed Mar  1 15:59:07 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Wed, 1 Mar 2000 09:59:07 -0500 (EST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <011001bf835e$600d1da0$34aab5d4@hagrid>
References: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org>
	<011001bf835e$600d1da0$34aab5d4@hagrid>
Message-ID: <14525.12347.120543.804804@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>yes, I think it's bad.  I've been using Python since 1.2,
>and no other change has had the same consequences
>(wrt. time/money required to fix it)

There are more things in 1.6 that might require fixing existing code:
str(2L) returning '2', the int/long changes, the Unicode changes, and
if it gets added, garbage collection -- and bugs caused by those
changes might not be catchable by a nanny.  IMHO it's too early to
point at the .append() change as breaking too much existing code;
there may be changes that break a lot more.  I'd wait and see what
happens once the 1.6 alphas become available; if c.l.p is filled with
shrieks and groans, GvR might decide to back the offending change out.
(Or he might not...)

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
I have no skills with machines. I fear them, and because I cannot help
attributing human qualities to them, I suspect that they hate me and will kill
me if they can.
    -- Robertson Davies, "Reading"


From klm at digicool.com  Wed Mar  1 16:37:49 2000
From: klm at digicool.com (Ken Manheimer)
Date: Wed, 1 Mar 2000 10:37:49 -0500 (EST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.10.10002292348430.19420-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.21.0003011033030.22173-100000@korak.digicool.com>

On Tue, 29 Feb 2000, Greg Stein wrote:

> On Tue, 29 Feb 2000, Ken Manheimer wrote:
> >...
> > None the less, for those practicing it, the incorrectness of it will be
> > fresh news.  I would be less sympathetic with them if there was recent
> > warning, eg, the schedule for changing it in the next release was part of
> > the current release.  But if you tell somebody you're going to change
> > something, and then don't for a few years, you probably need to renew the
> > warning before you make the change.  Don't you think so?  Why not?
> 
> I agree.
> 
> Note that Guido posted a note to c.l.py on Monday. I believe that meets
> your notification criteria.

Actually, by "part of the current release", i meant having the
deprecation/impending-deletion warning in the release notes for the
release before the one where the deletion happens - saying it's being
deprecated now, will be deleted next time around.

Ken
klm at digicool.com

 I mean, you tell one guy it's blue.  He tells his guy it's brown, and it
 lands on the page sorta purple.         Wavy Gravy/Hugh Romney


From marangoz at python.inrialpes.fr  Wed Mar  1 18:07:07 2000
From: marangoz at python.inrialpes.fr (Vladimir Marangozov)
Date: Wed, 1 Mar 2000 18:07:07 +0100 (CET)
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: <200003010544.AAA13155@eric.cnri.reston.va.us> from "Guido van Rossum" at Mar 01, 2000 12:44:10 AM
Message-ID: <200003011707.SAA01310@python.inrialpes.fr>

Guido van Rossum wrote:
> 
> Thanks for the new patches, Neil!

Thanks from me too!
I notice, however, that hash_resize() still uses a malloc call
instead of PyMem_NEW. Neil, please correct this in your version
immediately ;-)

> 
> We had a visitor here at CNRI today, Eric Tiedemann
> <est at hyperreal.org>, who had a look at your patches before.  Eric
> knows his way around the Scheme, Lisp and GC literature, and presented
> a variant on your approach which takes the bite out of the recursive
> passes.

Avoiding the recursion is valuable, as long we're optimizing the
implementation of one particular scheme. It doesn't bother me that
Neil's scheme is recursive, because I still perceive his code as a
proof of concept.

You're presenting here another scheme based on refcounts arithmetic,
generalized for all container types. The linked list implementation
of this generalized scheme is not directly related to the logic.

I have some suspitions on the logic, so you'll probably want to elaborate
a bit more on it, and convince me that this scheme would actually work.

> Today, Eric proposed to do away with Neil's hash table altogether --
> as long as we're wasting memory, we might as well add 3 fields to each
> container object rather than allocating the same amount in a separate
> hash table.

I cannot agree so easily with this statement, but you should have expecting
this from me :-)  If we're about to opimize storage, I have good reasons
to believe that we don't need 3 additional slots per container (but 1 for
gc_refs, yes).

We could certainly envision allocating the containers within memory pools
of 4K (just as it is done in pymalloc, and close to what we have for
ints & floats). These pools would be labaled as "container's memory",
they would obviously be under our control, and we'd have additional slots
per pool, not per object. As long as we isolate the containers from the
rest, we can enumerate them easily by walking though the pools.

But I'm willing to defer this question for now, as it involves the object
allocators (the builtin allocators + PyObject_NEW for extension types E --
user objects of type E would be automatically taken into account for GC
if there's a flag in the type struct which identifies them as containers).

> Eric expects that this will run faster, although this obviously needs
> to be tried.

Definitely, although I trust Eric & Tim :-)

> 
> Container types are: dict, list, tuple, class, instance; plus
> potentially user-defined container types such as kjbuckets.  I have a
> feeling that function objects should also be considered container
> types, because of the cycle involving globals.

+ other extension container types. And I insist.
Don't forget that we're planning to merge types and classes...

> 
> Eric's algorithm, then, consists of the following parts.
> 
> Each container object has three new fields: gc_next, gc_prev, and
> gc_refs.  (Eric calls the gc_refs "refcount-zero".)
> 
> We color objects white (initial), gray (root), black (scanned root).
> (The terms are explained later; we believe we don't actually need bits
> in the objects to store the color; see later.)
> 
> All container objects are chained together in a doubly-linked list --
> this is the same as Neil's code except Neil does it only for dicts.
> (Eric postulates that you need a list header.)
> 
> When GC is activated, all objects are colored white; we make a pass
> over the entire list and set gc_refs equal to the refcount for each
> object.

Step 1:  for all containers, c->gc_refs = c->ob_refcnt

> 
> Next, we make another pass over the list to collect the internal
> references.  Internal references are (just like in Neil's version)
> references from other container types.  In Neil's version, this was
> recursive; in Eric's version, we don't need recursion, since the list
> already contains all containers.  So we simple visit the containers in
> the list in turn, and for each one we go over all the objects it
> references and subtract one from *its* gc_refs field.  (Eric left out
> the little detail that we ened to be able to distinguish between
> container and non-container objects amongst those references; this can
> be a flag bit in the type field.)

Step 2:  c->gc_refs = c->gc_refs - Nb_referenced_containers_from_c

I guess that you realize that after this step, gc_refs can be zero
or negative.

I'm not sure that you collect "internal" references here (references
from other container types). A list referencing 20 containers, being
itself referenced by one container + one static variable + two times
from the runtime stack, has an initial refcount == 4, so we'll end
up with gc_refs == -16.

A tuple referencing 1 list, referenced once by the stack, will end up
with gc_refs == 0.

Neil's scheme doesn't seem to have this "property".

> 
> Now, similar to Neil's version, all objects for which gc_refs == 0
> have only internal references, and are potential garbage; all objects
> for which gc_refs > 0 are "roots".  These have references to them from
> other places, e.g. from globals or stack frames in the Python virtual
> machine.
> 

Agreed, some roots have gc_refs > 0
I'm not sure that all of them have it, though... Do they?

> We now start a second list, to which we will move all roots.  The way
> to do this is to go over the first list again and to move each object
> that has gc_refs > 0 to the second list.  Objects placed on the second
> list in this phase are considered colored gray (roots).
> 

Step 3: Roots with gc_refs > 0 go to the 2nd list.
        All c->gc_refs <= 0 stay in the 1st list.

> Of course, some roots will reference some non-roots, which keeps those
> non-roots alive.  We now make a pass over the second list, where for
> each object on the second list, we look at every object it references.
> If a referenced object is a container and is still in the first list
> (colored white) we *append* it to the second list (colored gray).
> Because we append, objects thus added to the second list will
> eventually be considered by this same pass; when we stop finding
> objects that sre still white, we stop appending to the second list,
> and we will eventually terminate this pass.  Conceptually, objects on
> the second list that have been scanned in this pass are colored black
> (scanned root); but there is no need to to actually make the
> distinction.
> 

Step 4: Closure on reachable containers which are all moved to the 2nd list.

(Assuming that the objects are checked only via their type, without
involving gc_refs)

> (How do we know whether an object pointed to is white (in the first
> list) or gray or black (in the second)?

Good question? :-)

> We could use an extra  bitfield, but that's a waste of space.
> Better: we could set gc_refs to a magic value (e.g. 0xffffffff) when
> we move the object to the second list.

I doubt that this would work for the reasons mentioned above.

> During the meeting, I proposed to set the back pointer to NULL; that
> might work too but I think the gc_refs field is more elegant. We could
> even just test for a non-zero gc_refs field; the roots moved to the
> second list initially all have a non-zero gc_refs field already, and
> for the objects with a zero gc_refs field we could indeed set it to
> something arbitrary.)

Not sure that "arbitrary" is a good choice if the differentiation
is based solely on gc_refs.

> 
> Once we reach the end of the second list, all objects still left in
> the first list are garbage.  We can destroy them in a similar to the
> way Neil does this in his code.  Neil calls PyDict_Clear on the
> dictionaries, and ignores the rest.  Under Neils assumption that all
> cycles (that he detects) involve dictionaries, that is sufficient.  In
> our case, we may need a type-specific "clear" function for containers
> in the type object.

Couldn't this be done in the object's dealloc function?

Note that both Neil's and this scheme assume that garbage _detection_
and garbage _collection_ is an atomic operation. I must say that
I don't care of having some living garbage if it doesn't hurt my work.
IOW, the used criterion for triggering the detection phase _may_ eventually
differ from the one used for the collection phase. But this is where we
reach the incremental approaches, implying different reasoning as a
whole. My point is that the introduction of a "clear" function depends
on the adopted scheme, whose logic depends on pertinent statistics on
memory consumption of the cyclic garbage.

To make it simple, we first need stats on memory consumption, then we
can discuss objectively on how to implement some particular GC scheme.
I second Eric on the need for excellent statistics.

> 
> The general opinion was that we should first implement and test the
> algorithm as sketched above, and then changes or extensions could be
> made.

I'd like to see it discussed first in conjunction with (1) the possibility
of having a proprietary malloc, (2) the envisioned type/class unification.
Perhaps I'm getting too deep, but once something gets in, it's difficult
to take it out, even when a better solution is found subsequently. Although
I'm enthousiastic about this work on GC, I'm not in a position to evaluate
the true benefits of the proposed schemes, as I still don't have a basis
for evaluating how much garbage my program generates and whether it hurts
the interpreter compared to its overal memory consumption.

> 
> I was pleasantly surprised to find Neil's code in my inbox when we
> came out of the meeting; I think it would be worthwhile to compare and
> contrast the two approaches.  (Hm, maybe there's a paper in it?)

I'm all for it!

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From jeremy at cnri.reston.va.us  Wed Mar  1 18:53:13 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Wed, 1 Mar 2000 12:53:13 -0500 (EST)
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: <200003011707.SAA01310@python.inrialpes.fr>
References: <200003010544.AAA13155@eric.cnri.reston.va.us>
	<200003011707.SAA01310@python.inrialpes.fr>
Message-ID: <14525.22793.963077.707198@goon.cnri.reston.va.us>

>>>>> "VM" == Vladimir Marangozov <marangoz at python.inrialpes.fr> writes:

  [">>" == Guido explaining Eric Tiedemann's GC design]
  >>  Next, we make another pass over the list to collect the internal
  >> references.  Internal references are (just like in Neil's
  >> version) references from other container types.  In Neil's
  >> version, this was recursive; in Eric's version, we don't need
  >> recursion, since the list already contains all containers.  So we
  >> simple visit the containers in the list in turn, and for each one
  >> we go over all the objects it references and subtract one from
  >> *its* gc_refs field.  (Eric left out the little detail that we
  >> ened to be able to distinguish between container and
  >> non-container objects amongst those references; this can be a
  >> flag bit in the type field.)

  VM> Step 2: c->gc_refs = c->gc_refs -
  VM> Nb_referenced_containers_from_c

  VM> I guess that you realize that after this step, gc_refs can be
  VM> zero or negative.

I think Guido's explanation is slightly ambiguous.  When he says,
"subtract one from *its" gc_refs field" he means subtract one from the
_contained_ object's gc_refs field.  

  VM> I'm not sure that you collect "internal" references here
  VM> (references from other container types). A list referencing 20
  VM> containers, being itself referenced by one container + one
  VM> static variable + two times from the runtime stack, has an
  VM> initial refcount == 4, so we'll end up with gc_refs == -16.

The strategy is not that the container's gc_refs is decremented once
for each object it contains.  Rather, the container decrements each
contained object's gc_refs by one.  So you should never end of with
gc_refs < 0.

  >> During the meeting, I proposed to set the back pointer to NULL;
  >> that might work too but I think the gc_refs field is more
  >> elegant. We could even just test for a non-zero gc_refs field;
  >> the roots moved to the second list initially all have a non-zero
  >> gc_refs field already, and for the objects with a zero gc_refs
  >> field we could indeed set it to something arbitrary.)

I believe we discussed this further and concluded that setting the
back pointer to NULL would not work.  If we make the second list
doubly-linked (like the first one), it is trivial to end GC by
swapping the first and second lists.  If we've zapped the NULL
pointer, then we have to go back and re-set them all.

Jeremy


From mal at lemburg.com  Wed Mar  1 19:44:58 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 01 Mar 2000 19:44:58 +0100
Subject: [Python-Dev] Unicode Snapshot 2000-03-01
Message-ID: <38BD652A.EA2EB0A3@lemburg.com>

There is a new Unicode implementation snaphot available at the secret
URL. It contains quite a few small changes to the internal APIs,
doc strings for all methods and some new methods (e.g. .title()) 
on the Unicode and the string objects. The code page mappings
are now integer->integer which should make them more performant.

Some of the C codec APIs have changed, so you may need to
adapt code that already uses these (Fredrik ?!).

Still missing is a MSVC project file... haven't gotten around yet
to build one. The code does compile on WinXX though, as Finn
Bock told me in private mail.

Please try out the new stuff... Most interesting should be the
code in Lib/codecs.py as it provides a very high level interface
to all those builtin codecs.

BTW: I would like to implement a .readline() method using only
the .read() method as basis. Does anyone have a good idea on
how this could be done without buffering ?
(Unicode has a slightly larger choice of line break chars as C; the
.splitlines() method will deal with these)

Gotta run...
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From effbot at telia.com  Wed Mar  1 20:20:12 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Wed, 1 Mar 2000 20:20:12 +0100
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org><011001bf835e$600d1da0$34aab5d4@hagrid> <14525.12347.120543.804804@amarok.cnri.reston.va.us>
Message-ID: <034a01bf83b3$e97c8620$34aab5d4@hagrid>

Andrew M. Kuchling wrote:
> There are more things in 1.6 that might require fixing existing code:
> str(2L) returning '2', the int/long changes, the Unicode changes, and
> if it gets added, garbage collection -- and bugs caused by those
> changes might not be catchable by a nanny.

hey, you make it sound like "1.6" should really be "2.0" ;-)

</F>


From nascheme at enme.ucalgary.ca  Wed Mar  1 20:29:02 2000
From: nascheme at enme.ucalgary.ca (nascheme at enme.ucalgary.ca)
Date: Wed, 1 Mar 2000 12:29:02 -0700
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: <200003011707.SAA01310@python.inrialpes.fr>; from marangoz@python.inrialpes.fr on Wed, Mar 01, 2000 at 06:07:07PM +0100
References: <200003010544.AAA13155@eric.cnri.reston.va.us> <200003011707.SAA01310@python.inrialpes.fr>
Message-ID: <20000301122902.B7773@acs.ucalgary.ca>

On Wed, Mar 01, 2000 at 06:07:07PM +0100, Vladimir Marangozov wrote:
> Guido van Rossum wrote:
> > Once we reach the end of the second list, all objects still left in
> > the first list are garbage.  We can destroy them in a similar to the
> > way Neil does this in his code.  Neil calls PyDict_Clear on the
> > dictionaries, and ignores the rest.  Under Neils assumption that all
> > cycles (that he detects) involve dictionaries, that is sufficient.  In
> > our case, we may need a type-specific "clear" function for containers
> > in the type object.
> 
> Couldn't this be done in the object's dealloc function?

No, I don't think so.  The object still has references to it.
You have to be careful about how you break cycles so that memory
is not accessed after it is freed.


    Neil

-- 
"If elected mayor, my first act will be to kill the whole lot of you, and
burn your town to cinders!" -- Groundskeeper Willie


From gvwilson at nevex.com  Wed Mar  1 21:19:30 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Wed, 1 Mar 2000 15:19:30 -0500 (EST)
Subject: [Python-Dev] DDJ article on Python GC
Message-ID: <Pine.LNX.4.10.10003011516160.29299-100000@akbar.nevex.com>

Jon Erickson (editor-in-chief) of "Doctor Dobb's Journal" would like an
article on what's involved in adding garbage collection to Python.  Please
email me if you're interested in tackling it...

Thanks,
Greg


From fdrake at acm.org  Wed Mar  1 21:37:49 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 1 Mar 2000 15:37:49 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src Makefile.in,1.82,1.83
In-Reply-To: <Pine.LNX.4.10.10003010102080.19420-100000@nebula.lyra.org>
References: <14523.56638.286603.340358@weyr.cnri.reston.va.us>
	<Pine.LNX.4.10.10003010102080.19420-100000@nebula.lyra.org>
Message-ID: <14525.32669.909212.716484@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Isn't the documentation better than what has been released? In other
 > words, if you release now, how could you make things worse? If something
 > does turn up during a check, you can always release again...

  Releasing is still somewhat tedious, and I don't want to ask people
to do several substantial downloads & installs.
  So far, a major navigation bug has been fonud in the test version I
posted (just now fixed online); *thats* why I don't like to release
too hastily!  I don't think waiting two more weeks is a problem.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido at python.org  Wed Mar  1 23:53:26 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 17:53:26 -0500
Subject: [Python-Dev] DDJ article on Python GC
In-Reply-To: Your message of "Wed, 01 Mar 2000 15:19:30 EST."
             <Pine.LNX.4.10.10003011516160.29299-100000@akbar.nevex.com> 
References: <Pine.LNX.4.10.10003011516160.29299-100000@akbar.nevex.com> 
Message-ID: <200003012253.RAA16056@eric.cnri.reston.va.us>

> Jon Erickson (editor-in-chief) of "Doctor Dobb's Journal" would like an
> article on what's involved in adding garbage collection to Python.  Please
> email me if you're interested in tackling it...

I might -- although I should get Neil, Eric and Tim as co-authors.

I'm halfway implementing the scheme that Eric showed yesterday.  It's
very elegant, but I don't have an idea about its impact performance
yet.

Say hi to Jon -- we've met a few times.  I liked his March editorial,
having just read the same book and had the same feeling of "wow, an
open source project in the 19th century!"

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond at skippinet.com.au  Thu Mar  2 00:09:23 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu, 2 Mar 2000 10:09:23 +1100
Subject: [Python-Dev] Re: A warning switch?
In-Reply-To: <200003011255.HAA13489@eric.cnri.reston.va.us>
Message-ID: <ECEPKNMJLHAPFFJHDOJBOELECFAA.mhammond@skippinet.com.au>

> > Can we then please have an interface to the "give warning" call (in
> > stead of a simple fprintf)? On the mac (and possibly also in
> > PythonWin) it's probably better to pop up a dialog (possibly with a
> > "don't show again" button) than do a printf which may get lost.
>
> Sure.  All you have to do is code it (or get someone else to code it).

How about just having either a "sys.warning" function, or maybe even a
sys.stdwarn stream?  Then a simple C API to call this, and we are done :-)
sys.stdwarn sounds OK - it just defaults to sys.stdout, so the Mac and
Pythonwin etc should "just work" by sending the output wherever sys.stdout
goes today...

Mark.


From tim_one at email.msn.com  Thu Mar  2 06:08:39 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 2 Mar 2000 00:08:39 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <38BCF3A4.1CCADFCE@lemburg.com>
Message-ID: <001001bf8405$5f9582c0$732d153f@tim>

[/F]
>     append = list.append
>     for x in something:
>         append(...)

[M.-A. Lemburg]
> Same here.  checkappend.py doesn't find these

As detailed in a c.l.py posting, I have yet to find a single instance of
this actually called with multiple arguments.  Pointing out that it's
*possible* isn't the same as demonstrating it's an actual problem.  I'm
quite willing to believe that it is, but haven't yet seen evidence of it.
For whatever reason, people seem much (and, in my experience so far,
infinitely <wink>) more prone to make the

    list.append(1, 2, 3)

error than the

    maybethisisanappend(1, 2, 3)

error.

> (a great tool BTW, thanks Tim; I noticed that it leaks memory badly
> though).

Which Python?  Which OS?  How do you know?  What were you running it over?

Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the
total (code + data) virtual memory allocated to it peaked at about 2Mb a few
seconds into the run, and actually decreased as time went on.  So, akin to
the bound method multi-argument append problem, the "checkappend leak
problem" is something I simply have no reason to believe <wink>.  Check your
claim again?  checkappend.py itself obviously creates no cycles or holds on
to any state across files, so if you're seeing a leak it must be a bug in
some other part of the version of Python + std libraries you're using.
Maybe a new 1.6 bug?  Something you did while adding Unicode?  Etc.  Tell us
what you were running.

Has anyone else seen a leak?


From tim_one at email.msn.com  Thu Mar  2 06:50:19 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 2 Mar 2000 00:50:19 -0500
Subject: [Python-Dev] str vs repr at prompt again (FW: String printing behavior?)
Message-ID: <001401bf840b$3177ba60$732d153f@tim>

Another unsolicited testimonial that countless users are oppressed by
auto-repr (as opposed to auto-str) at the interpreter prompt.  Just trying
to keep a once-hot topic from going stone cold forever <wink>.


-----Original Message-----
From: python-list-admin at python.org [mailto:python-list-admin at python.org]
On Behalf Of Ted Drain
Sent: Wednesday, March 01, 2000 5:42 PM
To: python-list at python.org
Subject: String printing behavior?


Hi all,
I've got a question about the string printing behavior.  If I define a
functions as:

>>> def foo():
...    return "line1\nline2"

>>> foo()
'line1\013line2'

>>> print foo()
line1
line2

>>>

It seems to me that the default printing behavior for strings should match
behavior of the print routine.  I realize that some people may want to
see embedded control codes, but I would advocate a seperate method for
printing raw byte sequences.

We are using the python interactive prompt as a pseudo-matlab like user
interface and the current printing behavior is very confusing to users.
It also means that functions that return text (like help routines)
must print the string rather than returning it.  Returning the string
is much more flexible because it allows the string to be captured
easily and redirected.

Any thoughts?

Ted

--
Ted Drain   Jet Propulsion Laboratory    Ted.Drain at jpl.nasa.gov
--
http://www.python.org/mailman/listinfo/python-list


From mal at lemburg.com  Thu Mar  2 08:42:33 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 02 Mar 2000 08:42:33 +0100
Subject: [Python-Dev] breaking list.append()
References: <001001bf8405$5f9582c0$732d153f@tim>
Message-ID: <38BE1B69.E0B88B41@lemburg.com>

Tim Peters wrote:
> 
> [/F]
> >     append = list.append
> >     for x in something:
> >         append(...)
> 
> [M.-A. Lemburg]
> > Same here.  checkappend.py doesn't find these
> 
> As detailed in a c.l.py posting, I have yet to find a single instance of
> this actually called with multiple arguments.  Pointing out that it's
> *possible* isn't the same as demonstrating it's an actual problem.  I'm
> quite willing to believe that it is, but haven't yet seen evidence of it.

Haven't had time to check this yet, but I'm pretty sure
there are some instances of this idiom in my code. Note that
I did in fact code like this on purpose: it saves a tuple
construction for every append, which can make a difference
in tight loops...

> For whatever reason, people seem much (and, in my experience so far,
> infinitely <wink>) more prone to make the
> 
>     list.append(1, 2, 3)
> 
> error than the
> 
>     maybethisisanappend(1, 2, 3)
> 
> error.

Of course... still there are hidden instances of the problem
which are yet to be revealed. For my own code the siutation
is even worse, since I sometimes did:

add = list.append
for x in y:
   add(x,1,2)

> > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly
> > though).
> 
> Which Python?  Which OS?  How do you know?  What were you running it over?

That's Python 1.5 on Linux2. I let the script run over
a large lib directory and my projects directory. In the
projects directory the script consumed as much as 240MB
of process size.
 
> Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the
> total (code + data) virtual memory allocated to it peaked at about 2Mb a few
> seconds into the run, and actually decreased as time went on.  So, akin to
> the bound method multi-argument append problem, the "checkappend leak
> problem" is something I simply have no reason to believe <wink>.  Check your
> claim again?  checkappend.py itself obviously creates no cycles or holds on
> to any state across files, so if you're seeing a leak it must be a bug in
> some other part of the version of Python + std libraries you're using.
> Maybe a new 1.6 bug?  Something you did while adding Unicode?  Etc.  Tell us
> what you were running.

I'll try the same thing again using Python1.5.2 and the CVS version.
 
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Thu Mar  2 08:46:49 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 02 Mar 2000 08:46:49 +0100
Subject: [Python-Dev] breaking list.append()
References: <001001bf8405$5f9582c0$732d153f@tim> <38BE1B69.E0B88B41@lemburg.com>
Message-ID: <38BE1C69.C8A9E6B0@lemburg.com>

"M.-A. Lemburg" wrote:
> 
> > > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly
> > > though).
> >
> > Which Python?  Which OS?  How do you know?  What were you running it over?
> 
> That's Python 1.5 on Linux2. I let the script run over
> a large lib directory and my projects directory. In the
> projects directory the script consumed as much as 240MB
> of process size.
> 
> > Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the
> > total (code + data) virtual memory allocated to it peaked at about 2Mb a few
> > seconds into the run, and actually decreased as time went on.  So, akin to
> > the bound method multi-argument append problem, the "checkappend leak
> > problem" is something I simply have no reason to believe <wink>.  Check your
> > claim again?  checkappend.py itself obviously creates no cycles or holds on
> > to any state across files, so if you're seeing a leak it must be a bug in
> > some other part of the version of Python + std libraries you're using.
> > Maybe a new 1.6 bug?  Something you did while adding Unicode?  Etc.  Tell us
> > what you were running.
> 
> I'll try the same thing again using Python1.5.2 and the CVS version.

Using the Unicode patched CVS version there's no leak anymore.
Couldn't find a 1.5.2 version on my machine... I'll build one
later.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Thu Mar  2 16:32:32 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 02 Mar 2000 10:32:32 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
Message-ID: <200003021532.KAA17088@eric.cnri.reston.va.us>

I was looking at the code that invokes __del__, with the intent to
implement a feature from Java: in Java, a finalizer is only called
once per object, even if calling it makes the object live longer.

To implement this, we need a flag in each instance that means "__del__
was called".  I opened the creation code for instances, looking for
the right place to set the flag.  I then realized that it might be
smart, now that we have this flag anyway, to set it to "true" during
initialization.  There are a number of exits from the initialization
where the object is created but not fully initialized, where the new
object is DECREF'ed and NULL is returned.  When such an exit is taken,
__del__ is called on an incompletely initialized object!  Example:

	>>> class C:
	  def __del__(self): print "deleting", self

	>>> x = C(1)
 !-->   deleting <__main__.C instance at 1686d8>
	Traceback (innermost last):
	  File "<stdin>", line 1, in ?
	TypeError: this constructor takes no arguments
	>>>

Now I have a choice to make.  If the class has an __init__, should I
clear the flag only after __init__ succeeds?  This means that if
__init__ raises an exception, __del__ is never called.  This is an
incompatibility.  It's possible that someone has written code that
relies on __del__ being called even when __init__ fails halfway, and
then their code would break.

But it is just as likely that calling __del__ on a partially
uninitialized object is a bad mistake, and I am doing all these cases
a favor by not calling __del__ when __init__ failed!

Any opinions?  If nobody speaks up, I'll make the change.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From bwarsaw at cnri.reston.va.us  Thu Mar  2 17:44:00 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 2 Mar 2000 11:44:00 -0500 (EST)
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
References: <200003021532.KAA17088@eric.cnri.reston.va.us>
Message-ID: <14526.39504.36065.657527@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido at python.org> writes:

    GvR> Now I have a choice to make.  If the class has an __init__,
    GvR> should I clear the flag only after __init__ succeeds?  This
    GvR> means that if __init__ raises an exception, __del__ is never
    GvR> called.  This is an incompatibility.  It's possible that
    GvR> someone has written code that relies on __del__ being called
    GvR> even when __init__ fails halfway, and then their code would
    GvR> break.

It reminds me of the separation between object allocation and
initialization in ObjC.  

    GvR> But it is just as likely that calling __del__ on a partially
    GvR> uninitialized object is a bad mistake, and I am doing all
    GvR> these cases a favor by not calling __del__ when __init__
    GvR> failed!

    GvR> Any opinions?  If nobody speaks up, I'll make the change.

I think you should set the flag right before you call __init__(),
i.e. after (nearly all) the C level initialization has occurred.
Here's why: your "favor" can easily be accomplished by Python
constructs in the __init__():

class MyBogo:
    def __init__(self):
	self.get_delified = 0
	do_sumtin_exceptional()
	self.get_delified = 1

    def __del__(self):
	if self.get_delified:
	    ah_sweet_release()

-Barry


From gstein at lyra.org  Thu Mar  2 18:14:35 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 2 Mar 2000 09:14:35 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ only after successful
 __init__?
In-Reply-To: <200003021532.KAA17088@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003020913520.2146-100000@nebula.lyra.org>

On Thu, 2 Mar 2000, Guido van Rossum wrote:
>...
> But it is just as likely that calling __del__ on a partially
> uninitialized object is a bad mistake, and I am doing all these cases
> a favor by not calling __del__ when __init__ failed!
> 
> Any opinions?  If nobody speaks up, I'll make the change.

+1 on calling __del__ IFF __init__ completes successfully.


Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From jeremy at cnri.reston.va.us  Thu Mar  2 18:15:14 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Thu, 2 Mar 2000 12:15:14 -0500 (EST)
Subject: [Python-Dev] str vs repr at prompt again (FW: String printing behavior?)
In-Reply-To: <001401bf840b$3177ba60$732d153f@tim>
References: <001401bf840b$3177ba60$732d153f@tim>
Message-ID: <14526.41378.374653.497993@goon.cnri.reston.va.us>

>>>>> "TP" == Tim Peters <tim_one at email.msn.com> writes:

  TP> Another unsolicited testimonial that countless users are
  TP> oppressed by auto-repr (as opposed to auto-str) at the
  TP> interpreter prompt.  Just trying to keep a once-hot topic from
  TP> going stone cold forever <wink>.

  [Signature from the included message:]

  >> -- Ted Drain Jet Propulsion Laboratory Ted.Drain at jpl.nasa.gov --

This guy is probably a rocket scientist.  We want the language to be
useful for everybody, not just rocket scientists. <wink>

Jeremy


From guido at python.org  Thu Mar  2 23:45:37 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 02 Mar 2000 17:45:37 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: Your message of "Thu, 02 Mar 2000 11:44:00 EST."
             <14526.39504.36065.657527@anthem.cnri.reston.va.us> 
References: <200003021532.KAA17088@eric.cnri.reston.va.us>  
            <14526.39504.36065.657527@anthem.cnri.reston.va.us> 
Message-ID: <200003022245.RAA20265@eric.cnri.reston.va.us>

> >>>>> "GvR" == Guido van Rossum <guido at python.org> writes:
> 
>     GvR> Now I have a choice to make.  If the class has an __init__,
>     GvR> should I clear the flag only after __init__ succeeds?  This
>     GvR> means that if __init__ raises an exception, __del__ is never
>     GvR> called.  This is an incompatibility.  It's possible that
>     GvR> someone has written code that relies on __del__ being called
>     GvR> even when __init__ fails halfway, and then their code would
>     GvR> break.

[Barry]
> It reminds me of the separation between object allocation and
> initialization in ObjC.  

Is that good or bad?

>     GvR> But it is just as likely that calling __del__ on a partially
>     GvR> uninitialized object is a bad mistake, and I am doing all
>     GvR> these cases a favor by not calling __del__ when __init__
>     GvR> failed!
> 
>     GvR> Any opinions?  If nobody speaks up, I'll make the change.
> 
> I think you should set the flag right before you call __init__(),
> i.e. after (nearly all) the C level initialization has occurred.
> Here's why: your "favor" can easily be accomplished by Python
> constructs in the __init__():
> 
> class MyBogo:
>     def __init__(self):
> 	self.get_delified = 0
> 	do_sumtin_exceptional()
> 	self.get_delified = 1
> 
>     def __del__(self):
> 	if self.get_delified:
> 	    ah_sweet_release()

But the other behavior (call __del__ even when __init__ fails) can
also easily be accomplished in Python:

    class C:

        def __init__(self):
            try:
                ...stuff that may fail...
            except:
                self.__del__()
                raise

        def __del__(self):
            ...cleanup...

I believe that in almost all cases the programmer would be happier if
__del__ wasn't called when their __init__ fails.  This makes it easier
to write a __del__ that can assume that all the object's fields have
been properly initialized.

In my code, typically when __init__ fails, this is a symptom of a
really bad bug (e.g. I just renamed one of __init__'s arguments and
forgot to fix all references), and I don't care much about cleanup
behavior.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From bwarsaw at cnri.reston.va.us  Thu Mar  2 23:52:31 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Thu, 2 Mar 2000 17:52:31 -0500 (EST)
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
References: <200003021532.KAA17088@eric.cnri.reston.va.us>
	<14526.39504.36065.657527@anthem.cnri.reston.va.us>
	<200003022245.RAA20265@eric.cnri.reston.va.us>
Message-ID: <14526.61615.362973.624022@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido at python.org> writes:

    GvR> But the other behavior (call __del__ even when __init__
    GvR> fails) can also easily be accomplished in Python:

It's a fair cop.

    GvR> I believe that in almost all cases the programmer would be
    GvR> happier if __del__ wasn't called when their __init__ fails.
    GvR> This makes it easier to write a __del__ that can assume that
    GvR> all the object's fields have been properly initialized.

That's probably fine; I don't have strong feelings either way.

-Barry

P.S. Interesting what X-Oblique-Strategy was randomly inserted in this
message (but I'm not sure which approach is more "explicit" :).

-Barry


From tim_one at email.msn.com  Fri Mar  3 06:38:59 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 3 Mar 2000 00:38:59 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: <200003021532.KAA17088@eric.cnri.reston.va.us>
Message-ID: <000001bf84d2$c711e2e0$092d153f@tim>

[Guido]
> I was looking at the code that invokes __del__, with the intent to
> implement a feature from Java: in Java, a finalizer is only called
> once per object, even if calling it makes the object live longer.

Why?  That is, in what way is this an improvement over current behavior?

Note that Java is a bit subtle:  a finalizer is only called once by magic;
explicit calls "don't count".

The Java rules add up to quite a confusing mish-mash.  Python's rules are
*currently* clearer.

I deal with possible exceptions in Python constructors the same way I do in
C++ and Java:  if there's a destructor, don't put anything in __init__ that
may raise an uncaught exception.  Anything dangerous is moved into a
separate .reset() (or .clear() or ...) method.  This works well in practice.

> To implement this, we need a flag in each instance that means "__del__
> was called".

At least <wink>.

> I opened the creation code for instances, looking for the right place
> to set the flag.  I then realized that it might be smart, now that we
> have this flag anyway, to set it to "true" during initialization.  There
> are a number of exits from the initialization where the object is created
> but not fully initialized, where the new object is DECREF'ed and NULL is
> returned.  When such an exit is taken, __del__ is called on an
> incompletely initialized object!

I agree *that* isn't good.  Taken on its own, though, it argues for adding
an "instance construction completed" flag that __del__ later checks, as if
its body were:

    if self.__instance_construction_completed:
        body

That is, the problem you've identified here could be addressed directly.

> Now I have a choice to make.  If the class has an __init__, should I
> clear the flag only after __init__ succeeds?  This means that if
> __init__ raises an exception, __del__ is never called.  This is an
> incompatibility.  It's possible that someone has written code that
> relies on __del__ being called even when __init__ fails halfway, and
> then their code would break.
>
> But it is just as likely that calling __del__ on a partially
> uninitialized object is a bad mistake, and I am doing all these cases
> a favor by not calling __del__ when __init__ failed!
>
> Any opinions?  If nobody speaks up, I'll make the change.

I'd be in favor of fixing the actual problem; I don't understand the point
to the rest of it, especially as it has the potential to break existing code
and I don't see a compensating advantage (surely not compatibility w/
JPython -- JPython doesn't invoke __del__ methods at all by magic, right?
or is that changing, and that's what's driving this?).

too-much-magic-is-dizzying-ly y'rs  - tim


From bwarsaw at cnri.reston.va.us  Fri Mar  3 06:50:16 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 3 Mar 2000 00:50:16 -0500 (EST)
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
References: <200003021532.KAA17088@eric.cnri.reston.va.us>
	<000001bf84d2$c711e2e0$092d153f@tim>
Message-ID: <14527.21144.9421.958311@anthem.cnri.reston.va.us>

>>>>> "TP" == Tim Peters <tim_one at email.msn.com> writes:

    TP> (surely not compatibility w/ JPython -- JPython doesn't invoke
    TP> __del__ methods at all by magic, right?  or is that changing,
    TP> and that's what's driving this?).

No, JPython doesn't invoke __del__ methods by magic, and I don't have
any plans to change that.

-Barry


From ping at lfw.org  Fri Mar  3 10:00:21 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 3 Mar 2000 01:00:21 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ only after successful
 __init__?
In-Reply-To: <Pine.LNX.4.10.10003020913520.2146-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.10.10003030049150.1788-100000@skuld.lfw.org>

On Thu, 2 Mar 2000, Greg Stein wrote:
> On Thu, 2 Mar 2000, Guido van Rossum wrote:
> >...
> > But it is just as likely that calling __del__ on a partially
> > uninitialized object is a bad mistake, and I am doing all these cases
> > a favor by not calling __del__ when __init__ failed!
> > 
> > Any opinions?  If nobody speaks up, I'll make the change.
> 
> +1 on calling __del__ IFF __init__ completes successfully.

That would be my vote as well.

What convinced me of this is the following:

If it's up to the implementation of __del__ to deal with a problem
that happened during initialization, you only know about the problem
with very coarse granularity.  It's a pain (or even impossible) to
then rediscover the information you need to recover adequately.

If on the other hand you deal with the problem in __init__, then
you have much better control over what is happening, because you
can position try/except blocks precisely where you need them to
deal with specific potential problems.  Each block can take care
of its case appropriately, and re-raise if necessary.

In general, it seems to me that what you want to do when __init__
runs afoul is going to be different from what you want to do to
take care of object cleanup in __del__.  So it doesn't belong
there -- it belongs in an except: clause in __init__.

Even though it's an incompatibility, i really think this is the
right behaviour.


-- ?!ng

"To be human is to continually change.  Your desire to remain as you are
is what ultimately limits you."
    -- The Puppet Master, Ghost in the Shell


From guido at python.org  Fri Mar  3 17:13:16 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 03 Mar 2000 11:13:16 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: Your message of "Fri, 03 Mar 2000 00:38:59 EST."
             <000001bf84d2$c711e2e0$092d153f@tim> 
References: <000001bf84d2$c711e2e0$092d153f@tim> 
Message-ID: <200003031613.LAA21571@eric.cnri.reston.va.us>

> [Guido]
> > I was looking at the code that invokes __del__, with the intent to
> > implement a feature from Java: in Java, a finalizer is only called
> > once per object, even if calling it makes the object live longer.

[Tim]
> Why?  That is, in what way is this an improvement over current behavior?
> 
> Note that Java is a bit subtle:  a finalizer is only called once by magic;
> explicit calls "don't count".

Of course.  Same in my proposal.  But I wouldn't call it "by magic" --
just "on behalf of the garbage collector".

> The Java rules add up to quite a confusing mish-mash.  Python's rules are
> *currently* clearer.

I don't find the Java rules confusing.  It seems quite useful that the
GC promises to call the finalizer at most once -- this can simplify
the finalizer logic.  (Otherwise it may have to ask itself, "did I
clean this already?" and leave notes for itself.)  Explicit finalizer
calls are always a mistake and thus "don't count" -- the response to
that should in general be "don't do that" (unless you have
particularly stupid callers -- or very fearful lawyers :-).

> I deal with possible exceptions in Python constructors the same way I do in
> C++ and Java:  if there's a destructor, don't put anything in __init__ that
> may raise an uncaught exception.  Anything dangerous is moved into a
> separate .reset() (or .clear() or ...) method.  This works well in practice.

Sure, but the rule "if __init__ fails, __del__ won't be called" means
that we don't have to program our __init__ or __del__ quite so
defensively.  Most people who design a __del__ probably assume that
__init__ has run to completion.  The typical scenario (which has
happened to me!  And I *implemented* the damn thing!) is this:
__init__ opens a file and assigns it to an instance variable; __del__
closes the file.  This is tested a few times and it works great.  Now
in production the file somehow unexpectedly fails to be openable.
Sure, the programmer should've expected that, but she didn't.  Now, at
best, the failed __del__ creates an additional confusing error
message on top of the traceback generated by IOError.  At worst, the
failed __del__ could wreck the original traceback.

Note that I'm not proposing to change the C level behavior; when a
Py<Object>_New() function is halfway its initialization and decides to
bail out, it does a DECREF(self) and you bet that at this point the
<object>_dealloc() function gets called (via
self->ob_type->tp_dealloc).  Occasionally I need to initialize certain
fields to NULL so that the dealloc() function doesn't try to free
memory that wasn't allocated.  Often it's as simple as using XDECREF
instead of DECREF in the dealloc() function (XDECREF is safe when the
argument is NULL, DECREF dumps core, saving a load-and-test if you are
sure its arg is a valid object).

> > To implement this, we need a flag in each instance that means "__del__
> > was called".
> 
> At least <wink>.
> 
> > I opened the creation code for instances, looking for the right place
> > to set the flag.  I then realized that it might be smart, now that we
> > have this flag anyway, to set it to "true" during initialization.  There
> > are a number of exits from the initialization where the object is created
> > but not fully initialized, where the new object is DECREF'ed and NULL is
> > returned.  When such an exit is taken, __del__ is called on an
> > incompletely initialized object!
> 
> I agree *that* isn't good.  Taken on its own, though, it argues for adding
> an "instance construction completed" flag that __del__ later checks, as if
> its body were:
> 
>     if self.__instance_construction_completed:
>         body
> 
> That is, the problem you've identified here could be addressed directly.

Sure -- but I would argue that when __del__ returns,
__instance_construction_completed should be reset to false, because
the destruction (conceptually, at least) cancels out the construction!

> > Now I have a choice to make.  If the class has an __init__, should I
> > clear the flag only after __init__ succeeds?  This means that if
> > __init__ raises an exception, __del__ is never called.  This is an
> > incompatibility.  It's possible that someone has written code that
> > relies on __del__ being called even when __init__ fails halfway, and
> > then their code would break.
> >
> > But it is just as likely that calling __del__ on a partially
> > uninitialized object is a bad mistake, and I am doing all these cases
> > a favor by not calling __del__ when __init__ failed!
> >
> > Any opinions?  If nobody speaks up, I'll make the change.
> 
> I'd be in favor of fixing the actual problem; I don't understand the point
> to the rest of it, especially as it has the potential to break existing code
> and I don't see a compensating advantage (surely not compatibility w/
> JPython -- JPython doesn't invoke __del__ methods at all by magic, right?
> or is that changing, and that's what's driving this?).

JPython's a red herring here.

I think that the proposed change probably *fixes* much morecode that
is subtly wrong than it breaks code that is relying on __del__ being
called after a partial __init__.  All the rules relating to __del__
are confusing (e.g. what __del__ can expect to survive in its
globals).

Also note Ping's observation:

| If it's up to the implementation of __del__ to deal with a problem
| that happened during initialization, you only know about the problem
| with very coarse granularity.  It's a pain (or even impossible) to
| then rediscover the information you need to recover adequately.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Fri Mar  3 17:49:52 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 3 Mar 2000 11:49:52 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: <200003031613.LAA21571@eric.cnri.reston.va.us>
Message-ID: <000501bf8530$7f8c78a0$b0a0143f@tim>

[Tim]
>> Note that Java is a bit subtle:  a finalizer is only called
>> once by magic; explicit calls "don't count".

[Guido]
> Of course.  Same in my proposal.

OK -- that wasn't clear.

> But I wouldn't call it "by magic" -- just "on behalf of the garbage
> collector".

Yup, magically called <wink>.

>> The Java rules add up to quite a confusing mish-mash.  Python's
>> rules are *currently* clearer.

> I don't find the Java rules confusing.

"add up" == "taken as a whole"; include the Java spec's complex state
machine for cleanup semantics, and the later complications added by three
(four?) distinct flavors of weak reference, and I doubt 1 Java programmer in
1,000 actually understands the rules.  This is why I'm wary of moving in the
Java *direction* here.  Note that Java programmers in past c.l.py threads
have generally claimed Java's finalizers are so confusing & unpredictable
they don't use them at all!  Which, in the end, is probably a good idea in
Python too <0.5 wink>.

> It seems quite useful that the GC promises to call the finalizer at
> most once -- this can simplify the finalizer logic.

Granting that explicit calls are "use at your own risk", the only
user-visible effect of "called only once" is in the presence of
resurrection.  Now in my Python experience, on the few occasions I've
resurrected an object in __del__, *of course* I expected __del__ to get
called again if the object is about to die again!  Typical:

    def __del__(self):
        if oops_i_still_need_to_stay_alive:
            resurrect(self)
        else:
            # really going away
            release(self.critical_resource)

Call __del__ only once, and code like this is busted bigtime.

OTOH, had I written __del__ logic that relied on being called only once,
switching the implementation to call it more than once would break *that*
bigtime.  Neither behavior is an obvious all-cases win to me, or even a
plausibly most-cases win.  But Python already took a stand on this & so I
think you need a *good* reason to change semantics now.

> ...
> Sure, but the rule "if __init__ fails, __del__ won't be called" means
> that we don't have to program our __init__ or __del__ quite so
> defensively.  Most people who design a __del__ probably assume that
> __init__ has run to completion. ...

This is (or can easily be made) a separate issue, & I agreed the first time
this seems worth fixing (although if nobody has griped about it in a decade
of use, it's hard to call it a major bug <wink>).

> ...
> Sure -- but I would argue that when __del__ returns,
>__instance_construction_completed should be reset to false, because
> the destruction (conceptually, at least) cancels out the construction!

In the __del__ above (which is typical of the cases of resurrection I've
seen), there is no such implication.  Perhaps this is philosophical abuse of
Python's intent, but if so it relied only on trusting its advertised
semantics.

> I think that the proposed change probably *fixes* much morecode that
> is subtly wrong than it breaks code that is relying on __del__ being
> called after a partial __init__.

Yes, again, I have no argument against refusing to call __del__ unless
__init__ succeeded.  Going beyond that to a new "called at most once" rule
is indeed going beyond that, *will* break reasonable old code, and holds no
particular attraction that I can see (it trades making one kind of
resurrection scenario easier at the cost of making other kinds harder).

If there needs to be incompatible change here, curiously enough I'd be more
in favor of making resurrection illegal period (which could *really*
simplify gc's headaches).

> All the rules relating to __del__ are confusing (e.g. what __del__ can
> expect to survive in its globals).

Problems unique to final shutdown don't seem relevant here.

> Also note Ping's observation: ...

I can't agree with that yet another time without being quadruply redundant
<wink>.


From guido at python.org  Fri Mar  3 17:50:08 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 03 Mar 2000 11:50:08 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: Your message of "Wed, 01 Mar 2000 00:44:10 EST."
             <200003010544.AAA13155@eric.cnri.reston.va.us> 
References: <20000229153421.A16502@acs.ucalgary.ca>  
            <200003010544.AAA13155@eric.cnri.reston.va.us> 
Message-ID: <200003031650.LAA21647@eric.cnri.reston.va.us>

We now have two implementations of Eric Tiedemann's idea: Neil and I
both implemented it.  It's too soon to post the patch sets (both are
pretty rough) but I've got another design question.

Once we've identified a bunch of objects that are only referring to
each other (i.e., one or more cycles) we have to dispose of them.

The question is, how?  We can't just call free on each of the objects;
some may not be allocated with malloc, and some may contain pointers
to other malloc'ed memory that also needs to be freed.

So we have to get their destructors involved.  But how?  Calling
ob->ob_type->tp_dealloc(ob) for an object who reference count is
unsafe -- this will destroy the object while there are still
references to it!  Those references are all coming from other objects
that are part of the same cycle; those objects will also be
deallocated and they will reference the deallocated objects (if only
to DECREF them).

Neil uses the same solution that I use when finalizing the Python
interpreter -- find the dictionaries and call PyDict_Clear() on them.
(In his unpublished patch, he also clears the lists using
PyList_SetSlice(list, 0, list->ob_size, NULL).  He's also generalized
so that *every* object can define a tp_clear function in its type
object.)

As long as every cycle contains at least one dictionary or list
object, this will break cycles reliably and get rid of all the
garbage.  (If you wonder why: clearing the dict DECREFs the next
object(s) in the cycle; if the last dict referencing a particular
object is cleared, the last DECREF will deallocate that object, which
will in turn DECREF the objects it references, and so forth.  Since
none of the objects in the cycle has incoming references from outside
the cycle, we can prove that this will delete all objects as long as
there's a dict or list in each cycle.

However, there's a snag.  It's the same snag as what finalizing the
Python interpreter runs into -- it has to do with __del__ methods and
the undefined order in which the dictionaries are cleared.

For example, it's quite possible that the first dictionary we clear is
the __dict__ of an instance, so this zaps all its instance variables.
Suppose this breaks the cycle, so then the instance itself gets
DECREFed to zero.  Its deallocator will be called.  If it's got a
__del__, this __del__ will be called -- but all the instance variables
have already been zapped, so it will fail miserably!

It's also possible that the __dict__ of a class involved in a cycle
gets cleared first, in which case the __del__ no longer "exists", and
again the cleanup is skipped.

So the question is: What to *do*?

My solution is to make an extra pass over all the garbage objects
*before* we clear dicts and lists, and for those that are instances
and have __del__ methods, call their __del__ ("by magic", as Tim calls
it in another post).  The code in instance_dealloc() already does the
right thing here: it calls __del__, then discovers that the reference
count is > 0 ("I'm not dead yet" :-), and returns without freeing the
object.  (This is also why I want to introduce a flag ensuring that
__del__ gets called by instance_dealloc at most once: later when the
instance gets DECREFed to 0, instance_dealloc is called again and will
correctly free the object; but we don't want __del__ called again.)
[Note for Neil: somehow I forgot to add this logic to the code;
in_del_called isn't used!  The change is obvious though.]

This still leaves a problem for the user: if two class instances
reference each other and both have a __del__, we can't predict whose
__del__ is called first when they are called as part of cycle
collection.  The solution is to write each __del__ so that it doesn't
depend on the other __del__.

Someone (Tim?) in the past suggested a different solution (probably
found in another language): for objects that are collected as part of
a cycle, the destructor isn't called at all.  The memory is freed
(since it's no longer reachable), but the destructor is not called --
it is as if the object lives on forever.

This is theoretically superior, but not practical: when I have an
object that creates a temp file, I want to be able to reliably delete
the temp file in my destructor, even when I'm part of a cycle!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jack at oratrix.nl  Fri Mar  3 17:57:54 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Fri, 03 Mar 2000 17:57:54 +0100
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? 
In-Reply-To: Message by Guido van Rossum <guido@python.org> ,
	     Fri, 03 Mar 2000 11:50:08 -0500 , <200003031650.LAA21647@eric.cnri.reston.va.us> 
Message-ID: <20000303165755.490EA371868@snelboot.oratrix.nl>

The __init__ rule for calling __del__ has me confused. Is this per-class or 
per-object?

I.e. what will happen in the following case:

class Purse:
	def __init__(self):
		self.balance = WithdrawCashFromBank(1000)

	def __del__(self):
		PutCashBackOnBank(self.balance)
		self.balance = 0

class LossyPurse(Purse):
	def __init__(self):
		Purse.__init__(self)
		 raise 'kaboo! kaboo!'

If the new scheme means that the __del__ method of Purse isn't called I think 
I don't like it. In the current scheme I can always program defensively:
	def __del__(self):
		try:
			b = self.balance
			self.balance = 0
		except AttributeError:
			pass
		else:
			PutCashBackOnBank(b)
but in a new scheme with a per-object "__del__ must be called" flag I can't...
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido at python.org  Fri Mar  3 18:05:00 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 03 Mar 2000 12:05:00 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: Your message of "Fri, 03 Mar 2000 11:49:52 EST."
             <000501bf8530$7f8c78a0$b0a0143f@tim> 
References: <000501bf8530$7f8c78a0$b0a0143f@tim> 
Message-ID: <200003031705.MAA21700@eric.cnri.reston.va.us>

OK, so we're down to this one point: if __del__ resurrects the object,
should __del__ be called again later?  Additionally, should
resurrection be made illegal?

I can easily see how __del__ could *accidentally* resurrect the object
as part of its normal cleanup -- e.g. you make a call to some other
routine that helps with the cleanup, passing self as an argument, and
this other routine keeps a helpful cache of the last argument for some
reason.  I don't see how we could forbid this type of resurrection.
(What are you going to do?  You can't raise an exception from
instance_dealloc, since it is called from DECREF.  You can't track
down the reference and replace it with a None easily.)
In this example, the helper routine will eventually delete the object
from its cache, at which point it is truly deleted.  It would be
harmful, not helpful, if __del__ was called again at this point.

Now, it is true that the current docs for __del__ imply that
resurrection is possible.  The intention of that note was to warn
__del__ writers that in the case of accidental resurrection __del__
might be called again.  The intention certainly wasn't to allow or
encourage intentional resurrection.

Would there really be someone out there who uses *intentional*
resurrection?  I severely doubt it.  I've never heard of this.

[Jack just finds a snag]

> The __init__ rule for calling __del__ has me confused. Is this per-class or 
> per-object?
> 
> I.e. what will happen in the following case:
> 
> class Purse:
> 	def __init__(self):
> 		self.balance = WithdrawCashFromBank(1000)
> 
> 	def __del__(self):
> 		PutCashBackOnBank(self.balance)
> 		self.balance = 0
> 
> class LossyPurse(Purse):
> 	def __init__(self):
> 		Purse.__init__(self)
> 		 raise 'kaboo! kaboo!'
> 
> If the new scheme means that the __del__ method of Purse isn't called I think 
> I don't like it. In the current scheme I can always program defensively:
> 	def __del__(self):
> 		try:
> 			b = self.balance
> 			self.balance = 0
> 		except AttributeError:
> 			pass
> 		else:
> 			PutCashBackOnBank(b)
> but in a new scheme with a per-object "__del__ must be called" flag I can't...

Yes, that's a problem.  But there are other ways for the subclass to
break the base class's invariant (e.g. it could override __del__
without calling the base class' __del__).

So I think it's a red herring.  In Python 3000, typechecked classes
may declare invariants that are enforced by the inheritance mechanism;
then we may need to keep track which base class constructors succeeded
and only call corresponding destructors.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Fri Mar  3 19:17:11 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 03 Mar 2000 19:17:11 +0100
Subject: [Python-Dev] Design question: call __del__ only after successful 
 __init__?
References: <000501bf8530$7f8c78a0$b0a0143f@tim> <200003031705.MAA21700@eric.cnri.reston.va.us>
Message-ID: <38C001A7.6CF8F365@lemburg.com>

Guido van Rossum wrote:
> 
> OK, so we're down to this one point: if __del__ resurrects the object,
> should __del__ be called again later?  Additionally, should
> resurrection be made illegal?

Yes and no :-)

One example comes to mind: implementations of weak references,
which manage weak object references themselves (as soon as
__del__ is called the weak reference implementation takes
over the object). Another example is that of free list
like implementations which reduce object creation times
by implementing smart object recycling, e.g. objects could
keep allocated dictionaries alive or connections to databases
open, etc.

As for the second point: 
Calling __del__ again is certainly needed to keep application
logic sane... after all, __del__ should be called whenever the
refcount reaches 0 -- and that can happend more than once
in the objects life-time if reanimation occurs.

> I can easily see how __del__ could *accidentally* resurrect the object
> as part of its normal cleanup -- e.g. you make a call to some other
> routine that helps with the cleanup, passing self as an argument, and
> this other routine keeps a helpful cache of the last argument for some
> reason.  I don't see how we could forbid this type of resurrection.
> (What are you going to do?  You can't raise an exception from
> instance_dealloc, since it is called from DECREF.  You can't track
> down the reference and replace it with a None easily.)
> In this example, the helper routine will eventually delete the object
> from its cache, at which point it is truly deleted.  It would be
> harmful, not helpful, if __del__ was called again at this point.

I'd say this is an application logic error -- nothing that
the mechanism itself can help with automagically. OTOH,
turning multi calls to __del__ off, would make certain
techniques impossible.

> Now, it is true that the current docs for __del__ imply that
> resurrection is possible.  The intention of that note was to warn
> __del__ writers that in the case of accidental resurrection __del__
> might be called again.  The intention certainly wasn't to allow or
> encourage intentional resurrection.

I don't think that docs are the right argument here ;-)
It is simply the reference counting logic that plays its role:
__del__ is called when refcount reaches 0, which usually
means that the object is about to be garbage collected...
unless the object is rereferenced by some other object and
thus gets reanimated.
 
> Would there really be someone out there who uses *intentional*
> resurrection?  I severely doubt it.  I've never heard of this.

BTW, I can't see what the original question has to do with this
discussion ... calling __del__ only after successful __init__
is ok, IMHO, but what does this have to do with the way __del__
itself is implemented ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Fri Mar  3 19:30:36 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 03 Mar 2000 19:30:36 +0100
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
References: <20000229153421.A16502@acs.ucalgary.ca>  
	            <200003010544.AAA13155@eric.cnri.reston.va.us> <200003031650.LAA21647@eric.cnri.reston.va.us>
Message-ID: <38C004CC.1FE0A501@lemburg.com>

[Guido about ways to cleanup cyclic garbage]

FYI, I'm using a special protocol for disposing of cyclic
garbage: the __cleanup__ protocol. The purpose of this call
is probably similar to Neil's tp_clear: it is intended to
let objects break possible cycles in their own storage scope,
e.g. instances can delete instance variables which they
know can cause cyclic garbage.

The idea is simple: give all power to the objects rather
than try to solve everything with one magical master plan.

The mxProxy package has details on the protocol. The __cleanup__
method is called by the Proxy when the Proxy is about to be deleted.
If all references to an object go through the Proxy, the
__cleanup__ method call can easily break cycles to have the
refcount reach zero in which case __del__ is called. Since the
object knows about this scheme it can take precautions to
make sure that __del__ still works after __cleanup__ was
called.

Anyway, just a thought... there are probably many ways to do
all this.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From tismer at tismer.com  Fri Mar  3 19:51:55 2000
From: tismer at tismer.com (Christian Tismer)
Date: Fri, 03 Mar 2000 19:51:55 +0100
Subject: [Python-Dev] Design question: call __del__ only after successful 
 __init__?
References: <000501bf8530$7f8c78a0$b0a0143f@tim> <200003031705.MAA21700@eric.cnri.reston.va.us>
Message-ID: <38C009CB.72BD49CA@tismer.com>


Guido van Rossum wrote:
> 
> OK, so we're down to this one point: if __del__ resurrects the object,
> should __del__ be called again later?  Additionally, should
> resurrection be made illegal?

[much stuff]

Just a random note:

What if we had a __del__ with zombie behavior?

Assume an instance that is about to be destructed.
Then __del__ is called via normal method lookup.
What we want is to let this happen only once.
Here the Zombie:
After method lookup, place a dummy __del__ into the
to-be-deleted instance dict, and we are sure that
this does not harm.
Kinda "yes its there, but a broken link ". The zombie
always works by doing nothing. Makes some sense?

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From gstein at lyra.org  Sat Mar  4 00:09:48 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 3 Mar 2000 15:09:48 -0800 (PST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>

You may as well remove the entire "vi" concept from ConfigParser. Since
"vi" can be *only* a '=' or ':', then you aren't truly checking anything
in the "if" statement. Further, "vi" is used nowhere else, so that
variable and the corresponding regex group can be nuked altogether.

IMO, I'm not sure why the ";" comment form was initially restricted to
just one option format in the first place.

Cheers,
-g

On Fri, 3 Mar 2000, Jeremy Hylton wrote:
> Update of /projects/cvsroot/python/dist/src/Lib
> In directory bitdiddle:/home/jhylton/python/src/Lib
> 
> Modified Files:
> 	ConfigParser.py 
> Log Message:
> allow comments beginning with ; in key: value as well as key = value
> 
> 
> Index: ConfigParser.py
> ===================================================================
> RCS file: /projects/cvsroot/python/dist/src/Lib/ConfigParser.py,v
> retrieving revision 1.16
> retrieving revision 1.17
> diff -C2 -r1.16 -r1.17
> *** ConfigParser.py	2000/02/28 23:23:55	1.16
> --- ConfigParser.py	2000/03/03 20:43:57	1.17
> ***************
> *** 359,363 ****
>                           optname, vi, optval = mo.group('option', 'vi', 'value')
>                           optname = string.lower(optname)
> !                         if vi == '=' and ';' in optval:
>                               # ';' is a comment delimiter only if it follows
>                               # a spacing character
> --- 359,363 ----
>                           optname, vi, optval = mo.group('option', 'vi', 'value')
>                           optname = string.lower(optname)
> !                         if vi in ('=', ':') and ';' in optval:
>                               # ';' is a comment delimiter only if it follows
>                               # a spacing character
> 
> 
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at python.org
> http://www.python.org/mailman/listinfo/python-checkins
> 

-- 
Greg Stein, http://www.lyra.org/


From jeremy at cnri.reston.va.us  Sat Mar  4 00:15:32 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Fri, 3 Mar 2000 18:15:32 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us>
	<Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>
Message-ID: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>

Thanks for catching that.  I didn't look at the context.  I'm going to
wait, though, until I talk to Fred to mess with the code any more.

General question for python-dev readers: What are your experiences
with ConfigParser?  I just used it to build a simple config parser for
IDLE and found it hard to use for several reasons.  The biggest
problem was that the file format is undocumented.  I also found it
clumsy to have to specify section and option arguments. I ended up
writing a proxy that specializes on section so that get takes only an
option argument.

It sounds like ConfigParser code and docs could use a general cleanup.
Are there any other issues to take care of as part of that cleanup?

Jeremy


From gstein at lyra.org  Sat Mar  4 00:35:09 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 3 Mar 2000 15:35:09 -0800 (PST)
Subject: [Python-Dev] ConfigParser stuff (was: CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17)
In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003031525230.14301-100000@nebula.lyra.org>

On Fri, 3 Mar 2000, Jeremy Hylton wrote:
> Thanks for catching that.  I didn't look at the context.  I'm going to
> wait, though, until I talk to Fred to mess with the code any more.

Not a problem. I'm glad that diffs are now posted to -checkins. :-)

> General question for python-dev readers: What are your experiences
> with ConfigParser?

Love it!

> I just used it to build a simple config parser for
> IDLE and found it hard to use for several reasons.  The biggest
> problem was that the file format is undocumented.

In my most complex use of ConfigParser, I had to override SECTCRE to allow
periods in the section name. Of course, that was quite interesting since
the variable is __SECTRE in 1.5.2 (i.e. I had to compensate for the
munging).

I also change OPTCRE to allow a few more charaters ("@" in particular,
which even the update doesn't do). Not a problem nowadays since those are
public.

My subclass also defines a set() method and a delsection() method. These
are used because I write the resulting changes back out to a file. It
might be nice to have a method which writes out a config file (with an
"AUTOGENERATED BY ConfigParser.py -- DO NOT EDIT BY HAND"; or maybe
"... BY <appname> ...").

> I also found it
> clumsy to have to specify section and option arguments.

I found these were critical in my application. I also take advantage of
the sections in my "edna" application for logical organization.

> I ended up
> writing a proxy that specializes on section so that get takes only an
> option argument.
> 
> It sounds like ConfigParser code and docs could use a general cleanup.
> Are there any other issues to take care of as part of that cleanup?

A set() method and a writefile() type of method would be nice.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tim_one at email.msn.com  Sat Mar  4 02:38:43 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 3 Mar 2000 20:38:43 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <200003031650.LAA21647@eric.cnri.reston.va.us>
Message-ID: <000001bf857a$60b45ac0$c6a0143f@tim>

[Guido]
> ...
> Someone (Tim?) in the past suggested a different solution (probably
> found in another language): for objects that are collected as part of
> a cycle, the destructor isn't called at all.  The memory is freed
> (since it's no longer reachable), but the destructor is not called --
> it is as if the object lives on forever.

Stroustrup has written in favor of this for C++.  It's exactly the kind of
overly slick "good argument" he would never accept from anyone else <0.1
wink>.

> This is theoretically superior, but not practical: when I have an
> object that creates a temp file, I want to be able to reliably delete
> the temp file in my destructor, even when I'm part of a cycle!

A member of the C++ committee assured me Stroustrup is overwhelmingly
opposed on this.  I don't even agree it's theoretically superior:  it relies
on the fiction that gc "may never occur", and that's just silly in practice.

You're moving down the Java path.  I can't possibly do a better job of
explaining the Java rules than the Java Language Spec. does for itself.  So
pick that up and study section 12.6 (Finalization of Class Instances).  The
end result makes little sense to users, but is sufficient to guarantee that
Java itself never blows up.

Note, though, that there is NO good answer to finalizers in cycles!  The
implementation cannot be made smart enough to both avoid trouble and "do the
right thing" from the programmer's POV, because the latter is unknowable.
Somebody has to lose, one way or another.

Rather than risk doing a wrong thing, the BDW collector lets cycles with
finalizers leak.  But it also has optional hacks to support exceptions for
use with C++ (which sometimes creates self-cycles) and Java.  See

    http://reality.sgi.com/boehm_mti/finalization.html

for Boehm's best concentrated <wink> thoughts on the subject.

The only principled approach I know of comes out of the Scheme world.
Scheme has no finalizers, of course.  But it does have gc, and the concept
of "guardians" was invented to address all gc finalization problems in one
stroke.  It's extremely Scheme-like in providing a perfectly general
mechanism with no policy whatsoever.  You (the Scheme programmer) can create
guardian objects, and "register" other objects with a guardian.  At any
time, you can ask a guardian whether some object registered with it is
"ready to die" (i.e., the only thing keeping it alive is its registration
with the guardian).  If so, you can ask it to give you one.  Everything else
is up to you:  if you want to run a finalizer, your problem.  If there are
cycles, also your problem.  Even if there are simple non-cyclic
dependencies, your problem.  Etc.

So those are the extremes:  BDW avoids blame by refusing to do anything.
Java avoids blame by exposing an impossibly baroque implementation-driven
finalization model.  Scheme avoids blame by refusing to do anything "by
magic", but helps you to shoot yourself with the weapon of your choice.

That bad news is that I don't know of a scheme *not* at an extreme!

It's extremely un-Pythonic to let things leak (despite that it has let
things leak for a decade <wink>), but also extremely un-Pythonic to make
some wild-ass guess.

So here's what I'd consider doing:  explicit is better than implicit, and in
the face of ambiguity refuse the temptation to guess.  If a trash cycle
contains a finalizer (my, but that has to be rare. in practice, in
well-designed code!), don't guess, but make it available to the user.  A
gc.guardian() call could expose such beasts, or perhaps a callback could be
registered, invoked when gc finds one of these things.  Anyone crazy enough
to create cyclic trash with finalizers then has to take responsibility for
breaking the cycle themself.  This puts the burden on the person creating
the problem, and they can solve it in the way most appropriate to *their*
specific needs.  IOW, the only people who lose under this scheme are the
ones begging to lose, and their "loss" consists of taking responsibility.

when-a-problem-is-impossible-to-solve-favor-sanity<wink>-ly y'rs  - tim


From gstein at lyra.org  Sat Mar  4 03:59:26 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 3 Mar 2000 18:59:26 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim>
Message-ID: <Pine.LNX.4.10.10003031812160.14301-100000@nebula.lyra.org>

On Fri, 3 Mar 2000, Tim Peters wrote:
>...
> Note, though, that there is NO good answer to finalizers in cycles!  The

"Note" ?? Not just a note, but I'd say an axiom :-)

By definition, you have two objects referring to each other in some way.
How can you *definitely* know how to break the link between them? Do you
call A's finalizer or B's first? If they're instances, do you just whack
their __dict__ and hope for the best?

>...
> So here's what I'd consider doing:  explicit is better than implicit, and in
> the face of ambiguity refuse the temptation to guess.  If a trash cycle
> contains a finalizer (my, but that has to be rare. in practice, in
> well-designed code!), don't guess, but make it available to the user.  A
> gc.guardian() call could expose such beasts, or perhaps a callback could be
> registered, invoked when gc finds one of these things.  Anyone crazy enough
> to create cyclic trash with finalizers then has to take responsibility for
> breaking the cycle themself.  This puts the burden on the person creating
> the problem, and they can solve it in the way most appropriate to *their*
> specific needs.  IOW, the only people who lose under this scheme are the
> ones begging to lose, and their "loss" consists of taking responsibility.

I'm not sure if Tim is saying the same thing, but I'll write down a
concreate idea for cleaning garbage cycles.

First, a couple observations:

* Some objects can always be reliably "cleaned": lists, dicts, tuples.
  They just drop their contents, with no invocations against any of them.

  Note that an instance without a __del__ has no opinion on how it is
  cleaned.
  (this is related to Tim's point about whether a cycle has a finalizer)

* The other objects may need to *use* their referenced objects in some way
  to clean out cycles.

Since the second set of objects (possibly) need more care during their
cleanup, we must concentrate on how to solve their problem.

Back up a step: to determine where an object falls, let's define a
tp_clean type slot. It returns an integer and takes one parameter: an
operation integer.

    Py_TPCLEAN_CARE_CHECK      /* check whether care is needed */
    Py_TPCLEAN_CARE_EXEC       /* perform the careful cleaning */
    Py_TPCLEAN_EXEC            /* perform a non-careful cleaning */

Given a set of objects that require special cleaning mechanisms, there is
no way to tell where to start first. So... just pick the first one. Call
its tp_clean type slot with CARE_EXEC. For instances, this maps to
__clean__. If the instance does not have a __clean__, then tp_clean
returns FALSE meaning that it could not clean this object. The algorithm
moves on to the next object in the set.

If tp_clean returns TRUE, then the object has been "cleaned" and is moved
to the "no special care needed" list of objects, awaiting its reference
count to hit zero.

Note that objects in the "care" and "no care" lists may disappear during
the careful-cleaning process.

If the careful-cleaning algorithm hits the end of the careful set of
objects and the set is non-empty, then throw an exception:
GCImpossibleError. The objects in this set each said they could not be
cleaned carefully AND they were not dealloc'd during other objects'
cleaning.

[ it could be possible to define a *dynamic* CARE_EXEC that will succeed
  if you call it during a second pass; I'm not sure this is a Good Thing
  to allow, however. ]

This also implies that a developer should almost *always* consider writing
a __clean__ method whenever they write a __del__ method. That method MAY
be called when cycles need to be broken; the object should delete any
non-essential variables in such a way that integrity is retained (e.g. it
fails gracefully when methods are called and __del__ won't raise an
error). For example, __clean__ could call a self.close() to shut down its
operation. Whatever... you get the idea.

At the end of the iteration of the "care" set, then you may have objects
remaining in the "no care" set. By definition, these objects don't care
about their internal references to other objects (they don't need them
during deallocation). We iterate over this set, calling tp_clean(EXEC).
For lists, dicts, and tuples, the tp_clean(EXEC) call simply clears out
the references to other objects (but does not dealloc the object!). Again:
objects in the "no care" set will go away during this process. By the end
of the iteration over the "no care" set, it should be empty.

[ note: the iterations over these sets should probably INCREF/DECREF
  across the calls; otherwise, the object could be dealloc'd during the
  tp_clean call. ]

[ if the set is NOT empty, then tp_clean(EXEC) did not remove all possible
  references to other objects; not sure what this means. is it an error?
  maybe you just force a tp_dealloc on the remaining objects. ]

Note that the tp_clean mechanism could probably be used during the Python
finalization, where Python does a bunch of special-casing to clean up
modules. Specifically: a module does not care about its contents during
its deallocation, so it is a "no care" object; it responds to
tp_clean(EXEC) by clearing its dictionary. Class objects are similar: they
can clear their dict (which contains a module reference which usually
causes a loop) during tp_clean(EXEC). Module cleanup is easy once objects
with CARE_CHECK have been handled -- all that funny logic in there is to
deal with "care" objects.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tim_one at email.msn.com  Sat Mar  4 04:26:54 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 3 Mar 2000 22:26:54 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.LNX.4.10.10003031812160.14301-100000@nebula.lyra.org>
Message-ID: <000401bf8589$7d1364e0$c6a0143f@tim>

[Tim]
> Note, though, that there is NO good answer to finalizers in cycles!  The

[Greg Stein]
> "Note" ?? Not just a note, but I'd say an axiom :-)

An axiom is accepted without proof:  we have plenty of proof that there's no
thoroughly good answer (i.e., every language that has ever addressed this
issue -- along with every language that ever will <wink>).

> By definition, you have two objects referring to each other in some way.
> How can you *definitely* know how to break the link between them? Do you
> call A's finalizer or B's first? If they're instances, do you just whack
> their __dict__ and hope for the best?

Exactly.  The *programmer* may know the right thing to do, but the Python
implementation can't possibly know.  Facing both facts squarely constrains
the possibilities to the only ones that are all of understandable,
predictable and useful.  Cycles with finalizers must be a Magic-Free Zone
else you lose at least one of those three:  even Guido's kung fu isn't
strong enough to outguess this.

[a nice implementation sketch, of what seems an overly elaborate scheme,
 if you believe cycles with finalizers are rare in intelligently designed
 code)
]

Provided Guido stays interested in this, he'll make his own fun.  I'm just
inviting him to move in a sane direction <0.9 wink>.

One caution:

> ...
> If the careful-cleaning algorithm hits the end of the careful set of
> objects and the set is non-empty, then throw an exception:
> GCImpossibleError.

Since gc "can happen at any time", this is very severe (c.f. Guido's
objection to making resurrection illegal).  Hand a trash cycle back to the
programmer instead, via callback or request or whatever, and it's all
explicit without more cruft in the implementation.  It's alive again when
they get it back, and they can do anything they want with it (including
resurrecting it, or dropping it again, or breaking cycles -- anything).  I'd
focus on the cycles themselves, not on the types of objects involved.  I'm
not pretending to address the "order of finalization at shutdown" question,
though (although I'd agree they're deeply related:  how do you follow a
topological sort when there *isn't* one?  well, you don't, because you
can't).

realistically y'rs  - tim


From gstein at lyra.org  Sat Mar  4 09:43:45 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 4 Mar 2000 00:43:45 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <000401bf8589$7d1364e0$c6a0143f@tim>
Message-ID: <Pine.LNX.4.10.10003040000260.14301-100000@nebula.lyra.org>

On Fri, 3 Mar 2000, Tim Peters wrote:
>...
> [a nice implementation sketch, of what seems an overly elaborate scheme,
>  if you believe cycles with finalizers are rare in intelligently designed
>  code)
> ]

Nah. Quite simple to code up, but a bit longer to explain in English :-)

The hardest part is finding the cycles, but Guido already posted a long
explanation about that. Once that spits out the doubly-linked list of
objects, then you're set.

1) scan the list calling tp_clean(CARE_CHECK), shoving "care needed"
   objects to a second list
2) scan the care-needed list calling tp_clean(CARE_EXEC). if TRUE is
   returned, then the object was cleaned and moves to the "no care" list.
3) assert len(care-needed list) == 0
4) scan the no-care list calling tp_clean(EXEC)
5) (questionable) assert len(no-care list) == 0

The background makes it longer. The short description of the algorithm is
easy. Step (1) could probably be merged right into one of the scans in the
GC algorithm (e.g. during the placement into the "these are cyclical
garbage" list)

> Provided Guido stays interested in this, he'll make his own fun.  I'm just
> inviting him to move in a sane direction <0.9 wink>.

hehe... Agreed.

> One caution:
> 
> > ...
> > If the careful-cleaning algorithm hits the end of the careful set of
> > objects and the set is non-empty, then throw an exception:
> > GCImpossibleError.
> 
> Since gc "can happen at any time", this is very severe (c.f. Guido's
> objection to making resurrection illegal).

GCImpossibleError would simply be a subclass of MemoryError. Makes sense
to me, and definitely allows for its "spontaneity."

> Hand a trash cycle back to the
> programmer instead, via callback or request or whatever, and it's all
> explicit without more cruft in the implementation.  It's alive again when
> they get it back, and they can do anything they want with it (including
> resurrecting it, or dropping it again, or breaking cycles -- anything).  I'd
> focus on the cycles themselves, not on the types of objects involved.  I'm
> not pretending to address the "order of finalization at shutdown" question,
> though (although I'd agree they're deeply related:  how do you follow a
> topological sort when there *isn't* one?  well, you don't, because you
> can't).

I disagree. I don't think a Python-level function is going to have a very
good idea of what to do. IMO, this kind of semantics belong down in the
interpreter with a specific, documented algorithm. Throwing it out to
Python won't help -- that function will still have to use a "standard
pattern" for getting the cyclical objects to toss themselves. I think that
standard pattern should be a language definition. Without a standard
pattern, then you're saying the application will know what to do, but that
is kind of weird -- what happens when an unexpected cycle arrives?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From moshez at math.huji.ac.il  Sat Mar  4 10:50:19 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 4 Mar 2000 11:50:19 +0200 (IST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib
 ConfigParser.py,1.16,1.17
In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003041145580.1138-100000@sundial>

On Fri, 3 Mar 2000, Jeremy Hylton wrote:

> It sounds like ConfigParser code and docs could use a general cleanup.
> Are there any other issues to take care of as part of that cleanup?

One thing that bothered me once:

I want to be able to have something like:

[section]
tag = 1
tag = 2

And be able to retrieve ("section", "tag") -> ["1", "2"].
Can be awfully useful for things that make sense several time. 
Perhaps there should be two functions, one that reads a single-tag and
one that reads a multi-tag?

File format: I'm sure I'm going to get yelled at, but why don't we 
make it XML? Hard to edit, yadda, yadda, but you can easily write a
special purpose widget to edit XConfig (that's what we'll call the DTD)
files.

hopefull-yet-not-naive-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From gstein at lyra.org  Sat Mar  4 11:05:15 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 4 Mar 2000 02:05:15 -0800 (PST)
Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.GSO.4.10.10003041145580.1138-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003040201540.14301-100000@nebula.lyra.org>

On Sat, 4 Mar 2000, Moshe Zadka wrote:
> On Fri, 3 Mar 2000, Jeremy Hylton wrote:
> > It sounds like ConfigParser code and docs could use a general cleanup.
> > Are there any other issues to take care of as part of that cleanup?
> 
> One thing that bothered me once:
> 
> I want to be able to have something like:
> 
> [section]
> tag = 1
> tag = 2
> 
> And be able to retrieve ("section", "tag") -> ["1", "2"].
> Can be awfully useful for things that make sense several time. 
> Perhaps there should be two functions, one that reads a single-tag and
> one that reads a multi-tag?

Structured values would be nice. Several times, I've needed to decompose
the right hand side into lists.

> File format: I'm sure I'm going to get yelled at, but why don't we 
> make it XML? Hard to edit, yadda, yadda, but you can easily write a
> special purpose widget to edit XConfig (that's what we'll call the DTD)
> files.

Write a whole new module. ConfigParser is for files that look like the
above.

There isn't a reason to NOT use XML, but it shouldn't go into
ConfigParser.

<IMO>
I find the above style much easier for *humans*, than an XML file, to
specify options. XML is good for computers; not so good for humans.
</IMO>

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From moshez at math.huji.ac.il  Sat Mar  4 11:46:40 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 4 Mar 2000 12:46:40 +0200 (IST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim>
Message-ID: <Pine.GSO.4.10.10003041242100.1138-100000@sundial>

[Tim Peters]
> ...If a trash cycle
> contains a finalizer (my, but that has to be rare. in practice, in
> well-designed code!), 

This shows something Tim himself has often said -- he never programmed a
GUI. It's very hard to build a GUI (especially with Tkinter) which is
cycle-less, but the classes implementing the GUI often have __del__'s
to break system-allocated resources.

So, it's not as rare as we would like to believe, which is the reason
I haven't given this answer.

which-is-not-the-same-thing-as-disagreeing-with-it-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From moshez at math.huji.ac.il  Sat Mar  4 12:16:19 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 4 Mar 2000 13:16:19 +0200 (IST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.LNX.4.10.10003040000260.14301-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003041313270.1138-100000@sundial>

On Sat, 4 Mar 2000, Greg Stein wrote:

> I disagree. I don't think a Python-level function is going to have a very
> good idea of what to do
<snip>

Much better then the Python interpreter...

<snip>
> Throwing it out to Python won't help
<snip>
> what happens when an unexpected cycle arrives?

Don't delete it.
It's as simple as that, since it's a bug.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From moshez at math.huji.ac.il  Sat Mar  4 12:29:33 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 4 Mar 2000 13:29:33 +0200 (IST)
Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.LNX.4.10.10003040201540.14301-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003041317140.1138-100000@sundial>

On Sat, 4 Mar 2000, Greg Stein wrote:

> Write a whole new module. ConfigParser is for files that look like the
> above.

Gotcha.

One problem: two configurations modules might cause the classic "which
should I use?" confusion.

> <IMO>
> I find the above style much easier for *humans*, than an XML file, to
> specify options. XML is good for computers; not so good for humans.
> </IMO>

Of course: what human could delimit his text with <tag> and </tag>?

oh-no-another-c.l.py-bot-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From gstein at lyra.org  Sat Mar  4 12:38:46 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 4 Mar 2000 03:38:46 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.GSO.4.10.10003041313270.1138-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003040329370.14301-100000@nebula.lyra.org>

On Sat, 4 Mar 2000, Moshe Zadka wrote:
> On Sat, 4 Mar 2000, Greg Stein wrote:
> > I disagree. I don't think a Python-level function is going to have a very
> > good idea of what to do
> <snip>
> 
> Much better then the Python interpreter...

If your function receives two instances (A and B), what are you going to
do? How can you know what their policy is for cleaning up in the face of a
cycle?

I maintain that you would call the equivalent of my proposed __clean__.
There isn't much else you'd be able to do, unless you had a completely
closed system, you expected cycles between specific types of objects, and
you knew a way to clean them up. Even then, you would still be calling
something like __clean__ to let the objects do whatever they needed.

I'm suggesting that __clean__ should be formalized (as part of tp_clean).
Throwing the handling "up to Python" isn't going to do much for you.

Seriously... I'm all for coding more stuff in Python rather than C, but
this just doesn't feel right. Getting the objects GC'd is a language
feature, and a specific pattern/method/recommendation is best formulated
as an interpreter mechanism.

> <snip>
> > Throwing it out to Python won't help
> <snip>
> > what happens when an unexpected cycle arrives?
> 
> Don't delete it.
> It's as simple as that, since it's a bug.

The point behind this stuff is to get rid of it, rather than let it linger
on. If the objects have finalizers (which is how we get to this step!),
then it typically means there is a resource they must release. Getting the
object cleaned and dealloc'd becomes quite important.

Cheers,
-g

p.s. did you send in a patch for the instance_contains() thing yet?

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Sat Mar  4 12:43:12 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 4 Mar 2000 03:43:12 -0800 (PST)
Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.GSO.4.10.10003041317140.1138-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003040338510.14301-100000@nebula.lyra.org>

On Sat, 4 Mar 2000, Moshe Zadka wrote:
> On Sat, 4 Mar 2000, Greg Stein wrote:
> > Write a whole new module. ConfigParser is for files that look like the
> > above.
> 
> Gotcha.
> 
> One problem: two configurations modules might cause the classic "which
> should I use?" confusion.

Nah. They wouldn't *both* be called ConfigParser. And besides, I see the
XML format more as a persistence mechanism rather than a configuration
mechanism. I'd call the module something like "XMLPersist".

> > <IMO>
> > I find the above style much easier for *humans*, than an XML file, to
> > specify options. XML is good for computers; not so good for humans.
> > </IMO>
> 
> Of course: what human could delimit his text with <tag> and </tag>?

Feh. As a communciation mechanism, dropping in that stuff... it's easy.

<appository>But</appository><comma/><noun>I</noun>
<verb><tense>would<modifier>not</modifier></tense>want</verb> ... bleck.

I wouldn't want to use XML for configuration stuff. It just gets ugly.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gvwilson at nevex.com  Sat Mar  4 17:46:24 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Sat, 4 Mar 2000 11:46:24 -0500 (EST)
Subject: [Python-Dev] HTMLgen-style interface to SQL?
Message-ID: <Pine.LNX.4.10.10003041145580.21381-100000@akbar.nevex.com>

[short form]

I'm looking for an object-oriented toolkit that will do for SQL what
Perl's CGI.pm module, or Python's HTMLgen, does for HTML.  Pointers,
examples, or expressions of interest would be welcome.

[long form]

Lincoln Stein's CGI.pm module for Perl allows me to build HTML in an
object-oriented way, instead of getting caught in the Turing tarpit of
string substitution and printf. DOM does the same (in a variety of
languages) for XML.

Right now, if I want to interact with an SQL database from Perl or Python,
I have to embed SQL strings in my programs. I would like to have a
DOM-like ability to build and manipulate queries as objects, then call a
method that translate the query structure into SQL to send to the
database. Alternatively, if there is an XML DTD for SQL (how's that for a
chain of TLAs?), and some tool to convert the XML/SQL to pure SQL, so that
I could build my query using DOM, that would be cool too.

RSVP,

Greg Wilson
gvwilson at nevex.com


From moshez at math.huji.ac.il  Sat Mar  4 19:02:54 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 4 Mar 2000 20:02:54 +0200 (IST)
Subject: [Python-Dev] Re: [Patches] selfnanny.py: checking for "self" in every method
In-Reply-To: <200003041724.MAA05053@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003041955560.4094-100000@sundial>

On Sat, 4 Mar 2000, Guido van Rossum wrote:

> Before we all start writing nannies and checkers, how about a standard
> API design first?  

I thoroughly agree -- we should have a standard API. I tried to write 
selfnanny so it could be callable from any API possible (e.g., it can
take either a file, a string, an ast or a tuple representation)

> I will want to call various nannies from a "Check"
> command that I plan to add to IDLE.  

Very cool: what I imagine is a sort of modular PyLint.

> I already did this with tabnanny,
> and found that it's barely possible -- it's really written to run like
> a script.

Mine definitely isn't: it's designed to run both like a script and like
a module. One outstanding bug: no docos. To be supplied upon request <0.5
wink>. I just wanted to float it out and see if people think that this
particular nanny is worth while.

> Since parsing is expensive, we probably want to share the parse tree.

Yes. Probably as an AST, and transform to tuples/lists inside the
checkers.

> Ideas?

Here's a strawman API:
There's a package called Nanny
Every module in that package should have a function called check_ast.
It's argument is an AST object, and it's output should be a list 
of three-tuples: (line-number, error-message, None) or 
(line-number, error-message, (column-begin, column-end)) (each tuple can
be a different form). 

Problems?
(I'm CCing to python-dev. Please follow up to that discussion to
python-dev only, as I don't believe it belongs in patches)
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From gvwilson at nevex.com  Sat Mar  4 19:26:20 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Sat, 4 Mar 2000 13:26:20 -0500 (EST)
Subject: [Python-Dev] Re: selfnanny.py / nanny architecture
In-Reply-To: <Pine.GSO.4.10.10003041955560.4094-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003041312320.21722-100000@akbar.nevex.com>

> > Guido van Rossum wrote:
> > Before we all start writing nannies and checkers, how about a standard
> > API design first?  

> Moshe Zadka wrote:
> Here's a strawman API:
> There's a package called Nanny
> Every module in that package should have a function called check_ast.
> It's argument is an AST object, and it's output should be a list 
> of three-tuples: (line-number, error-message, None) or 
> (line-number, error-message, (column-begin, column-end)) (each tuple can
> be a different form). 

Greg Wilson wrote:

The SUIF (Stanford University Intermediate Format) group has been working
on an extensible compiler framework for about ten years now.  The
framework is based on an extensible AST spec; anyone can plug in a new
analysis or optimization algorithm by writing one or more modules that
read and write decorated ASTs. (See http://suif.stanford.edu for more
information.)

Based on their experience, I'd suggest that every nanny take an AST as an
argument, and add complaints in place as decorations to the nodes.  A
terminal nanny could then collect these and display them to the user. I
think this architecture will make it simpler to write meta-nannies.

I'd further suggest that the AST be something that can be manipulated
through DOM, since (a) it's designed for tree-crunching, (b) it's already
documented reasonably well, (c) it'll save us re-inventing a wheel, and
(d) generating human-readable output in a variety of customizable formats
ought to be simple (well, simpler than the alternatives).

Greg


From jeremy at cnri.reston.va.us  Sun Mar  5 03:10:28 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Sat, 4 Mar 2000 21:10:28 -0500 (EST)
Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.GSO.4.10.10003041317140.1138-100000@sundial>
References: <Pine.LNX.4.10.10003040201540.14301-100000@nebula.lyra.org>
	<Pine.GSO.4.10.10003041317140.1138-100000@sundial>
Message-ID: <14529.49684.219826.466310@bitdiddle.cnri.reston.va.us>

>>>>> "MZ" == Moshe Zadka <moshez at math.huji.ac.il> writes: 
 
  MZ> On Sat, 4 Mar 2000, Greg Stein wrote: 
  >> Write a whole new module. ConfigParser is for files that look 
  >> like the above. 
 
  MZ> Gotcha. 
 
  MZ> One problem: two configurations modules might cause the classic 
  MZ> "which should I use?" confusion. 
 
I don't think this is a hard decision to make.  ConfigParser is good 
for simple config files that are going to be maintained by humans with

a text editor. 
 
An XML-based configuration file is probably the right solution when 
humans aren't going to maintain the config files by hand.  Perhaps XML
will eventually be the right solution in both cases, but only if XML 
editors are widely available. 
 
  >> <IMO> I find the above style much easier for *humans*, than an 
  >> XML file, to specify options. XML is good for computers; not so 
  >> good for humans.  </IMO> 
 
  MZ> Of course: what human could delimit his text with <tag> and 
  MZ> </tag>? 
 
Could?  I'm sure there are more ways on Linux and Windows to mark up
text than are dreamt of in your philosophy, Moshe <wink>.  The
question is what is easiest to read and understand?

Jeremy


From tim_one at email.msn.com  Sun Mar  5 03:22:16 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 21:22:16 -0500
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" in every method
In-Reply-To: <200003041724.MAA05053@eric.cnri.reston.va.us>
Message-ID: <000201bf8649$a17383e0$f42d153f@tim>

[Guido van Rossum]
> Before we all start writing nannies and checkers, how about a standard
> API design first?  I will want to call various nannies from a "Check"
> command that I plan to add to IDLE.  I already did this with tabnanny,
> and found that it's barely possible -- it's really written to run like
> a script.

I like Moshe's suggestion fine, except with an abstract base class named
Nanny with a virtual method named check_ast.  Nannies should (of course)
derive from that.

> Since parsing is expensive, we probably want to share the parse tree.

What parse tree?  Python's parser module produces an AST not nearly "A
enough" for reasonably productive nanny writing.  GregS & BillT have
improved on that, but it's not in the std distrib.  Other "problems" include
the lack of original source lines in the trees, and lack of column-number
info.

Note that by the time Python has produced a parse tree, all evidence of the
very thing tabnanny is looking for has been removed.  That's why she used
the tokenize module to begin with.

God knows tokenize is too funky to use too when life gets harder (check out
checkappend.py's tokeneater state machine for a preliminary taste of that).

So the *only* solution is to adopt Christian's Stackless so I can rewrite
tokenize as a coroutine like God intended <wink>.

Seriously, I don't know of anything that produces a reasonably usable (for
nannies) parse tree now, except via modifying a Python grammar for use with
John Aycock's SPARK; the latter also comes with very pleasant & powerful
tree pattern-matching abilities.  But it's probably too slow for everyday
"just folks" use.  Grabbing the GregS/BillT enhancement is probably the most
practical thing we could build on right now (but tabnanny will have to
remain a special case).

unsure-about-the-state-of-simpleparse-on-mxtexttools-for-this-ly y'rs  - tim


From tim_one at email.msn.com  Sun Mar  5 04:24:18 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 22:24:18 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <38BE1B69.E0B88B41@lemburg.com>
Message-ID: <000301bf8652$4aadaf00$f42d153f@tim>

Just noting that two instances of this were found in Zope.

[/F]
>     append = list.append
>     for x in something:
>         append(...)

[Tim]
> As detailed in a c.l.py posting, I have yet to find a single instance of
> this actually called with multiple arguments.  Pointing out that it's
> *possible* isn't the same as demonstrating it's an actual problem.  I'm
> quite willing to believe that it is, but haven't yet seen evidence of it.


From fdrake at acm.org  Sun Mar  5 04:55:27 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Sat, 4 Mar 2000 22:55:27 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us>
	<Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>
	<14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
Message-ID: <14529.55983.263225.691427@weyr.cnri.reston.va.us>

Jeremy Hylton writes:
 > Thanks for catching that.  I didn't look at the context.  I'm going to
 > wait, though, until I talk to Fred to mess with the code any more.

  I did it that way since the .ini format allows comments after values 
(the ';' comments after a '=' vi; '#' comments are a ConfigParser
thing), but there's no equivalent concept for RFC822 parsing, other
than '(...)' in addresses.  The code was trying to allow what was
expected from the .ini crowd without breaking the "native" use of
ConfigParser.

 > General question for python-dev readers: What are your experiences
 > with ConfigParser?  I just used it to build a simple config parser for
 > IDLE and found it hard to use for several reasons.  The biggest
 > problem was that the file format is undocumented.  I also found it
 > clumsy to have to specify section and option arguments. I ended up
 > writing a proxy that specializes on section so that get takes only an
 > option argument.
 > 
 > It sounds like ConfigParser code and docs could use a general cleanup.
 > Are there any other issues to take care of as part of that cleanup?

  I agree that the API to ConfigParser sucks, and I think also that
the use of it as a general solution is a big mistake.  It's a messy
bit of code that doesn't need to be, supports a really nasty mix of
syntaxes, and can easily bite users who think they're getting
something .ini-like (the magic names and interpolation is a bad
idea!).  While it suited the original application well enough,
something with .ini syntax and interpolation from a subclass would
have been *much* better.
  I think we should create a new module, inilib, that implements
exactly .ini syntax in a base class that can be intelligently
extended.  ConfigParser should be deprecated.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From tim_one at email.msn.com  Sun Mar  5 05:11:12 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 23:11:12 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: <200003031705.MAA21700@eric.cnri.reston.va.us>
Message-ID: <000601bf8658$d81d34e0$f42d153f@tim>

[Guido]
> OK, so we're down to this one point: if __del__ resurrects the object,
> should __del__ be called again later?  Additionally, should
> resurrection be made illegal?

I give up on the latter, so it really is just one.

> I can easily see how __del__ could *accidentally* resurrect the object
> as part of its normal cleanup ...
> In this example, the helper routine will eventually delete the object
> from its cache, at which point it is truly deleted.  It would be
> harmful, not helpful, if __del__ was called again at this point.

If this is something that happens easily, and current behavior is harmful,
don't you think someone would have complained about it by now?  That is,
__del__ *is* "called again at this point" now, and has been for years &
years.  And if it happens easily, it *is* happening now, and in an unknown
amount of existing code.  (BTW, I doubt it happens at all <wink> -- people
tend to write very simple __del__ methods, so far as I've ever seen)

> Now, it is true that the current docs for __del__ imply that
> resurrection is possible.

"imply" is too weak.  The Reference Manual's "3.3.1 Basic customization"
flat-out says it's possible ("though not recommended").  The precise meaning
of the word "may" in the following sentence is open to debate, though.

> The intention of that note was to warn __del__ writers that in the case
> of accidental resurrection

Sorry, but I can't buy this:  saying that *accidents* are "not recommended"
is just too much of a stretch <wink/frown>.

> __del__ might be called again.

That's a plausible reading of the following "may", but not the only one.  I
believe it's the one you intended, but it's not the meaning I took prior to
this.

> The intention certainly wasn't to allow or encourage intentional
resurrection.

Well, I think it plainly says it's supported ("though not recommended").  I
used it intentionally at KSR, and even recommended it on c.l.py in the dim
past (in one of those "dark & useless" threads <wink>).

> Would there really be someone out there who uses *intentional*
> resurrection?  I severely doubt it.  I've never heard of this.

Why would anyone tell you about something that *works*?!  You rarely hear
the good stuff, you know.  I gave the typical pattern in the preceding msg.
To flesh out the motivation more, you have some external resource that's
very expensive to set up (in KSR's case, it was an IPC connection to a
remote machine).  Rights to use that resource are handed out in the form of
an object.  When a client is done using the resource, they *should*
explicitly use the object's .release() method, but you can't rely on that.
So the object's __del__ method looks like (for example):

def __del__(self):

    # Code not shown to figure out whether to disconnect:  the downside to
    # disconnecting is that it can cost a bundle to create a new connection.
    # If the whole app is shutting down, then of course we want to
disconnect.
    # Or if a timestamp trace shows that we haven't been making good use of
    # all the open connections lately, we may want to disconnect too.

    if decided_to_disconnect:
        self.external_resource.disconnect()
    else:
        # keep the connection alive for reuse
        global_available_connection_objects.append(self)

This is simple & effective, and it relies on both intentional resurrection
and __del__ getting called repeatedly.  I don't claim there's no other way
to write it, just that there's *been* no problem doing this for a millennium
<wink>.

Note that MAL spontaneously sketched similar examples, although I can't say
whether he's actually done stuff like this.


Going back up a level, in another msg you finally admitted <wink> that you
want "__del__ called only once" for the same reason Java wants it:  because
gc has no idea what to do when faced with finalizers in a trash cycle, and
settles for an unprincipled scheme whose primary virtue is that "it doesn't
blow up" -- and "__del__ called only once" happens to be convenient for that
scheme.

But toss such cycles back to the user to deal with at the Python level, and
all those problems go away (along with the artificial need to change
__del__).  The user can break the cycles in an order that makes sense to the
app (or they can let 'em leak!  up to them).

    >>> print gc.get_cycle.__doc__
    Return a list of objects comprising a single garbage cycle; [] if none.

    At least one of the objects has a finalizer, so Python can't determine
the
    intended order of destruction.  If you don't break the cycle, Python
will
    neither run any finalizers for the contained objects nor reclaim their
    memory.  If you do break the cycle, and dispose of the list, Python will
    follow its normal reference-counting rules for running finalizers and
    reclaiming memory.

That this "won't blow up" either is just the least of its virtues <wink>.

you-break-it-you-buy-it-ly y'rs  - tim


From tim_one at email.msn.com  Sun Mar  5 05:56:54 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 23:56:54 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.LNX.4.10.10003040000260.14301-100000@nebula.lyra.org>
Message-ID: <000001bf865f$3acb99a0$432d153f@tim>

[Tim sez "toss insane cycles back on the user"]

[Greg Stein]
> I disagree. I don't think a Python-level function is going to have a very
> good idea of what to do.

You've already assumed that Python coders know exactly what to do, else they
couldn't have coded the new __clean__ method your proposal relies on.  I'm
taking what strikes me as the best part of Scheme's Guardian idea:  don't
assume *anything* about what users "should" do to clean up their trash.
Leave it up to them:  their problem, their solution.  I think finalizers in
trash cycles should be so rare in well-written code that it's just not worth
adding much of anything in the implementation to cater to it.

> IMO, this kind of semantics belong down in the interpreter with a
> specific, documented algorithm. Throwing it out to Python won't help
> -- that function will still have to use a "standard pattern" for getting
> the cyclical objects to toss themselves.

They can use any pattern they want, and if the pattern doesn't *need* to be
coded in C as part of the implementation, it shouldn't be.

> I think that standard pattern should be a language definition.

I distrust our ability to foresee everything users may need over the next 10
years:  how can we know today that the first std pattern you dreamed up off
the top of your head is the best approach to an unbounded number of problems
we haven't yet seen a one of <wink>?

> Without a standard pattern, then you're saying the application will know
> what to do, but that is kind of weird -- what happens when an unexpected
> cycle arrives?

With the hypothetical gc.get_cycle() function I mentioned before, they
should inspect objects in the list they get back, and if they find they
don't know what to do with them, they can still do anything <wink> they
want.  Examples include raising an exception, dialing my home pager at 3am
to insist I come in to look at it, or simply let the list go away (at which
point the objects in the list will again become a trash cycle containing a
finalizer).

If several distinct third-party modules get into this act, I *can* see where
it could become a mess.  That's why Scheme "guardians" is plural:  a given
module could register its "problem objects" in advance with a specific
guardian of its own, and query only that guardian later for things ready to
die.  This probably can't be implemented in Python, though, without support
for weak references (or lots of brittle assumptions about specific refcount
values).

agreeably-disagreeing-ly y'rs  - tim


From tim_one at email.msn.com  Sun Mar  5 05:56:58 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 23:56:58 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.GSO.4.10.10003041242100.1138-100000@sundial>
Message-ID: <000101bf865f$3cb0d460$432d153f@tim>

[Tim]
> ...If a trash cycle contains a finalizer (my, but that has to be rare.
> in practice, in well-designed code!),

[Moshe Zadka]
> This shows something Tim himself has often said -- he never programmed a
> GUI. It's very hard to build a GUI (especially with Tkinter) which is
> cycle-less, but the classes implementing the GUI often have __del__'s
> to break system-allocated resources.
>
> So, it's not as rare as we would like to believe, which is the reason
> I haven't given this answer.

I wrote Cyclops.py when trying to track down leaks in IDLE.  The
extraordinary thing we discovered is that "even real gc" would not have
reclaimed the cycles.  They were legitimately reachable, because, indeed,
"everything points to everything else".  Guido fixed almost all of them by
explicitly calling new "close" methods.  I believe IDLE has no __del__
methods at all now.  Tkinter.py currently contains two.

so-they-contained-__del__-but-weren't-trash-ly y'rs  - tim


From tim_one at email.msn.com  Sun Mar  5 07:05:24 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sun, 5 Mar 2000 01:05:24 -0500
Subject: [Python-Dev] Unicode mapping tables
In-Reply-To: <38BCD71C.3592E6A@lemburg.com>
Message-ID: <000601bf8668$cbbdd640$432d153f@tim>

[M.-A. Lemburg]
> ...
> Here's what I'll do:
>
> * implement .capitalize() in the traditional way for Unicode
>   objects (simply convert the first char to uppercase)

Given .title(), is .capitalize() of use for Unicode strings?  Or is it just
a temptation to do something senseless in the Unicode world?  If it doesn't
make sense, leave it out (this *seems* like compulsion <wink> to implement
all current string methods in *some* way for Unicode, whether or not they
make sense).


From moshez at math.huji.ac.il  Sun Mar  5 07:16:22 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 5 Mar 2000 08:16:22 +0200 (IST)
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" in
 every method
In-Reply-To: <000201bf8649$a17383e0$f42d153f@tim>
Message-ID: <Pine.GSO.4.10.10003050800540.7148-100000@sundial>

On Sat, 4 Mar 2000, Tim Peters wrote:

> I like Moshe's suggestion fine, except with an abstract base class named
> Nanny with a virtual method named check_ast.  Nannies should (of course)
> derive from that.

Why? The C++ you're programming damaged your common sense cycles?

> > Since parsing is expensive, we probably want to share the parse tree.
> 
> What parse tree?  Python's parser module produces an AST not nearly "A
> enough" for reasonably productive nanny writing.

As a note, selfnanny uses the parser module AST.

>  GregS & BillT have
> improved on that, but it's not in the std distrib.  Other "problems" include
> the lack of original source lines in the trees,

The parser module has source lines.

> and lack of column-number info.

Yes, that sucks.

> Note that by the time Python has produced a parse tree, all evidence of the
> very thing tabnanny is looking for has been removed.  That's why she used
> the tokenize module to begin with.

Well, it's one of the few nannies which would be in that position.

> God knows tokenize is too funky to use too when life gets harder (check out
> checkappend.py's tokeneater state machine for a preliminary taste of that).

Why doesn't checkappend.py uses the parser module?

> Grabbing the GregS/BillT enhancement is probably the most
> practical thing we could build on right now 

You got some pointers?

> (but tabnanny will have to remain a special case).

tim-will-always-be-a-special-case-in-our-hearts-ly y'rs, Z.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From tim_one at email.msn.com  Sun Mar  5 08:01:12 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sun, 5 Mar 2000 02:01:12 -0500
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method
In-Reply-To: <Pine.GSO.4.10.10003050800540.7148-100000@sundial>
Message-ID: <000901bf8670$97d8f320$432d153f@tim>

[Tim]
>> [make Nanny a base class]

[Moshe Zadka]
> Why?

Because it's an obvious application for OO design.  A common base class
formalizes the interface and can provide useful utilities for subclasses.

> The C++ you're programming damaged your common sense cycles?

Yes, very, but that isn't relevant here <wink>.  It's good Python sense too.

>> [parser module produces trees far too concrete for comfort]

> As a note, selfnanny uses the parser module AST.

Understood, but selfnanny has a relatively trivial task.  Hassling with
tuples nested dozens deep for even relatively simple stmts is both a PITA
and a time sink.

>> [parser doesn't give source lines]

> The parser module has source lines.

No, it does not (it only returns terminals, as isolated strings).  The
tokenize module does deliver original source lines in their entirety (as
well as terminals, as isolated strings; and column numbers).

>> and lack of column-number info.

> Yes, that sucks.

> ...
> Why doesn't checkappend.py uses the parser module?

Because it wanted to display the acutal source line containing an offending
"append" (which, again, the parse module does not supply).  Besides, it was
a trivial variation on tabnanny.py, of which I have approximately 300 copies
on my disk <wink>.

>> Grabbing the GregS/BillT enhancement is probably the most
>> practical thing we could build on right now

> You got some pointers?

Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab
transformer.py from the  zip file.  The latter supplies a very useful
post-processing pass over the parse module's output, squashing it *way*
down.


From moshez at math.huji.ac.il  Sun Mar  5 08:08:41 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 5 Mar 2000 09:08:41 +0200 (IST)
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self"
 inevery method
In-Reply-To: <000901bf8670$97d8f320$432d153f@tim>
Message-ID: <Pine.GSO.4.10.10003050906030.7148-100000@sundial>

On Sun, 5 Mar 2000, Tim Peters wrote:

> [Tim]
> >> [make Nanny a base class]
> 
> [Moshe Zadka]
> > Why?
> 
> Because it's an obvious application for OO design.  A common base class
> formalizes the interface and can provide useful utilities for subclasses.

The interface is just one function. You're welcome to have a do-nothing
nanny that people *can* derive from: I see no point in making them derive
from a base class.

> > As a note, selfnanny uses the parser module AST.
> 
> Understood, but selfnanny has a relatively trivial task.

That it does, and it was painful.

> >> [parser doesn't give source lines]
> 
> > The parser module has source lines.
> 
> No, it does not (it only returns terminals, as isolated strings). 

Sorry, misunderstanding: it seemed obvious to me you wanted line numbers.
For lines, use the linecache module...

> > You got some pointers?
> 
> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab
> transformer.py from the  zip file. 

I'll have a look.
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From effbot at telia.com  Sun Mar  5 10:24:37 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Sun, 5 Mar 2000 10:24:37 +0100
Subject: [Python-Dev] return statements in lambda
Message-ID: <006f01bf8686$391ced80$34aab5d4@hagrid>

from "Python for Lisp Programmers":
http://www.norvig.com/python-lisp.html

> Don't forget return. Writing def twice(x): x+x is tempting
> and doesn't signal a warning or > ception, but you probably
> meant to have a return in there. This is particularly irksome
> because in a lambda you are prohibited from writing return,
> but the semantics is to do the return. 

maybe adding an (optional but encouraged) "return"
to lambda would be an improvement?

    lambda x: x + 10

vs.

    lambda x: return x + 10

or is this just more confusing...  opinions?

</F>


From guido at python.org  Sun Mar  5 13:04:56 2000
From: guido at python.org (Guido van Rossum)
Date: Sun, 05 Mar 2000 07:04:56 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: Your message of "Sat, 04 Mar 2000 22:55:27 EST."
             <14529.55983.263225.691427@weyr.cnri.reston.va.us> 
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>  
            <14529.55983.263225.691427@weyr.cnri.reston.va.us> 
Message-ID: <200003051204.HAA05367@eric.cnri.reston.va.us>

[Fred]
>   I agree that the API to ConfigParser sucks, and I think also that
> the use of it as a general solution is a big mistake.  It's a messy
> bit of code that doesn't need to be, supports a really nasty mix of
> syntaxes, and can easily bite users who think they're getting
> something .ini-like (the magic names and interpolation is a bad
> idea!).  While it suited the original application well enough,
> something with .ini syntax and interpolation from a subclass would
> have been *much* better.
>   I think we should create a new module, inilib, that implements
> exactly .ini syntax in a base class that can be intelligently
> extended.  ConfigParser should be deprecated.

Amen.

Some thoughts:

- You could put it all in ConfigParser.py but with new classnames.
(Not sure though, since the ConfigParser class, which is really a
kind of weird variant, will be assumed to be the main class because
its name is that of the module.)

- Variants on the syntax could be given through some kind of option
system rather than through subclassing -- they should be combinable
independently.  Som possible options (maybe I'm going overboard here)
could be:

	- comment characters: ('#', ';', both, others?)
	- comments after variables allowed? on sections?
	- variable characters: (':', '=', both, others?)
	- quoting of values with "..." allowed?
	- backslashes in "..." allowed?
	- does backslash-newline mean a continuation?
	- case sensitivity for section names (default on)
	- case sensitivity for option names (default off)
	- variables allowed before first section name?
	- first section name?  (default "main")
	- character set allowed in section names
	- character set allowed in variable names
	- %(...) substitution?

(Well maybe the whole substitution thing should really be done through
a subclass -- it's too weird for normal use.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Sun Mar  5 13:17:31 2000
From: guido at python.org (Guido van Rossum)
Date: Sun, 05 Mar 2000 07:17:31 -0500
Subject: [Python-Dev] Unicode mapping tables
In-Reply-To: Your message of "Sun, 05 Mar 2000 01:05:24 EST."
             <000601bf8668$cbbdd640$432d153f@tim> 
References: <000601bf8668$cbbdd640$432d153f@tim> 
Message-ID: <200003051217.HAA05395@eric.cnri.reston.va.us>

> [M.-A. Lemburg]
> > ...
> > Here's what I'll do:
> >
> > * implement .capitalize() in the traditional way for Unicode
> >   objects (simply convert the first char to uppercase)

[Tim]
> Given .title(), is .capitalize() of use for Unicode strings?  Or is it just
> a temptation to do something senseless in the Unicode world?  If it doesn't
> make sense, leave it out (this *seems* like compulsion <wink> to implement
> all current string methods in *some* way for Unicode, whether or not they
> make sense).

The intention of this is to make code that does something using
strings do exactly the same strings if those strings happen to be
Unicode strings with the same values.

The capitalize method returns self[0].upper() + self[1:] -- that may
not make sense for e.g. Japanese, but it certainly does for Russian or
Greek.

It also does this in JPython.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Sun Mar  5 13:24:41 2000
From: guido at python.org (Guido van Rossum)
Date: Sun, 05 Mar 2000 07:24:41 -0500
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method
In-Reply-To: Your message of "Sun, 05 Mar 2000 02:01:12 EST."
             <000901bf8670$97d8f320$432d153f@tim> 
References: <000901bf8670$97d8f320$432d153f@tim> 
Message-ID: <200003051224.HAA05410@eric.cnri.reston.va.us>

> >> [parser doesn't give source lines]
> 
> > The parser module has source lines.
> 
> No, it does not (it only returns terminals, as isolated strings).  The
> tokenize module does deliver original source lines in their entirety (as
> well as terminals, as isolated strings; and column numbers).

Moshe meant line numbers - -it has those.

> > Why doesn't checkappend.py uses the parser module?
> 
> Because it wanted to display the acutal source line containing an offending
> "append" (which, again, the parse module does not supply).  Besides, it was
> a trivial variation on tabnanny.py, of which I have approximately 300 copies
> on my disk <wink>.

Of course another argument for making things more OO.  (The code used
in tabnanny.py to process files and recursively directories fronm
sys.argv is replicated a thousand times in various scripts of mine --
Tim took it from my now-defunct takpolice.py.  This should be in the
std library somehow...)

> >> Grabbing the GregS/BillT enhancement is probably the most
> >> practical thing we could build on right now
> 
> > You got some pointers?
> 
> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab
> transformer.py from the  zip file.  The latter supplies a very useful
> post-processing pass over the parse module's output, squashing it *way*
> down.

Those of you who have seen the compiler-sig should know that Jeremy
made an improvement which will find its way into p2c.  It's currently
on display in the Python CVS tree in the nondist branch: see
http://www.python.org/pipermail/compiler-sig/2000-February/000011.html
and the ensuing thread for more details.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Sun Mar  5 14:46:13 2000
From: guido at python.org (Guido van Rossum)
Date: Sun, 05 Mar 2000 08:46:13 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: Your message of "Fri, 03 Mar 2000 22:26:54 EST."
             <000401bf8589$7d1364e0$c6a0143f@tim> 
References: <000401bf8589$7d1364e0$c6a0143f@tim> 
Message-ID: <200003051346.IAA05539@eric.cnri.reston.va.us>

I'm beginning to believe that handing cycles with finalizers to the
user is better than calling __del__ with a different meaning, and I
tentatively withdraw my proposal to change the rules for when __del__
is called (even when __init__ fails; I haven't had any complaints
about that either).

There seem to be two competing suggestions for solutions: (1) call
some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the
object; (2) Tim's proposal of an interface to ask the garbage
collector for a trash cycle with a finalizer (or for an object with a
finalizer in a trash cycle?).

Somehow Tim's version looks less helpful to me, because it *seems*
that whoever gets to handle the cycle (the main code of the program?)
isn't necessarily responsible for creating it (some library you didn't
even know was used under the covers of some other library you called).

Of course, it's also posssible that a trash cycle is created by code
outside the responsibility of the finalizer.

But still, I have a hard time understanding how Tim's version would be
used.  Greg or Marc-Andre's version I understand.

What keeps nagging me though is what to do when there's a finalizer
but no cleanup method.  I guess the trash cycle remains alive.  Is
this acceptable?  (I guess so, because we've given the programmer a
way to resolve the trash: provide a cleanup method.)

If we detect individual cycles (the current algorithm doesn't do that
yet, though it seems easy enough to do another scan), could we
special-case cycles with only one finalizer and no cleaner-upper?
(I'm tempted to call the finalizer because it seems little harm can be
done -- but then of course there's the problem of the finalizer being
called again when the refcount really goes to zero. :-( )

> Exactly.  The *programmer* may know the right thing to do, but the Python
> implementation can't possibly know.  Facing both facts squarely constrains
> the possibilities to the only ones that are all of understandable,
> predictable and useful.  Cycles with finalizers must be a Magic-Free Zone
> else you lose at least one of those three:  even Guido's kung fu isn't
> strong enough to outguess this.
> 
> [a nice implementation sketch, of what seems an overly elaborate scheme,
>  if you believe cycles with finalizers are rare in intelligently designed
>  code)
> ]
> 
> Provided Guido stays interested in this, he'll make his own fun.  I'm just
> inviting him to move in a sane direction <0.9 wink>.

My current tendency is to go with the basic __cleanup__ and nothing
more, calling each instance's __cleanup__ before clobbering
directories and lists -- which should break all cycles safely.

> One caution:
> 
> > ...
> > If the careful-cleaning algorithm hits the end of the careful set of
> > objects and the set is non-empty, then throw an exception:
> > GCImpossibleError.
> 
> Since gc "can happen at any time", this is very severe (c.f. Guido's
> objection to making resurrection illegal).

Not quite.  Cycle detection is presumably only called every once in a
while on memory allocation, and memory *allocation* (as opposed to
deallocation) is allowed to fail.  Of course, this will probably run
into various coding bugs where allocation failure isn't dealt with
properly, because in practice this happens so rarely...

> Hand a trash cycle back to the
> programmer instead, via callback or request or whatever, and it's all
> explicit without more cruft in the implementation.  It's alive again when
> they get it back, and they can do anything they want with it (including
> resurrecting it, or dropping it again, or breaking cycles --
> anything).

That was the idea with calling the finalizer too: it would be called
between INCREF/DECREF, so the object would be considered alive for the
duration of the finalizer call.

Here's another way of looking at my error: for dicts and lists, I
would call a special *clear* function; but for instances, I would call
*dealloc*, however intending it to perform a *clear*.

I wish we didn't have to special-case finalizers on class instances
(since each dealloc function is potentially a combination of a
finalizer and a deallocation routine), but the truth is that they
*are* special -- __del__ has no responsibility for deallocating
memory, only for deallocating external resources (such as temp files).

And even if we introduced a tp_clean protocol that would clear dicts
and lists and call __cleanup__ for instances, we'd still want to call
it first for instances, because an instance depends on its __dict__
for its __cleanup__ to succeed (but the __dict__ doesn't depend on the
instance for its cleanup).  Greg's 3-phase tp_clean protocol seems
indeed overly elaborate but I guess it deals with such dependencies in
the most general fashion.

> I'd focus on the cycles themselves, not on the types of objects
> involved.  I'm not pretending to address the "order of finalization
> at shutdown" question, though (although I'd agree they're deeply
> related: how do you follow a topological sort when there *isn't*
> one?  well, you don't, because you can't).

In theory, you just delete the last root (a C global pointing to
sys.modules) and you run the garbage collector.  It might be more
complicated in practiceto track down all roots.  Another practical
consideration is that now there are cycles of the form

<function object> <=> <module dict>

which suggests that we should make function objects traceable.  Also,
modules can cross-reference, so module objects should be made
traceable.  I don't think that this will grow the sets of traced
objects by too much (since the dicts involved are already traced, and
a typical program has way fewer functions and modules than it has
class instances).  On the other hand, we may also have to trace
(un)bound method objects, and these may be tricky because they are
allocated and deallocated at high rates (once per typical method
call).

Back to the drawing board...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at mojam.com  Sun Mar  5 17:42:30 2000
From: skip at mojam.com (Skip Montanaro)
Date: Sun, 5 Mar 2000 10:42:30 -0600 (CST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <200003051346.IAA05539@eric.cnri.reston.va.us>
References: <000401bf8589$7d1364e0$c6a0143f@tim>
	<200003051346.IAA05539@eric.cnri.reston.va.us>
Message-ID: <14530.36471.11654.666900@beluga.mojam.com>

    Guido> What keeps nagging me though is what to do when there's a
    Guido> finalizer but no cleanup method.  I guess the trash cycle remains
    Guido> alive.  Is this acceptable?  (I guess so, because we've given the
    Guido> programmer a way to resolve the trash: provide a cleanup method.)

That assumes the programmer even knows there's a cycle, right?  I'd like to
see this scheme help provide debugging assistance.  If a cycle is discovered
but the programmer hasn't declared a cleanup method for the object it wants
to cleanup, a default cleanup method is called if it exists
(e.g. sys.default_cleanup), which would serve mostly as an alert (print
magic hex values to stderr, popup a Tk bomb dialog, raise the blue screen of
death, ...) as opposed to actually breaking any cycles.  Presumably the
programmer would define sys.default_cleanup during development and leave it
undefined during production.

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From paul at prescod.net  Sat Mar  4 02:04:43 2000
From: paul at prescod.net (Paul Prescod)
Date: Fri, 03 Mar 2000 17:04:43 -0800
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>  
	            <38BC86E1.53F69776@prescod.net> <200003010411.XAA12988@eric.cnri.reston.va.us>
Message-ID: <38C0612B.7C92F8C4@prescod.net>

Guido van Rossum wrote:
> 
> ..
> Multi-arg
> append probably won't be the only reason why e.g. Digital Creations
> may need to release an update to Zope for Python 1.6.  Zope comes with
> its own version of Python anyway, so they have control over when they
> make the switch.

My concernc is when I want to build an application with a module that
only works with Python 1.5.2 and another one that only works with Python
1.6. If we can avoid that situation by making 1.6 compatible with 1.5.2.
we should. By the time 1.7 comes around I will accept that everyone has
had enough time to update their modules. Remember that many module
authors are just part time volunteers. They may only use Python every
few months when they get a spare weekend!

I really hope that Andrew is wrong when he predicts that there may be
lots of different places where Python 1.6 breaks code! I'm in favor of
being a total jerk when it comes to Py3K but Python has been pretty
conservative thus far.

Could someone remind in one sentence what the downside is for treating
this as a warning condition as Java does with its deprecated features?
Then the CP4E people don't get into bad habits and those same CP4E
people trying to use older modules don't run into frustrating runtime
errors. Do it for the CP4E people! (how's that for rhetoric)
-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"We still do not know why mathematics is true and whether it is
certain. But we know what we do not know in an immeasurably richer way
than we did. And learning this has been a remarkable achievement,
among the greatest and least known of the modern era." 
        - from "Advent of the Algorithm" David Berlinski
	http://www.opengroup.com/mabooks/015/0151003386.shtml


From jeremy at cnri.reston.va.us  Sun Mar  5 18:46:14 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Sun, 5 Mar 2000 12:46:14 -0500 (EST)
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method
In-Reply-To: <000901bf8670$97d8f320$432d153f@tim>
References: <Pine.GSO.4.10.10003050800540.7148-100000@sundial>
	<000901bf8670$97d8f320$432d153f@tim>
Message-ID: <14530.40294.593407.777859@bitdiddle.cnri.reston.va.us>

>>>>> "TP" == Tim Peters <tim_one at email.msn.com> writes:

  >>> Grabbing the GregS/BillT enhancement is probably the most
  >>> practical thing we could build on right now

  >> You got some pointers?

  TP> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and
  TP> grab transformer.py from the zip file.  The latter supplies a
  TP> very useful post-processing pass over the parse module's output,
  TP> squashing it *way* down.

The compiler tools in python/nondist/src/Compiler include Bill &
Greg's transformer code, a class-based AST (each node is a subclass of
the generic node), and a visitor framework for walking the AST.  

The APIs and organization are in a bit of flux; Mark Hammond suggested
some reorganization that I've not finished yet.  I may finish it up
this evening.

The transformer module does a good job of incuding line numbers, but
I've occasionally run into a node that didn't have a lineno
attribute when I expected it would.  I haven't taken the time to
figure out if my expection was unreasonable or if the transformer
should be fixed.

The compiler-sig might be a good place to discuss this further.  A
warning framework was one of my original goals for the SIG.  I imagine
we could convince Guido to move warnings + compiler tools into the
standard library if they end up being useful.

Jeremy


From mal at lemburg.com  Sun Mar  5 20:57:32 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 05 Mar 2000 20:57:32 +0100
Subject: [Python-Dev] Unicode mapping tables
References: <000601bf8668$cbbdd640$432d153f@tim>
Message-ID: <38C2BC2C.FFEB72C3@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > ...
> > Here's what I'll do:
> >
> > * implement .capitalize() in the traditional way for Unicode
> >   objects (simply convert the first char to uppercase)
> 
> Given .title(), is .capitalize() of use for Unicode strings?  Or is it just
> a temptation to do something senseless in the Unicode world?  If it doesn't
> make sense, leave it out (this *seems* like compulsion <wink> to implement
> all current string methods in *some* way for Unicode, whether or not they
> make sense).

.capitalize() only touches the first char of the string - not
sure whether it makes sense in both worlds ;-)

Anyhow, the difference is there but subtle: string.capitalize()
will use C's toupper() which is locale dependent, while
unicode.capitalize() uses Unicode's toTitleCase() for the first
character.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Sun Mar  5 21:15:47 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 05 Mar 2000 21:15:47 +0100
Subject: [Python-Dev] Design question: call __del__ only after successful 
 __init__?
References: <000601bf8658$d81d34e0$f42d153f@tim>
Message-ID: <38C2C073.CD51688@lemburg.com>

Tim Peters wrote:
> 
> [Guido]
> > Would there really be someone out there who uses *intentional*
> > resurrection?  I severely doubt it.  I've never heard of this.
> 
> Why would anyone tell you about something that *works*?!  You rarely hear
> the good stuff, you know.  I gave the typical pattern in the preceding msg.
> To flesh out the motivation more, you have some external resource that's
> very expensive to set up (in KSR's case, it was an IPC connection to a
> remote machine).  Rights to use that resource are handed out in the form of
> an object.  When a client is done using the resource, they *should*
> explicitly use the object's .release() method, but you can't rely on that.
> So the object's __del__ method looks like (for example):
> 
> def __del__(self):
> 
>     # Code not shown to figure out whether to disconnect:  the downside to
>     # disconnecting is that it can cost a bundle to create a new connection.
>     # If the whole app is shutting down, then of course we want to
> disconnect.
>     # Or if a timestamp trace shows that we haven't been making good use of
>     # all the open connections lately, we may want to disconnect too.
> 
>     if decided_to_disconnect:
>         self.external_resource.disconnect()
>     else:
>         # keep the connection alive for reuse
>         global_available_connection_objects.append(self)
> 
> This is simple & effective, and it relies on both intentional resurrection
> and __del__ getting called repeatedly.  I don't claim there's no other way
> to write it, just that there's *been* no problem doing this for a millennium
> <wink>.
> 
> Note that MAL spontaneously sketched similar examples, although I can't say
> whether he's actually done stuff like this.

Not exactly this, but similar things in the weak reference
implementation of mxProxy.

The idea came from a different area: the C implementation
of Python uses free lists a lot and these are basically
implementations of the same idiom: save an allocated
resource for reviving it at some later point.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From nascheme at enme.ucalgary.ca  Mon Mar  6 01:27:54 2000
From: nascheme at enme.ucalgary.ca (nascheme at enme.ucalgary.ca)
Date: Sun, 5 Mar 2000 17:27:54 -0700
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim>; from tim_one@email.msn.com on Fri, Mar 03, 2000 at 08:38:43PM -0500
References: <200003031650.LAA21647@eric.cnri.reston.va.us> <000001bf857a$60b45ac0$c6a0143f@tim>
Message-ID: <20000305172754.A14998@acs.ucalgary.ca>

On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote:
> So here's what I'd consider doing:  explicit is better than implicit, and in
> the face of ambiguity refuse the temptation to guess.

I like Marc's suggestion.  Here is my proposal:

Allow classes to have a new method, __cleanup__ or whatever you
want to call it.  When tp_clear is called for an instance, it
checks for this method.  If it exists, call it, otherwise delete
the container objects from the instance's dictionary.  When
collecting cycles, call tp_clear for instances first.

Its simple and allows the programmer to cleanly break cycles if
they insist on creating them and using __del__ methods.


    Neil


From tim_one at email.msn.com  Mon Mar  6 08:13:21 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Mon, 6 Mar 2000 02:13:21 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <38C0612B.7C92F8C4@prescod.net>
Message-ID: <000401bf873b$745f8320$ea2d153f@tim>

[Paul Prescod]
> ...
> Could someone remind in one sentence what the downside is for treating
> this as a warning condition as Java does with its deprecated features?

Simply the lack of anything to build on:  Python has no sort of runtime
warning system now, and nobody has volunteered to create one.  If you do
<wink>, remember that stdout & stderr may go to the bit bucket in a GUI app.

The bit about dropping the "L" suffix on longs seems unwarnable-about in any
case (short of warning every time anyone uses long()).

remember-that-you-asked-for-the-problems-not-for-solutions<wink>-ly y'rs
    - tim


From tim_one at email.msn.com  Mon Mar  6 08:33:49 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Mon, 6 Mar 2000 02:33:49 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: <38C2C073.CD51688@lemburg.com>
Message-ID: <000701bf873e$5032eca0$ea2d153f@tim>

[M.-A. Lemburg, on the resurrection/multiple-__del__ "idiom"]
> ...
> The idea came from a different area: the C implementation
> of Python uses free lists a lot and these are basically
> implementations of the same idiom: save an allocated
> resource for reviving it at some later point.

Excellent analogy!  Thanks.  Now that you phrased it in this clarifying way,
I recall that very much the same point was raised in the papers that
resulted in the creation of guardians in Scheme.  I don't know that anyone
is actually using Python __del__ this way today (I am not), but you reminded
me why I thought it was natural at one time <wink>.

generally-__del__-aversive-now-except-in-c++-where-destructors-are-
    guaranteed-to-be-called-when-you-except-them-to-be-ly y'rs  - tim


From tim_one at email.msn.com  Mon Mar  6 09:12:06 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Mon, 6 Mar 2000 03:12:06 -0500
Subject: [Python-Dev] return statements in lambda
In-Reply-To: <006f01bf8686$391ced80$34aab5d4@hagrid>
Message-ID: <000901bf8743$a9f61aa0$ea2d153f@tim>

[/F]
> maybe adding an (optional but encouraged) "return"
> to lambda would be an improvement?
>
>     lambda x: x + 10
>
> vs.
>
>     lambda x: return x + 10
>
> or is this just more confusing...  opinions?

It was an odd complaint to begin with, since Lisp-heads aren't used to using
"return" anyway.  More of a symptom of taking a shallow syntactic approach
to a new (to them) language.

For non-Lisp heads, I think it's more confusing in the end, blurring the
distinction between stmts and expressions ("the body of a lambda must be an
expression" ... "ok, i lied, unless it's a 'return' stmt).  If Guido had it
to do over again, I vote he rejects the original patch <wink>.  Short of
that, would have been better if the lambda arglist required parens, and if
the body were required to be a single return stmt (that would sure end the
"lambda x: print x" FAQ -- few would *expect* "return print x" to work!).

hindsight-is-great<wink>-ly y'rs  - tim


From tim_one at email.msn.com  Mon Mar  6 10:09:45 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Mon, 6 Mar 2000 04:09:45 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <200003051346.IAA05539@eric.cnri.reston.va.us>
Message-ID: <000b01bf874b$b6fe9da0$ea2d153f@tim>

[Guido]
> I'm beginning to believe that handing cycles with finalizers to the
> user is better than calling __del__ with a different meaning,

You won't be sorry:  Python has the chance to be the first language that's
both useful and sane here!

> and I tentatively withdraw my proposal to change the rules for when
> __del__is called (even when __init__ fails; I haven't had any complaints
> about that either).

Well, everyone liked the parenthetical half of that proposal, although
Jack's example did  point out a real surprise with it.

> There seem to be two competing suggestions for solutions: (1) call
> some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the
> object; (2) Tim's proposal of an interface to ask the garbage
> collector for a trash cycle with a finalizer (or for an object with a
> finalizer in a trash cycle?).

Or a maximal strongly-connected component, or *something* -- unsure.

> Somehow Tim's version looks less helpful to me, because it *seems*
> that whoever gets to handle the cycle (the main code of the program?)
> isn't necessarily responsible for creating it (some library you didn't
> even know was used under the covers of some other library you called).

Yes, to me too.  This is the Scheme "guardian" idea in a crippled form
(Scheme supports as many distinct guardians as the programmer cares to
create), and even in its full-blown form it supplies "a perfectly general
mechanism with no policy whatsoever".

Greg convinced me (although I haven't admitted this yet <wink>) that "no
policy whatsoever" is un-Pythonic too.  *Some* policy is helpful, so I won't
be pushing the guardian idea any more (although see immediately below for an
immediate backstep on that <wink>).

> ...
> What keeps nagging me though is what to do when there's a finalizer
> but no cleanup method.  I guess the trash cycle remains alive.  Is
> this acceptable?  (I guess so, because we've given the programmer a
> way to resolve the trash: provide a cleanup method.)

BDW considers it better to leak than to risk doing a wrong thing, and I
agree wholeheartedly with that.  GC is one place you want to have a "100%
language".

This is where something like a guardian can remain useful:  while leaking is
OK because you've given them an easy & principled alternative, leaking
without giving them a clear way to *know* about it is not OK.  If gc pushes
the leaked stuff off to the side, the gc module should (say) supply an entry
point that returns all the leaked stuff in a list.  Then users can *know*
they're leaking, know how badly they're leaking, and examine exactly the
objects that are leaking.  Then they've got the info they need to repair
their program (or at least track down the 3rd-party module that's leaking).
As with a guardian, they *could* also build a reclamation scheme on top of
it, but that would no longer be the main (or even an encouraged) thrust.

> If we detect individual cycles (the current algorithm doesn't do that
> yet, though it seems easy enough to do another scan), could we
> special-case cycles with only one finalizer and no cleaner-upper?
> (I'm tempted to call the finalizer because it seems little harm can be
> done -- but then of course there's the problem of the finalizer being
> called again when the refcount really goes to zero. :-( )

"Better safe than sorry" is my immediate view on this -- you can't know that
the finalizer won't resurrect the cycle, and "finalizer called iff refcount
hits 0" is a wonderfully simple & predictable rule.  That's worth a lot to
preserve, unless & until it proves to be a disaster in practice.


As to the details of cleanup, I haven't succeeded in making the time to
understand all the proposals.  But I've done my primary job here if I've
harassed everyone into not repeating the same mistakes all previous
languages have made <0.9 wink>.

> ...
> I wish we didn't have to special-case finalizers on class instances
> (since each dealloc function is potentially a combination of a
> finalizer and a deallocation routine), but the truth is that they
> *are* special -- __del__ has no responsibility for deallocating
> memory, only for deallocating external resources (such as temp files).

And the problem is that __del__ can do anything whatsoever than can be
expressed in Python, so there's not a chance in hell of outguessing it.

> ...
> Another practical consideration is that now there are cycles of the form
>
> <function object> <=> <module dict>
>
> which suggests that we should make function objects traceable.  Also,
> modules can cross-reference, so module objects should be made
> traceable.  I don't think that this will grow the sets of traced
> objects by too much (since the dicts involved are already traced, and
> a typical program has way fewer functions and modules than it has
> class instances).  On the other hand, we may also have to trace
> (un)bound method objects, and these may be tricky because they are
> allocated and deallocated at high rates (once per typical method
> call).

This relates to what I was trying to get at with my response to your gc
implementation sketch:  mark-&-sweep needs to chase *everything*, so the set
of chased types is maximal from the start.  Adding chased types to the
"indirectly infer what's unreachable via accounting for internal refcounts
within the transitive closure" scheme can end up touching nearly as much as
a full M-&-S pass per invocation.  I don't know where the break-even point
is, but the more stuff you chase in the latter scheme the less often you
want to run it.

About high rates, so long as a doubly-linked list allows efficient removal
of stuff that dies via refcount exhaustion, you won't actually *chase* many
bound method objects (i.e.,  they'll usually go away by themselves).

Note in passing that bound method objects often showed up in cycles in IDLE,
although you usually managed to break those in other ways.

> Back to the drawing board...

Good!  That means you're making real progress <wink>.

glad-someone-is-ly y'rs  - tim


From mal at lemburg.com  Mon Mar  6 11:01:31 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 06 Mar 2000 11:01:31 +0100
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
References: <200003031650.LAA21647@eric.cnri.reston.va.us> <000001bf857a$60b45ac0$c6a0143f@tim> <20000305172754.A14998@acs.ucalgary.ca>
Message-ID: <38C381FB.E222D6E4@lemburg.com>

nascheme at enme.ucalgary.ca wrote:
> 
> On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote:
> > So here's what I'd consider doing:  explicit is better than implicit, and in
> > the face of ambiguity refuse the temptation to guess.
> 
> I like Marc's suggestion.  Here is my proposal:
> 
> Allow classes to have a new method, __cleanup__ or whatever you
> want to call it.  When tp_clear is called for an instance, it
> checks for this method.  If it exists, call it, otherwise delete
> the container objects from the instance's dictionary.  When
> collecting cycles, call tp_clear for instances first.
> 
> Its simple and allows the programmer to cleanly break cycles if
> they insist on creating them and using __del__ methods.

Right :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Mon Mar  6 12:57:29 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 06 Mar 2000 12:57:29 +0100
Subject: [Python-Dev] Unicode character property methods
Message-ID: <38C39D29.A29CE67F@lemburg.com>

As you may have noticed, the Unicode objects provide
new methods .islower(), .isupper() and .istitle(). Finn Bock
mentioned that Java also provides .isdigit() and .isspace().

Question: should Unicode also provide these character
property methods: .isdigit(), .isnumeric(), .isdecimal()
and .isspace() ? Plus maybe .digit(), .numeric() and
.decimal() for the corresponding decoding ?

Similar APIs are already available through the unicodedata
module, but could easily be moved to the Unicode object
(they cause the builtin interpreter to grow a bit in size 
due to the new mapping tables).

BTW, string.atoi et al. are currently not mapped to
string methods... should they be ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Mon Mar  6 14:29:04 2000
From: guido at python.org (Guido van Rossum)
Date: Mon, 06 Mar 2000 08:29:04 -0500
Subject: [Python-Dev] Unicode character property methods
In-Reply-To: Your message of "Mon, 06 Mar 2000 12:57:29 +0100."
             <38C39D29.A29CE67F@lemburg.com> 
References: <38C39D29.A29CE67F@lemburg.com> 
Message-ID: <200003061329.IAA09529@eric.cnri.reston.va.us>

> As you may have noticed, the Unicode objects provide
> new methods .islower(), .isupper() and .istitle(). Finn Bock
> mentioned that Java also provides .isdigit() and .isspace().
> 
> Question: should Unicode also provide these character
> property methods: .isdigit(), .isnumeric(), .isdecimal()
> and .isspace() ? Plus maybe .digit(), .numeric() and
> .decimal() for the corresponding decoding ?

What would be the difference between isdigit, isnumeric, isdecimal?
I'd say don't do more than Java.  I don't understand what the
"corresponding decoding" refers to.  What would "3".decimal() return?

> Similar APIs are already available through the unicodedata
> module, but could easily be moved to the Unicode object
> (they cause the builtin interpreter to grow a bit in size 
> due to the new mapping tables).
> 
> BTW, string.atoi et al. are currently not mapped to
> string methods... should they be ?

They are mapped to int() c.s.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at acm.org  Mon Mar  6 16:09:55 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 6 Mar 2000 10:09:55 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <200003051204.HAA05367@eric.cnri.reston.va.us>
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us>
	<Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>
	<14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
	<14529.55983.263225.691427@weyr.cnri.reston.va.us>
	<200003051204.HAA05367@eric.cnri.reston.va.us>
Message-ID: <14531.51779.650532.881626@weyr.cnri.reston.va.us>

Guido van Rossum writes:
 > - You could put it all in ConfigParser.py but with new classnames.
 > (Not sure though, since the ConfigParser class, which is really a
 > kind of weird variant, will be assumed to be the main class because
 > its name is that of the module.)

  The ConfigParser class could be clearly marked as deprecated both in 
the source/docstring and in the documentation.  But the class itself
should not be used in any way.

 > - Variants on the syntax could be given through some kind of option
 > system rather than through subclassing -- they should be combinable
 > independently.  Som possible options (maybe I'm going overboard here)
 > could be:

  Yes, you are going overboard.  It should contain exactly what's
right for .ini files, and that's it.
  There are really three aspects to the beast: reading, using, and
writing.  I think there should be a class which does the right thing
for using the informatin in the file, and reading & writing can be
handled through functions or helper classes.  That separates the
parsing issues from the use issues, and alternate syntaxes will be
easy enough to implement by subclassing the helper or writing a new
function.  An "editable" version that allows loading & saving without
throwing away comments, ordering, etc. would require a largely
separate implementation of all three aspects (or at least the reader
and writer).

 > (Well maybe the whole substitution thing should really be done through
 > a subclass -- it's too weird for normal use.)

  That and the ad hoc syntax are my biggest beefs with ConfigParser.
But it can easily be added by a subclass as long as the method to
override is clearly specified in the documenation (it should only
require one!).


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake at acm.org  Mon Mar  6 18:47:44 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 6 Mar 2000 12:47:44 -0500 (EST)
Subject: [Python-Dev] PyBufferProcs
Message-ID: <14531.61248.941076.803617@weyr.cnri.reston.va.us>

  While working on the documentation, I've noticed a naming
inconsistency regarding PyBufferProcs; it's peers are all named
Py*Methods (PySequenceMethods, PyNumberMethods, etc.).
  I'd like to propose that a synonym, PyBufferMethods, be made for
PyBufferProcs, and use that in the core implementations and the
documentation.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From jeremy at cnri.reston.va.us  Mon Mar  6 20:28:12 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Mon, 6 Mar 2000 14:28:12 -0500 (EST)
Subject: [Python-Dev] example checkers based on compiler package
Message-ID: <14532.1740.90292.440395@goon.cnri.reston.va.us>

There was some discussion on python-dev over the weekend about
generating warnings, and Moshe Zadke posted a selfnanny that warned
about methods that didn't have self as the first argument.

I think these kinds of warnings are useful, and I'd like to see a more
general framework for them built are Python abstract syntax originally
from P2C.  Ideally, they would be available as command line tools and
integrated into GUIs like IDLE in some useful way.

I've included a couple of quick examples I coded up last night based
on the compiler package (recently re-factored) that is resident in
python/nondist/src/Compiler.  The analysis on the one that checks for
name errors is a bit of a mess, but the overall structure seems right.

I'm hoping to collect a few more examples of checkers and generalize
from them to develop a framework for checking for errors and reporting
them.

Jeremy

------------ checkself.py ------------
"""Check for methods that do not have self as the first argument"""

from compiler import parseFile, walk, ast, misc

class Warning:
    def __init__(self, filename, klass, method, lineno, msg):
        self.filename = filename
        self.klass = klass
        self.method = method
        self.lineno = lineno
        self.msg = msg

    _template = "%(filename)s:%(lineno)s %(klass)s.%(method)s: %(msg)s"

    def __str__(self):
        return  self._template % self.__dict__

class NoArgsWarning(Warning):
    super_init = Warning.__init__
    
    def __init__(self, filename, klass, method, lineno):
        self.super_init(filename, klass, method, lineno,
                        "no arguments")

class NotSelfWarning(Warning):
    super_init = Warning.__init__
    
    def __init__(self, filename, klass, method, lineno, argname):
        self.super_init(filename, klass, method, lineno,
                        "self slot is named %s" % argname)

class CheckSelf:
    def __init__(self, filename):
        self.filename = filename
        self.warnings = []
        self.scope = misc.Stack()

    def inClass(self):
        if self.scope:
            return isinstance(self.scope.top(), ast.Class)
        return 0        

    def visitClass(self, klass):
        self.scope.push(klass)
        self.visit(klass.code)
        self.scope.pop()
        return 1

    def visitFunction(self, func):
        if self.inClass():
            classname = self.scope.top().name
            if len(func.argnames) == 0:
                w = NoArgsWarning(self.filename, classname, func.name,
                                  func.lineno)
                self.warnings.append(w)
            elif func.argnames[0] != "self":
                w = NotSelfWarning(self.filename, classname, func.name,
                                   func.lineno, func.argnames[0])
                self.warnings.append(w)
        self.scope.push(func)
        self.visit(func.code)
        self.scope.pop()
        return 1

def check(filename):
    global p, check
    p = parseFile(filename)
    check = CheckSelf(filename)
    walk(p, check)
    for w in check.warnings:
        print w

if __name__ == "__main__":
    import sys

    # XXX need to do real arg processing
    check(sys.argv[1])

------------ badself.py ------------
def foo():
    return 12

class Foo:
    def __init__():
        pass

    def foo(self, foo):
        pass

    def bar(this, that):
        def baz(this=that):
            return this
        return baz

def bar():
    class Quux:
        def __init__(self):
            self.sum = 1
        def quam(x, y):
            self.sum = self.sum + (x * y)
    return Quux()

------------ checknames.py ------------
"""Check for NameErrors"""

from compiler import parseFile, walk
from compiler.misc import Stack, Set

import __builtin__
from UserDict import UserDict

class Warning:
    def __init__(self, filename, funcname, lineno):
        self.filename = filename
        self.funcname = funcname
        self.lineno = lineno

    def __str__(self):
        return self._template % self.__dict__

class UndefinedLocal(Warning):
    super_init = Warning.__init__
    
    def __init__(self, filename, funcname, lineno, name):
        self.super_init(filename, funcname, lineno)
        self.name = name

    _template = "%(filename)s:%(lineno)s  %(funcname)s undefined local %(name)s"

class NameError(UndefinedLocal):
    _template = "%(filename)s:%(lineno)s  %(funcname)s undefined name %(name)s"

class NameSet(UserDict):
    """Track names and the line numbers where they are referenced"""
    def __init__(self):
        self.data = self.names = {}

    def add(self, name, lineno):
        l = self.names.get(name, [])
        l.append(lineno)
        self.names[name] = l

class CheckNames:
    def __init__(self, filename):
        self.filename = filename
        self.warnings = []
        self.scope = Stack()
        self.gUse = NameSet()
        self.gDef = NameSet()
        # _locals is the stack of local namespaces
        # locals is the top of the stack
        self._locals = Stack()
        self.lUse = None
        self.lDef = None
        self.lGlobals = None # var declared global
        # holds scope,def,use,global triples for later analysis
        self.todo = []

    def enterNamespace(self, node):
##        print node.name
        self.scope.push(node)
        self.lUse = use = NameSet()
        self.lDef = _def = NameSet()
        self.lGlobals = gbl = NameSet()
        self._locals.push((use, _def, gbl))

    def exitNamespace(self):
##        print
        self.todo.append((self.scope.top(), self.lDef, self.lUse,
                          self.lGlobals))
        self.scope.pop()
        self._locals.pop()
        if self._locals:
            self.lUse, self.lDef, self.lGlobals = self._locals.top()
        else:
            self.lUse = self.lDef = self.lGlobals = None

    def warn(self, warning, funcname, lineno, *args):
        args = (self.filename, funcname, lineno) + args
        self.warnings.append(apply(warning, args))

    def defName(self, name, lineno, local=1):
##        print "defName(%s, %s, local=%s)" % (name, lineno, local)
        if self.lUse is None:
            self.gDef.add(name, lineno)
        elif local == 0:
            self.gDef.add(name, lineno)
            self.lGlobals.add(name, lineno)
        else:
            self.lDef.add(name, lineno)

    def useName(self, name, lineno, local=1):
##        print "useName(%s, %s, local=%s)" % (name, lineno, local)
        if self.lUse is None:
            self.gUse.add(name, lineno)
        elif local == 0:
            self.gUse.add(name, lineno)
            self.lUse.add(name, lineno)            
        else:
            self.lUse.add(name, lineno)

    def check(self):
        for s, d, u, g in self.todo:
            self._check(s, d, u, g, self.gDef)
        # XXX then check the globals

    def _check(self, scope, _def, use, gbl, globals):
        # check for NameError
        # a name is defined iff it is in def.keys()
        # a name is global iff it is in gdefs.keys()
        gdefs = UserDict()
        gdefs.update(globals)
        gdefs.update(__builtin__.__dict__)
        defs = UserDict()
        defs.update(gdefs)
        defs.update(_def)
        errors = Set()
        for name in use.keys():
            if not defs.has_key(name):
                firstuse = use[name][0]
                self.warn(NameError, scope.name, firstuse, name)
                errors.add(name)

        # check for UndefinedLocalNameError
        # order == use & def sorted by lineno
        # elements are lineno, flag, name
        # flag = 0 if use, flag = 1 if def
        order = []
        for name, lines in use.items():
            if gdefs.has_key(name) and not _def.has_key(name):
                # this is a global ref, we can skip it
                continue
            for lineno in lines:
                order.append(lineno, 0, name)
        for name, lines in _def.items():
            for lineno in lines:
                order.append(lineno, 1, name)
        order.sort()
        # ready contains names that have been defined or warned about
        ready = Set()
        for lineno, flag, name in order:
            if flag == 0: # use
                if not ready.has_elt(name) and not errors.has_elt(name):
                    self.warn(UndefinedLocal, scope.name, lineno, name)
                    ready.add(name) # don't warn again
            else:
                ready.add(name)

    # below are visitor methods
        

    def visitFunction(self, node, noname=0):
        for expr in node.defaults:
            self.visit(expr)
        if not noname:
            self.defName(node.name, node.lineno)
        self.enterNamespace(node)
        for name in node.argnames:
            self.defName(name, node.lineno)
        self.visit(node.code)
        self.exitNamespace()
        return 1

    def visitLambda(self, node):
        return self.visitFunction(node, noname=1)

    def visitClass(self, node):
        for expr in node.bases:
            self.visit(expr)
        self.defName(node.name, node.lineno)
        self.enterNamespace(node)
        self.visit(node.code)
        self.exitNamespace()
        return 1

    def visitName(self, node):
        self.useName(node.name, node.lineno)

    def visitGlobal(self, node):
        for name in node.names:
            self.defName(name, node.lineno, local=0)

    def visitImport(self, node):
        for name in node.names:
            self.defName(name, node.lineno)

    visitFrom = visitImport

    def visitAssName(self, node):
        self.defName(node.name, node.lineno)
    
def check(filename):
    global p, checker
    p = parseFile(filename)
    checker = CheckNames(filename)
    walk(p, checker)
    checker.check()
    for w in checker.warnings:
        print w

if __name__ == "__main__":
    import sys

    # XXX need to do real arg processing
    check(sys.argv[1])

------------ badnames.py ------------
# XXX can we detect race conditions on accesses to global variables?
#     probably can (conservatively) by noting variables _created_ by
#     global decls in funcs
import string
import time

def foo(x):
    return x + y

def foo2(x):
    return x + z

a = 4

def foo3(x):
    a, b = x, a

def bar(x):
    z = x
    global z

def bar2(x):
    f = string.strip
    a = f(x)
    import string
    return string.lower(a)

def baz(x, y):
    return x + y + z

def outer(x):
    def inner(y):
        return x + y
    return inner


From gstein at lyra.org  Mon Mar  6 22:09:33 2000
From: gstein at lyra.org (Greg Stein)
Date: Mon, 6 Mar 2000 13:09:33 -0800 (PST)
Subject: [Python-Dev] PyBufferProcs
In-Reply-To: <14531.61248.941076.803617@weyr.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003061122120.17063-100000@nebula.lyra.org>

On Mon, 6 Mar 2000, Fred L. Drake, Jr. wrote:
>   While working on the documentation, I've noticed a naming
> inconsistency regarding PyBufferProcs; it's peers are all named
> Py*Methods (PySequenceMethods, PyNumberMethods, etc.).
>   I'd like to propose that a synonym, PyBufferMethods, be made for
> PyBufferProcs, and use that in the core implementations and the
> documentation.

+0

Although.. I might say that it should be renamed, and a synonym (#define
or typedef?) be provided for the old name.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Mon Mar  6 23:04:14 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 06 Mar 2000 23:04:14 +0100
Subject: [Python-Dev] Unicode character property methods
References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us>
Message-ID: <38C42B5E.42801755@lemburg.com>

Guido van Rossum wrote:
> 
> > As you may have noticed, the Unicode objects provide
> > new methods .islower(), .isupper() and .istitle(). Finn Bock
> > mentioned that Java also provides .isdigit() and .isspace().
> >
> > Question: should Unicode also provide these character
> > property methods: .isdigit(), .isnumeric(), .isdecimal()
> > and .isspace() ? Plus maybe .digit(), .numeric() and
> > .decimal() for the corresponding decoding ?
> 
> What would be the difference between isdigit, isnumeric, isdecimal?
> I'd say don't do more than Java.  I don't understand what the
> "corresponding decoding" refers to.  What would "3".decimal() return?

These originate in the Unicode database; see

ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html

Here are the descriptions:

"""
6
      Decimal digit value
                        normative
                                     This is a numeric field. If the
                                     character has the decimal digit
                                     property, as specified in Chapter
                                     4 of the Unicode Standard, the
                                     value of that digit is represented
                                     with an integer value in this field
   7
      Digit value
                        normative
                                     This is a numeric field. If the
                                     character represents a digit, not
                                     necessarily a decimal digit, the
                                     value is here. This covers digits
                                     which do not form decimal radix
                                     forms, such as the compatibility
                                     superscript digits
   8
      Numeric value
                        normative
                                     This is a numeric field. If the
                                     character has the numeric
                                     property, as specified in Chapter
                                     4 of the Unicode Standard, the
                                     value of that character is
                                     represented with an integer or
                                     rational number in this field. This
                                     includes fractions as, e.g., "1/5" for
                                     U+2155 VULGAR FRACTION
                                     ONE FIFTH Also included are
                                     numerical values for compatibility
                                     characters such as circled
                                     numbers.

u"3".decimal() would return 3. u"\u2155".

Some more examples from the unicodedata module (which makes
all fields of the database available in Python):

>>> unicodedata.decimal(u"3")
3
>>> unicodedata.decimal(u"?")
2
>>> unicodedata.digit(u"?")
2
>>> unicodedata.numeric(u"?")
2.0
>>> unicodedata.numeric(u"\u2155")
0.2
>>> unicodedata.numeric(u'\u215b')
0.125

> > Similar APIs are already available through the unicodedata
> > module, but could easily be moved to the Unicode object
> > (they cause the builtin interpreter to grow a bit in size
> > due to the new mapping tables).
> >
> > BTW, string.atoi et al. are currently not mapped to
> > string methods... should they be ?
> 
> They are mapped to int() c.s.

Hmm, I just noticed that int() et friends don't like
Unicode... shouldn't they use the "t" parser marker 
instead of requiring a string or tp_int compatible
type ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Tue Mar  7 00:12:33 2000
From: guido at python.org (Guido van Rossum)
Date: Mon, 06 Mar 2000 18:12:33 -0500
Subject: [Python-Dev] Unicode character property methods
In-Reply-To: Your message of "Mon, 06 Mar 2000 23:04:14 +0100."
             <38C42B5E.42801755@lemburg.com> 
References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us>  
            <38C42B5E.42801755@lemburg.com> 
Message-ID: <200003062312.SAA11697@eric.cnri.reston.va.us>

[MAL]
> > > As you may have noticed, the Unicode objects provide
> > > new methods .islower(), .isupper() and .istitle(). Finn Bock
> > > mentioned that Java also provides .isdigit() and .isspace().
> > >
> > > Question: should Unicode also provide these character
> > > property methods: .isdigit(), .isnumeric(), .isdecimal()
> > > and .isspace() ? Plus maybe .digit(), .numeric() and
> > > .decimal() for the corresponding decoding ?

[Guido]
> > What would be the difference between isdigit, isnumeric, isdecimal?
> > I'd say don't do more than Java.  I don't understand what the
> > "corresponding decoding" refers to.  What would "3".decimal() return?

[MAL]
> These originate in the Unicode database; see
> 
> ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html
> 
> Here are the descriptions:
> 
> """
> 6
>       Decimal digit value
>                         normative
>                                      This is a numeric field. If the
>                                      character has the decimal digit
>                                      property, as specified in Chapter
>                                      4 of the Unicode Standard, the
>                                      value of that digit is represented
>                                      with an integer value in this field
>    7
>       Digit value
>                         normative
>                                      This is a numeric field. If the
>                                      character represents a digit, not
>                                      necessarily a decimal digit, the
>                                      value is here. This covers digits
>                                      which do not form decimal radix
>                                      forms, such as the compatibility
>                                      superscript digits
>    8
>       Numeric value
>                         normative
>                                      This is a numeric field. If the
>                                      character has the numeric
>                                      property, as specified in Chapter
>                                      4 of the Unicode Standard, the
>                                      value of that character is
>                                      represented with an integer or
>                                      rational number in this field. This
>                                      includes fractions as, e.g., "1/5" for
>                                      U+2155 VULGAR FRACTION
>                                      ONE FIFTH Also included are
>                                      numerical values for compatibility
>                                      characters such as circled
>                                      numbers.
> 
> u"3".decimal() would return 3. u"\u2155".
> 
> Some more examples from the unicodedata module (which makes
> all fields of the database available in Python):
> 
> >>> unicodedata.decimal(u"3")
> 3
> >>> unicodedata.decimal(u"?")
> 2
> >>> unicodedata.digit(u"?")
> 2
> >>> unicodedata.numeric(u"?")
> 2.0
> >>> unicodedata.numeric(u"\u2155")
> 0.2
> >>> unicodedata.numeric(u'\u215b')
> 0.125

Hm, very Unicode centric.  Probably best left out of the general
string methods.  Isspace() seems useful, and an isdigit() that is only
true for ASCII '0' - '9' also makes sense.

What about "123".isdigit()?  What does Java say?  Or do these only
apply to single chars there?  I think "123".isdigit() should be true
if "abc".islower() is true.

> > > Similar APIs are already available through the unicodedata
> > > module, but could easily be moved to the Unicode object
> > > (they cause the builtin interpreter to grow a bit in size
> > > due to the new mapping tables).
> > >
> > > BTW, string.atoi et al. are currently not mapped to
> > > string methods... should they be ?
> > 
> > They are mapped to int() c.s.
> 
> Hmm, I just noticed that int() et friends don't like
> Unicode... shouldn't they use the "t" parser marker 
> instead of requiring a string or tp_int compatible
> type ?

Good catch.  Go ahead.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From moshez at math.huji.ac.il  Tue Mar  7 06:25:43 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 7 Mar 2000 07:25:43 +0200 (IST)
Subject: [Python-Dev] Re: example checkers based on compiler package
In-Reply-To: <14532.1740.90292.440395@goon.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003070712480.4496-100000@sundial>

On Mon, 6 Mar 2000, Jeremy Hylton wrote:

> I think these kinds of warnings are useful, and I'd like to see a more
> general framework for them built are Python abstract syntax originally
> from P2C.  Ideally, they would be available as command line tools and
> integrated into GUIs like IDLE in some useful way.

Yes! Guido already suggested we have a standard API to them. One thing
I suggested was that the abstract API include not only the input (one form
or another of an AST), but the output: so IDE's wouldn't have to parse
strings, but get a warning class. Something like a:

An output of a warning can be a subclass of GeneralWarning, and should
implemented the following methods:

	1. line-no() -- returns an integer
	2. columns() -- returns either a pair of integers, or None
        3. message() -- returns a string containing a message
	4. __str__() -- comes for free if inheriting GeneralWarning,
	                and formats the warning message.

> I've included a couple of quick examples I coded up last night based
> on the compiler package (recently re-factored) that is resident in
> python/nondist/src/Compiler.  The analysis on the one that checks for
> name errors is a bit of a mess, but the overall structure seems right.

One thing I had trouble with is that in my implementation of selfnanny,
I used Python's stack for recursion while you used an explicit stack.
It's probably because of the visitor pattern, which is just another
argument for co-routines and generators.

> I'm hoping to collect a few more examples of checkers and generalize
> from them to develop a framework for checking for errors and reporting
> them.

Cool! 
Brainstorming: what kind of warnings would people find useful? In
selfnanny, I wanted to include checking for assigment to self, and
checking for "possible use before definition of local variables" sounds
good. Another check could be a CP4E "checking that no two identifiers
differ only by case". I might code up a few if I have the time...

What I'd really want (but it sounds really hard) is a framework for
partial ASTs: warning people as they write code.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From mwh21 at cam.ac.uk  Tue Mar  7 09:31:23 2000
From: mwh21 at cam.ac.uk (Michael Hudson)
Date: 07 Mar 2000 08:31:23 +0000
Subject: [Python-Dev] Re: [Compiler-sig] Re: example checkers based on compiler package
In-Reply-To: Moshe Zadka's message of "Tue, 7 Mar 2000 07:25:43 +0200 (IST)"
References: <Pine.GSO.4.10.10003070712480.4496-100000@sundial>
Message-ID: <m3u2ij89lw.fsf@atrus.jesus.cam.ac.uk>

Moshe Zadka <moshez at math.huji.ac.il> writes:

> On Mon, 6 Mar 2000, Jeremy Hylton wrote:
> 
> > I think these kinds of warnings are useful, and I'd like to see a more
> > general framework for them built are Python abstract syntax originally
> > from P2C.  Ideally, they would be available as command line tools and
> > integrated into GUIs like IDLE in some useful way.
> 
> Yes! Guido already suggested we have a standard API to them. One thing
> I suggested was that the abstract API include not only the input (one form
> or another of an AST), but the output: so IDE's wouldn't have to parse
> strings, but get a warning class. 

That would be seriously cool.

> Something like a:
> 
> An output of a warning can be a subclass of GeneralWarning, and should
> implemented the following methods:
> 
> 	1. line-no() -- returns an integer
> 	2. columns() -- returns either a pair of integers, or None
>         3. message() -- returns a string containing a message
> 	4. __str__() -- comes for free if inheriting GeneralWarning,
> 	                and formats the warning message.

Wouldn't it make sense to include function/class name here too?  A
checker is likely to now, and it would save reparsing to find it out.

[little snip]
 
> > I'm hoping to collect a few more examples of checkers and generalize
> > from them to develop a framework for checking for errors and reporting
> > them.
> 
> Cool! 
> Brainstorming: what kind of warnings would people find useful? In
> selfnanny, I wanted to include checking for assigment to self, and
> checking for "possible use before definition of local variables" sounds
> good. Another check could be a CP4E "checking that no two identifiers
> differ only by case". I might code up a few if I have the time...

Is there stuff in the current Compiler code to do control flow
analysis?  You'd need that to check for use before definition in
meaningful cases, and also if you ever want to do any optimisation...

> What I'd really want (but it sounds really hard) is a framework for
> partial ASTs: warning people as they write code.

I agree (on both points).

Cheers,
M.

-- 
very few people approach me in real life and insist on proving they are
drooling idiots.                         -- Erik Naggum, comp.lang.lisp


From mal at lemburg.com  Tue Mar  7 10:14:25 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 10:14:25 +0100
Subject: [Python-Dev] Unicode character property methods
References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us>  
	            <38C42B5E.42801755@lemburg.com> <200003062312.SAA11697@eric.cnri.reston.va.us>
Message-ID: <38C4C871.F47E17A3@lemburg.com>

Guido van Rossum wrote:
> [MAL about adding .isdecimal(), .isdigit() and .isnumeric()]
> > Some more examples from the unicodedata module (which makes
> > all fields of the database available in Python):
> >
> > >>> unicodedata.decimal(u"3")
> > 3
> > >>> unicodedata.decimal(u"?")
> > 2
> > >>> unicodedata.digit(u"?")
> > 2
> > >>> unicodedata.numeric(u"?")
> > 2.0
> > >>> unicodedata.numeric(u"\u2155")
> > 0.2
> > >>> unicodedata.numeric(u'\u215b')
> > 0.125
> 
> Hm, very Unicode centric.  Probably best left out of the general
> string methods.  Isspace() seems useful, and an isdigit() that is only
> true for ASCII '0' - '9' also makes sense.

Well, how about having all three on Unicode objects
and only .isdigit() on string objects ?
 
> What about "123".isdigit()?  What does Java say?  Or do these only
> apply to single chars there?  I think "123".isdigit() should be true
> if "abc".islower() is true.

In the current uPython implementation u"123".isdigit() is true;
same for the other two methods.
 
> > > > Similar APIs are already available through the unicodedata
> > > > module, but could easily be moved to the Unicode object
> > > > (they cause the builtin interpreter to grow a bit in size
> > > > due to the new mapping tables).
> > > >
> > > > BTW, string.atoi et al. are currently not mapped to
> > > > string methods... should they be ?
> > >
> > > They are mapped to int() c.s.
> >
> > Hmm, I just noticed that int() et friends don't like
> > Unicode... shouldn't they use the "t" parser marker
> > instead of requiring a string or tp_int compatible
> > type ?
> 
> Good catch.  Go ahead.

Done. float(), int() and long() now accept charbuf
compatible objects as argument.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Tue Mar  7 10:23:35 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 10:23:35 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
Message-ID: <38C4CA97.5D0AA9D@lemburg.com>

Before starting to code away, I would like to know which
of the new Unicode methods should also be available on
string objects.

Here are the currently available methods:

Unicode objects      string objects
------------------------------------
capitalize           capitalize
center              
count                count
encode              
endswith             endswith
expandtabs          
find                 find
index                index
isdecimal           
isdigit             
islower             
isnumeric           
isspace             
istitle             
isupper             
join                 join
ljust               
lower                lower
lstrip               lstrip
replace              replace
rfind                rfind
rindex               rindex
rjust               
rstrip               rstrip
split                split
splitlines          
startswith           startswith
strip                strip
swapcase             swapcase
title                title
translate            translate (*)
upper                upper
zfill               

(*) The two hvae slightly different implementations, e.g.
deletions are handled differently.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fredrik at pythonware.com  Tue Mar  7 12:54:56 2000
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 7 Mar 2000 12:54:56 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
References: <38C4CA97.5D0AA9D@lemburg.com>
Message-ID: <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com>

> Unicode objects      string objects
> expandtabs          

yes.

I'm pretty sure there's "expandtabs" code in the
strop module.  maybe barry missed it?

> center
> ljust
> rjust              

probably.

the implementation is trivial, and ljust/rjust are
somewhat useful, so you might as well add them
all (just cut and paste from the unicode class).

what about rguido and lguido, btw?

> zfill               

no.

</F>


From guido at python.org  Tue Mar  7 14:52:00 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 07 Mar 2000 08:52:00 -0500
Subject: [Python-Dev] finalization again
Message-ID: <200003071352.IAA13571@eric.cnri.reston.va.us>

Warning: long message.  If you're not interested in reading all this,
please skip to "Conclusion" at the end.

At Tim's recommendation I had a look at what section 12.6 of the Java
language spec says about finalizers. The stuff there is sure seductive
for language designers...

Have a look at te diagram at
http://java.sun.com/docs/books/jls/html/12.doc.html#48746. In all its
(seeming) complexity, it helped me understand some of the issues of
finalization better. Rather than the complex 8-state state machine
that it appears to be, think of it as a simple 3x3 table. The three
rows represent the categories reachable, finalizer-reachable
(abbreviated in the diagram as f-reachable), and unreachable. These
categories correspond directly to categories of objects that the
Schemenauer-Tiedemann cycle-reclamation scheme deals with: after
moving all the reachable objects to the second list (first the roots
and then the objects reachable from the roots), the first list is left
with the unreachable and finalizer-reachable objects.

If we want to distinguish between unreachable and finalizer-reachable
at this point, a straightforward application of the same algorithm
will work well: Create a third list (this will contain the
finalizer-reachable objects). Start by filling it with all the objects
from the first list (which contains the potential garbage at this
point) that have a finalizer. We can look for objects that have
__del__ or __clean__ or for which tp_clean(CARE_EXEC)==true, it
doesn't matter here.(*) Then walk through the third list, following
each object's references, and move all referenced objects that are
still in the first list to the third list. Now, we have:

List 1: truly unreachable objects. These have no finalizers and can be
discarded right away.

List 2: truly reachable objects. (Roots and objects reachable from
roots.) Leave them alone.

List 3: finalizer-reachable objects. This contains objects that are
unreachable but have a finalizer, and objects that are only reachable
through those.

We now have to decide on a policy for invoking finalizers. Java
suggests the following: Remember the "roots" of the third list -- the
nodes that were moved there directly from the first list because they
have a finalizer. These objects are marked *finalizable* (a category
corresponding to the second *column* of the Java diagram). The Java
spec allows the Java garbage collector to call all of these finalizers
in any order -- even simultaneously in separate threads. Java never
allows an object to go back from the finalizable to the unfinalized
state (there are no arrows pointing left in the diagram). The first
finalizer that is called could make its object reachable again (up
arrow), thereby possibly making other finalizable objects reachable
too. But this does not cancel their scheduled finalization! The
conclusion is that Java can sometimes call finalization on unreachable
objects -- but only if those objects have gone through a phase in
their life where they were unreachable or at least
finalizer-unreachable.

I agree that this is the best that Java can do: if there are cycles
containing multiple objects with finalizers, there is no way (short of
asking the programmer(s)) to decide which object to finalize first. We
could pick one at random, run its finalizer, and start garbage
collection all over -- if the finalizer doesn't resurrect anything,
this will give us the same set of unreachable objects, from which we
could pick the next finalizable object, and so on. That looks very
inefficient, might not terminate (the same object could repeatedly
show up as the candidate for finalization), and it's still arbitrary:
the programmer(s) still can't predict which finalizer in a cycle with
multiple finalizers will be called first. Assuming the recommended
characteristics of finalizers (brief and robust), it won't make much
difference if we call all finalizers (of the now-finalizeable objects)
"without looking back". Sure, some objects may find themselves in a
perfectly reachable position with their finalizer called -- but they
did go through a "near-death experience". I don't find this
objectionable, and I don't see how Java could possibly do better for
cycles with multiple finalizers.

Now let's look again at the rule that an object's finalizer will be
called at most once automatically by the garbage collector. The
transitions between the colums of the Java diagram enforce this: the
columns are labeled from left to right with unfinalized, finalizable,
and finalized, and there are no transition arrows pointing left. (In
my description above, already finalized objects are considered not to
have a finalizer.) I think this rule makes a lot of sense given Java's
multi-threaded garbage collection: the invocation of finalizers could
run concurreltly with another garbage collection, and we don't want
this to find some of the same finalizable objects and call their
finalizers again!

We could mark them with a "finalization in progress" flag only while
their finalizer is running, but in a cycle with multiple finalizers it
seems we should keep this flag set until *all* finalizers for objects
in the cycle have run. But we don't actually know exactly what the
cycles are: all we know is "these objects are involved in trash
cycles". More detailed knowledge would require yet another sweep, plus
a more hairy two-dimensional data structure (a list of separate
cycles).  And for what? as soon as we run finalizers from two separate
cycles, those cycles could be merged again (e.g. the first finalizer
could resurrect its cycle, and the second one could link to it). Now
we have a pool of objects that are marked "finalization in progress"
until all their finalizations terminate. For an incremental concurrent
garbage collector, this seems a pain, since it may continue to find
new finalizable objects and add them to the pile. Java takes the
logical conclusion: the "finalization in progress" flag is never
cleared -- and renamed to "finalized".

Conclusion
----------

Are the Java rules complex? Yes. Are there better rules possible? I'm
not so sure, given the requirement of allowing concurrent incremental
garbage collection algorithms that haven't even been invented
yet. (Plus the implied requirement that finalizers in trash cycles
should be invoked.) Are the Java rules difficult for the user? Only
for users who think they can trick finalizers into doing things for
them that they were not designed to do. I would think the following
guidelines should do nicely for the rest of us:

1. Avoid finalizers if you can; use them only to release *external*
(e.g. OS) resources.

2. Write your finalizer as robust as you can, with as little use of
other objects as you can.

3. Your only get one chance. Use it.

Unlike Scheme guardians or the proposed __cleanup__ mechanism, you
don't have to know whether your object is involved in a cycle -- your
finalizer will still be called.

I am reconsidering to use the __del__ method as the finalizer. As a
compromise to those who want their __del__ to run whenever the
reference count reaches zero, the finalized flag can be cleared
explicitly. I am considering to use the following implementation:
after retrieving the __del__ method, but before calling it,
self.__del__ is set to None (better, self.__dict__['__del__'] = None,
to avoid confusing __setattr__ hooks). The object call remove
self.__del__ to clear the finalized flag. I think I'll use the same
mechanism to prevent __del__ from being called upon a failed
initialization.

Final note: the semantics "__del__ is called whenever the reference
count reaches zero" cannot be defended in the light of a migration to
different forms of garbage collection (e.g. JPython).  There may not
be a reference count.

--Guido van Rossum (home page: http://www.python.org/~guido/)

____
(*) Footnote: there's one complication: to ask a Python class instance
if it has a finalizer, we have to use PyObject_Getattr(obj, ...). If
the object's class has a __getattr__ hook, this can invoke arbitrary
Python code -- even if the answer to the question is "no"! This can
make the object reachable again (in the Java diagram, arrows pointing
up or up and right). We could either use instance_getattr1(), which
avoids the __getattr__ hook, or mark all class instances as
finalizable until proven innocent.


From gward at cnri.reston.va.us  Tue Mar  7 15:04:30 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Tue, 7 Mar 2000 09:04:30 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <200003051204.HAA05367@eric.cnri.reston.va.us>; from guido@python.org on Sun, Mar 05, 2000 at 07:04:56AM -0500
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> <14529.55983.263225.691427@weyr.cnri.reston.va.us> <200003051204.HAA05367@eric.cnri.reston.va.us>
Message-ID: <20000307090430.A16948@cnri.reston.va.us>

On 05 March 2000, Guido van Rossum said:
> - Variants on the syntax could be given through some kind of option
> system rather than through subclassing -- they should be combinable
> independently.  Som possible options (maybe I'm going overboard here)
> could be:
> 
> 	- comment characters: ('#', ';', both, others?)
> 	- comments after variables allowed? on sections?
> 	- variable characters: (':', '=', both, others?)
> 	- quoting of values with "..." allowed?
> 	- backslashes in "..." allowed?
> 	- does backslash-newline mean a continuation?
> 	- case sensitivity for section names (default on)
> 	- case sensitivity for option names (default off)
> 	- variables allowed before first section name?
> 	- first section name?  (default "main")
> 	- character set allowed in section names
> 	- character set allowed in variable names
> 	- %(...) substitution?

I agree with Fred that this level of flexibility is probably overkill
for a config file parser; you don't want every application author who
uses the module to have to explain his particular variant of the syntax.

However, if you're interested in a class that *does* provide some of the
above flexibility, I have written such a beast.  It's currently used to
parse the Distutils MANIFEST.in file, and I've considered using it for
the mythical Distutils config files.  (And it also gets heavy use in my
day job.)  It's really a class for reading a file in preparation for
"text processing the Unix way", though: it doesn't say anything about
syntax, it just worries about blank lines, comments, continuations, and
a few other things.  Here's the class docstring:

class TextFile:

    """Provides a file-like object that takes care of all the things you
       commonly want to do when processing a text file that has some
       line-by-line syntax: strip comments (as long as "#" is your comment
       character), skip blank lines, join adjacent lines by escaping the
       newline (ie. backslash at end of line), strip leading and/or
       trailing whitespace, and collapse internal whitespace.  All of these
       are optional and independently controllable.

       Provides a 'warn()' method so you can generate warning messages that
       report physical line number, even if the logical line in question
       spans multiple physical lines.  Also provides 'unreadline()' for
       implementing line-at-a-time lookahead.

       Constructor is called as:

           TextFile (filename=None, file=None, **options)

       It bombs (RuntimeError) if both 'filename' and 'file' are None;
       'filename' should be a string, and 'file' a file object (or
       something that provides 'readline()' and 'close()' methods).  It is
       recommended that you supply at least 'filename', so that TextFile
       can include it in warning messages.  If 'file' is not supplied,
       TextFile creates its own using the 'open()' builtin.

       The options are all boolean, and affect the value returned by
       'readline()':
         strip_comments [default: true]
           strip from "#" to end-of-line, as well as any whitespace
           leading up to the "#" -- unless it is escaped by a backslash
         lstrip_ws [default: false]
           strip leading whitespace from each line before returning it
         rstrip_ws [default: true]
           strip trailing whitespace (including line terminator!) from
           each line before returning it
         skip_blanks [default: true}
           skip lines that are empty *after* stripping comments and
           whitespace.  (If both lstrip_ws and rstrip_ws are true,
           then some lines may consist of solely whitespace: these will
           *not* be skipped, even if 'skip_blanks' is true.)
         join_lines [default: false]
           if a backslash is the last non-newline character on a line
           after stripping comments and whitespace, join the following line
           to it to form one "logical line"; if N consecutive lines end
           with a backslash, then N+1 physical lines will be joined to
           form one logical line.
         collapse_ws [default: false]  
           after stripping comments and whitespace and joining physical
           lines into logical lines, all internal whitespace (strings of
           whitespace surrounded by non-whitespace characters, and not at
           the beginning or end of the logical line) will be collapsed
           to a single space.

       Note that since 'rstrip_ws' can strip the trailing newline, the
       semantics of 'readline()' must differ from those of the builtin file
       object's 'readline()' method!  In particular, 'readline()' returns
       None for end-of-file: an empty string might just be a blank line (or
       an all-whitespace line), if 'rstrip_ws' is true but 'skip_blanks' is
       not."""

Interested in having something like this in the core?  Adding more
options is possible, but the code is already on the hairy side to
support all of these.  And I'm not a big fan of the subtle difference in
semantics with file objects, but honestly couldn't think of a better way
at the time.

If you're interested, you can download it from

    http://www.mems-exchange.org/exchange/software/python/text_file/

or just use the version in the Distutils CVS tree.

        Greg


From mal at lemburg.com  Tue Mar  7 15:38:09 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 15:38:09 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com>
Message-ID: <38C51451.D38B21FE@lemburg.com>

Fredrik Lundh wrote:
> 
> > Unicode objects      string objects
> > expandtabs
> 
> yes.
> 
> I'm pretty sure there's "expandtabs" code in the
> strop module.  maybe barry missed it?
> 
> > center
> > ljust
> > rjust
> 
> probably.
> 
> the implementation is trivial, and ljust/rjust are
> somewhat useful, so you might as well add them
> all (just cut and paste from the unicode class).
> 
> what about rguido and lguido, btw?

Ooops, forgot those, thanks :-)
 
> > zfill
> 
> no.

Why not ?

Since the string implementation had all of the above
marked as TBD, I added all four.

What about the other new methods (.isXXX() and .splitlines()) ?

.isXXX() are mostly needed due to the extended character
properties in Unicode. They would be new to the string object
world.

.splitlines() is Unicode aware and also treats CR/LF
combinations across platforms:

S.splitlines([maxsplit]]) -> list of strings

Return a list of the lines in S, breaking at line boundaries.
If maxsplit is given, at most maxsplit are done. Line breaks are not
included in the resulting list.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Tue Mar  7 16:38:18 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 07 Mar 2000 10:38:18 -0500
Subject: [Python-Dev] Adding Unicode methods to string objects
In-Reply-To: Your message of "Tue, 07 Mar 2000 15:38:09 +0100."
             <38C51451.D38B21FE@lemburg.com> 
References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com>  
            <38C51451.D38B21FE@lemburg.com> 
Message-ID: <200003071538.KAA13977@eric.cnri.reston.va.us>

> > > zfill
> > 
> > no.
> 
> Why not ?

Zfill is (or ought to be) deprecated.  It stems from times before we
had things like "%08d" % x and no longer serves a useful purpose.
I doubt anyone would miss it.

(Of course, now /F will claim that PIL will break in 27 places because
of this. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Tue Mar  7 18:07:40 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Tue, 7 Mar 2000 12:07:40 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <200003071352.IAA13571@eric.cnri.reston.va.us>
Message-ID: <000701bf8857$a56ed660$a72d153f@tim>

[Guido]
> ...
> Conclusion
> ----------
>
> Are the Java rules complex? Yes. Are there better rules possible? I'm
> not so sure, given the requirement of allowing concurrent incremental
> garbage collection algorithms that haven't even been invented
> yet.

Guy Steele worked his ass off on Java's rules.  He had as much real-world
experience with implementing GC as anyone, via his long & deep Lisp
implementation background (both SW & HW), and indeed invented several key
techniques in high-performance GC.  But he had no background in GC with
user-defined finalizers -- and it shows!

> (Plus the implied requirement that finalizers in trash cycles
> should be invoked.) Are the Java rules difficult for the user? Only
> for users who think they can trick finalizers into doing things for
> them that they were not designed to do.

This is so implementation-centric it's hard to know what to say <0.5 wink>.
The Java rules weren't designed to do much of anything except guarantee that
Java (1) would eventually reclaim all unreachable objects, and (2) wouldn't
expose dangling pointers to user finalizers, or chase any itself.  Whatever
*useful* finalizer semantics may remain are those that just happened to
survive.

> ...
> Unlike Scheme guardians or the proposed __cleanup__ mechanism, you
> don't have to know whether your object is involved in a cycle -- your
> finalizer will still be called.

This is like saying a user doesn't have to know whether the new drug
prescribed for them by their doctor has potentially fatal side effects --
they'll be forced to take it regardless <wink>.

> ...
> Final note: the semantics "__del__ is called whenever the reference
> count reaches zero" cannot be defended in the light of a migration to
> different forms of garbage collection (e.g. JPython).  There may not
> be a reference count.

1. I don't know why JPython doesn't execute __del__ methods at all now, but
have to suspect that the Java rules imply an implementation so grossly
inefficient in the presence of __del__ that Barry simply doesn't want to
endure the speed complaints.  The Java spec itself urges implementations to
special-case the snot out of classes that don't  override the default
do-nothing finalizer, for "go fast" reasons too.

2. The "refcount reaches 0" rule in CPython is merely a wonderfully concrete
way to get across the idea of "destruction occurs in an order consistent
with a topological sort of the points-to graph".  The latter is explicit in
the BDW collector, which has no refcounts; the topsort concept is applicable
and thoroughly natural in all languages; refcounts in CPython give an
exploitable hint about *when* collection will occur, but add no purely
semantic constraint beyond the topsort requirement (they neatly *imply* the
topsort requirement).  There is no topsort in the presence of cycles, so
cycles create problems in all languages.  The same "throw 'em back at the
user" approach makes just as much sense from the topsort view as the RC
view; it doesn't rely on RC at all.

stop-the-insanity<wink>-ly y'rs  - tim


From guido at python.org  Tue Mar  7 18:33:31 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 07 Mar 2000 12:33:31 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Tue, 07 Mar 2000 12:07:40 EST."
             <000701bf8857$a56ed660$a72d153f@tim> 
References: <000701bf8857$a56ed660$a72d153f@tim> 
Message-ID: <200003071733.MAA14926@eric.cnri.reston.va.us>

[Tim tells Guido again that he finds the Java rules bad, slinging some
mud at Guy Steel, but without explaining what the problem with them
is, and then asks:]

> 1. I don't know why JPython doesn't execute __del__ methods at all now, but
> have to suspect that the Java rules imply an implementation so grossly
> inefficient in the presence of __del__ that Barry simply doesn't want to
> endure the speed complaints.  The Java spec itself urges implementations to
> special-case the snot out of classes that don't  override the default
> do-nothing finalizer, for "go fast" reasons too.

Something like that, yes, although it was Jim Hugunin.  I have a
feeling it has to do with the dynamic of __del__ -- this would imply
that *all* Python class instances would appear to Java to have a
finalizer -- just in most cases it would do a failing lookup of
__del__ and bail out quickly.  Maybe some source code or class
analysis looking for a __del__ could fix this, at the cost of not
allowing one to patch __del__ into an existing class after instances
have already been created.  I don't find that breach of dynamicism a
big deal -- e.g. CPython keeps copies of __getattr__, __setattr__ and
__delattr__ in the class for similar reasons.

> 2. The "refcount reaches 0" rule in CPython is merely a wonderfully concrete
> way to get across the idea of "destruction occurs in an order consistent
> with a topological sort of the points-to graph".  The latter is explicit in
> the BDW collector, which has no refcounts; the topsort concept is applicable
> and thoroughly natural in all languages; refcounts in CPython give an
> exploitable hint about *when* collection will occur, but add no purely
> semantic constraint beyond the topsort requirement (they neatly *imply* the
> topsort requirement).  There is no topsort in the presence of cycles, so
> cycles create problems in all languages.  The same "throw 'em back at the
> user" approach makes just as much sense from the topsort view as the RC
> view; it doesn't rely on RC at all.

Indeed.  I propose to throw it back at the user by calling __del__.

The typical user defines __del__ because they want to close a file,
say goodbye nicely on a socket connection, or delete a temp file.
That sort of thing.  This is what finalizers are *for*.  As an author
of this kind of finalizer, I don't see why I need to know whether I'm
involved in a cycle or not.  I want my finalizer called when my object
goes away, and I don't want my object kept alive by unreachable
cycles.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Tue Mar  7 18:39:15 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 18:39:15 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> <38C51451.D38B21FE@lemburg.com>
Message-ID: <38C53EC3.5292ECF@lemburg.com>

I've ported most of the Unicode methods to strings now.
Here's the new table:

Unicode objects      string objects
------------------------------------------------------------
capitalize           capitalize
center               center
count                count
encode              
endswith             endswith
expandtabs           expandtabs
find                 find
index                index
isdecimal           
isdigit              isdigit
islower              islower
isnumeric           
isspace              isspace
istitle              istitle
isupper              isupper
join                 join
ljust                ljust
lower                lower
lstrip               lstrip
replace              replace
rfind                rfind
rindex               rindex
rjust                rjust
rstrip               rstrip
split                split
splitlines           splitlines
startswith           startswith
strip                strip
swapcase             swapcase
title                title
translate            translate
upper                upper
zfill                zfill

I don't think that .isdecimal() and .isnumeric() are
needed for strings since most of the added mappings
refer to Unicode char points.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Tue Mar  7 18:42:53 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 18:42:53 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com>  
	            <38C51451.D38B21FE@lemburg.com> <200003071538.KAA13977@eric.cnri.reston.va.us>
Message-ID: <38C53F9D.44C3A0F3@lemburg.com>

Guido van Rossum wrote:
> 
> > > > zfill
> > >
> > > no.
> >
> > Why not ?
> 
> Zfill is (or ought to be) deprecated.  It stems from times before we
> had things like "%08d" % x and no longer serves a useful purpose.
> I doubt anyone would miss it.
> 
> (Of course, now /F will claim that PIL will break in 27 places because
> of this. :-)

Ok, I'll remove it from both implementations again... (there
was some email overlap).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From bwarsaw at cnri.reston.va.us  Tue Mar  7 20:24:39 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 7 Mar 2000 14:24:39 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <200003071352.IAA13571@eric.cnri.reston.va.us>
	<000701bf8857$a56ed660$a72d153f@tim>
Message-ID: <14533.22391.447739.901802@anthem.cnri.reston.va.us>

>>>>> "TP" == Tim Peters <tim_one at email.msn.com> writes:

    TP> 1. I don't know why JPython doesn't execute __del__ methods at
    TP> all now, but have to suspect that the Java rules imply an
    TP> implementation so grossly inefficient in the presence of
    TP> __del__ that Barry simply doesn't want to endure the speed
    TP> complaints.

Actually, it was JimH that discovered this performance gotcha.  The
problem is that if you want to support __del__, you've got to take the
finalize() hit for every instance (i.e. PyInstance object) and it's
just not worth it.

<doing!> I just realized that it would be relatively trivial to add a
subclass of PyInstance differing only in that it has a finalize()
method which would invoke __del__().  Now when the class gets defined,
the __del__() would be mined and cached and we'd look at that cache
when creating an instance.  If there's a function there, we create a
PyFinalizableInstance, otherwise we create a PyInstance.  The cache
means you couldn't dynamically add a __del__ later, but I don't think
that's a big deal.  It wouldn't be hard to look up the __del__ every
time, but that'd be a hit for every instance creation (as opposed to
class creation), so again, it's probably not worth it.

I just did a quick and dirty hack and it seems at first blush to
work.  I'm sure there's something I'm missing :).

For those of you who don't care about JPython, you can skip the rest.

Okay, first the Python script to exercise this, then the
PyFinalizableInstance.java file, and then the diffs to PyClass.java.

JPython-devers, is it worth adding this?

-------------------- snip snip --------------------del.py
class B:
    def __del__(self):
        print 'In my __del__'

b = B()
del b

from java.lang import System
System.gc()
-------------------- snip snip --------------------PyFinalizableInstance.java
// Copyright ? Corporation for National Research Initiatives

// These are just like normal instances, except that their classes included
// a definition for __del__(), i.e. Python's finalizer.  These two instance
// types have to be separated due to Java performance issues.

package org.python.core;

public class PyFinalizableInstance extends PyInstance 
{
    public PyFinalizableInstance(PyClass iclass) {
        super(iclass);
    }

    // __del__ method is invoked upon object finalization.
    protected void finalize() {
        __class__.__del__.__call__(this);
    }
}
-------------------- snip snip --------------------
Index: PyClass.java
===================================================================
RCS file: /projects/cvsroot/jpython/dist/org/python/core/PyClass.java,v
retrieving revision 2.8
diff -c -r2.8 PyClass.java
*** PyClass.java	1999/10/04 20:44:28	2.8
--- PyClass.java	2000/03/07 19:02:29
***************
*** 21,27 ****
          
      // Store these methods for performance optimization
      // These are only used by PyInstance
!     PyObject __getattr__, __setattr__, __delattr__, __tojava__;
  
      // Holds the classes for which this is a proxy
      // Only used when subclassing from a Java class
--- 21,27 ----
          
      // Store these methods for performance optimization
      // These are only used by PyInstance
!     PyObject __getattr__, __setattr__, __delattr__, __tojava__, __del__;
  
      // Holds the classes for which this is a proxy
      // Only used when subclassing from a Java class
***************
*** 111,116 ****
--- 111,117 ----
          __setattr__ = lookup("__setattr__", false);
          __delattr__ = lookup("__delattr__", false);
          __tojava__ = lookup("__tojava__", false);
+         __del__ = lookup("__del__", false);
      }
          
      protected void findModule(PyObject dict) {
***************
*** 182,188 ****
      }
  
      public PyObject __call__(PyObject[] args, String[] keywords) {
!         PyInstance inst = new PyInstance(this);
          inst.__init__(args, keywords);
          return inst;
      }
--- 183,194 ----
      }
  
      public PyObject __call__(PyObject[] args, String[] keywords) {
!         PyInstance inst;
!         if (__del__ == null)
!             inst = new PyInstance(this);
!         else
!             // the class defined an __del__ method
!             inst = new PyFinalizableInstance(this);
          inst.__init__(args, keywords);
          return inst;
      }


From bwarsaw at cnri.reston.va.us  Tue Mar  7 20:35:44 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 7 Mar 2000 14:35:44 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <000701bf8857$a56ed660$a72d153f@tim>
	<200003071733.MAA14926@eric.cnri.reston.va.us>
Message-ID: <14533.23056.517661.633574@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido at python.org> writes:

    GvR> Maybe some source code or class analysis looking for a
    GvR> __del__ could fix this, at the cost of not allowing one to
    GvR> patch __del__ into an existing class after instances have
    GvR> already been created.  I don't find that breach of dynamicism
    GvR> a big deal -- e.g. CPython keeps copies of __getattr__,
    GvR> __setattr__ and __delattr__ in the class for similar reasons.

For those of you who enter the "Being Guido van Rossum" door like I
just did, please keep in mind that it dumps you out not on the NJ
Turnpike, but in the little ditch back behind CNRI.  Stop by and say
hi after you brush yourself off.

-Barry


From Tim_Peters at Dragonsys.com  Tue Mar  7 23:30:16 2000
From: Tim_Peters at Dragonsys.com (Tim_Peters at Dragonsys.com)
Date: Tue, 7 Mar 2000 17:30:16 -0500
Subject: [Python-Dev] finalization again
Message-ID: <8525689B.007AB2BA.00@notes-mta.dragonsys.com>

[Guido]
> Tim tells Guido again that he finds the Java rules bad, slinging some
> mud at Guy Steele, but without explaining what the problem with them
> is ...

Slinging mud?  Let's back off here.  You've read the Java spec and were
impressed.  That's fine -- it is impressive <wink>.  But go on from
there and see where it leads in practice.  That Java's GC model did a
masterful job but includes a finalization model users dislike is really
just conventional wisdom in the Java world.  My sketch of Guy Steele's
involvement was an attempt to explain why both halves of that are valid.

I didn't think "explaining the problem" was necessary, as it's been
covered in depth multiple times in c.l.py threads, by Java programmers
as well as by me.  Searching the web for articles about this turns up
many; the first one I hit is typical:

    http://www.quoininc.com/quoininc/Design_Java0197.html

eventually concludes

    Consequently we recommend that [Java] programmers support but do
    not rely on finalization. That is, place all finalization semantics
    in finalize() methods, but call those methods explicitly and in the
    order required.  The points below provide more detail.

That's par for the Java course:  advice to write finalizers to survive
being called multiple times, call them explicitly, and do all you can
to ensure that the "by magic" call is a nop.  The lack of ordering
rules in the language forces people to "do it by hand" (as the Java
spec acknowledges: "It is straightforward to implement a Java class
that will cause a set of finalizer-like methods to be invoked in a
specified order for a set of objects when all the objects become
unreachable. Defining such a class is left as an exercise for the
reader."  But from what I've seen, that exercise is beyond the
imagination of most Java programmers!  The perceived need for ordering
is not.).

It's fine that you want to restrict finalizers to "simple" cases; it's
not so fine if the language can't ensure that simple cases are the only
ones the user can write, & can neither detect & complain at runtime
about cases it didn't intend to support.  The Java spec is unhelpful
here too:

   Therefore, we recommend that the design of finalize methods be kept
   simple and that they be programmed defensively, so that they will
   work in all cases.

Mom and apple pie, but what does it mean, exactly?  The spec realizes
that you're going to be tempted to try things that won't work, but
can't really explain what those are in terms simpler than the full set
of implementation consequences.  As a result, users hate it -- but
don't take my word for that!  If you look & don't find that Java's
finalization rules are widely viewed as "a problem to be wormed around"
by serious Java programmers, fine -- then you've got a much better
search engine than mine <wink>.

As for why I claim following topsort rules is very likely to work out
better, they follow from the nature of the problem, and can be
explained as such, independent of implementation details.  See the
Boehm reference for more about topsort.

will-personally-use-python-regardless-ly y'rs  - tim


From guido at python.org  Wed Mar  8 01:50:38 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 07 Mar 2000 19:50:38 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Tue, 07 Mar 2000 17:30:16 EST."
             <8525689B.007AB2BA.00@notes-mta.dragonsys.com> 
References: <8525689B.007AB2BA.00@notes-mta.dragonsys.com> 
Message-ID: <200003080050.TAA19264@eric.cnri.reston.va.us>

> [Guido]
> > Tim tells Guido again that he finds the Java rules bad, slinging some
> > mud at Guy Steele, but without explaining what the problem with them
> > is ...
> 
> Slinging mud?  Let's back off here.  You've read the Java spec and were
> impressed.  That's fine -- it is impressive <wink>.  But go on from
> there and see where it leads in practice.  That Java's GC model did a
> masterful job but includes a finalization model users dislike is really
> just conventional wisdom in the Java world.  My sketch of Guy Steele's
> involvement was an attempt to explain why both halves of that are valid.

Granted.  I can read Java code and sometimes I write some, but I'm not
a Java programmer by any measure, and I wasn't aware that finalize()
has a general bad rep.

> I didn't think "explaining the problem" was necessary, as it's been
> covered in depth multiple times in c.l.py threads, by Java programmers
> as well as by me.  Searching the web for articles about this turns up
> many; the first one I hit is typical:
> 
>     http://www.quoininc.com/quoininc/Design_Java0197.html
> 
> eventually concludes
> 
>     Consequently we recommend that [Java] programmers support but do
>     not rely on finalization. That is, place all finalization semantics
>     in finalize() methods, but call those methods explicitly and in the
>     order required.  The points below provide more detail.
> 
> That's par for the Java course:  advice to write finalizers to survive
> being called multiple times, call them explicitly, and do all you can
> to ensure that the "by magic" call is a nop.

It seems the authors make one big mistake: they recommend to call
finalize() explicitly.  This may be par for the Java course: the
quality of the materials is often poor, and that has to be taken into
account when certain features have gotten a bad rep.  (These authors
also go on at length about the problems of GC in a real-time situation
-- attempts to use Java in sutations for which it is inappropriate are
also par for the cours, inspired by all the hype.)

Note that e.g. Bruce Eckel in "Thinking in Java" makes it clear that
you should never call finalize() explicitly (except that you should
always call super.fuinalize() in your finalize() method).  (Bruce goes
on at length explaining that there aren't a lot of things you should
use finalize() for -- except to observe the garbage collector. :-)

> The lack of ordering
> rules in the language forces people to "do it by hand" (as the Java
> spec acknowledges: "It is straightforward to implement a Java class
> that will cause a set of finalizer-like methods to be invoked in a
> specified order for a set of objects when all the objects become
> unreachable. Defining such a class is left as an exercise for the
> reader."  But from what I've seen, that exercise is beyond the
> imagination of most Java programmers!  The perceived need for ordering
> is not.).

True, but note that Python won't have the ordering problem, at least
not as long as we stick to reference counting as the primary means of
GC.  The ordering problem in Python will only happen when there are
cycles, and there you really can't blame the poor GC design!

> It's fine that you want to restrict finalizers to "simple" cases; it's
> not so fine if the language can't ensure that simple cases are the only
> ones the user can write, & can neither detect & complain at runtime
> about cases it didn't intend to support.  The Java spec is unhelpful
> here too:
> 
>    Therefore, we recommend that the design of finalize methods be kept
>    simple and that they be programmed defensively, so that they will
>    work in all cases.
> 
> Mom and apple pie, but what does it mean, exactly?  The spec realizes
> that you're going to be tempted to try things that won't work, but
> can't really explain what those are in terms simpler than the full set
> of implementation consequences.  As a result, users hate it -- but
> don't take my word for that!  If you look & don't find that Java's
> finalization rules are widely viewed as "a problem to be wormed around"
> by serious Java programmers, fine -- then you've got a much better
> search engine than mine <wink>.

Hm.  Of course programmers hate finalizers.  They hate GC as well.
But they hate even more not to have it (witness the relentless
complaints about Python's "lack of GC" -- and Java's GC is often
touted as one of the reasons for its superiority over C++).

I think this stuff is just hard!  (Otherwise why would we be here
having this argument?)

> As for why I claim following topsort rules is very likely to work out
> better, they follow from the nature of the problem, and can be
> explained as such, independent of implementation details.  See the
> Boehm reference for more about topsort.

Maybe we have a disconnect?  We *are* using topsort -- for
non-cyclical data structures.  Reference counting ensure that.
Nothing in my design changes that.  The issue at hand is what to do
with *cyclical* data structures, where topsort doesn't help.  Boehm,
on http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html,
says: "Cycles involving one or more finalizable objects are never
finalized."

The question remains, what to do with trash cycles?  I find having a
separate __cleanup__ protocol cumbersome.  I think that the "finalizer
only called once by magic" rule is reasonable.  I believe that the
ordering problems will be much less than in Java, because we use
topsort whenever we can.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Wed Mar  8 07:25:56 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 8 Mar 2000 01:25:56 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <200003080050.TAA19264@eric.cnri.reston.va.us>
Message-ID: <001401bf88c7$29f2a320$452d153f@tim>

[Guido]
> Granted.  I can read Java code and sometimes I write some, but I'm not
> a Java programmer by any measure, and I wasn't aware that finalize()
> has a general bad rep.

It does, albeit often for bad reasons.

1. C++ programmers seeking to emulate techniques based on C++'s
   rigid specification of the order and timing of destruction of autos.

2. People pushing the limits (as in the URL I happened to post).

3. People trying to do anything <wink>.  Java's finalization semantics
   are very weak, and s-l-o-w too (under most current implementations).

Now I haven't used Java for real in about two years, and avoided finalizers
completely when I did use it.  I can't recall any essential use of __del__ I
make in Python code, either.  So what Python does here makes no personal
difference to me.  However, I frequently respond to complaints & questions
on c.l.py, and don't want to get stuck trying to justify Java's uniquely
baroque rules outside of comp.lang.java <0.9 wink>.

>> [Tim, passes on the first relevant URL he finds:
>>  http://www.quoininc.com/quoininc/Design_Java0197.html]

> It seems the authors make one big mistake: they recommend to call
> finalize() explicitly.  This may be par for the Java course: the
> quality of the materials is often poor, and that has to be taken into
> account when certain features have gotten a bad rep.

Well, in the "The Java Programming Language", Gosling recommends to:

a) Add a method called close(), that tolerates being called multiple
   times.

b) Write a finalize() method whose body calls close().

People tended to do that at first, but used a bunch of names other than
"close" too.  I guess people eventually got weary of having two methods that
did the same thing, so decided to just use the single name Java guaranteed
would make sense.

> (These authors also go on at length about the problems of GC in a real-
> time situation -- attempts to use Java in sutations for which it is
> inappropriate are also par for the course, inspired by all the hype.)

I could have picked any number of other URLs, but don't regret picking this
one:  you can't judge a ship in smooth waters, and people will push *all*
features beyond their original intents.  Doing so exposes weaknesses.
Besides, Sun won't come out & say Java is unsuitable for real-time, no
matter how obvious it is <wink>.

> Note that e.g. Bruce Eckel in "Thinking in Java" makes it clear that
> you should never call finalize() explicitly (except that you should
> always call super.fuinalize() in your finalize() method).

You'll find lots of conflicting advice here, be it about Java or C++.  Java
may be unique, though, in the universality of the conclusion Bruce draws
here:

> (Bruce goes on at length explaining that there aren't a lot of things
> you should use finalize() for -- except to observe the garbage collector.
:-)

Frankly, I think Java would be better off without finalizers.  Python could
do fine without __del__ too -- if you and I were the only users <0.6 wink>.

[on Java's lack of ordering promises]
> True, but note that Python won't have the ordering problem, at least
> not as long as we stick to reference counting as the primary means of
> GC.  The ordering problem in Python will only happen when there are
> cycles, and there you really can't blame the poor GC design!

I cannot.  Nor do I intend to.  The cyclic ordering problem isn't GC's
fault, it's the program's; but GC's *response* to it is entirely GC's
responsibility.

>> ... The Java spec is unhelpful here too:
>>
>>  Therefore, we recommend that the design of finalize methods be kept
>>  simple and that they be programmed defensively, so that they will
>>  work in all cases.
>>
>> Mom and apple pie, but what does it mean, exactly?  The spec realizes
>> that you're going to be tempted to try things that won't work, but
>> can't really explain what those are in terms simpler than the full set
>> of implementation consequences.  As a result, users hate it -- but
>> don't take my word for that!  If you look & don't find that Java's
>> finalization rules are widely viewed as "a problem to be wormed around"
>> by serious Java programmers, fine -- then you've got a much better
>> search engine than mine <wink>.

> Hm.  Of course programmers hate finalizers.

Oh no!  C++ programmers *love* destructors!  I mean it, they're absolutely
gaga over them.  I haven't detected signs that CPython programmers hate
__del__ either, except at shutdown time.  Regardless of language, they love
them when they're predictable and work as expected, they hate them when
they're unpredictable and confusing.  C++ auto destructors are extremely
predictable (e.g., after "{SomeClass a, b; ...}", b is destructed before a,
and both destructions are guaranteed before leaving the block they're
declared in, regardless of whether via return, exception, goto or falling
off the end).  CPython's __del__ is largely predictable (modulo shutdown,
cycles, and sometimes exceptions).  The unhappiness in the Java world comes
from Java finalizers' unpredictability and consequent all-around uselessness
in messy real life.

> They hate GC as well.

Yes, when it's unpredictable and confusing <wink>.

> But they hate even more not to have it (witness the relentless
> complaints about Python's "lack of GC" -- and Java's GC is often
> touted as one of the reasons for its superiority over C++).

Back when JimF & I were looking at gc, we may have talked each other into
really believing that paying careful attention to RC issues leads to cleaner
and more robust designs.  In fact, I still believe that, and have never
clamored for "real gc" in Python.  Jim now may even be opposed to "real gc".
But Jim and I and you all think a lot about the art of programming, and most
users just don't have time or inclination for that -- the slowly changing
nature of c.l.py is also clear evidence of this.  I'm afraid this makes
growing "real GC" a genuine necessity for Python's continued growth.  It's
not a *bad* thing in any case.  Think of it as a marketing requirement <0.7
wink>.

> I think this stuff is just hard!  (Otherwise why would we be here
> having this argument?)

Honest to Guido, I think it's because you're sorely tempted to go down an
un-Pythonic path here, and I'm fighting that.  I said early on there are no
thoroughly good answers (yes, it's hard), but that's nothing new for Python!
We're having this argument solely because you're confusing Python with some
other language <wink>.

[a 2nd or 3rd plug for taking topsort seriously]
> Maybe we have a disconnect?

Not in the technical analysis, but in what conclusions to take from it.

> We *are* using topsort -- for non-cyclical data structures.  Reference
> counting ensure that. Nothing in my design changes that.

And it's great!  Everyone understands the RC rules pretty quickly, lots of
people like them a whole lot, and if it weren't for cyclic trash everything
would be peachy.

> The issue at hand is what to do with *cyclical* data structures, where
> topsort doesn't help.  Boehm, on
> http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html,
> says: "Cycles involving one or more finalizable objects are never
> finalized."

This is like some weird echo chamber, where the third time I shout something
the first one comes back without any distortion at all <wink>.  Yes, Boehm's
first rule is "Do No Harm".  It's a great rule.  Python follows the same
rule all over the place; e.g., when you see

    x = "4" + 2

you can't possibly know what was intended, so you refuse to guess:  you
would rather *kill* the program than make a blind guess!  I see cycles with
finalizers as much the same:  it's plain wrong to guess when you can't
possibly know what was intended.  Because topsort is the only principled way
to decide order of finalization, and they've *created* a situation where a
topsort doesn't exist, what they're handing you is no less amibiguous than
in trying to add a string to an int.  This isn't the time to abandon topsort
as inconvenient, it's the time to defend it as inviolate principle!

The only throughly rational response is "you know, this doesn't make
sense -- since I can't know what you want here, I refuse to pretend that I
can".  Since that's "the right" response everywhere else in Python, what the
heck is so special about this case?  It's like you decided Python *had* to
allow adding strings to ints, and now we're going to argue about whether
Perl, Awk or Tcl makes the best unprincipled guess <wink>.

> The question remains, what to do with trash cycles?

A trash cycle without a finalizer isn't a problem, right?  In that case,
topsort rules have no visible consquence so it doesn't matter in what order
you merely reclaim the memory.

If it has an object with a finalizer, though, at the very worst you can let
it leak, and  make the collection of leaked objects available for
inspection.  Even that much is a *huge* "improvement" over what they have
today:  most cycles won't have a finalizer and so will get reclaimed, and
for the rest they'll finally have a simple way to identify exactly where the
problem is, and a simple criterion for predicting when it will happen.  If
that's not "good enough", then without abandoning principle the user needs
to have some way to reduce such a cycle *to* a topsort case themself.

> I find having a separate __cleanup__ protocol cumbersome.

Same here, but if you're not comfortable leaking, and you agree Python is
not in the business of guesing in inherently ambiguous situations, maybe
that's what it takes!  MAL and GregS both gravitated to this kind of thing
at once, and that's at least suggestive; and MAL has actually been using his
approach.  It's explicit, and that's Pythonic on the face of it.

> I think that the "finalizer only called once by magic" rule is reasonable.

If it weren't for its specific use in emulating Java's scheme, would you
still be in favor of that?  It's a little suspicious that it never came up
before <wink>.

> I believe that the ordering problems will be much less than in Java,
because
> we use topsort whenever we can.

No argument here, except that I believe there's never sufficient reason to
abandon topsort ordering.  Note that BDW's adamant refusal to yield on this
hasn't stopped "why doesn't Python use BDW?" from becoming a FAQ <wink>.

a-case-where-i-expect-adhering-to-principle-is-more-pragmatic-
    in-the-end-ly y'rs  - tim


From tim_one at email.msn.com  Wed Mar  8 08:48:24 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 8 Mar 2000 02:48:24 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
Message-ID: <001801bf88d2$af0037c0$452d153f@tim>

Mike has a darned good point here.  Anyone have a darned good answer <wink>?

-----Original Message-----
From: python-list-admin at python.org [mailto:python-list-admin at python.org]
On Behalf Of Mike Fletcher
Sent: Tuesday, March 07, 2000 2:08 PM
To: Python Listserv (E-mail)
Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be
adopted?

Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage
all over the hard-disk, traffic rerouted through the bit-bucket, you aren't
getting to work anytime soon Mrs. Programmer) and wondering why we have a
FAQ instead of having the win32pipe stuff rolled into the os module to fix
it.  Is there some incompatibility?  Is there a licensing problem?

Ideas?
Mike
__________________________________
 Mike C. Fletcher
 Designer, VR Plumber
 http://members.home.com/mcfletch

-- 
http://www.python.org/mailman/listinfo/python-list


From mal at lemburg.com  Wed Mar  8 09:36:57 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 09:36:57 +0100
Subject: [Python-Dev] finalization again
References: <8525689B.007AB2BA.00@notes-mta.dragonsys.com> <200003080050.TAA19264@eric.cnri.reston.va.us>
Message-ID: <38C61129.2F8C9E95@lemburg.com>

> [Guido]
> The question remains, what to do with trash cycles?  I find having a
> separate __cleanup__ protocol cumbersome.  I think that the "finalizer
> only called once by magic" rule is reasonable.  I believe that the
> ordering problems will be much less than in Java, because we use
> topsort whenever we can.

Note that the __cleanup__ protocol is intended to break cycles
*before* calling the garbage collector. After those cycles are broken,
ordering is not a problem anymore and because __cleanup__ can
do its task on a per-object basis all magic is left in the hands
of the programmer.

The __cleanup__ protocol as I use it is designed to be called
in situations where the system knows that all references into
a cycle are about to be dropped (I typically use small cyclish
object systems in my application, e.g. ones that create and
reference namespaces which include a reference to the hosting
object itself). In my application that is done by using mxProxies
at places where I know these cyclic object subsystems are being
referenced. In Python the same could be done whenever the
interpreter knows that a certain object is about to be
deleted, e.g. during shutdown (important for embedding Python
in other applications such as Apache) or some other major
subsystem finalization, e.g. unload of a module or killing
of a thread (yes, I know these are nonos, but they could
be useful, esp. the thread kill operation in multi-threaded
servers).

After __cleanup__ has done its thing, the finalizer can either
choose to leave all remaining cycles in memory (and leak) or
apply its own magic to complete the task. In any case, __del__
should be called when the refcount reaches 0. (I find it somewhat
strange that people are argueing to keep external resources
alive even though there is a chance of freeing them.)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Mar  8 09:46:14 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 09:46:14 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <001801bf88d2$af0037c0$452d153f@tim>
Message-ID: <38C61356.E0598DBF@lemburg.com>

Tim Peters wrote:
> 
> Mike has a darned good point here.  Anyone have a darned good answer <wink>?
> 
> -----Original Message-----
> From: python-list-admin at python.org [mailto:python-list-admin at python.org]
> On Behalf Of Mike Fletcher
> Sent: Tuesday, March 07, 2000 2:08 PM
> To: Python Listserv (E-mail)
> Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be
> adopted?
> 
> Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage
> all over the hard-disk, traffic rerouted through the bit-bucket, you aren't
> getting to work anytime soon Mrs. Programmer) and wondering why we have a
> FAQ instead of having the win32pipe stuff rolled into the os module to fix
> it.  Is there some incompatibility?  Is there a licensing problem?
> 
> Ideas?

I'd suggest moving the popen from the C modules into os.py
as Python API and then applying all necessary magic to either
use the win32pipe implementation (if available) or the native
C one from the posix module in os.py.

Unless, of course, the win32 stuff (or some of it) makes it into
the core.

I'm mostly interested in this for my platform.py module... 
BTW, is there any interest of moving it into the core ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Wed Mar  8 13:10:53 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 08 Mar 2000 07:10:53 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: Your message of "Wed, 08 Mar 2000 09:46:14 +0100."
             <38C61356.E0598DBF@lemburg.com> 
References: <001801bf88d2$af0037c0$452d153f@tim>  
            <38C61356.E0598DBF@lemburg.com> 
Message-ID: <200003081210.HAA19931@eric.cnri.reston.va.us>

> Tim Peters wrote:
> > 
> > Mike has a darned good point here.  Anyone have a darned good answer <wink>?
> > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be
> > adopted?
> > 
> > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage
> > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't
> > getting to work anytime soon Mrs. Programmer) and wondering why we have a
> > FAQ instead of having the win32pipe stuff rolled into the os module to fix
> > it.  Is there some incompatibility?  Is there a licensing problem?

MAL:
> I'd suggest moving the popen from the C modules into os.py
> as Python API and then applying all necessary magic to either
> use the win32pipe implementation (if available) or the native
> C one from the posix module in os.py.
> 
> Unless, of course, the win32 stuff (or some of it) makes it into
> the core.

No concrete plans -- except that I think the registry access is
supposed to go in.  Haven't seen the code on patches at python.org yet
though.

> I'm mostly interested in this for my platform.py module... 
> BTW, is there any interest of moving it into the core ?

"it" == platform.py?  Little interest from me personally; I suppose it
could go in Tools/scripts/...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Wed Mar  8 15:06:53 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 08 Mar 2000 09:06:53 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Wed, 08 Mar 2000 01:25:56 EST."
             <001401bf88c7$29f2a320$452d153f@tim> 
References: <001401bf88c7$29f2a320$452d153f@tim> 
Message-ID: <200003081406.JAA20033@eric.cnri.reston.va.us>

> A trash cycle without a finalizer isn't a problem, right?  In that case,
> topsort rules have no visible consquence so it doesn't matter in what order
> you merely reclaim the memory.

When we have a pile of garbage, we don't know whether it's all
connected or whether it's lots of little cycles.  So if we find
[objects with -- I'm going to omit this] finalizers, we have to put
those on a third list and put everything reachable from them on that
list as well (the algorithm I described before).

What's left on the first list then consists of finalizer-free garbage.
We dispose of this garbage by clearing dicts and lists.  Hopefully
this makes the refcount of some of the finalizers go to zero -- those
are finalized in the normal way.

And now we have to deal with the inevitable: finalizers that are part
of cycles.  It makes sense to reduce the graph of objects to a graph
of finalizers only.  Example:

  A <=> b -> C <=> d

A and C have finalizers.  C is part of a cycle (C-d) that contains no
other finalizers, but C is also reachable from A.  A is part of a
cycle (A-b) that keeps it alive.  The interesting thing here is that
if we only look at the finalizers, there are no cycles!

If we reduce the graph to only finalizers (setting aside for now the
problem of how to do that -- we may need to allocate more memory to
hold the reduced greaph), we get:

  A -> C

We can now finalize A (even though its refcount is nonzero!).  And
that's really all we can do!  A could break its own cycle, thereby
disposing of itself and b.  It could also break C's cycle, disposing
of C and d.  It could do nothing.  Or it could resurrect A, thereby
resurrecting all of A, b, C, and d.

This leads to (there's that weird echo again :-) Boehm's solution:
Call A's finalizer and leave the rest to the next time the garbage
collection runs.

Note that we're now calling finalizers on objects with a non-zero
refcount.  At some point (probably as a result of finalizing A) its
refcount will go to zero.  We should not finalize it again -- this
would serve no purpose.  Possible solution:

  INCREF(A);
  A->__del__();
  if (A->ob_refcnt == 1)
      A->__class__ = NULL; /* Make a finalizer-less */
  DECREF(A);

This avoids finalizing twice if the first finalization broke all
cycles in which A is involved.  But if it doesn't, A is still cyclical
garbage with a finalizer!  Even if it didn't resurrect itself.

Instead of the code fragment above, we could mark A as "just
finalized" and when it shows up at the head of the tree (of finalizers
in cyclical trash) again on the next garbage collection, to discard it
without calling the finalizer again (because this clearly means that
it didn't resurrect itself -- at least not for a very long time).

I would be happier if we could still have a rule that says that a
finalizer is called only once by magic -- even if we have two forms of
magic: refcount zero or root of the tree.  Tim: I don't know if you
object against this rule as a matter of principle (for the sake of
finalizers that resurrect the object) or if your objection is really
against the unordered calling of finalizers legitimized by Java's
rules.  I hope the latter, since I think it that this rule (__del__
called only once by magic) by itself is easy to understand and easy to
deal with, and I believe it may be necessary to guarantee progress for
the garbage collector.

The problem is that the collector can't easily tell whether A has
resurrected itself.  Sure, if the refcount is 1 after the finalizer
run, I know it didn't resurrect itself.  But even if it's higher than
before, that doesn't mean it's resurrected: it could have linked to
itself.  Without doing a full collection I can't tell the difference.
If I wait until a full collection happens again naturally, and look at
the "just finalized flag", I can't tell the difference between the
case whereby the object resurrected itself but died again before the
next collection, and the case where it was dead already.  So I don't
know how many times it was expecting the "last rites" to be performed,
and the object can't know whether to expect them again or not.  This
seems worse than the only-once rule to me.

Even if someone once found a good use for resurrecting inside __del__,
against all recommendations, I don't mind breaking their code, if it's
for a good cause.  The Java rules aren't a good cause.  But top-sorted
finalizer calls seem a worthy cause.

So now we get to discuss what to do with multi-finalizer cycles, like:

  A <=> b <=> C

Here the reduced graph is:

  A <=> C

About this case you say:

> If it has an object with a finalizer, though, at the very worst you can let
> it leak, and  make the collection of leaked objects available for
> inspection.  Even that much is a *huge* "improvement" over what they have
> today:  most cycles won't have a finalizer and so will get reclaimed, and
> for the rest they'll finally have a simple way to identify exactly where the
> problem is, and a simple criterion for predicting when it will happen.  If
> that's not "good enough", then without abandoning principle the user needs
> to have some way to reduce such a cycle *to* a topsort case themself.
> 
> > I find having a separate __cleanup__ protocol cumbersome.
> 
> Same here, but if you're not comfortable leaking, and you agree Python is
> not in the business of guesing in inherently ambiguous situations, maybe
> that's what it takes!  MAL and GregS both gravitated to this kind of thing
> at once, and that's at least suggestive; and MAL has actually been using his
> approach.  It's explicit, and that's Pythonic on the face of it.
> 
> > I think that the "finalizer only called once by magic" rule is reasonable.
> 
> If it weren't for its specific use in emulating Java's scheme, would you
> still be in favor of that?  It's a little suspicious that it never came up
> before <wink>.

Suspicious or not, it still comes up.  I still like it.  I still think
that playing games with resurrection is evil.  (Maybe my spiritual
beliefs shine through here -- I'm a convinced atheist. :-)

Anyway, once-only rule aside, we still need a protocol to deal with
cyclical dependencies between finalizers.  The __cleanup__ approach is
one solution, but it also has a problem: we have a set of finalizers.
Whose __cleanup__ do we call?  Any?  All?  Suggestions?

Note that I'd like some implementation freedom: I may not want to
bother with the graph reduction algorithm at first (which seems very
hairy) so I'd like to have the right to use the __cleanup__ API
as soon as I see finalizers in cyclical trash.  I don't mind disposing
of finalizer-free cycles first, but once I have more than one
finalizer left in the remaining cycles, I'd like the right not to
reduce the graph for topsort reasons -- that algorithm seems hard.

So we're back to the __cleanup__ design.  Strawman proposal: for all
finalizers in a trash cycle, call their __cleanup__ method, in
arbitrary order.  After all __cleanup__ calls are done, if the objects
haven't all disposed of themselves, they are all garbage-collected
without calling __del__.  (This seems to require another garbage
colelction cycle -- so perhaps there should also be a once-only rule
for __cleanup__?)

Separate question: what if there is no __cleanup__?  This should
probably be reported: "You have cycles with finalizers, buddy!  What
do you want to do about them?"  This same warning could be given when
there is a __cleanup__ but it doesn't break all cycles.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed Mar  8 14:34:06 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 14:34:06 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <001801bf88d2$af0037c0$452d153f@tim>  
	            <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us>
Message-ID: <38C656CE.B0ACFF35@lemburg.com>

Guido van Rossum wrote:
> 
> > Tim Peters wrote:
> > >
> > > Mike has a darned good point here.  Anyone have a darned good answer <wink>?
> > > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be
> > > adopted?
> > >
> > > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage
> > > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't
> > > getting to work anytime soon Mrs. Programmer) and wondering why we have a
> > > FAQ instead of having the win32pipe stuff rolled into the os module to fix
> > > it.  Is there some incompatibility?  Is there a licensing problem?
> 
> MAL:
> > I'd suggest moving the popen from the C modules into os.py
> > as Python API and then applying all necessary magic to either
> > use the win32pipe implementation (if available) or the native
> > C one from the posix module in os.py.
> >
> > Unless, of course, the win32 stuff (or some of it) makes it into
> > the core.
> 
> No concrete plans -- except that I think the registry access is
> supposed to go in.  Haven't seen the code on patches at python.org yet
> though.

Ok, what about the optional "use win32pipe if available" idea then ?
 
> > I'm mostly interested in this for my platform.py module...
> > BTW, is there any interest of moving it into the core ?
> 
> "it" == platform.py? 

Right.

> Little interest from me personally; I suppose it
> could go in Tools/scripts/...

Hmm, it wouldn't help much in there I guess... after all, it defines
APIs which are to be queried by other scripts. The default
action to print the platform information to stdout is just
a useful addition.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Wed Mar  8 15:33:53 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 08 Mar 2000 09:33:53 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: Your message of "Wed, 08 Mar 2000 14:34:06 +0100."
             <38C656CE.B0ACFF35@lemburg.com> 
References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us>  
            <38C656CE.B0ACFF35@lemburg.com> 
Message-ID: <200003081433.JAA20177@eric.cnri.reston.va.us>

> > MAL:
> > > I'd suggest moving the popen from the C modules into os.py
> > > as Python API and then applying all necessary magic to either
> > > use the win32pipe implementation (if available) or the native
> > > C one from the posix module in os.py.
> > >
> > > Unless, of course, the win32 stuff (or some of it) makes it into
> > > the core.
[Guido]
> > No concrete plans -- except that I think the registry access is
> > supposed to go in.  Haven't seen the code on patches at python.org yet
> > though.
> 
> Ok, what about the optional "use win32pipe if available" idea then ?

Sorry, I meant please send me the patch!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at acm.org  Wed Mar  8 15:59:46 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 8 Mar 2000 09:59:46 -0500 (EST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us>
References: <001801bf88d2$af0037c0$452d153f@tim>
	<38C61356.E0598DBF@lemburg.com>
	<200003081210.HAA19931@eric.cnri.reston.va.us>
Message-ID: <14534.27362.139106.701784@weyr.cnri.reston.va.us>

Guido van Rossum writes:
 > "it" == platform.py?  Little interest from me personally; I suppose it
 > could go in Tools/scripts/...

  I think platform.py is pretty nifty, but I'm not entirely sure how
it's expected to be used.  Perhaps Marc-Andre could explain further
the motivation behind the module?
  My biggest requirement is that it be accompanied by documentation.
The coolness factor and shared use of hackerly knowledge would
probably get *me* to put it in, but there are a lot of things about
which I'll disagree with Guido just to hear his (well-considered)
thoughts on the matter.  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From mal at lemburg.com  Wed Mar  8 18:37:43 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 18:37:43 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 ... code for thought.
References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us>  
	            <38C656CE.B0ACFF35@lemburg.com> <200003081433.JAA20177@eric.cnri.reston.va.us>
Message-ID: <38C68FE7.63943C5C@lemburg.com>

Guido van Rossum wrote:
> 
> > > MAL:
> > > > I'd suggest moving the popen from the C modules into os.py
> > > > as Python API and then applying all necessary magic to either
> > > > use the win32pipe implementation (if available) or the native
> > > > C one from the posix module in os.py.
> > > >
> > > > Unless, of course, the win32 stuff (or some of it) makes it into
> > > > the core.
> [Guido]
> > > No concrete plans -- except that I think the registry access is
> > > supposed to go in.  Haven't seen the code on patches at python.org yet
> > > though.
> >
> > Ok, what about the optional "use win32pipe if available" idea then ?
> 
> Sorry, I meant please send me the patch!

Here's the popen() interface I use in platform.py. It should
serve well as basis for a os.popen patch... (don't have time
to do it myself right now):

class _popen:

    """ Fairly portable (alternative) popen implementation.

        This is mostly needed in case os.popen() is not available, or
        doesn't work as advertised, e.g. in Win9X GUI programs like
        PythonWin or IDLE.

        XXX Writing to the pipe is currently not supported.

    """
    tmpfile = ''
    pipe = None
    bufsize = None
    mode = 'r'

    def __init__(self,cmd,mode='r',bufsize=None):

        if mode != 'r':
            raise ValueError,'popen()-emulation only support read mode'
        import tempfile
        self.tmpfile = tmpfile = tempfile.mktemp()
        os.system(cmd + ' > %s' % tmpfile)
        self.pipe = open(tmpfile,'rb')
        self.bufsize = bufsize
        self.mode = mode

    def read(self):

        return self.pipe.read()

    def readlines(self):

        if self.bufsize is not None:
            return self.pipe.readlines()

    def close(self,

              remove=os.unlink,error=os.error):

        if self.pipe:
            rc = self.pipe.close()
        else:
            rc = 255
        if self.tmpfile:
            try:
                remove(self.tmpfile)
            except error:
                pass
        return rc

    # Alias
    __del__ = close

def popen(cmd, mode='r', bufsize=None):

    """ Portable popen() interface.
    """
    # Find a working popen implementation preferring win32pipe.popen
    # over os.popen over _popen
    popen = None
    if os.environ.get('OS','') == 'Windows_NT':
        # On NT win32pipe should work; on Win9x it hangs due to bugs
        # in the MS C lib (see MS KnowledgeBase article Q150956)
        try:
            import win32pipe
        except ImportError:
            pass
        else:
            popen = win32pipe.popen
    if popen is None:
        if hasattr(os,'popen'):
            popen = os.popen
            # Check whether it works... it doesn't in GUI programs
            # on Windows platforms
            if sys.platform == 'win32': # XXX Others too ?
                try:
                    popen('')
                except os.error:
                    popen = _popen
        else:
            popen = _popen
    if bufsize is None:
        return popen(cmd,mode)
    else:
        return popen(cmd,mode,bufsize)

if __name__ == '__main__':
    print """
I confirm that, to the best of my knowledge and belief, this
contribution is free of any claims of third parties under
copyright, patent or other rights or interests ("claims").  To
the extent that I have any such claims, I hereby grant to CNRI a
nonexclusive, irrevocable, royalty-free, worldwide license to
reproduce, distribute, perform and/or display publicly, prepare
derivative versions, and otherwise use this contribution as part
of the Python software and its related documentation, or any
derivative versions thereof, at no cost to CNRI or its licensed
users, and to authorize others to do so.

I acknowledge that CNRI may, at its sole discretion, decide
whether or not to incorporate this contribution in the Python
software and its related documentation.  I further grant CNRI
permission to use my name and other identifying information
provided to CNRI by me for use in connection with the Python
software and its related documentation.
"""

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Mar  8 18:44:59 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 18:44:59 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <001801bf88d2$af0037c0$452d153f@tim>
		<38C61356.E0598DBF@lemburg.com>
		<200003081210.HAA19931@eric.cnri.reston.va.us> <14534.27362.139106.701784@weyr.cnri.reston.va.us>
Message-ID: <38C6919B.EA3EE2E7@lemburg.com>

"Fred L. Drake, Jr." wrote:
> 
> Guido van Rossum writes:
>  > "it" == platform.py?  Little interest from me personally; I suppose it
>  > could go in Tools/scripts/...
> 
>   I think platform.py is pretty nifty, but I'm not entirely sure how
> it's expected to be used.  Perhaps Marc-Andre could explain further
> the motivation behind the module?

It was first intended to provide a way to format a platform
identifying file name for the mxCGIPython project and then
quickly moved on to provide many different APIs to query
platform specific information.

    architecture(executable='/usr/local/bin/python', bits='', linkage='') :
        Queries the given executable (defaults to the Python interpreter
        binary) for various architecture informations.
        
        Returns a tuple (bits,linkage) which contain information about
        the bit architecture and the linkage format used for the
        executable. Both values are returned as strings.
        
        Values that cannot be determined are returned as given by the
        parameter presets. If bits is given as '', the sizeof(long) is
        used as indicator for the supported pointer size.
        
        The function relies on the system's "file" command to do the
        actual work. This is available on most if not all Unix
        platforms. On some non-Unix platforms and then only if the
        executable points to the Python interpreter defaults from
        _default_architecture are used.

    dist(distname='', version='', id='') :
        Tries to determine the name of the OS distribution name
        
        The function first looks for a distribution release file in
        /etc and then reverts to _dist_try_harder() in case no
        suitable files are found.
        
        Returns a tuple distname,version,id which default to the
        args given as parameters.

    java_ver(release='', vendor='', vminfo=('', '', ''), osinfo=('', '', '')) :
        Version interface for JPython.
        
        Returns a tuple (release,vendor,vminfo,osinfo) with vminfo being
        a tuple (vm_name,vm_release,vm_vendor) and osinfo being a
        tuple (os_name,os_version,os_arch).
        
        Values which cannot be determined are set to the defaults
        given as parameters (which all default to '').

    libc_ver(executable='/usr/local/bin/python', lib='', version='') :
        Tries to determine the libc version against which the
        file executable (defaults to the Python interpreter) is linked.
        
        Returns a tuple of strings (lib,version) which default to the
        given parameters in case the lookup fails.
        
        Note that the function has intimate knowledge of how different
        libc versions add symbols to the executable is probably only
        useable for executables compiled using gcc. 
        
        The file is read and scanned in chunks of chunksize bytes.

    mac_ver(release='', versioninfo=('', '', ''), machine='') :
        Get MacOS version information and return it as tuple (release,
        versioninfo, machine) with versioninfo being a tuple (version,
        dev_stage, non_release_version).
        
        Entries which cannot be determined are set to ''. All tuple
        entries are strings.
        
        Thanks to Mark R. Levinson for mailing documentation links and
        code examples for this function. Documentation for the
        gestalt() API is available online at:
        
           http://www.rgaros.nl/gestalt/

    machine() :
        Returns the machine type, e.g. 'i386'
        
        An empty string is returned if the value cannot be determined.

    node() :
        Returns the computer's network name (may not be fully qualified !)
        
        An empty string is returned if the value cannot be determined.

    platform(aliased=0, terse=0) :
        Returns a single string identifying the underlying platform
        with as much useful information as possible (but no more :).
        
        The output is intended to be human readable rather than
        machine parseable. It may look different on different
        platforms and this is intended.
        
        If "aliased" is true, the function will use aliases for
        various platforms that report system names which differ from
        their common names, e.g. SunOS will be reported as
        Solaris. The system_alias() function is used to implement
        this.
        
        Setting terse to true causes the function to return only the
        absolute minimum information needed to identify the platform.

    processor() :
        Returns the (true) processor name, e.g. 'amdk6'
        
        An empty string is returned if the value cannot be
        determined. Note that many platforms do not provide this
        information or simply return the same value as for machine(),
        e.g.  NetBSD does this.

    release() :
        Returns the system's release, e.g. '2.2.0' or 'NT'
        
        An empty string is returned if the value cannot be determined.

    system() :
        Returns the system/OS name, e.g. 'Linux', 'Windows' or 'Java'.
        
        An empty string is returned if the value cannot be determined.

    system_alias(system, release, version) :
        Returns (system,release,version) aliased to common
        marketing names used for some systems.
        
        It also does some reordering of the information in some cases
        where it would otherwise cause confusion.

    uname() :
        Fairly portable uname interface. Returns a tuple
        of strings (system,node,release,version,machine,processor)
        identifying the underlying platform.
        
        Note that unlike the os.uname function this also returns
        possible processor information as additional tuple entry.
        
        Entries which cannot be determined are set to ''.

    version() :
        Returns the system's release version, e.g. '#3 on degas'
        
        An empty string is returned if the value cannot be determined.

    win32_ver(release='', version='', csd='', ptype='') :
        Get additional version information from the Windows Registry
        and return a tuple (version,csd,ptype) referring to version
        number, CSD level and OS type (multi/single
        processor).
        
        As a hint: ptype returns 'Uniprocessor Free' on single
        processor NT machines and 'Multiprocessor Free' on multi
        processor machines. The 'Free' refers to the OS version being
        free of debugging code. It could also state 'Checked' which
        means the OS version uses debugging code, i.e. code that
        checks arguments, ranges, etc. (Thomas Heller).
        
        Note: this functions only works if Mark Hammond's win32
        package is installed and obviously only runs on Win32
        compatible platforms.
        
        XXX Is there any way to find out the processor type on WinXX ?
        
        XXX Is win32 available on Windows CE ?
        
        Adapted from code posted by Karl Putland to comp.lang.python.

>   My biggest requirement is that it be accompanied by documentation.
> The coolness factor and shared use of hackerly knowledge would
> probably get *me* to put it in, but there are a lot of things about
> which I'll disagree with Guido just to hear his (well-considered)
> thoughts on the matter.  ;)

The module is doc-string documented (see above).
This should server well as basis for the latex docs.

--
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From DavidA at ActiveState.com  Wed Mar  8 19:36:01 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Wed, 8 Mar 2000 10:36:01 -0800
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us>
Message-ID: <NDBBJPNCJLKKIOBLDOMJOEHOCBAA.DavidA@ActiveState.com>

> "it" == platform.py?  Little interest from me personally; I suppose it
> could go in Tools/scripts/...

FWIW, I think it belongs in the standard path. It allows one to do the
equivalent of
if os.platform == '...'  but in a much more useful way.

--david


From mhammond at skippinet.com.au  Wed Mar  8 22:36:12 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu, 9 Mar 2000 08:36:12 +1100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us>
Message-ID: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>

> No concrete plans -- except that I think the registry access is
> supposed to go in.  Haven't seen the code on patches at python.org yet
> though.

FYI, that is off with Trent who is supposed to be testing it on the Alpha.

Re win32pipe - I responded to that post suggesting that we do with os.pipe
and win32pipe what was done with os.path.abspath/win32api - optionally try
to import the win32 specific module and use it.

My only "concern" is that this then becomes more code for Guido to maintain
in the core, even though Guido has expressed a desire to get out of the
installers business.

Assuming the longer term plan is for other people to put together
installation packages, and that these people are free to redistribute
win32api/win32pipe, Im wondering if it is worth bothering with?

Mark.


From trentm at ActiveState.com  Wed Mar  8 15:42:06 2000
From: trentm at ActiveState.com (Trent Mick)
Date: Wed, 8 Mar 2000 14:42:06 -0000
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <38C6919B.EA3EE2E7@lemburg.com>
Message-ID: <NDBBKLNNJCFFMINBECLEMEIGCDAA.trentm@ActiveState.com>

MAL:
>     architecture(executable='/usr/local/bin/python', bits='',
> linkage='') :
>
>         Values that cannot be determined are returned as given by the
>         parameter presets. If bits is given as '', the sizeof(long) is
>         used as indicator for the supported pointer size.

Just a heads up, using sizeof(long) will not work on forthcoming WIN64
(LLP64 data model) to determine the supported pointer size. You would want
to use the 'P' struct format specifier instead, I think (I am speaking in
relative ignorance). However, the docs say that a PyInt is used to store 'P'
specified value, which as a C long, will not hold a pointer on LLP64. Hmmmm.
The keyword perhaps is "forthcoming".

This is the code in question in platform.py:

    # Use the sizeof(long) as default number of bits if nothing
    # else is given as default.
    if not bits:
        import struct
        bits = str(struct.calcsize('l')*8) + 'bit'


Guido:
> > No concrete plans -- except that I think the registry access is
> > supposed to go in.  Haven't seen the code on patches at python.org yet
> > though.
>
Mark Hammond:
> FYI, that is off with Trent who is supposed to be testing it on the Alpha.

My Alpha is in pieces right now! I will get to it soon. I will try it on
Win64 as well, if I can.


Trent


Trent Mick
trentm at activestate.com


From guido at python.org  Thu Mar  9 03:59:51 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 08 Mar 2000 21:59:51 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: Your message of "Thu, 09 Mar 2000 08:36:12 +1100."
             <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au> 
References: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au> 
Message-ID: <200003090259.VAA20928@eric.cnri.reston.va.us>

> My only "concern" is that this then becomes more code for Guido to maintain
> in the core, even though Guido has expressed a desire to get out of the
> installers business.

Theoretically, it shouldn't need much maintenance.  I'm more concerned
that it will have different semantics than on Unix so that in practice
you'd need to know about the platform anyway (apart from the fact that
the installed commands are different, of course).

> Assuming the longer term plan is for other people to put together
> installation packages, and that these people are free to redistribute
> win32api/win32pipe, Im wondering if it is worth bothering with?

So that everybody could use os.popen() regardless of whether they're
on Windows or Unix.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond at skippinet.com.au  Thu Mar  9 04:31:21 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu, 9 Mar 2000 14:31:21 +1100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <200003090259.VAA20928@eric.cnri.reston.va.us>
Message-ID: <ECEPKNMJLHAPFFJHDOJBKEDLCGAA.mhammond@skippinet.com.au>

[Me]
> > Assuming the longer term plan is for other people to put together
> > installation packages, and that these people are free to redistribute
> > win32api/win32pipe, Im wondering if it is worth bothering with?

[Guido]
> So that everybody could use os.popen() regardless of whether they're
> on Windows or Unix.

Sure.  But what I meant was "should win32pipe code move into the core, or
should os.pipe() just auto-detect and redirect to win32pipe if installed?"

I was suggesting that over the longer term, it may be reasonable to assume
that win32pipe _will_ be installed, as everyone who releases installers for
Python should include it :-)  It could also be written in such a way that it
prints a warning message when win32pipe doesnt exist, so in 99% of cases, it
will answer the FAQ before they have had a chance to ask it :-)

It also should be noted that the win32pipe support for popen on Windows
95/98 includes a small, dedicated .exe - this just adds to the maintenance
burden.

But it doesnt worry me at all what happens - I was just trying to save you
work <wink>.  Anyone is free to take win32pipe and move the relevant code
into the core anytime they like, with my and Bill's blessing.  It quite
suits me that people have to download win32all to get this working, so I
doubt I will get around to it any time soon :-)

Mark.


From tim_one at email.msn.com  Thu Mar  9 04:52:58 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 8 Mar 2000 22:52:58 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>
Message-ID: <000401bf897a$f5a7e620$0d2d153f@tim>

I had another take on all this, which I'll now share <wink> since nobody
seems inclined to fold in the Win32 popen:  perhaps os.popen should not be
supported at all under Windows!

The current function is a mystery wrapped in an enigma -- sometimes it
works, sometimes it doesn't, and I've never been able to outguess which one
will obtain (there's more to it than just whether a console window is
attached).  If it's not reliable (it's not), and we can't document the
conditions under which it can be used safely (I can't), Python shouldn't
expose it.

Failing that, the os.popen docs should caution it's "use at your own risk"
under Windows, and that this is directly inherited from MS's popen
implementation.


From tim_one at email.msn.com  Thu Mar  9 10:40:26 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 04:40:26 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <200003081406.JAA20033@eric.cnri.reston.va.us>
Message-ID: <000701bf89ab$80cb8e20$0d2d153f@tim>

[Guido, with some implementation details and nice examples]

Normally I'd eat this up -- today I'm gasping for air trying to stay afloat.
I'll have to settle for sketching the high-level approach I've had in the
back of my mind.  I start with the pile of incestuous stuff Toby/Neil
discovered have no external references.  It consists of dead cycles, and
perhaps also non-cycles reachable only from dead cycles.

1. The "points to" relation on this pile defines a graph G.

2. From any graph G, we can derive a related graph G' consisting of the
maximal strongly connected components (SCCs) of G.  Each (super)node of G'
is an SCC of G, where (super)node A' of G' points to (super)node B' of G'
iff there exists a node A in A' that points to (wrt G) some node B in B'.
It's not obvious, but the SCCs can be found in linear time (via Tarjan's
algorithm, which is simple but subtle; Cyclops.py uses a much dumber
brute-force approach, which is nevertheless perfectly adequate in the
absence of massively large cycles -- premature optimization is the root etc
<0.5 wink>).

3. G' is necessarily a DAG.  For if distinct A' and B' are both reachable
from each other in G', then every pair of A in A' and B in B' are reachable
from each other in G, contradicting that A' and B' are distinct maximal SCCs
(that is, the union of A' and B' is also an SCC).

4. The point to all this:  Every DAG can be topsorted.  Start with the nodes
of G' without predecessors.  There must be at least one, because G' is a
DAG.

5. For every node A' in G' without predecessors (wrt G'), it either does or
does not contain an object with a potentially dangerous finalizer.  If it
does not, let's call it a safe node.  If there are no safe nodes without
predecessors, GC is stuck, and for good reason:  every object in the whole
pile is reachable from an object with a finalizer, which could change the
topology in near-arbitrary ways.  The unsafe nodes without predecessors (and
again, by #4, there must be at least one) are the heart of the problem, and
this scheme identifies them precisely.

6. Else there is a safe node A'.  For each A in A', reclaim it, following
the normal refcount rules (or in an implementation w/o RC, by following a
topsort of "points to" in the original G).  This *may* cause reclamation of
an object X with a finalizer outside of A'.  But doing so cannot cause
resurrection of anything in A' (X is reachable from A' else cleaning up A'
couldn't have affected X, and if anything in A' were also reachable from X,
X would have been in A' to begin with (SCC!), contradicting that A' is
safe).  So the objects in A' can get reclaimed without difficulty.

7. The simplest thing to do now is just stop:  rebuild it from scratch the
next time the scheme is invoked.  If it was *possible* to make progress
without guessing, we did; and if it was impossible, we identified the
precise SCC(s) that stopped us.  Anything beyond that is optimization <0.6
wink>.

Seems the most valuable optimization would be to keep track of whether an
object with a finalizer gets reclaimed in step 6 (so long as that doesn't
happen, the mutations that can occur to the structure of G' seem nicely
behaved enough that it should be possible to loop back to step #5 without
crushing pain).


On to Guido's msg:

[Guido]
> When we have a pile of garbage, we don't know whether it's all
> connected or whether it's lots of little cycles.  So if we find
> [objects with -- I'm going to omit this] finalizers, we have to put
> those on a third list and put everything reachable from them on that
> list as well (the algorithm I described before).

SCC determination gives precise answers to all that.

> What's left on the first list then consists of finalizer-free garbage.
> We dispose of this garbage by clearing dicts and lists.  Hopefully
> this makes the refcount of some of the finalizers go to zero -- those
> are finalized in the normal way.

In Python it's even possible for a finalizer to *install* a __del__ method
that didn't previously exist, into the class of one of the objects on your
"first list".  The scheme above is meant to be bulletproof in the face of
abuses even I can't conceive of <wink>.

More mundanely, clearing an item on your first list can cause a chain of
events that runs a finalizer, which in turn can resurrect one of the objects
on your first list (and so it should *not* get reclaimed).  Without doing
the SCC bit, I don't think you can out-think that (the reasoning above
showed that the finalizer can't resurrect something in the *same* SCC as the
object that started it all, but that argument cannot be extended to objects
in other safe SCCs:  they're vulnerable).

> And now we have to deal with the inevitable: finalizers that are part
> of cycles.  It makes sense to reduce the graph of objects to a graph
> of finalizers only.  Example:
>
>   A <=> b -> C <=> d
>
> A and C have finalizers.  C is part of a cycle (C-d) that contains no
> other finalizers, but C is also reachable from A.  A is part of a
> cycle (A-b) that keeps it alive.  The interesting thing here is that
> if we only look at the finalizers, there are no cycles!

The scheme above derives G':

    A' -> C'

where A' consists of the A<=>b cycle and C' the C<=>d cycle.  That there are
no cycles in G' isn't surprising, it's just the natural consequence of doing
the natural analysis <wink>.  The scheme above refuses to do anything here,
because the only node in G' without a predecessor (namely A') isn't "safe".

> If we reduce the graph to only finalizers (setting aside for now the
> problem of how to do that -- we may need to allocate more memory to
> hold the reduced greaph), we get:
>
>   A -> C

You should really have self-loops on both A and C, right? (because A is
reachable from itself via chasing pointers; ditto for C)

> We can now finalize A (even though its refcount is nonzero!).  And
> that's really all we can do!  A could break its own cycle, thereby
> disposing of itself and b.  It could also break C's cycle, disposing
> of C and d.  It could do nothing.  Or it could resurrect A, thereby
> resurrecting all of A, b, C, and d.
>
> This leads to (there's that weird echo again :-) Boehm's solution:
> Call A's finalizer and leave the rest to the next time the garbage
> collection runs.

This time the echo came back distorted <wink>:

   [Boehm]
   Cycles involving one or more finalizable objects are never finalized.

A<=>b is "a cycle involving one or more finalizable objects", so he won't
touch it.  The scheme at the top doesn't either.  If you handed him your
*derived* graph (but also without the self-loops), he would; me too.  KISS!

> Note that we're now calling finalizers on objects with a non-zero
> refcount.

I don't know why you want to do this.  As the next several paragraphs
confirm, it creates real headaches for the implementation, and I'm unclear
on what it buys in return.  Is "we'll do something by magic for cycles with
no more than one finalizer" a major gain for the user over "we'll do
something by magic for cycles with no finalizer"?  0, 1 and infinity *are*
the only interesting numbers <wink>, but the difference between 0 and 1
*here* doesn't seem to me worth signing up for any pain at all.

> At some point (probably as a result of finalizing A) its
> refcount will go to zero.  We should not finalize it again -- this
> would serve no purpose.

I don't believe BDW (or the scheme at the top) has this problem (simply
because the only way to run finalizer in a cycle under them is for the user
to break the cycle explicitly -- so if an object's finalizer gets run, the
user caused it directly, and so can never claim surprise).

>  Possible solution:
>
>   INCREF(A);
>   A->__del__();
>   if (A->ob_refcnt == 1)
>       A->__class__ = NULL; /* Make a finalizer-less */
>   DECREF(A);
>
> This avoids finalizing twice if the first finalization broke all
> cycles in which A is involved.  But if it doesn't, A is still cyclical
> garbage with a finalizer!  Even if it didn't resurrect itself.
>
> Instead of the code fragment above, we could mark A as "just
> finalized" and when it shows up at the head of the tree (of finalizers
> in cyclical trash) again on the next garbage collection, to discard it
> without calling the finalizer again (because this clearly means that
> it didn't resurrect itself -- at least not for a very long time).

I don't think you need to do any of this -- unless you think you need to do
the thing that created the need for this, which I didn't think you needed to
do either <wink>.

> I would be happier if we could still have a rule that says that a
> finalizer is called only once by magic -- even if we have two forms of
> magic: refcount zero or root of the tree.  Tim: I don't know if you
> object against this rule as a matter of principle (for the sake of
> finalizers that resurrect the object) or if your objection is really
> against the unordered calling of finalizers legitimized by Java's
> rules.  I hope the latter, since I think it that this rule (__del__
> called only once by magic) by itself is easy to understand and easy to
> deal with, and I believe it may be necessary to guarantee progress for
> the garbage collector.

My objections to Java's rules have been repeated enough.

I would have no objection to "__del__ called only once" if it weren't for
that Python currently does something different.  I don't know whether people
rely on that now; if they do, it's a much more dangerous thing to change
than adding a new keyword (the compiler gives automatic 100% coverage of the
latter; but nothing mechanical can help people track down reliance-- whether
deliberate or accidental --on the former).

My best *guess* is that __del__ is used rarely; e.g., there are no more than
40 instances of it in the whole CVS tree, including demo directories; and
they all look benign (at least three have bodies consisting of "pass"!).
The most complicated one I found in my own code is:

    def __del__(self):
        self.break_cycles()

    def break_cycles(self):
        for rule in self.rules:
            if rule is not None:
                rule.cleanse()

But none of this self-sampling is going to comfort some guy in France who
has a megaline of code relying on it.  Good *bet*, though <wink>.

> [and another cogent explanation of why breaking the "leave cycles with
>  finalizers" alone injunction creates headaches]

> ...
> Even if someone once found a good use for resurrecting inside __del__,
> against all recommendations, I don't mind breaking their code, if it's
> for a good cause.  The Java rules aren't a good cause.  But top-sorted
> finalizer calls seem a worthy cause.

They do to me too, except that I say even a cycle involving but a single
object (w/ finalizer) looping on itself is the user's problem.

> So now we get to discuss what to do with multi-finalizer cycles, like:
>
>   A <=> b <=> C
>
> Here the reduced graph is:
>
>   A <=> C

The SCC reduction is simply to

    A

and, right, the scheme at the top punts.

> [more the on once-only rule chopped]
> ...
> Anyway, once-only rule aside, we still need a protocol to deal with
> cyclical dependencies between finalizers.  The __cleanup__ approach is
> one solution, but it also has a problem: we have a set of finalizers.
> Whose __cleanup__ do we call?  Any?  All?  Suggestions?

This is why a variant of guardians were more appealing to me at first:  I
could ask a guardian for the entire SCC, so I get the *context* of the
problem as well as the final microscopic symptom.

I see Marc-Andre already declined to get sucked into the magical part of
this <wink>.  Greg should speak for his scheme, and I haven't made time to
understand it fully; my best guess is to call x.__cleanup__ for every object
in the SCC (but there's no clear way to decide which order to call them in,
and unless they're more restricted than __del__ methods they can create all
the same problems __del__ methods can!).

> Note that I'd like some implementation freedom: I may not want to
> bother with the graph reduction algorithm at first (which seems very
> hairy) so I'd like to have the right to use the __cleanup__ API
> as soon as I see finalizers in cyclical trash.  I don't mind disposing
> of finalizer-free cycles first, but once I have more than one
> finalizer left in the remaining cycles, I'd like the right not to
> reduce the graph for topsort reasons -- that algorithm seems hard.

I hate to be realistic <wink>, but modern GC algorithms are among the
hardest you'll ever see in any field; even the outer limits of what we've
talked about here is baby stuff.  Sun's Java group (the one in Chelmsford,
MA, down the road from me) had a group of 4+ people (incl. the venerable Mr.
Steele) working full-time for over a year on the last iteration of Java's
GC.  The simpler BDW is a megabyte of code spread over 100+ files.  Etc --
state of the art GC can be crushingly hard.

So I've got nothing against taking shortcuts at first -- there's actually no
realistic alternative.  I think we're overlooking the obvious one, though:
if any finalizer appears in any trash cycle, tough luck.  Python 3000 --
which may be a spelling of 1.7 <wink>, but doesn't *need* to be a spelling
of 1.6.

> So we're back to the __cleanup__ design.  Strawman proposal: for all
> finalizers in a trash cycle, call their __cleanup__ method, in
> arbitrary order.  After all __cleanup__ calls are done, if the objects
> haven't all disposed of themselves, they are all garbage-collected
> without calling __del__.  (This seems to require another garbage
> colelction cycle -- so perhaps there should also be a once-only rule
> for __cleanup__?)
>
> Separate question: what if there is no __cleanup__?  This should
> probably be reported: "You have cycles with finalizers, buddy!  What
> do you want to do about them?"  This same warning could be given when
> there is a __cleanup__ but it doesn't break all cycles.

If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly
1" isn't special to me), I will consider it to be a bug.  So I want a way to
get it back from gc, so I can see what the heck it is, so I can fix my code
(or harass whoever did it to me).  __cleanup__ suffices for that, so the
very act of calling it is all I'm really after ("Python invoked __cleanup__
== Tim has a bug").

But after I outgrow that <wink>, I'll certainly want the option to get
another kind of complaint if __cleanup__ doesn't break the cycles, and after
*that* I couldn't care less.  I've given you many gracious invitations to
say that you don't mind leaking in the face of a buggy program <wink>, but
as you've declined so far, I take it that never hearing another gripe about
leaking is a Primary Life Goal.  So collection without calling __del__ is
fine -- but so is collection with calling it!  If we're going to (at least
implicitly) approve of this stuff, it's probably better *to* call __del__,
if for no other reason than to catch your case of some poor innocent object
caught in a cycle not of its making that expects its __del__ to abort
starting World War III if it becomes unreachable <wink>.

whatever-we-don't-call-a-mistake-is-a-feature-ly y'rs  - tim


From fdrake at acm.org  Thu Mar  9 15:25:35 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 9 Mar 2000 09:25:35 -0500 (EST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <000401bf897a$f5a7e620$0d2d153f@tim>
References: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>
	<000401bf897a$f5a7e620$0d2d153f@tim>
Message-ID: <14535.46175.991970.135642@weyr.cnri.reston.va.us>

Tim Peters writes:
 > Failing that, the os.popen docs should caution it's "use at your own risk"
 > under Windows, and that this is directly inherited from MS's popen
 > implementation.

Tim (& others),
  Would this additional text be sufficient for the os.popen()
documentation?

	\strong{Note:} This function behaves unreliably under Windows
        due to the native implementation of \cfunction{popen()}.

  If someone cares to explain what's weird about it, that might be
appropriate as well, but I've never used this under Windows.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From mal at lemburg.com  Thu Mar  9 15:42:37 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 09 Mar 2000 15:42:37 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <NDBBKLNNJCFFMINBECLEMEIGCDAA.trentm@ActiveState.com>
Message-ID: <38C7B85D.E6090670@lemburg.com>

Trent Mick wrote:
> 
> MAL:
> >     architecture(executable='/usr/local/bin/python', bits='',
> > linkage='') :
> >
> >         Values that cannot be determined are returned as given by the
> >         parameter presets. If bits is given as '', the sizeof(long) is
> >         used as indicator for the supported pointer size.
> 
> Just a heads up, using sizeof(long) will not work on forthcoming WIN64
> (LLP64 data model) to determine the supported pointer size. You would want
> to use the 'P' struct format specifier instead, I think (I am speaking in
> relative ignorance). However, the docs say that a PyInt is used to store 'P'
> specified value, which as a C long, will not hold a pointer on LLP64. Hmmmm.
> The keyword perhaps is "forthcoming".
> 
> This is the code in question in platform.py:
> 
>     # Use the sizeof(long) as default number of bits if nothing
>     # else is given as default.
>     if not bits:
>         import struct
>         bits = str(struct.calcsize('l')*8) + 'bit'

Python < 1.5.2 doesn't support 'P', but anyway, I'll change
those lines according to your suggestion.
 
Does struct.calcsize('P')*8 return 64 on 64bit-platforms as
it should (probably ;) ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jim at interet.com  Thu Mar  9 16:45:54 2000
From: jim at interet.com (James C. Ahlstrom)
Date: Thu, 09 Mar 2000 10:45:54 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <000401bf897a$f5a7e620$0d2d153f@tim>
Message-ID: <38C7C732.D9086C34@interet.com>

Tim Peters wrote:
> 
> I had another take on all this, which I'll now share <wink> since nobody
> seems inclined to fold in the Win32 popen:  perhaps os.popen should not be
> supported at all under Windows!
> 
> The current function is a mystery wrapped in an enigma -- sometimes it
> works, sometimes it doesn't, and I've never been able to outguess which one
> will obtain (there's more to it than just whether a console window is
> attached).  If it's not reliable (it's not), and we can't document the
> conditions under which it can be used safely (I can't), Python shouldn't
> expose it.

OK, I admit I don't understand this either, but here goes...

It looks like Python popen() uses the Windows _popen() function.
The _popen() docs say that it creates a spawned copy of the command
processor (shell) with the given string argument.  It further states
that
it does NOT work in a Windows program and ONLY works when called from a
Windows Console program.


From tim_one at email.msn.com  Thu Mar  9 18:14:17 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 12:14:17 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <38C7C732.D9086C34@interet.com>
Message-ID: <000401bf89ea$e6e54180$79a0143f@tim>

[James C. Ahlstrom]
> OK, I admit I don't understand this either, but here goes...
>
> It looks like Python popen() uses the Windows _popen() function.
> The _popen() docs say ...

Screw the docs.  Pretend you're a newbie and *try* it.  Here:

import os
p = os.popen("dir")
while 1:
    line = p.readline()
    if not line:
        break
    print line

Type that in by hand, or stick it in a file & run it from a cmdline
python.exe (which is a Windows console program).  Under Win95 the process
freezes solid, and even trying to close the DOS box doesn't work.  You have
to bring up the task manager and kill it that way.  I once traced this under
the debugger -- it's hung inside an MS DLL.  "dir" is not entirely arbitrary
here:  for *some* cmds it works fine, for others not.  The set of which work
appears to vary across Windows flavors.  Sometimes you can worm around it by
wrapping "a bad" cmd in a .bat file, and popen'ing the latter instead; but
sometimes not.

After hours of poke-&-hope (in the past), as I said, I've never been able to
predict which cases will work.

> ...
> It further states that it does NOT work in a Windows program and ONLY
> works when called from a Windows Console program.

The latter is a necessary condition but not sufficient; don't know what *is*
sufficient, and AFAIK nobody else does either.

> From this I assume that popen() works from python.exe (it is a Console
> app) if the command can be directly executed by the shell (like "dir"),

See above for a counterexample to both <wink>.  I actually have much better
luck with cmds command.com *doesn't* know anything about.  So this appears
to vary by shell too.

> ...
> If there is something wrong with _popen() then the way to fix it is
> to avoid using it and create the pipes directly.

libc pipes ares as flaky as libc popen under Windows, Jim!  MarkH has the
only versions of these things that come close to working under Windows (he
wraps the native Win32 spellings of these things; MS's libc entry points
(which Python uses now) are much worse).

> ...
> Of course, the strength of Python is portable code.  popen() should be
> fixed the right way.

pipes too, but users get baffled by popen much more often simply because
they try popen much more often.

there's-no-question-about-whether-it-works-right-it-doesn't-ly y'rs  - tim


From gstein at lyra.org  Thu Mar  9 18:47:23 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 09:47:23 -0800 (PST)
Subject: [Python-Dev] platform.py (was: Fixing os.popen on Win32 => is the win32pipe stuff
 going to be adopted?)
In-Reply-To: <38C7B85D.E6090670@lemburg.com>
Message-ID: <Pine.LNX.4.10.10003090946420.18225-100000@nebula.lyra.org>

On Thu, 9 Mar 2000, M.-A. Lemburg wrote:
>...
> Python < 1.5.2 doesn't support 'P', but anyway, I'll change
> those lines according to your suggestion.
>  
> Does struct.calcsize('P')*8 return 64 on 64bit-platforms as
> it should (probably ;) ?

Yes. It returns sizeof(void *).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Thu Mar  9 15:55:36 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 09 Mar 2000 15:55:36 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>
		<000401bf897a$f5a7e620$0d2d153f@tim> <14535.46175.991970.135642@weyr.cnri.reston.va.us>
Message-ID: <38C7BB68.9FAE3BE9@lemburg.com>

"Fred L. Drake, Jr." wrote:
> 
> Tim Peters writes:
>  > Failing that, the os.popen docs should caution it's "use at your own risk"
>  > under Windows, and that this is directly inherited from MS's popen
>  > implementation.
> 
> Tim (& others),
>   Would this additional text be sufficient for the os.popen()
> documentation?
> 
>         \strong{Note:} This function behaves unreliably under Windows
>         due to the native implementation of \cfunction{popen()}.
> 
>   If someone cares to explain what's weird about it, that might be
> appropriate as well, but I've never used this under Windows.

Ehm, hasn't anyone looked at the code I posted yesterday ?
It goes a long way to deal with these inconsistencies... even
though its not perfect (yet ;).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake at acm.org  Thu Mar  9 19:52:40 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 9 Mar 2000 13:52:40 -0500 (EST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
In-Reply-To: <38C7BB68.9FAE3BE9@lemburg.com>
References: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>
	<000401bf897a$f5a7e620$0d2d153f@tim>
	<14535.46175.991970.135642@weyr.cnri.reston.va.us>
	<38C7BB68.9FAE3BE9@lemburg.com>
Message-ID: <14535.62200.158087.102380@weyr.cnri.reston.va.us>

M.-A. Lemburg writes:
 > Ehm, hasn't anyone looked at the code I posted yesterday ?
 > It goes a long way to deal with these inconsistencies... even
 > though its not perfect (yet ;).

  I probably sent that before I'd read everything, and I'm not the one 
to change the popen() implementation.
  At this point, I'm waiting for someone who understands the details
to decide what happens (if anything) to the implementation before I
check in any changes to the docs.
  My inclination is to fix popen() on Windows to do the right thing,
but I don't know enough about pipes & process management on Windows to 
get into that fray.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From nascheme at enme.ucalgary.ca  Thu Mar  9 20:37:31 2000
From: nascheme at enme.ucalgary.ca (nascheme at enme.ucalgary.ca)
Date: Thu, 9 Mar 2000 12:37:31 -0700
Subject: [Python-Dev] finalization again
Message-ID: <20000309123731.A3664@acs.ucalgary.ca>

[Tim, explaining something I was thinking about more clearly than
I ever could]

>It's not obvious, but the SCCs can be found in linear time (via Tarjan's
>algorithm, which is simple but subtle;

Wow, it seems like it should be more expensive than that.  What
are the space requirements?  Also, does the simple algorithm you
used in Cyclops have a name?

>If there are no safe nodes without predecessors, GC is stuck,
>and for good reason: every object in the whole pile is reachable
>from an object with a finalizer, which could change the topology
>in near-arbitrary ways. The unsafe nodes without predecessors
>(and again, by #4, there must be at least one) are the heart of
>the problem, and this scheme identifies them precisely.

Exactly.  What is our policy on these unsafe nodes?  Guido seems
to feel that it is okay for the programmer to create them and
Python should have a way of collecting them.  Tim seems to feel
that the programmer should not create them in the first place.  I
agree with Tim.

If topological finalization is used, it is possible for the
programmer to design their classes so that this problem does not
happen.  This is explained on Hans Boehm's finalization web page.

If the programmer can or does not redesign their classes I don't
think it is unreasonable to leak memory.  We can link these
cycles to a global list of garbage or print a debugging message.
This is a large improvement over the current situation (ie.
leaking memory with no debugging even for cycles without
finalizers).


    Neil

-- 
"If you're a great programmer, you make all the routines depend on each
other, so little mistakes can really hurt you." -- Bill Gates, ca. 1985.


From gstein at lyra.org  Thu Mar  9 20:50:29 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 11:50:29 -0800 (PST)
Subject: [Python-Dev] finalization again
In-Reply-To: <20000309123731.A3664@acs.ucalgary.ca>
Message-ID: <Pine.LNX.4.10.10003091148180.18817-100000@nebula.lyra.org>

On Thu, 9 Mar 2000 nascheme at enme.ucalgary.ca wrote:
>...
> If the programmer can or does not redesign their classes I don't
> think it is unreasonable to leak memory.  We can link these
> cycles to a global list of garbage or print a debugging message.
> This is a large improvement over the current situation (ie.
> leaking memory with no debugging even for cycles without
> finalizers).

I think we throw an error (as a subclass of MemoryError).

As an alternative, is it possible to move those cycles to the garbage list
and then never look at them again? That would speed up future collection
processing.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From guido at python.org  Thu Mar  9 20:51:46 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 09 Mar 2000 14:51:46 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Thu, 09 Mar 2000 11:50:29 PST."
             <Pine.LNX.4.10.10003091148180.18817-100000@nebula.lyra.org> 
References: <Pine.LNX.4.10.10003091148180.18817-100000@nebula.lyra.org> 
Message-ID: <200003091951.OAA26184@eric.cnri.reston.va.us>

> As an alternative, is it possible to move those cycles to the garbage list
> and then never look at them again? That would speed up future collection
> processing.

With the current approach, that's almost automatic :-)

I'd rather reclaim the memory too.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gmcm at hypernet.com  Thu Mar  9 20:54:16 2000
From: gmcm at hypernet.com (Gordon McMillan)
Date: Thu, 9 Mar 2000 14:54:16 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <000401bf89ea$e6e54180$79a0143f@tim>
References: <38C7C732.D9086C34@interet.com>
Message-ID: <1259490837-400325@hypernet.com>

[Tim re popen on Windows]

...
> the debugger -- it's hung inside an MS DLL.  "dir" is not entirely arbitrary
> here:  for *some* cmds it works fine, for others not.  The set of which work
> appears to vary across Windows flavors.  Sometimes you can worm around it by
> wrapping "a bad" cmd in a .bat file, and popen'ing the latter instead; but
> sometimes not.

It doesn't work for commands builtin to whatever "shell" you're 
using. That's different between cmd and command, and the 
various flavors, versions and extensions thereof.

FWIW, I gave up a long time ago. I use redirection and a 
tempfile. The few times I've wanted "interactive" control, I've 
used Win32Process, dup'ed, inherited handles... the whole 9 
yards. Why? Look at all the questions about popen and child 
processes in general, on platforms where it *works*, (if it 
weren't for Donn Cave, nobody'd get it to work anywhere 
<wink>).
 
To reiterate Tim's point: *none* of the c runtime routines for 
process control on Windows are adequate (beyond os.system 
and living with a DOS box popping up). The raw Win32 
CreateProcess does everything you could possibly want, but 
takes a week or more to understand, (if this arg is a that, then 
that arg is a whatsit, and the next is limited to the values X 
and Z unless...).

your-brain-on-Windows-ly y'rs

- Gordon


From guido at python.org  Thu Mar  9 20:55:23 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 09 Mar 2000 14:55:23 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Thu, 09 Mar 2000 04:40:26 EST."
             <000701bf89ab$80cb8e20$0d2d153f@tim> 
References: <000701bf89ab$80cb8e20$0d2d153f@tim> 
Message-ID: <200003091955.OAA26217@eric.cnri.reston.va.us>

[Tim describes a more formal approach based on maximal strongly
connected components (SCCs).]

I like the SCC approach -- it's what I was struggling to invent but
came short of discovering.

However:

[me]
> > What's left on the first list then consists of finalizer-free garbage.
> > We dispose of this garbage by clearing dicts and lists.  Hopefully
> > this makes the refcount of some of the finalizers go to zero -- those
> > are finalized in the normal way.

[Tim]
> In Python it's even possible for a finalizer to *install* a __del__ method
> that didn't previously exist, into the class of one of the objects on your
> "first list".  The scheme above is meant to be bulletproof in the face of
> abuses even I can't conceive of <wink>.

Are you *sure* your scheme deals with this?  Let's look at an example.
(Again, lowercase nodes have no finalizers.)  Take G:

  a <=> b -> C

This is G' (a and b are strongly connected):

  a' -> C'

C is not reachable from any root node.  We decide to clear a and b.
Let's suppose we happen to clear b first.  This removes the last
reference to C, C's finalizer runs, and it installs a finalizer on
a.__class__.  So now a' has turned into A', and we're halfway
committing a crime we said we would never commit (touching cyclical
trash with finalizers).

I propose to disregard this absurd possibility, except to the extent
that Python shouldn't crash -- but we make no guarantees to the user.

> More mundanely, clearing an item on your first list can cause a chain of
> events that runs a finalizer, which in turn can resurrect one of the objects
> on your first list (and so it should *not* get reclaimed).  Without doing
> the SCC bit, I don't think you can out-think that (the reasoning above
> showed that the finalizer can't resurrect something in the *same* SCC as the
> object that started it all, but that argument cannot be extended to objects
> in other safe SCCs:  they're vulnerable).

I don't think so.  While my poor wording ("finalizer-free garbage")
didn't make this clear, my references to earlier algorithms were
intended to imply that this is garbage that consists of truly
unreachable objects.  I have three lists: let's call them T(rash),
R(oot-reachable), and F(inalizer-reachable).  The Schemenauer
c.s. algorithm moves all reachable nodes to R.  I then propose to move
all finalizers to F, and to run another pass of Schemenauer c.s. to
also move all finalizer-reachable (but not root-reachable) nodes to F.

I truly believe that (barring the absurdity of installing a new
__del__) the objects on T at this point cannot be resurrected by a
finalizer that runs, since they aren't reachable from any finalizers:
by virtue of Schemenauer c.s. (which computes a reachability closure
given some roots) anything reachable from a finalizer is on F by now
(if it isn't on R -- again, nothing on T is reachable from R, because
R is calculated a closure).

So, unless there's still a bug in my thinking here, I think that as
long as we only want to clear SCCs with 0 finalizers, T is exactly the
set of nodes we're looking for.

> This time the echo came back distorted <wink>:
> 
>    [Boehm]
>    Cycles involving one or more finalizable objects are never finalized.
> 
> A<=>b is "a cycle involving one or more finalizable objects", so he won't
> touch it.  The scheme at the top doesn't either.  If you handed him your
> *derived* graph (but also without the self-loops), he would; me too.  KISS!
> 
> > Note that we're now calling finalizers on objects with a non-zero
> > refcount.
> 
> I don't know why you want to do this.  As the next several paragraphs
> confirm, it creates real headaches for the implementation, and I'm unclear
> on what it buys in return.  Is "we'll do something by magic for cycles with
> no more than one finalizer" a major gain for the user over "we'll do
> something by magic for cycles with no finalizer"?  0, 1 and infinity *are*
> the only interesting numbers <wink>, but the difference between 0 and 1
> *here* doesn't seem to me worth signing up for any pain at all.

I do have a reason: if a maximal SCC has only one finalizer, there can
be no question about the ordering between finalizer calls.  And isn't
the whole point of this discussion to have predictable ordering of
finalizer calls in the light of trash recycling?

> I would have no objection to "__del__ called only once" if it weren't for
> that Python currently does something different.  I don't know whether people
> rely on that now; if they do, it's a much more dangerous thing to change
> than adding a new keyword (the compiler gives automatic 100% coverage of the
> latter; but nothing mechanical can help people track down reliance-- whether
> deliberate or accidental --on the former).
[...]
> But none of this self-sampling is going to comfort some guy in France who
> has a megaline of code relying on it.  Good *bet*, though <wink>.

OK -- so your objection is purely about backwards compatibility.
Apart from that, I strongly feel that the only-once rule is a good
one.  And I don't think that the compatibility issue weighs very
strongly here (given all the other problems that typically exist with
__del__).

> I see Marc-Andre already declined to get sucked into the magical part of
> this <wink>.  Greg should speak for his scheme, and I haven't made time to
> understand it fully; my best guess is to call x.__cleanup__ for every object
> in the SCC (but there's no clear way to decide which order to call them in,
> and unless they're more restricted than __del__ methods they can create all
> the same problems __del__ methods can!).

Yes, but at least since we're defining a new API (in a reserved
portion of the method namespace) there are no previous assumptions to
battle.

> > Note that I'd like some implementation freedom: I may not want to
> > bother with the graph reduction algorithm at first (which seems very
> > hairy) so I'd like to have the right to use the __cleanup__ API
> > as soon as I see finalizers in cyclical trash.  I don't mind disposing
> > of finalizer-free cycles first, but once I have more than one
> > finalizer left in the remaining cycles, I'd like the right not to
> > reduce the graph for topsort reasons -- that algorithm seems hard.
> 
> I hate to be realistic <wink>, but modern GC algorithms are among the
> hardest you'll ever see in any field; even the outer limits of what we've
> talked about here is baby stuff.  Sun's Java group (the one in Chelmsford,
> MA, down the road from me) had a group of 4+ people (incl. the venerable Mr.
> Steele) working full-time for over a year on the last iteration of Java's
> GC.  The simpler BDW is a megabyte of code spread over 100+ files.  Etc --
> state of the art GC can be crushingly hard.
> 
> So I've got nothing against taking shortcuts at first -- there's actually no
> realistic alternative.  I think we're overlooking the obvious one, though:
> if any finalizer appears in any trash cycle, tough luck.  Python 3000 --
> which may be a spelling of 1.7 <wink>, but doesn't *need* to be a spelling
> of 1.6.

Kind of sad though -- finally knowing about cycles and then not being
able to do anything about them.

> > So we're back to the __cleanup__ design.  Strawman proposal: for all
> > finalizers in a trash cycle, call their __cleanup__ method, in
> > arbitrary order.  After all __cleanup__ calls are done, if the objects
> > haven't all disposed of themselves, they are all garbage-collected
> > without calling __del__.  (This seems to require another garbage
> > colelction cycle -- so perhaps there should also be a once-only rule
> > for __cleanup__?)
> >
> > Separate question: what if there is no __cleanup__?  This should
> > probably be reported: "You have cycles with finalizers, buddy!  What
> > do you want to do about them?"  This same warning could be given when
> > there is a __cleanup__ but it doesn't break all cycles.
> 
> If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly
> 1" isn't special to me), I will consider it to be a bug.  So I want a way to
> get it back from gc, so I can see what the heck it is, so I can fix my code
> (or harass whoever did it to me).  __cleanup__ suffices for that, so the
> very act of calling it is all I'm really after ("Python invoked __cleanup__
> == Tim has a bug").
> 
> But after I outgrow that <wink>, I'll certainly want the option to get
> another kind of complaint if __cleanup__ doesn't break the cycles, and after
> *that* I couldn't care less.  I've given you many gracious invitations to
> say that you don't mind leaking in the face of a buggy program <wink>, but
> as you've declined so far, I take it that never hearing another gripe about
> leaking is a Primary Life Goal.  So collection without calling __del__ is
> fine -- but so is collection with calling it!  If we're going to (at least
> implicitly) approve of this stuff, it's probably better *to* call __del__,
> if for no other reason than to catch your case of some poor innocent object
> caught in a cycle not of its making that expects its __del__ to abort
> starting World War III if it becomes unreachable <wink>.

I suppose we can print some obnoxious message to stderr like

"""Your program has created cyclical trash involving one or more
objects with a __del__ method; calling their __cleanup__ method didn't
resolve the cycle(s).  I'm going to call the __del__ method(s) but I
can't guarantee that they will be called in a meaningful order,
because of the cyclical dependencies."""

But I'd still like to reclaim the memory.  If this is some
long-running server process that is executing arbitrary Python
commands sent to it by clients, it's not nice to leak, period.
(Because of this, I will also need to trace functions, methods and
modules -- these create massive cycles that currently require painful
cleanup.  Of course I also need to track down all the roots
then... :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gstein at lyra.org  Thu Mar  9 20:59:48 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 11:59:48 -0800 (PST)
Subject: [Python-Dev] finalization again
In-Reply-To: <200003091951.OAA26184@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003091157560.18817-100000@nebula.lyra.org>

On Thu, 9 Mar 2000, Guido van Rossum wrote:
> > As an alternative, is it possible to move those cycles to the garbage list
> > and then never look at them again? That would speed up future collection
> > processing.
> 
> With the current approach, that's almost automatic :-)
> 
> I'd rather reclaim the memory too.

Well, yah. I would too :-)  I'm at ApacheCon right now, so haven't read
the thread in detail, but it seems that people saw my algorithm as a bit
too complex. Bah. IMO, it's a pretty straightforward way for the
interpreter to get cycles cleaned up. (whether the objects in the cycles
are lists/dicts, class instances, or extension types!)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Thu Mar  9 21:18:06 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 12:18:06 -0800 (PST)
Subject: [Python-Dev] finalization again
In-Reply-To: <200003091955.OAA26217@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003091205510.18817-100000@nebula.lyra.org>

On Thu, 9 Mar 2000, Guido van Rossum wrote:
>...
> I don't think so.  While my poor wording ("finalizer-free garbage")
> didn't make this clear, my references to earlier algorithms were
> intended to imply that this is garbage that consists of truly
> unreachable objects.  I have three lists: let's call them T(rash),
> R(oot-reachable), and F(inalizer-reachable).  The Schemenauer
> c.s. algorithm moves all reachable nodes to R.  I then propose to move
> all finalizers to F, and to run another pass of Schemenauer c.s. to
> also move all finalizer-reachable (but not root-reachable) nodes to F.
>...
> [Tim Peters]
> > I see Marc-Andre already declined to get sucked into the magical part of
> > this <wink>.  Greg should speak for his scheme, and I haven't made time to
> > understand it fully; my best guess is to call x.__cleanup__ for every object
> > in the SCC (but there's no clear way to decide which order to call them in,
> > and unless they're more restricted than __del__ methods they can create all
> > the same problems __del__ methods can!).

My scheme was to identify objects in F, but only those with a finalizer
(not the closure). Then call __cleanup__ on each of them, in arbitrary
order. If any are left after the sequence of __cleanup__ calls, then I
call it an error.

[ note that my proposal defined checking for a finalizer by calling
  tp_clean(TPCLEAN_CARE_CHECK); this accounts for class instances and for
  extension types with "heavy" processing in tp_dealloc ]

The third step was to use tp_clean to try and clean all other objects in a
safe fashion. Specifically: the objects have no finalizers, so there is no
special care needed in finalizing, so this third step should nuke
references that are stored in the object. This means object pointers are
still valid (we haven't dealloc'd), but the insides have been emptied. If
the third step does not remove all cycles, then one of the PyType objects
did not remove all references during the tp_clean call.

>...
> > If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly
> > 1" isn't special to me), I will consider it to be a bug.  So I want a way to
> > get it back from gc, so I can see what the heck it is, so I can fix my code
> > (or harass whoever did it to me).  __cleanup__ suffices for that, so the
> > very act of calling it is all I'm really after ("Python invoked __cleanup__
> > == Tim has a bug").

Agreed.

>...
> I suppose we can print some obnoxious message to stderr like

A valid alternative to raising an exception, but it falls into the whole
trap of "where does stderr go?"

>...
> But I'd still like to reclaim the memory.  If this is some
> long-running server process that is executing arbitrary Python
> commands sent to it by clients, it's not nice to leak, period.

If an exception is raised, the top-level server loop can catch it, log the
error, and keep going. But yes: it will leak.

> (Because of this, I will also need to trace functions, methods and
> modules -- these create massive cycles that currently require painful
> cleanup.  Of course I also need to track down all the roots
> then... :-)

Yes. It would be nice to have these participate in the "cleanup protocol"
that I've described. It should help a lot at Python finalization time,
effectively moving some special casing from import.c to the objects
themselves.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From jim at interet.com  Thu Mar  9 21:20:23 2000
From: jim at interet.com (James C. Ahlstrom)
Date: Thu, 09 Mar 2000 15:20:23 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <000401bf89ea$e6e54180$79a0143f@tim>
Message-ID: <38C80787.7791A1A6@interet.com>

Tim Peters wrote:
> Screw the docs.  Pretend you're a newbie and *try* it.

I did try it.

> 
> import os
> p = os.popen("dir")
> while 1:
>     line = p.readline()
>     if not line:
>         break
>     print line
> 
> Type that in by hand, or stick it in a file & run it from a cmdline
> python.exe (which is a Windows console program).  Under Win95 the process
> freezes solid, and even trying to close the DOS box doesn't work.  You have
> to bring up the task manager and kill it that way.  I once traced this under

Point on the curve:  This program works perfectly on my
machine running NT.

> libc pipes ares as flaky as libc popen under Windows, Jim!  MarkH has the
> only versions of these things that come close to working under Windows (he
> wraps the native Win32 spellings of these things; MS's libc entry points
> (which Python uses now) are much worse).

I believe you when you say popen() is flakey.  It is a little
harder to believe it is not possible to write a _popen()
replacement using pipes which works.

Of course I wanted you to do it instead of me!  Well, if
I get any time before 1.6 comes out...

JimA


From gstein at lyra.org  Thu Mar  9 21:31:38 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 12:31:38 -0800 (PST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe
 stuff  going to be adopted?
In-Reply-To: <38C80787.7791A1A6@interet.com>
Message-ID: <Pine.LNX.4.10.10003091230510.18817-100000@nebula.lyra.org>

On Thu, 9 Mar 2000, James C. Ahlstrom wrote:
>...
> > libc pipes ares as flaky as libc popen under Windows, Jim!  MarkH has the
> > only versions of these things that come close to working under Windows (he
> > wraps the native Win32 spellings of these things; MS's libc entry points
> > (which Python uses now) are much worse).
> 
> I believe you when you say popen() is flakey.  It is a little
> harder to believe it is not possible to write a _popen()
> replacement using pipes which works.
> 
> Of course I wanted you to do it instead of me!  Well, if
> I get any time before 1.6 comes out...

It *has* been done. Bill Tutt did it a long time ago. That's what
win32pipe is all about.

-g

-- 
Greg Stein, http://www.lyra.org/


From jim at interet.com  Thu Mar  9 22:04:59 2000
From: jim at interet.com (James C. Ahlstrom)
Date: Thu, 09 Mar 2000 16:04:59 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipestuff  
 going to be adopted?
References: <Pine.LNX.4.10.10003091230510.18817-100000@nebula.lyra.org>
Message-ID: <38C811FB.B6096FA4@interet.com>

Greg Stein wrote:
> 
> On Thu, 9 Mar 2000, James C. Ahlstrom wrote:
> > Of course I wanted you to do it instead of me!  Well, if
> > I get any time before 1.6 comes out...
> 
> It *has* been done. Bill Tutt did it a long time ago. That's what
> win32pipe is all about.

Thanks for the heads up!

Unfortunately, win32pipe is not in the core, and probably
covers more ground than just popen() and so might be a
maintenance problem.  And popen() is not written in it anyway.
So we are Not There Yet (TM).  Which I guess was Tim's
original point.

JimA


From mhammond at skippinet.com.au  Thu Mar  9 22:36:14 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri, 10 Mar 2000 08:36:14 +1100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <38C80787.7791A1A6@interet.com>
Message-ID: <ECEPKNMJLHAPFFJHDOJBEEEHCGAA.mhammond@skippinet.com.au>

> Point on the curve:  This program works perfectly on my
> machine running NT.

And running from Python.exe.  I bet you didnt try it from a GUI.

The situation is worse WRT Windows 95.  MS has a knowledge base article
describing the bug, and telling you how to work around it by using a
decicated .EXE.

So, out of the box, popen works only on a NT from a console - pretty sorry
state of affairs :-(

> I believe you when you say popen() is flakey.  It is a little
> harder to believe it is not possible to write a _popen()
> replacement using pipes which works.

Which is what I believe win32pipe.popen* are.

Mark.


From guido at python.org  Fri Mar 10 02:13:51 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 09 Mar 2000 20:13:51 -0500
Subject: [Python-Dev] writelines() not thread-safe
Message-ID: <200003100113.UAA27337@eric.cnri.reston.va.us>

Christian Tismer just did an exhaustive search for thread unsafe use
of Python operations, and found two weaknesses.  One is
posix.listdir(), which I had already found; the other is
file.writelines().  Here's a program that demonstrates the bug;
basically, while writelines is walking down the list, another thread
could truncate the list, causing PyList_GetItem() to fail or a string
object to be deallocated while writelines is using it.  On my SOlaris
7 system it typically crashes in the first or second iteration.

It's easy to fix: just don't use release the interpreter lock (get rid
of Py_BEGIN_ALLOW_THREADS c.s.).  This would however prevent other
threads from doing any work while this thread may be blocked for I/O.

An alternative solution is to put Py_BEGIN_ALLOW_THREADS and
Py_END_ALLOW_THREADS just around the fwrite() call.  This is safe, but
would require a lot of lock operations and would probably slow things
down too much.

Ideas?

--Guido van Rossum (home page: http://www.python.org/~guido/)
import os
import sys
import thread
import random
import time
import tempfile

def good_guy(fp, list):
    t0 = time.time()
    fp.seek(0)
    fp.writelines(list)
    t1 = time.time()
    print fp.tell(), "bytes written"
    return t1-t0

def bad_guy(dt, list):
    time.sleep(random.random() * dt)
    del list[:]

def main():
    infn = "/usr/dict/words"
    if sys.argv[1:]:
        infn = sys.argv[1]
    print "reading %s..." % infn
    fp = open(infn)
    list = fp.readlines()
    fp.close()
    print "read %d lines" % len(list)
    tfn = tempfile.mktemp()
    fp = None
    try:
        fp = open(tfn, "w")
        print "calibrating..."
        dt = 0.0
        n = 3
        for i in range(n):
            dt = dt + good_guy(fp, list)
        dt = dt / n # average time it took to write the list to disk
        print "dt =", round(dt, 3)
        i = 0
        while 1:
            i = i+1
            print "test", i
            copy = map(lambda x: x[1:], list)
            thread.start_new_thread(bad_guy, (dt, copy))
            good_guy(fp, copy)
    finally:
        if fp:
            fp.close()
        try:
            os.unlink(tfn)
        except os.error:
            pass

main()


From tim_one at email.msn.com  Fri Mar 10 03:13:51 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 21:13:51 -0500
Subject: [Python-Dev] writelines() not thread-safe
In-Reply-To: <200003100113.UAA27337@eric.cnri.reston.va.us>
Message-ID: <000601bf8a36$46ebf880$58a2143f@tim>

[Guido van Rossum]
> Christian Tismer just did an exhaustive search for thread unsafe use
> of Python operations, and found two weaknesses.  One is
> posix.listdir(), which I had already found; the other is
> file.writelines().  Here's a program that demonstrates the bug;
> basically, while writelines is walking down the list, another thread
> could truncate the list, causing PyList_GetItem() to fail or a string
> object to be deallocated while writelines is using it.  On my SOlaris
> 7 system it typically crashes in the first or second iteration.
>
> It's easy to fix: just don't use release the interpreter lock (get rid
> of Py_BEGIN_ALLOW_THREADS c.s.).  This would however prevent other
> threads from doing any work while this thread may be blocked for I/O.
>
> An alternative solution is to put Py_BEGIN_ALLOW_THREADS and
> Py_END_ALLOW_THREADS just around the fwrite() call.  This is safe, but
> would require a lot of lock operations and would probably slow things
> down too much.
>
> Ideas?

2.5:

1: Before releasing the lock, make a shallow copy of the list.

1.5:  As in #1, but iteratively peeling off "the next N" values, for some N
balancing the number of lock operations against the memory burden (I don't
care about the speed of a shallow copy here ...).

2. Pull the same trick list.sort() uses:  make the list object immutable for
the duration (I know you think that's a hack, and it is <wink>, but it costs
virtually nothing and would raise an approriate error when they attempted
the insane mutation).

I actually like #2 best now, but won't in the future, because
file_writelines() should really accept an argument of any sequence type.
This makes 1.5 a better long-term hack.

although-adding-1.5-to-1.6-is-confusing<wink>-ly y'rs  - tim


From tim_one at email.msn.com  Fri Mar 10 03:52:26 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 21:52:26 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <1259490837-400325@hypernet.com>
Message-ID: <000901bf8a3b$ab314660$58a2143f@tim>

[Gordon McM, aspires to make sense of the mess]
> It doesn't work for commands builtin to whatever "shell" you're
> using. That's different between cmd and command, and the
> various flavors, versions and extensions thereof.

It's not that simple, either; e.g., old apps invoking the 16-bit subsystem
can screw up too.  Look at Tcl's man page for "exec" and just *try* to wrap
your brain around all the caveats they were left with after throwing a few
thousand lines of C at this under their Windows port <wink>.

> FWIW, I gave up a long time ago. I use redirection and a
> tempfile. The few times I've wanted "interactive" control, I've
> used Win32Process, dup'ed, inherited handles... the whole 9
> yards. Why? Look at all the questions about popen and child
> processes in general, on platforms where it *works*, (if it
> weren't for Donn Cave, nobody'd get it to work anywhere <wink>).

Donn is downright scary that way.  I stopped using 'em too, of course.

> To reiterate Tim's point: *none* of the c runtime routines for
> process control on Windows are adequate (beyond os.system
> and living with a DOS box popping up).

No, os.system is a problem under command.com flavors of Windows too, as
system spawns a new shell and command.com's exit code is *always* 0.  So
Python's os.system returns 0 no matter what app the user *thinks* they were
running, and whether it worked or set the baby on fire.

> The raw Win32 CreateProcess does everything you could possibly want, but
> takes a week or more to understand, (if this arg is a that, then that arg
> is a whatsit, and the next is limited to the values X  and Z unless...).

Except that CreateProcess doesn't handle shell metacharacters, right?  Tcl
is the only language I've seen that really works hard at making
cmdline-style process control portable.

so-all-we-need-to-do-is-a-single-createprocess-to-invoke-tcl<wink>-ly y'rs
    - tim


From tim_one at email.msn.com  Fri Mar 10 03:52:24 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 21:52:24 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <14535.46175.991970.135642@weyr.cnri.reston.va.us>
Message-ID: <000801bf8a3b$aa0c4e60$58a2143f@tim>

[Fred L. Drake, Jr.]
> Tim (& others),
>   Would this additional text be sufficient for the os.popen()
> documentation?
>
> 	\strong{Note:} This function behaves unreliably under Windows
>         due to the native implementation of \cfunction{popen()}.

Yes, that's good!  If Mark/Bill's alternatives don't make it in, would also
be good to point to the PythonWin extensions (although MarkH will have to
give us the Official Name for that).

>   If someone cares to explain what's weird about it, that might be
> appropriate as well, but I've never used this under Windows.

As the rest of this thread should have made abundantly clear by now <0.9
wink>, it's such a mess across various Windows flavors that nobody can
explain it.


From tim_one at email.msn.com  Fri Mar 10 04:15:18 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 22:15:18 -0500
Subject: [Python-Dev] RE: finalization again
In-Reply-To: <20000309123731.A3664@acs.ucalgary.ca>
Message-ID: <000a01bf8a3e$dc8878c0$58a2143f@tim>

Quickie:

[Tim]
>> It's not obvious, but the SCCs can be found in linear time (via Tarjan's
>> algorithm, which is simple but subtle;

[NeilS]
> Wow, it seems like it should be more expensive than that.

Oh yes!  Many bright people failed to discover the trick; Tarjan didn't
discover it until (IIRC) the early 70's, and it was a surprise.  It's just a
few lines of simple code added to an ordinary depth-first search.  However,
while the code is simple, a correctness proof is not.  BTW, if it wasn't
clear, when talking about graph algorithms "linear" is usual taken to mean
"in the sum of the number of nodes and edges".  Cyclops.py finds all the
cycles in linear time in that sense, too (but does not find the SCCs in
linear time, at least not in theory -- in practice you can't tell the
difference <wink>).

> What are the space requirements?

Same as depth-first search, plus a way to associate an SCC id with each
node, plus a single global "id" vrbl.  So it's worst-case linear (in the
number of nodes) space.  See, e.g., any of the books in Sedgewick's
"Algorithms in [Language du Jour]" series for working code.

> Also, does the simple algorithm you used in Cyclops have a name?

Not officially, but it answers to "hey, dumb-ass!" <wink>.

then-again-so-do-i-so-make-eye-contact-ly y'rs  - tim


From bwarsaw at cnri.reston.va.us  Fri Mar 10 05:21:46 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 9 Mar 2000 23:21:46 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <000701bf89ab$80cb8e20$0d2d153f@tim>
	<200003091955.OAA26217@eric.cnri.reston.va.us>
Message-ID: <14536.30810.720836.886023@anthem.cnri.reston.va.us>

Okay, I had a flash of inspiration on the way home from my gig
tonight.  Of course, I'm also really tired so I'm sure Tim will shoot
this down in his usual witty but humbling way.  I just had to get this
out or I wouldn't sleep tonight.

What if you timestamp instances when you create them?  Then when you
have trash cycles with finalizers, you sort them and finalize in
chronological order.  The nice thing here is that the user can have
complete control over finalization order by controlling object
creation order.

Some random thoughts:

- Finalization order of cyclic finalizable trash is completely
  deterministic.

- Given sufficient resolution of your system clock, you should never
  have two objects with the same timestamp.

- You could reduce the memory footprint by only including a timestamp
  for objects whose classes have __del__'s at instance creation time.
  Sticking an __del__ into your class dynamically would have no effect
  on objects that are already created (and I wouldn't poke you with a
  pointy stick if even post-twiddle instances didn't get
  timestamped).  Thus, such objects would never be finalized -- tough
  luck.

- FIFO order /seems/ more natural to me than FILO, but then I rarely
  create cyclic objects, and almost never use __del__, so this whole
  argument has been somewhat academic to me :).

- The rule seems easy enough to implement, describe, and understand.

I think I came up with a few more points on the drive home, but my
post jam, post lightbulb endorphodrenalin rush is quickly subsiding,
so I leave the rest until tomorrow.

its-simply-a-matter-of-time-ly y'rs,
-Barry


From moshez at math.huji.ac.il  Fri Mar 10 06:32:41 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Fri, 10 Mar 2000 07:32:41 +0200 (IST)
Subject: [Python-Dev] finalization again
In-Reply-To: <Pine.LNX.4.10.10003091205510.18817-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003100728580.23922-100000@sundial>

On Thu, 9 Mar 2000, Greg Stein wrote:

> > But I'd still like to reclaim the memory.  If this is some
> > long-running server process that is executing arbitrary Python
> > commands sent to it by clients, it's not nice to leak, period.
> 
> If an exception is raised, the top-level server loop can catch it, log the
> error, and keep going. But yes: it will leak.

And Tim's version stops the leaking if the server is smart enough:
occasionally, it will call gc.get_dangerous_cycles(), and nuke everything
it finds there. (E.g., clean up dicts and lists). Some destructor raises
an exception? Ignore it (or whatever). And no willy-nilly "but I'm using a
silly OS which has hardly any concept of stderr" problems! If the server
wants, it can just send a message to the log.

rooting-for-tim-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From tim_one at email.msn.com  Fri Mar 10 09:18:29 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 10 Mar 2000 03:18:29 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <200003091955.OAA26217@eric.cnri.reston.va.us>
Message-ID: <000001bf8a69$37d57b40$812d153f@tim>

This is getting to be fun, but afraid I can only make time for the first
easy one tonight:

[Tim, conjures a horrid vision of finalizers installing new __del__ methods,
 then sez ...
]
> The scheme above is meant to be bulletproof in the face of abuses even
> I can't conceive of <wink>.

[Guido]
> Are you *sure* your scheme deals with this?

Never said it did -- only that it *meant* to <wink>.  Ya, you got me.  The
things I thought I had *proved* I put in the numbered list, and in a rush
put the speculative stuff in the reply body.  One practical thing I think I
can prove today:  after finding SCCs, and identifying the safe nodes without
predecessors, all such nodes S1, S2, ... can be cleaned up without fear of
resurrection, or of cleaning something in Si causing anything in Sj (i!=j)
to get reclaimed either (at the time I wrote it, I could only prove that
cleaning *one* Si was non-problematic).  Barring, of course, this "__del__
from hell" pathology.  Also suspect that this claim is isomorphic to your
later elaboration on why

    the objects on T at this point cannot be resurrected by a finalizer
    that runs, since they aren't reachable from any finalizers

That is, exactly the same is true of "the safe (SCC super)nodes without
predecessors", so I expect we've just got two ways of identifying the same
set here.  Perhaps yours is bigger, though (I realize that isn't clear;
later).

> Let's look at an example.
> (Again, lowercase nodes have no finalizers.)  Take G:
>
>   a <=> b -> C
>
> [and cleaning b can trigger C.__del__ which can create
>  a.__class__.__del__ before a is decref'ed ...]
>
> ... and we're halfway committing a crime we said we would never commit
> (touching cyclical trash with finalizers).

Wholly agreed.

> I propose to disregard this absurd possibility,

How come you never propose to just shoot people <0.9 wink>?

> except to the extent that Python shouldn't crash -- but we make no
> guarantees to the user.

"Shouldn't crash" is essential, sure.  Carry it another step:  after C is
finalized, we get back to the loop clearing b.__dict__, and the refcount on
"a" falls to 0 next.  So the new a.__del__ gets called.  Since b was visible
to a, it's possible for a.__del__ to resurrect b, which latter is now in
some bizarre (from the programmer's POV) cleared state (or even in the bit
bucket, if we optimistically reclaim b's memory "early"!).

I can't (well, don't want to <wink>) believe it will be hard to stop this.
It's just irksome to need to think about it at all.

making-java's-gc-look-easy?-ly y'rs  - tim


From guido at python.org  Fri Mar 10 14:46:43 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Mar 2000 08:46:43 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Thu, 09 Mar 2000 23:21:46 EST."
             <14536.30810.720836.886023@anthem.cnri.reston.va.us> 
References: <000701bf89ab$80cb8e20$0d2d153f@tim> <200003091955.OAA26217@eric.cnri.reston.va.us>  
            <14536.30810.720836.886023@anthem.cnri.reston.va.us> 
Message-ID: <200003101346.IAA27847@eric.cnri.reston.va.us>

> What if you timestamp instances when you create them?  Then when you
> have trash cycles with finalizers, you sort them and finalize in
> chronological order.  The nice thing here is that the user can have
> complete control over finalization order by controlling object
> creation order.
> 
> Some random thoughts:
> 
> - Finalization order of cyclic finalizable trash is completely
>   deterministic.
> 
> - Given sufficient resolution of your system clock, you should never
>   have two objects with the same timestamp.

Forget the clock -- just use a counter that is incremented on each
allocation.

> - You could reduce the memory footprint by only including a timestamp
>   for objects whose classes have __del__'s at instance creation time.
>   Sticking an __del__ into your class dynamically would have no effect
>   on objects that are already created (and I wouldn't poke you with a
>   pointy stick if even post-twiddle instances didn't get
>   timestamped).  Thus, such objects would never be finalized -- tough
>   luck.
> 
> - FIFO order /seems/ more natural to me than FILO, but then I rarely
>   create cyclic objects, and almost never use __del__, so this whole
>   argument has been somewhat academic to me :).

Ai, there's the rub.

Suppose I have a tree with parent and child links.  And suppose I have
a rule that children need to be finalized before their parents (maybe
they represent a Unix directory tree, where you must rm the files
before you can rmdir the directory).  This suggests that we should
choose LIFO: you must create the parents first (you have to create a
directory before you can create files in it).  However, now we add
operations to move nodes around in the tree.  Suddenly you can have a
child that is older than its parent! Conclusion: the creation time is
useless; the application logic and actual link relationships are
needed.

> - The rule seems easy enough to implement, describe, and understand.
> 
> I think I came up with a few more points on the drive home, but my
> post jam, post lightbulb endorphodrenalin rush is quickly subsiding,
> so I leave the rest until tomorrow.
> 
> its-simply-a-matter-of-time-ly y'rs,
> -Barry

Time flies like an arrow -- fruit flies like a banana.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Fri Mar 10 16:06:48 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Mar 2000 10:06:48 -0500
Subject: [Python-Dev] writelines() not thread-safe
In-Reply-To: Your message of "Thu, 09 Mar 2000 21:13:51 EST."
             <000601bf8a36$46ebf880$58a2143f@tim> 
References: <000601bf8a36$46ebf880$58a2143f@tim> 
Message-ID: <200003101506.KAA28358@eric.cnri.reston.va.us>

OK, here's a patch for writelines() that supports arbitrary sequences
and fixes the lock problem using Tim's solution #1.5 (slicing 1000
items at a time).  It contains a fast path for when the argument is a
list, using PyList_GetSlice; otherwise it uses PyObject_GetItem and a
fixed list.

Please have a good look at this; I've only tested it lightly.

--Guido van Rossum (home page: http://www.python.org/~guido/)

Index: fileobject.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Objects/fileobject.c,v
retrieving revision 2.70
diff -c -r2.70 fileobject.c
*** fileobject.c	2000/02/29 13:59:28	2.70
--- fileobject.c	2000/03/10 14:55:47
***************
*** 884,923 ****
  	PyFileObject *f;
  	PyObject *args;
  {
! 	int i, n;
  	if (f->f_fp == NULL)
  		return err_closed();
! 	if (args == NULL || !PyList_Check(args)) {
  		PyErr_SetString(PyExc_TypeError,
! 			   "writelines() requires list of strings");
  		return NULL;
  	}
! 	n = PyList_Size(args);
! 	f->f_softspace = 0;
! 	Py_BEGIN_ALLOW_THREADS
! 	errno = 0;
! 	for (i = 0; i < n; i++) {
! 		PyObject *line = PyList_GetItem(args, i);
! 		int len;
! 		int nwritten;
! 		if (!PyString_Check(line)) {
! 			Py_BLOCK_THREADS
! 			PyErr_SetString(PyExc_TypeError,
! 				   "writelines() requires list of strings");
  			return NULL;
  		}
! 		len = PyString_Size(line);
! 		nwritten = fwrite(PyString_AsString(line), 1, len, f->f_fp);
! 		if (nwritten != len) {
! 			Py_BLOCK_THREADS
! 			PyErr_SetFromErrno(PyExc_IOError);
! 			clearerr(f->f_fp);
! 			return NULL;
  		}
  	}
! 	Py_END_ALLOW_THREADS
  	Py_INCREF(Py_None);
! 	return Py_None;
  }
  
  static PyMethodDef file_methods[] = {
--- 884,975 ----
  	PyFileObject *f;
  	PyObject *args;
  {
! #define CHUNKSIZE 1000
! 	PyObject *list, *line;
! 	PyObject *result;
! 	int i, j, index, len, nwritten, islist;
! 
  	if (f->f_fp == NULL)
  		return err_closed();
! 	if (args == NULL || !PySequence_Check(args)) {
  		PyErr_SetString(PyExc_TypeError,
! 			   "writelines() requires sequence of strings");
  		return NULL;
  	}
! 	islist = PyList_Check(args);
! 
! 	/* Strategy: slurp CHUNKSIZE lines into a private list,
! 	   checking that they are all strings, then write that list
! 	   without holding the interpreter lock, then come back for more. */
! 	index = 0;
! 	if (islist)
! 		list = NULL;
! 	else {
! 		list = PyList_New(CHUNKSIZE);
! 		if (list == NULL)
  			return NULL;
+ 	}
+ 	result = NULL;
+ 
+ 	for (;;) {
+ 		if (islist) {
+ 			Py_XDECREF(list);
+ 			list = PyList_GetSlice(args, index, index+CHUNKSIZE);
+ 			if (list == NULL)
+ 				return NULL;
+ 			j = PyList_GET_SIZE(list);
  		}
! 		else {
! 			for (j = 0; j < CHUNKSIZE; j++) {
! 				line = PySequence_GetItem(args, index+j);
! 				if (line == NULL) {
! 					if (PyErr_ExceptionMatches(PyExc_IndexError)) {
! 						PyErr_Clear();
! 						break;
! 					}
! 					/* Some other error occurred.
! 					   Note that we may lose some output. */
! 					goto error;
! 				}
! 				if (!PyString_Check(line)) {
! 					PyErr_SetString(PyExc_TypeError,
! 					 "writelines() requires sequences of strings");
! 					goto error;
! 				}
! 				PyList_SetItem(list, j, line);
! 			}
! 		}
! 		if (j == 0)
! 			break;
! 
! 		Py_BEGIN_ALLOW_THREADS
! 		f->f_softspace = 0;
! 		errno = 0;
! 		for (i = 0; i < j; i++) {
! 			line = PyList_GET_ITEM(list, i);
! 			len = PyString_GET_SIZE(line);
! 			nwritten = fwrite(PyString_AS_STRING(line),
! 					  1, len, f->f_fp);
! 			if (nwritten != len) {
! 				Py_BLOCK_THREADS
! 				PyErr_SetFromErrno(PyExc_IOError);
! 				clearerr(f->f_fp);
! 				Py_DECREF(list);
! 				return NULL;
! 			}
  		}
+ 		Py_END_ALLOW_THREADS
+ 
+ 		if (j < CHUNKSIZE)
+ 			break;
+ 		index += CHUNKSIZE;
  	}
! 
  	Py_INCREF(Py_None);
! 	result = Py_None;
!   error:
! 	Py_XDECREF(list);
! 	return result;
  }
  
  static PyMethodDef file_methods[] = {


From skip at mojam.com  Fri Mar 10 16:28:13 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 10 Mar 2000 09:28:13 -0600
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
Message-ID: <200003101528.JAA15951@beluga.mojam.com>

Consider the following snippet of code from MySQLdb.py:

    try:
        self._query(query % escape_row(args, qc))
    except TypeError:
        self._query(query % escape_dict(args, qc))

It's not quite right.  There are at least four reasons I can think of why
the % operator might raise a TypeError:

    1. query has not enough format specifiers
    2. query has too many format specifiers
    3. argument type mismatch between individual format specifier and
       corresponding argument
    4. query expects dist-style interpolation

The except clause only handles the last case.  That leaves the other three
cases mishandled.  The above construct pretends that all TypeErrors possible
are handled by calling escape_dict() instead of escape_row().

I stumbled on case 2 yesterday and got a fairly useless error message when
the code in the except clause also bombed.  Took me a few minutes of head
scratching to see that I had an extra %s in my format string.  A note to
Andy Dustman, MySQLdb's author, yielded the following modified version:

    try:
        self._query(query % escape_row(args, qc))
    except TypeError, m:
        if m.args[0] == "not enough arguments for format string": raise
        if m.args[0] == "not all arguments converted": raise
        self._query(query % escape_dict(args, qc))

This will do the trick for me for the time being.  Note, however, that the
only way for Andy to decide which of the cases occurred (case 3 still isn't
handled above, but should occur very rarely in MySQLdb since it only uses
the more accommodating %s as a format specifier) is to compare the string
value of the message to see which of the four cases was raised.

This strong coupling via the error message text between the exception being
raised (in C code, in this case) and the place where it's caught seems bad
to me and encourages authors to either not recover from errors or to recover
from them in the crudest fashion.  If Guido decides to tweak the TypeError
message in any fashion, perhaps to include the count of arguments in the
format string and argument tuple, this code will break.  It makes me wonder
if there's not a better mechanism waiting to be discovered.  Would it be
possible to publish an interface of some sort via the exceptions module that
would allow symbolic names or dictionary references to be used to decide
which case is being handled?  I envision something like the following in
exceptions.py:

    UNKNOWN_ERROR_CATEGORY = 0
    TYP_SHORT_FORMAT = 1
    TYP_LONG_FORMAT = 2
    ...
    IND_BAD_RANGE = 1

    message_map = {
        # leave
        (TypeError, ("not enough arguments for format string",)):
	    TYP_SHORT_FORMAT,
	(TypeError, ("not all arguments converted",)):
	    TYP_LONG_FORMAT,
	...
	(IndexError, ("list index out of range",)): IND_BAD_RANGE,
	...
    }

This would isolate the raw text of exception strings to just a single place
(well, just one place on the exception handling side of things).  It would
be used something like

    try:
        self._query(query % escape_row(args, qc))
    except TypeError, m:
        from exceptions import *
        exc_case = message_map.get((TypeError, m.args), UNKNOWN_ERROR_CATEGORY)
        if exc_case in [UNKNOWN_ERROR_CATEGORY,TYP_SHORT_FORMAT,
		        TYP_LONG_FORMAT]: raise
        self._query(query % escape_dict(args, qc))

This could be added to exceptions.py without breaking existing code.

Does this (or something like it) seem like a reasonable enhancement for
Py2K?  If we can narrow things down to an implementable solution I'll create 
a patch.

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From guido at python.org  Fri Mar 10 17:17:56 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Mar 2000 11:17:56 -0500
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: Your message of "Fri, 10 Mar 2000 09:28:13 CST."
             <200003101528.JAA15951@beluga.mojam.com> 
References: <200003101528.JAA15951@beluga.mojam.com> 
Message-ID: <200003101617.LAA28722@eric.cnri.reston.va.us>

> Consider the following snippet of code from MySQLdb.py:

Skip, I'm not familiar with MySQLdb.py, and I have no idea what your
example is about.  From the rest of the message I feel it's not about
MySQLdb at all, but about string formatting, butthe point escapes me
because you never quite show what's in the format string and what
error that gives.  Could you give some examples based on first
principles?  A simple interactive session showing the various errors
would be helpful...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward at cnri.reston.va.us  Fri Mar 10 20:05:04 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Fri, 10 Mar 2000 14:05:04 -0500
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: <200003101617.LAA28722@eric.cnri.reston.va.us>; from guido@python.org on Fri, Mar 10, 2000 at 11:17:56AM -0500
References: <200003101528.JAA15951@beluga.mojam.com> <200003101617.LAA28722@eric.cnri.reston.va.us>
Message-ID: <20000310140503.A8619@cnri.reston.va.us>

On 10 March 2000, Guido van Rossum said:
> Skip, I'm not familiar with MySQLdb.py, and I have no idea what your
> example is about.  From the rest of the message I feel it's not about
> MySQLdb at all, but about string formatting, butthe point escapes me
> because you never quite show what's in the format string and what
> error that gives.  Could you give some examples based on first
> principles?  A simple interactive session showing the various errors
> would be helpful...

I think Skip's point was just this: "TypeError" isn't expressive
enough.  If you catch TypeError on a statement with multiple possible
type errors, you don't know which one you caught.  Same holds for any
exception type, really: a given statement could blow up with ValueError
for any number of reasons.  Etc., etc.

One possible solution, and I think this is what Skip was getting at, is
to add an "error code" to the exception object that identifies the error
more reliably than examining the error message.  It's just the
errno/strerror dichotomy: strerror is for users, errno is for code.  I
think Skip is just saying that Pythone exception objets need an errno
(although it doesn't have to be a number).  It would probably only make
sense to define error codes for exceptions that can be raised by Python
itself, though.

        Greg


From skip at mojam.com  Fri Mar 10 21:17:30 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 10 Mar 2000 14:17:30 -0600 (CST)
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: <200003101617.LAA28722@eric.cnri.reston.va.us>
References: <200003101528.JAA15951@beluga.mojam.com>
	<200003101617.LAA28722@eric.cnri.reston.va.us>
Message-ID: <14537.22618.656740.296408@beluga.mojam.com>

    Guido> Skip, I'm not familiar with MySQLdb.py, and I have no idea what
    Guido> your example is about.  From the rest of the message I feel it's
    Guido> not about MySQLdb at all, but about string formatting, 

My apologies.  You're correct, it's really not about MySQLdb. It's about
handling multiple cases raised by the same exception.

First, a more concrete example that just uses simple string formats:

    code		exception
    "%s" % ("a", "b")	TypeError: 'not all arguments converted'
    "%s %s" % "a"	TypeError: 'not enough arguments for format string'
    "%(a)s" % ("a",)	TypeError: 'format requires a mapping'
    "%d" % {"a": 1}	TypeError: 'illegal argument type for built-in operation'

Let's presume hypothetically that it's possible to recover from some subset
of the TypeErrors that are raised, but not all of them.  Now, also presume
that the format strings and the tuple, string or dict literals I've given
above can be stored in variables (which they can).

If we wrap the code in a try/except statement, we can catch the TypeError
exception and try to do something sensible.  This is precisely the trick
that Andy Dustman uses in MySQLdb: first try expanding the format string
using a tuple as the RH operand, then try with a dict if that fails.

Unfortunately, as you can see from the above examples, there are four cases
that need to be handled.  To distinguish them currently, you have to compare
the message you get with the exception to string literals that are generally
defined in C code in the interpreter.  Here's what Andy's original code
looked like stripped of the MySQLdb-ese:

    try:
        x = format % tuple_generating_function(...)
    except TypeError:
        x = format % dict_generating_function(...)

That doesn't handle the first two cases above.  You have to inspect the
message that raise sends out:

    try:
        x = format % tuple_generating_function(...)
    except TypeError, m:
        if m.args[0] == "not all arguments converted": raise
        if m.args[0] == "not enough arguments for format string": raise
        x = format % dict_generating_function(...)

This comparison of except arguments with hard-coded strings (especially ones
the programmer has no direct control over) seems fragile to me.  If you
decide to reword the error message strings, you break someone's code.

In my previous message I suggested collecting this fragility in the
exceptions module where it can be better isolated.  My solution is a bit
cumbersome, but could probably be cleaned up somewhat, but basically looks
like 

    try:
        x = format % tuple_generating_function(...)
    except TypeError, m:
        import exceptions
	msg_case = exceptions.message_map.get((TypeError, m.args),
				              exceptions.UNKNOWN_ERROR_CATEGORY)
	# punt on the cases we can't recover from
        if msg_case == exceptions.TYP_SHORT_FORMAT: raise
        if msg_case == exceptions.TYP_LONG_FORMAT: raise
        if msg_case == exceptions.UNKNOWN_ERROR_CATEGORY: raise
	# handle the one we can
        x = format % dict_generating_function(...)

In private email that crossed my original message, Andy suggested defining
more standard exceptions, e.g.:

    class FormatError(TypeError): pass
    class TooManyElements(FormatError): pass
    class TooFewElements(FormatError): pass

then raising the appropriate error based on the circumstance.  Code that
catches TypeError exceptions would still work.

So there are two possible changes on the table:

    1. define more standard exceptions so you can distinguish classes of
       errors on a more fine-grained basis using just the first argument of
       the except clause.

    2. provide some machinery in exceptions.py to allow programmers a
       measure of uncoupling from using hard-coded strings to distinguish
       cases. 

Skip


From skip at mojam.com  Fri Mar 10 21:21:11 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 10 Mar 2000 14:21:11 -0600 (CST)
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: <20000310140503.A8619@cnri.reston.va.us>
References: <200003101528.JAA15951@beluga.mojam.com>
	<200003101617.LAA28722@eric.cnri.reston.va.us>
	<20000310140503.A8619@cnri.reston.va.us>
Message-ID: <14537.22839.664131.373727@beluga.mojam.com>

    Greg> One possible solution, and I think this is what Skip was getting
    Greg> at, is to add an "error code" to the exception object that
    Greg> identifies the error more reliably than examining the error
    Greg> message.  It's just the errno/strerror dichotomy: strerror is for
    Greg> users, errno is for code.  I think Skip is just saying that
    Greg> Pythone exception objets need an errno (although it doesn't have
    Greg> to be a number).  It would probably only make sense to define
    Greg> error codes for exceptions that can be raised by Python itself,
    Greg> though.

I'm actually allowing the string to be used as the error code.  If you raise 
TypeError with "not all arguments converted" as the argument, then that
string literal will appear in the definition of exceptions.message_map as
part of a key.  The programmer would only refer to the args attribute of the 
object being raised.

either-or-makes-no-real-difference-to-me-ly y'rs,

Skip


From bwarsaw at cnri.reston.va.us  Fri Mar 10 21:56:45 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Fri, 10 Mar 2000 15:56:45 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <000701bf89ab$80cb8e20$0d2d153f@tim>
	<200003091955.OAA26217@eric.cnri.reston.va.us>
	<14536.30810.720836.886023@anthem.cnri.reston.va.us>
	<200003101346.IAA27847@eric.cnri.reston.va.us>
Message-ID: <14537.24973.579056.533282@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido at python.org> writes:

    >> Given sufficient resolution of your system
    >> clock, you should never have two objects with the same
    >> timestamp.

    GvR> Forget the clock -- just use a counter that is incremented on
    GvR> each allocation.

Good idea.

    GvR> Suppose I have a tree with parent and child links.  And
    GvR> suppose I have a rule that children need to be finalized
    GvR> before their parents (maybe they represent a Unix directory
    GvR> tree, where you must rm the files before you can rmdir the
    GvR> directory).  This suggests that we should choose LIFO: you
    GvR> must create the parents first (you have to create a directory
    GvR> before you can create files in it).  However, now we add
    GvR> operations to move nodes around in the tree.  Suddenly you
    GvR> can have a child that is older than its parent! Conclusion:
    GvR> the creation time is useless; the application logic and
    GvR> actual link relationships are needed.

One potential way to solve this is to provide an interface for
refreshing the counter; for discussion purposes, I'll call this
sys.gcrefresh(obj).  Throws a TypeError if obj isn't a finalizable
instance.  Otherwise, it sets the "timestamp" to the current counter
value and increments the counter.

Thus, in your example, when the child node is reparented, you
sys.gcrefresh(child) and now the parent is automatically older.  Of
course, what if the child has its own children?  You've now got an age
graph like this

    parent > child < grandchild

with the wrong age relationship between the parent and grandchild.  So
when you refresh, you've got to walk down the containment tree making
sure your grandkids are "younger" than yourself.  E.g.:

class Node:
    ...
    def __del__(self):
	...

    def reparent(self, node):
	self.parent = node
	self.refresh()

    def refresh(self):
	sys.gcrefresh(self)
	for c in self.children:
	    c.refresh()

The point to all this is that it gives explicit control of the
finalizable cycle reclamation order to the user, via a fairly easy to
understand, and manipulate mechanism.

twas-only-a-flesh-wound-but-waiting-for-the-next-stroke-ly y'rs,
-Barry


From jim at interet.com  Fri Mar 10 22:14:45 2000
From: jim at interet.com (James C. Ahlstrom)
Date: Fri, 10 Mar 2000 16:14:45 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <000801bf8a3b$aa0c4e60$58a2143f@tim>
Message-ID: <38C965C4.B164C2D5@interet.com>

Tim Peters wrote:
> 
> [Fred L. Drake, Jr.]
> > Tim (& others),
> >   Would this additional text be sufficient for the os.popen()
> > documentation?
> >
> >       \strong{Note:} This function behaves unreliably under Windows
> >         due to the native implementation of \cfunction{popen()}.
> 
> Yes, that's good!  If Mark/Bill's alternatives don't make it in, would also
> be good to point to the PythonWin extensions (although MarkH will have to
> give us the Official Name for that).

Well, it looks like this thread has fizzled out.  But what did we
decide?

Changing the docs to say popen() "doesn't work reliably" is
a little weak.  Maybe removing popen() is better, and demanding
that Windows users use win32pipe.

I played around with a patch to posixmodule.c which eliminates
_popen() and implements os.popen() using CreatePipe().  It
sort of works on NT and fails on 95.  Anyway, I am stuck on
how to make a Python file object from a pipe handle.

Would it be a good idea to extract the Wisdom from win32pipe
and re-implement os.popen() either in C or by using win32pipe
directly?  Using C is simple and to the point.

I feel Tim's original complaint that popen() is a Problem
still hasn't been fixed.

JimA


From moshez at math.huji.ac.il  Fri Mar 10 22:29:05 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Fri, 10 Mar 2000 23:29:05 +0200 (IST)
Subject: [Python-Dev] finalization again
In-Reply-To: <14537.24973.579056.533282@anthem.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003102324250.4723-100000@sundial>

On Fri, 10 Mar 2000 bwarsaw at cnri.reston.va.us wrote:

> One potential way to solve this is to provide an interface for
> refreshing the counter; for discussion purposes, I'll call this
> sys.gcrefresh(obj).

Barry, there are other problems with your scheme, but I won't even try to 
point those out: having to call a function whose purpose can only be
described in terms of a concrete implementation of a garbage collection
scheme is simply unacceptable. I can almost see you shouting "Come back
here, I'll bite your legs off" <wink>.

> The point to all this is that it gives explicit control of the
> finalizable cycle reclamation order to the user, via a fairly easy to
> understand, and manipulate mechanism.

Oh? This sounds like the most horrendus mechanism alive....

you-probably-jammed-a-*little*-too-loud-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From bwarsaw at cnri.reston.va.us  Fri Mar 10 23:15:27 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Fri, 10 Mar 2000 17:15:27 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <14537.24973.579056.533282@anthem.cnri.reston.va.us>
	<Pine.GSO.4.10.10003102324250.4723-100000@sundial>
Message-ID: <14537.29695.532507.197580@anthem.cnri.reston.va.us>

Just throwing out ideas.


From DavidA at ActiveState.com  Fri Mar 10 23:20:45 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Fri, 10 Mar 2000 14:20:45 -0800
Subject: [Python-Dev] finalization again
In-Reply-To: <Pine.GSO.4.10.10003102324250.4723-100000@sundial>
Message-ID: <LMBBIEIJKMPMLBONJMFCIEHMCCAA.DavidA@ActiveState.com>

Moshe, some _arguments_ backing your feelings might give them more weight...
As they stand, they're just insults, and if I were Barry I'd ignore them.

--david ascher

Moshe Zadka:

> Barry, there are other problems with your scheme, but I won't even try to
> point those out: having to call a function whose purpose can only be
> described in terms of a concrete implementation of a garbage collection
> scheme is simply unacceptable. I can almost see you shouting "Come back
> here, I'll bite your legs off" <wink>.
> [...]
> Oh? This sounds like the most horrendus mechanism alive....


From skip at mojam.com  Fri Mar 10 23:40:02 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 10 Mar 2000 16:40:02 -0600
Subject: [Python-Dev] on the suitability of ideas tossed out to python-dev
Message-ID: <200003102240.QAA07881@beluga.mojam.com>

Folks, let's not forget that python-dev is a place where oftentimes
half-baked ideas will get advanced.  I came up with an idea about decoupling
error handling from exception message strings.  I don't expect my idea to be
adopted as is.  Similarly, Barry's ideas about object timestamps were
admittedly conceived late at night in the thrill following an apparently
good gig. (I like the idea that every object has a modtime, but for other
reasons than Barry suggested.)

My feeling is that bad ideas will get winnowed out or drastically modified
quickly enough anyway.  Think of these early ideas as little more than
brainstorms.  A lot of times if I have an idea, I feel I need to put it down
on my virtual whiteboard quickly, because a) I often don't have a lot of
time to pursue stuff (do it now or it won't get done), b) because bad ideas
can be the catalyst for better ideas, and c) if I don't do it immediately,
I'll probably forget the idea altogether, thus missing the opportunity for
reason b altogether.

Try and collect a bunch of ideas before shooting any down and see what falls
out.  The best ideas will survive.  When people start proving things and
using fancy diagrams like "a <=> b -> C", then go ahead and get picky... ;-)

Have a relaxing, thought provoking weekend.  I'm going to go see a movie
this evening with my wife and youngest son, appropriately enough titled, "My
Dog Skip".  Enough Pythoneering for one day...

bow-wow-ly y'rs,

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From guido at python.org  Sat Mar 11 01:20:01 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Mar 2000 19:20:01 -0500
Subject: [Python-Dev] Unicode patches checked in
Message-ID: <200003110020.TAA17777@eric.cnri.reston.va.us>

I've just checked in a massive patch from Marc-Andre Lemburg which
adds Unicode support to Python.  This work was financially supported
by Hewlett-Packard.  Marc-Andre has done a tremendous amount of work,
for which I cannot thank him enough.

We're still awaiting some more things: Marc-Andre gave me
documentation patches which will be reviewed by Fred Drake before they
are checked in; Fredrik Lundh has developed a new regular expression
which is Unicode-aware and which should be checked in real soon now.
Also, the documentation is probably incomplete and will be updated,
and of course there may be bugs -- this should be considered alpha
software.  However, I believe it is quite good already, otherwise I
wouldn't have checked it in!

I'd like to invite everyone with an interest in Unicode or Python 1.6
to check out this new Unicode-aware Python, so that we can ensure a
robust code base by the time Python 1.6 is released (planned release
date: June 1, 2000).  The download links are below.

Links:

http://www.python.org/download/cvs.html
    Instructions on how to get access to the CVS version.
    (David Ascher is making nightly tarballs of the CVS version
    available at http://starship.python.net/crew/da/pythondists/)

http://starship.python.net/crew/lemburg/unicode-proposal.txt
    The latest version of the specification on which the Marc
    has based his implementation.

http://www.python.org/sigs/i18n-sig/
    Home page of the i18n-sig (Internationalization SIG), which has
    lots of other links about this and related issues.

http://www.python.org/search/search_bugs.html
    The Python Bugs List.  Use this for all bug reports.

Note that next Tuesday I'm going on a 10-day trip, with limited time
to read email and no time to solve problems.  The usual crowd will
take care of urgent updates.  See you at the Intel Computing Continuum
Conference in San Francisco or at the Python Track at Software
Development 2000 in San Jose!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Sat Mar 11 03:03:47 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 10 Mar 2000 21:03:47 -0500
Subject: [Python-Dev] Finalization in Eiffel
Message-ID: <000701bf8afe$0a0fd800$a42d153f@tim>

Eiffel is Bertrand Meyer's "design by contract" OO language.  Meyer took
extreme care in its design, and has written extensively and articulately
about the design -- agree with him or not, he's always worth reading!

I used Eiffel briefly a few years ago, just out of curiosity.  I didn't
recall even bumping into a notion of destructors.  Turns out it does have
them, but they're appallingly (whether relative to Eiffel's usual clarity,
or even relative to C++'s usual lack thereof <0.9 wink>) ill-specified.

An Eiffel class can register a destructor by inheriting from the system
MEMORY class and overriding the latter's "dispose()".  This appears to be
viewed as a low-level facility, and neither OOSC (2nd ed) nor "Eiffel: The
Language" say much about its semantics.  Within dispose, you're explicitly
discouraged from invoking methods on *any* other object, and resurrection is
right out the window.  But the language doesn't appear to check for any of
that, which is extremely un-Eiffel-like.  Many msgs on comp.lang.eiffel from
people who should know suggest that all but one Eiffel implementation pay no
attention at all to reachability during gc, and that none support
resurrection.  If you need ordering during finalization, the advice is to
write that part in C/C++.  Violations of the vague rules appear to lead to
random system damage(!).

Looking at various Eiffel pkgs on the web, the sole use of dispose was in
one-line bodies that released external resources (like memory & db
connections) via calling an external C/C++ function.

jealous-&-appalled-at-the-same-time<wink>-ly y'rs  - tim


From tim_one at email.msn.com  Sat Mar 11 03:03:50 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 10 Mar 2000 21:03:50 -0500
Subject: [Python-Dev] Conventional wisdom on finalization
Message-ID: <000801bf8afe$0b3df7c0$a42d153f@tim>

David Chase maintains a well-regarded GC FAQ, at

    http://www.iecc.com/gclist/GC-faq.html

Interested folks should look it up.  A couple highlights:

On cycles with finalizers:

    In theory, of course, a cycle in the graph of objects to be finalized
    will prevent a topological sort from succeeding.  In practice, the
    "right" thing to do appears to be to signal an error (at least when
    debugging) and let the programmer clean this up.  People with experience
    on large systems report that such cycles are in fact exceedingly rare
    (note, however, that some languages define "finalizers" for almost
    every object, and that was not the case for the large systems studied
    -- there, finalizers were not too common).

On Java's "finalizer called only once" rule:

    if an object is revived in finalization, that is fine, but its
    finalizer will not run a second time. (It isn't clear if this is a
    matter of design, or merely an accident of the first implementation
    of the language, but it is in the specification now. Obviously, this
    encourages careful use of finalization, in much the same way that
    driving without seatbelts encourages careful driving.)

Until today, I had no idea I was so resolutely conventional <wink>.

seems-we're-trying-to-do-more-than-anyone-other-than-us-expects-ly
    y'rs  - tim


From shichang at icubed.com  Fri Mar 10 23:33:11 2000
From: shichang at icubed.com (Shichang Zhao)
Date: Fri, 10 Mar 2000 22:33:11 -0000
Subject: [Python-Dev] RE: Unicode patches checked in
Message-ID: <01BF8AE0.9E911980.shichang@icubed.com>

I would love to test the Python 1.6 (Unicode support) in Chinese language 
aspect, but I don't know where I can get a copy of OS that supports 
Chinese. Anyone can point me a direction?

-----Original Message-----
From:	Guido van Rossum [SMTP:guido at python.org]
Sent:	Saturday, March 11, 2000 12:20 AM
To:	Python mailing list; python-announce at python.org; python-dev at python.org; 
i18n-sig at python.org; string-sig at python.org
Cc:	Marc-Andre Lemburg
Subject:	Unicode patches checked in

I've just checked in a massive patch from Marc-Andre Lemburg which
adds Unicode support to Python.  This work was financially supported
by Hewlett-Packard.  Marc-Andre has done a tremendous amount of work,
for which I cannot thank him enough.

We're still awaiting some more things: Marc-Andre gave me
documentation patches which will be reviewed by Fred Drake before they
are checked in; Fredrik Lundh has developed a new regular expression
which is Unicode-aware and which should be checked in real soon now.
Also, the documentation is probably incomplete and will be updated,
and of course there may be bugs -- this should be considered alpha
software.  However, I believe it is quite good already, otherwise I
wouldn't have checked it in!

I'd like to invite everyone with an interest in Unicode or Python 1.6
to check out this new Unicode-aware Python, so that we can ensure a
robust code base by the time Python 1.6 is released (planned release
date: June 1, 2000).  The download links are below.

Links:

http://www.python.org/download/cvs.html
    Instructions on how to get access to the CVS version.
    (David Ascher is making nightly tarballs of the CVS version
    available at http://starship.python.net/crew/da/pythondists/)

http://starship.python.net/crew/lemburg/unicode-proposal.txt
    The latest version of the specification on which the Marc
    has based his implementation.

http://www.python.org/sigs/i18n-sig/
    Home page of the i18n-sig (Internationalization SIG), which has
    lots of other links about this and related issues.

http://www.python.org/search/search_bugs.html
    The Python Bugs List.  Use this for all bug reports.

Note that next Tuesday I'm going on a 10-day trip, with limited time
to read email and no time to solve problems.  The usual crowd will
take care of urgent updates.  See you at the Intel Computing Continuum
Conference in San Francisco or at the Python Track at Software
Development 2000 in San Jose!

--Guido van Rossum (home page: http://www.python.org/~guido/)

--
http://www.python.org/mailman/listinfo/python-list


From moshez at math.huji.ac.il  Sat Mar 11 10:10:12 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 11 Mar 2000 11:10:12 +0200 (IST)
Subject: [Python-Dev] Unicode: When Things Get Hairy
Message-ID: <Pine.GSO.4.10.10003111108090.8019-100000@sundial>

The following "problem" is easy to fix. However, what I wanted to know is
if people (Skip and Guido most importantly) think it is a problem:

>>> "a" in u"bbba"
1
>>> u"a" in "bbba"
Traceback (innermost last):
  File "<stdin>", line 1, in ?
TypeError: string member test needs char left operand

Suggested fix: in stringobject.c, explicitly allow a unicode char left
operand.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From mal at lemburg.com  Sat Mar 11 11:24:26 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 11 Mar 2000 11:24:26 +0100
Subject: [Python-Dev] Unicode: When Things Get Hairy
References: <Pine.GSO.4.10.10003111108090.8019-100000@sundial>
Message-ID: <38CA1EDA.423F8A2C@lemburg.com>

Moshe Zadka wrote:
> 
> The following "problem" is easy to fix. However, what I wanted to know is
> if people (Skip and Guido most importantly) think it is a problem:
> 
> >>> "a" in u"bbba"
> 1
> >>> u"a" in "bbba"
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
> TypeError: string member test needs char left operand
> 
> Suggested fix: in stringobject.c, explicitly allow a unicode char left
> operand.

Hmm, this must have been introduced by your contains code...
it did work before.

The normal action taken by the Unicode and the string
code in these mixed type situations is to first
convert everything to Unicode and then retry the operation.
Strings are interpreted as UTF-8 during this conversion.

To simplify this task, I added method APIs to the
Unicode object which do the conversion for you (they
apply all the necessariy coercion business to all arguments).
I guess adding another PyUnicode_Contains() wouldn't hurt :-)

Perhaps I should also add a tp_contains slot to the
Unicode object which then uses the above API as well.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From moshez at math.huji.ac.il  Sat Mar 11 12:05:48 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 11 Mar 2000 13:05:48 +0200 (IST)
Subject: [Python-Dev] Unicode: When Things Get Hairy
In-Reply-To: <38CA1EDA.423F8A2C@lemburg.com>
Message-ID: <Pine.GSO.4.10.10003111300320.8673-100000@sundial>

On Sat, 11 Mar 2000, M.-A. Lemburg wrote:

> Hmm, this must have been introduced by your contains code...
> it did work before.

Nope: the string "in" semantics were forever special-cased. Guido beat me
soundly for trying to change the semantics...

> The normal action taken by the Unicode and the string
> code in these mixed type situations is to first
> convert everything to Unicode and then retry the operation.
> Strings are interpreted as UTF-8 during this conversion.

Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments.
Should it? (Again, it didn't before). If it does, then the order of
testing for seq_contains and seq_getitem and conversions 

> Perhaps I should also add a tp_contains slot to the
> Unicode object which then uses the above API as well.

But that wouldn't help at all for 

u"a" in "abbbb"

PySequence_Contains only dispatches on the container argument :-(

(BTW: I discovered it while contemplating adding a seq_contains (not
tp_contains) to unicode objects to optimize the searching for a bit.)

PS:
MAL: thanks for the a great birthday present! I'm enjoying the unicode
patch a lot.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From guido at python.org  Sat Mar 11 13:16:06 2000
From: guido at python.org (Guido van Rossum)
Date: Sat, 11 Mar 2000 07:16:06 -0500
Subject: [Python-Dev] Unicode: When Things Get Hairy
In-Reply-To: Your message of "Sat, 11 Mar 2000 13:05:48 +0200."
             <Pine.GSO.4.10.10003111300320.8673-100000@sundial> 
References: <Pine.GSO.4.10.10003111300320.8673-100000@sundial> 
Message-ID: <200003111216.HAA12651@eric.cnri.reston.va.us>

[Moshe discovers that u"a" in "bbba" raises TypeError]

[Marc-Andre]
> > Hmm, this must have been introduced by your contains code...
> > it did work before.
> 
> Nope: the string "in" semantics were forever special-cased. Guido beat me
> soundly for trying to change the semantics...

But I believe that Marc-Andre added a special case for Unicode in
PySequence_Contains.  I looked for evidence, but the last snapshot that
I actually saved and built before Moshe's code was checked in is from
2/18 and it isn't in there.  Yet I believe Marc-Andre.  The special
case needs to be added back to string_contains in stringobject.c.

> > The normal action taken by the Unicode and the string
> > code in these mixed type situations is to first
> > convert everything to Unicode and then retry the operation.
> > Strings are interpreted as UTF-8 during this conversion.
> 
> Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments.
> Should it? (Again, it didn't before). If it does, then the order of
> testing for seq_contains and seq_getitem and conversions 

Or it could be done this way.

> > Perhaps I should also add a tp_contains slot to the
> > Unicode object which then uses the above API as well.

Yes.

> But that wouldn't help at all for 
> 
> u"a" in "abbbb"

It could if PySeqeunce_Contains would first look for a string and a
unicode argument (in either order) and in that case convert the string
to unicode.

> PySequence_Contains only dispatches on the container argument :-(
> 
> (BTW: I discovered it while contemplating adding a seq_contains (not
> tp_contains) to unicode objects to optimize the searching for a bit.)

You may beat Marc-Andre to it, but I'll have to let him look at the
code anyway -- I'm not sufficiently familiar with the Unicode stuff
myself yet.

BTW, I added a tag "pre-unicode" to the CVS tree to the revisions
before the Unicode changes were made.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Sat Mar 11 14:32:57 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 11 Mar 2000 14:32:57 +0100
Subject: [Python-Dev] Unicode: When Things Get Hairy
References: <Pine.GSO.4.10.10003111300320.8673-100000@sundial> <200003111216.HAA12651@eric.cnri.reston.va.us>
Message-ID: <38CA4B08.7B13438D@lemburg.com>

Guido van Rossum wrote:
> 
> [Moshe discovers that u"a" in "bbba" raises TypeError]
> 
> [Marc-Andre]
> > > Hmm, this must have been introduced by your contains code...
> > > it did work before.
> >
> > Nope: the string "in" semantics were forever special-cased. Guido beat me
> > soundly for trying to change the semantics...
> 
> But I believe that Marc-Andre added a special case for Unicode in
> PySequence_Contains.  I looked for evidence, but the last snapshot that
> I actually saved and built before Moshe's code was checked in is from
> 2/18 and it isn't in there.  Yet I believe Marc-Andre.  The special
> case needs to be added back to string_contains in stringobject.c.

Moshe was right: I had probably not checked the code because
the obvious combinations worked out of the box... the
only combination which doesn't work is "unicode in string".
I'll fix it next week.

BTW, there's a good chance that the string/Unicode integration
is not complete yet: just keep looking for them.

> > > The normal action taken by the Unicode and the string
> > > code in these mixed type situations is to first
> > > convert everything to Unicode and then retry the operation.
> > > Strings are interpreted as UTF-8 during this conversion.
> >
> > Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments.
> > Should it? (Again, it didn't before). If it does, then the order of
> > testing for seq_contains and seq_getitem and conversions
> 
> Or it could be done this way.
> 
> > > Perhaps I should also add a tp_contains slot to the
> > > Unicode object which then uses the above API as well.
> 
> Yes.
> 
> > But that wouldn't help at all for
> >
> > u"a" in "abbbb"
> 
> It could if PySeqeunce_Contains would first look for a string and a
> unicode argument (in either order) and in that case convert the string
> to unicode.

I think the right way to do
this is to add a special case to seq_contains in the
string implementation. That's how most other auto-coercions
work too.

Instead of raising an error, the implementation would then
delegate the work to PyUnicode_Contains().
 
> > PySequence_Contains only dispatches on the container argument :-(
> >
> > (BTW: I discovered it while contemplating adding a seq_contains (not
> > tp_contains) to unicode objects to optimize the searching for a bit.)
> 
> You may beat Marc-Andre to it, but I'll have to let him look at the
> code anyway -- I'm not sufficiently familiar with the Unicode stuff
> myself yet.

I'll add that one too.
 
BTW, Happy Birthday, Moshe :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Sat Mar 11 14:57:34 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 11 Mar 2000 14:57:34 +0100
Subject: [Python-Dev] Unicode: When Things Get Hairy
References: <Pine.GSO.4.10.10003111300320.8673-100000@sundial> <200003111216.HAA12651@eric.cnri.reston.va.us> <38CA4B08.7B13438D@lemburg.com>
Message-ID: <38CA50CE.BEEFAB5E@lemburg.com>

I couldn't resist :-) Here's the patch...

BTW, how should we proceed with future patches ? Should I wrap
them together about once a week, or send them as soon as they
are done ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/
-------------- next part --------------
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Include/unicodeobject.h Python+Unicode/Include/unicodeobject.h
--- CVS-Python/Include/unicodeobject.h	Fri Mar 10 23:33:05 2000
+++ Python+Unicode/Include/unicodeobject.h	Sat Mar 11 14:45:59 2000
@@ -683,6 +683,17 @@
     PyObject *args		/* Argument tuple or dictionary */
     );
 
+/* Checks whether element is contained in container and return 1/0
+   accordingly.
+
+   element has to coerce to an one element Unicode string. -1 is
+   returned in case of an error. */
+
+extern DL_IMPORT(int) PyUnicode_Contains(
+    PyObject *container,	/* Container string */ 
+    PyObject *element		/* Element string */
+    );
+
 /* === Characters Type APIs =============================================== */
 
 /* These should not be used directly. Use the Py_UNICODE_IS* and
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py
--- CVS-Python/Lib/test/test_unicode.py	Sat Mar 11 00:23:20 2000
+++ Python+Unicode/Lib/test/test_unicode.py	Sat Mar 11 14:52:29 2000
@@ -219,6 +219,19 @@
 test('translate', u"abababc", u'iiic', {ord('a'):None, ord('b'):ord('i')})
 test('translate', u"abababc", u'iiix', {ord('a'):None, ord('b'):ord('i'), ord('c'):u'x'})
 
+# Contains:
+print 'Testing Unicode contains method...',
+assert ('a' in 'abdb') == 1
+assert ('a' in 'bdab') == 1
+assert ('a' in 'bdaba') == 1
+assert ('a' in 'bdba') == 1
+assert ('a' in u'bdba') == 1
+assert (u'a' in u'bdba') == 1
+assert (u'a' in u'bdb') == 0
+assert (u'a' in 'bdb') == 0
+assert (u'a' in 'bdba') == 1
+print 'done.'
+
 # Formatting:
 print 'Testing Unicode formatting strings...',
 assert u"%s, %s" % (u"abc", "abc") == u'abc, abc'
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt
--- CVS-Python/Misc/unicode.txt	Sat Mar 11 00:14:11 2000
+++ Python+Unicode/Misc/unicode.txt	Sat Mar 11 14:53:37 2000
@@ -743,8 +743,9 @@
 stream codecs as available through the codecs module should 
 be used.
 
-XXX There should be a short-cut open(filename,mode,encoding) available which
-    also assures that mode contains the 'b' character when needed.
+The codecs module should provide a short-cut open(filename,mode,encoding)
+available which also assures that mode contains the 'b' character when
+needed.
 
 
 File/Stream Input:
@@ -810,6 +811,10 @@
 Introduction to Unicode (a little outdated by still nice to read):
         http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html
 
+For comparison:
+	Introducing Unicode to ECMAScript --
+	http://www-4.ibm.com/software/developer/library/internationalization-support.html
+
 Encodings:
 
     Overview:
@@ -832,7 +837,7 @@
 
 History of this Proposal:
 -------------------------
-1.2: 
+1.2: Removed POD about codecs.open()
 1.1: Added note about comparisons and hash values. Added note about
      case mapping algorithms. Changed stream codecs .read() and
      .write() method to match the standard file-like object methods
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/stringobject.c Python+Unicode/Objects/stringobject.c
--- CVS-Python/Objects/stringobject.c	Sat Mar 11 10:55:09 2000
+++ Python+Unicode/Objects/stringobject.c	Sat Mar 11 14:47:45 2000
@@ -389,7 +389,9 @@
 {
 	register char *s, *end;
 	register char c;
-	if (!PyString_Check(el) || PyString_Size(el) != 1) {
+	if (!PyString_Check(el))
+		return PyUnicode_Contains(a, el);
+	if (PyString_Size(el) != 1) {
 		PyErr_SetString(PyExc_TypeError,
 				"string member test needs char left operand");
 		return -1;
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/unicodeobject.c Python+Unicode/Objects/unicodeobject.c
--- CVS-Python/Objects/unicodeobject.c	Fri Mar 10 23:53:23 2000
+++ Python+Unicode/Objects/unicodeobject.c	Sat Mar 11 14:48:52 2000
@@ -2737,6 +2737,49 @@
     return -1;
 }
 
+int PyUnicode_Contains(PyObject *container,
+		       PyObject *element)
+{
+    PyUnicodeObject *u = NULL, *v = NULL;
+    int result;
+    register const Py_UNICODE *p, *e;
+    register Py_UNICODE ch;
+
+    /* Coerce the two arguments */
+    u = (PyUnicodeObject *)PyUnicode_FromObject(container);
+    if (u == NULL)
+	goto onError;
+    v = (PyUnicodeObject *)PyUnicode_FromObject(element);
+    if (v == NULL)
+	goto onError;
+
+    /* Check v in u */
+    if (PyUnicode_GET_SIZE(v) != 1) {
+	PyErr_SetString(PyExc_TypeError,
+			"string member test needs char left operand");
+	goto onError;
+    }
+    ch = *PyUnicode_AS_UNICODE(v);
+    p = PyUnicode_AS_UNICODE(u);
+    e = p + PyUnicode_GET_SIZE(u);
+    result = 0;
+    while (p < e) {
+	if (*p++ == ch) {
+	    result = 1;
+	    break;
+	}
+    }
+
+    Py_DECREF(u);
+    Py_DECREF(v);
+    return result;
+
+onError:
+    Py_XDECREF(u);
+    Py_XDECREF(v);
+    return -1;
+}
+
 /* Concat to string or Unicode object giving a new Unicode object. */
 
 PyObject *PyUnicode_Concat(PyObject *left,
@@ -3817,6 +3860,7 @@
     (intintargfunc) unicode_slice, 	/* sq_slice */
     0, 					/* sq_ass_item */
     0, 					/* sq_ass_slice */
+    (objobjproc)PyUnicode_Contains, 	/*sq_contains*/
 };
 
 static int

From tim_one at email.msn.com  Sat Mar 11 21:10:23 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 11 Mar 2000 15:10:23 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <14536.30810.720836.886023@anthem.cnri.reston.va.us>
Message-ID: <000e01bf8b95$d52939e0$c72d153f@tim>

[Barry A. Warsaw, jamming after hours]
> ...
> What if you timestamp instances when you create them?  Then when you
> have trash cycles with finalizers, you sort them and finalize in
> chronological order.

Well, I strongly agree that would be better than finalizing them in
increasing order of storage address <wink>.

> ...
> - FIFO order /seems/ more natural to me than FILO,

Forget cycles for a moment, and consider just programs that manipulate
*immutable* containers (the simplest kind to think about):  at the time you
create an immutable container, everything *contained* must already be in
existence, so every pointer goes from a newer object (container) to an older
one (containee).  This is the "deep" reason for why, e.g., you can't build a
cycle out of pure tuples in Python (if every pointer goes new->old, you
can't get a loop, else each node in the loop would be (transitively) older
than itself!).

Then, since a finalizer can see objects pointed *to*, a finalizer can see
only older objects.  Since it's desirable that a finalizer see only wholly
intact (unfinalized) objects, it is in fact the oldest object ("first in")
that needs to be cleaned up last ("last out").  So, under the assumption of
immutability, FILO is sufficient, but FIFO dangerous.  So your muse inflamed
you with an interesting tune, but you fingered the riff backwards <wink>.

One problem is that it all goes out the window as soon as mutation is
allowed.  It's *still* desirable that a finalizer see only unfinalized
objects, but in the presence of mutation that no longer bears any
relationship to relative creation time.

Another problem is in Guido's directory example, which we can twist to view
as an "immutable container" problem that builds its image of the directory
bottom-up, and where a finalizer on each node tries to remove the file (or
delete the directory, whichever the node represents).  In this case the
physical remove/delete/unlink operations have to follow a *postorder*
traversal of the container tree, so that "finalizer sees only unfinalized
objects" is the opposite of what the app needs!

The lesson to take from that is that the implementation can't possibly guess
what ordering an app may need in a fancy finalizer.  At best it can promise
to follow a "natural" ordering based on the points-to relationship, and
while "finalizer sees only unfinalized objects" is at least clear, it's
quite possibly unhelpful (in Guido's particular case, it *can* be exploited,
though, by adding a postorder remove/delete/unlink method to nodes, and
explicitly calling it from __del__ -- "the rules" guarantee that the root of
the tree will get finalized first, and the code can rely on that in its own
explicit postorder traversal).

>   but then I rarely create cyclic objects, and almost never use __del__,
>   so this whole argument has been somewhat academic to me :).

Well, not a one of us creates cycles often in CPython today, simply because
we don't want to track down leaks <0.5 wink>.  It seems that nobody here
uses __del__ much, either; indeed, my primary use of __del__ is simply to
call an explicit break_cycles() function from the header node of a graph!
The need for that goes away as soon as Python reclaims cycles by itself, and
I may never use __del__ at all then in the vast bulk of my code.

It's because we've seen no evidence here (and also that I've seen none
elsewhere either) that *anyone* is keen on mixing cycles with finalizers
that I've been so persistent in saying "screw it -- let it leak, but let the
user get at it if they insist on doing it".  Seems we're trying to provide
slick support for something nobody wants to do.  If it happens by accident
anyway, well, people sometimes divide by 0 by accident too <0.0 wink>:  give
them a way to know about it, but don't move heaven & earth trying to treat
it like a normal case.

if-it-were-easy-to-implement-i-wouldn't-care-ly y'rs  - tim


From moshez at math.huji.ac.il  Sat Mar 11 21:35:43 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 11 Mar 2000 22:35:43 +0200 (IST)
Subject: [Python-Dev] finalization again
In-Reply-To: <000e01bf8b95$d52939e0$c72d153f@tim>
Message-ID: <Pine.GSO.4.10.10003112233240.12810-100000@sundial>

In a continuation (yes, a dangerous word in these parts) of the timbot's
looks at the way other languages handle finalization, let me add something 
from the Sather manual I'm now reading (when I'm done with it, you'll see
me begging for iterators here, and having some weird ideas in the
types-sig):

===============================
   Finalization will only occur once, even if new references are created
   to the object during finalization. Because few guarantees can be made
   about the environment in which finalization occurs, finalization is
   considered dangerous and should only be used in the rare cases that
   conventional coding will not suffice.
===============================

(Sather is garbage-collected, BTW)
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From tim_one at email.msn.com  Sat Mar 11 21:51:47 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 11 Mar 2000 15:51:47 -0500
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: <200003101528.JAA15951@beluga.mojam.com>
Message-ID: <001001bf8b9b$9e09d720$c72d153f@tim>

[Skip Montanaro, with an expression that may raise TypeError for any of
 several distinct reasons, and wants to figure out which one after the fact]

The existing exception machinery is sufficiently powerful for building a
solution, so nothing new is needed in the language.  What you really need
here is an exhaustive list of all exceptions the language can raise, and
when, and why, and a formally supported "detail" field (whether numeric id
or string or whatever) that you can rely on to tell them apart at runtime.

There are at least a thousand cases that need to be so documented and
formalized.  That's why not a one of them is now <0.9 wink>.

If P3K is a rewrite from scratch, a rational scheme could be built in from
the start.  Else it would seem to require a volunteer with even less of a
life than us <wink>.


From tim_one at email.msn.com  Sat Mar 11 21:51:49 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 11 Mar 2000 15:51:49 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <38C965C4.B164C2D5@interet.com>
Message-ID: <001101bf8b9b$9f37f6e0$c72d153f@tim>

[James C. Ahlstrom]
> Well, it looks like this thread has fizzled out.  But what did we
> decide?

Far as I could tell, nothing specific.

> ...
> I feel Tim's original complaint that popen() is a Problem
> still hasn't been fixed.

I was passing it on from MikeF's c.l.py posting.  This isn't a new problem,
of course, it just drags on year after year -- which is the heart of MikeF's
gripe.  People have code that *does* work, but for whatever reasons it never
gets moved to the core.  In the meantime, the Library Ref implies the broken
code that is in the core does work.  One or the other has to change, and it
looks most likely to me that Fred will change the docs for 1.6.  While not
ideal, that would be a huge improvement over the status quo.

luckily-few-people-expect-windows-to-work-anyway<0.9-wink>-ly y'rs  - tim


From mhammond at skippinet.com.au  Mon Mar 13 04:50:35 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Mon, 13 Mar 2000 14:50:35 +1100
Subject: [Python-Dev] string.replace behaviour change since Unicode patch.
Message-ID: <ECEPKNMJLHAPFFJHDOJBKEGMCGAA.mhammond@skippinet.com.au>

Hi,
	After applying the Unicode changes string.replace() seems to have changed
its behaviour:

Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import string
>>> string.replace("foo\nbar", "\n", "")
'foobar'
>>>

But since the Unicode update:

Python 1.5.2+ (#0, Feb  2 2000, 16:46:55) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import string
>>> string.replace("foo\nbar", "\n", "")
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "L:\src\python-cvs\lib\string.py", line 407, in replace
    return s.replace(old, new, maxsplit)
ValueError: empty replacement string
>>>

The offending check is stringmodule.c, line 1578:
	if (repl_len <= 0) {
		PyErr_SetString(PyExc_ValueError, "empty replacement string");
		return NULL;
	}

Changing the check to "< 0" fixes the immediate problem, but it is unclear
why the check was added at all, so I didnt bother submitting a patch...

Mark.


From mal at lemburg.com  Mon Mar 13 10:13:50 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 13 Mar 2000 10:13:50 +0100
Subject: [Python-Dev] string.replace behaviour change since Unicode patch.
References: <ECEPKNMJLHAPFFJHDOJBKEGMCGAA.mhammond@skippinet.com.au>
Message-ID: <38CCB14D.C07ACC26@lemburg.com>

Mark Hammond wrote:
> 
> Hi,
>         After applying the Unicode changes string.replace() seems to have changed
> its behaviour:
> 
> Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> import string
> >>> string.replace("foo\nbar", "\n", "")
> 'foobar'
> >>>
> 
> But since the Unicode update:
> 
> Python 1.5.2+ (#0, Feb  2 2000, 16:46:55) [MSC 32 bit (Intel)] on win32
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> import string
> >>> string.replace("foo\nbar", "\n", "")
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
>   File "L:\src\python-cvs\lib\string.py", line 407, in replace
>     return s.replace(old, new, maxsplit)
> ValueError: empty replacement string
> >>>
> 
> The offending check is stringmodule.c, line 1578:
>         if (repl_len <= 0) {
>                 PyErr_SetString(PyExc_ValueError, "empty replacement string");
>                 return NULL;
>         }
>
> Changing the check to "< 0" fixes the immediate problem, but it is unclear
> why the check was added at all, so I didnt bother submitting a patch...

Dang. Must have been my mistake -- it should read:

        if (sub_len <= 0) {
                PyErr_SetString(PyExc_ValueError, "empty pattern string");
                return NULL;
        }

Thanks for reporting this... I'll include the fix in the
next patch set.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake at acm.org  Mon Mar 13 16:43:09 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 13 Mar 2000 10:43:09 -0500 (EST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <001101bf8b9b$9f37f6e0$c72d153f@tim>
References: <38C965C4.B164C2D5@interet.com>
	<001101bf8b9b$9f37f6e0$c72d153f@tim>
Message-ID: <14541.3213.590243.359394@weyr.cnri.reston.va.us>

Tim Peters writes:
 > code that is in the core does work.  One or the other has to change, and it
 > looks most likely to me that Fred will change the docs for 1.6.  While not
 > ideal, that would be a huge improvement over the status quo.

  Actually, I just checked in my proposed change for the 1.5.2 doc
update that I'm releasing soon.
  I'd like to remove it for 1.6, if the appropriate implementation is
moved into the core.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From gvwilson at nevex.com  Mon Mar 13 22:10:52 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Mon, 13 Mar 2000 16:10:52 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
Message-ID: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>

Once 1.6 is out the door, would people be willing to consider extending
Python's token set to make HTML/XML-ish spellings using entity references
legal?  This would make the following 100% legal Python:

i = 0
while i &lt; 10:
    print i &amp; 1
    i = i + 1

which would in turn make it easier to embed Python in XML such as
config-files-for-whatever-Software-Carpentry-produces-to-replace-make,
PMZ, and so on.

Greg


From skip at mojam.com  Mon Mar 13 22:23:17 2000
From: skip at mojam.com (Skip Montanaro)
Date: Mon, 13 Mar 2000 15:23:17 -0600 (CST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <14541.23621.89087.357783@beluga.mojam.com>

    Greg> Once 1.6 is out the door, would people be willing to consider
    Greg> extending Python's token set to make HTML/XML-ish spellings using
    Greg> entity references legal?  This would make the following 100% legal
    Greg> Python:

    Greg> i = 0
    Greg> while i &lt; 10:
    Greg>     print i &amp; 1
    Greg>     i = i + 1

What makes it difficult to pump your Python code through cgi.escape when
embedding it?  There doesn't seem to be an inverse function to cgi.escape
(at least not in the cgi module), but I suspect it could rather easily be
written. 

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From akuchlin at mems-exchange.org  Mon Mar 13 22:23:29 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Mon, 13 Mar 2000 16:23:29 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <14541.23633.873411.86833@amarok.cnri.reston.va.us>

gvwilson at nevex.com writes:
>Once 1.6 is out the door, would people be willing to consider extending
>Python's token set to make HTML/XML-ish spellings using entity references
>legal?  This would make the following 100% legal Python:
>
>i = 0
>while i &lt; 10:
>    print i &amp; 1
>    i = i + 1

I don't think that would be sufficient.  What about user-defined
entities, as in r&eacute;sultat = max(a,b)?  (r?sultat, in French.)
Would Python have to also parse a DTD from somewhere?  What about
other places when Python and XML syntax collide, as in this contrived
example:

<![CDATA[
# Python code starts here
if a[index[1]]>b:
    print ...

Oops!  The ]]> looks like the end of the CDATA section, but it's legal
Python code.  IMHO whatever tool is outputting the XML should handle
escaping wacky characters in the Python code, which will be undone
by the parser when the XML gets parsed.  Users certainly won't be
writing this XML by hand; writing 'if (i &lt; 10)' is very strange.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Art history is the nightmare from which art is struggling to awake.
    -- Robert Fulford


From gvwilson at nevex.com  Mon Mar 13 22:58:27 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Mon, 13 Mar 2000 16:58:27 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <14541.23633.873411.86833@amarok.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003131638180.12270-100000@akbar.nevex.com>

> >Greg Wilson wrote:
> >...would people be willing to consider extending
> >Python's token set to make HTML/XML-ish spellings using entity references
> >legal?
> >
> >i = 0
> >while i &lt; 10:
> >    print i &amp; 1
> >    i = i + 1

> Skip Montanaro wrote:
> What makes it difficult to pump your Python code through cgi.escape when
> embedding it?

Most non-programmers use WYSIWYG editor, and many of these are moving
toward XML-compliant formats.  Parsing the standard character entities
seemed like a good first step toward catering to this (large) audience.

> Andrew Kuchling wrote:
> I don't think that would be sufficient.  What about user-defined
> entities, as in r&eacute;sultat = max(a,b)?  (r?sultat, in French.)
> Would Python have to also parse a DTD from somewhere?

Longer term, I believe that someone is going to come out with a
programming language that (finally) leaves the flat-ASCII world behind,
and lets people use the structuring mechanisms (e.g. XML) that we have
developed for everyone else's data.  I think it would be to Python's
advantage to be first, and if I'm wrong, there's little harm done.
User-defined entities, DTD's, and the like are probably part of that, but
I don't think I know enough to know what to ask for.  Escaping the
standard entites seems like an easy start.

> Andrew Kuchling also wrote:
> What about other places when Python and XML syntax collide, as in this
> contrived example:
> 
> <![CDATA[
> # Python code starts here
> if a[index[1]]>b:
>     print ...
> 
> Oops!  The ]]> looks like the end of the CDATA section, but it's legal
> Python code.

Yup; that's one of the reasons I'd like to be able to write:

<python>
# Python code starts here
if a[index[1]]&gt;b:
    print ...
</python>

> Users certainly won't be writing this XML by hand; writing 'if (i &lt;
> 10)' is very strange.

I'd expect my editor to put '&lt;' in the file when I press the '<' key,
and to display '<' on the screen when viewing the file.

thanks,
Greg


From beazley at rustler.cs.uchicago.edu  Mon Mar 13 23:35:24 2000
From: beazley at rustler.cs.uchicago.edu (David M. Beazley)
Date: Mon, 13 Mar 2000 16:35:24 -0600 (CST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <200003132235.QAA08031@rustler.cs.uchicago.edu>

gvwilson at nevex.com writes:
 > Once 1.6 is out the door, would people be willing to consider extending
 > Python's token set to make HTML/XML-ish spellings using entity references
 > legal?  This would make the following 100% legal Python:
 > 
 > i = 0
 > while i &lt; 10:
 >     print i &amp; 1
 >     i = i + 1
 > 
 > which would in turn make it easier to embed Python in XML such as
 > config-files-for-whatever-Software-Carpentry-produces-to-replace-make,
 > PMZ, and so on.
 > 

Sure, and while we're at it, maybe we can add support for C trigraph
sequences as well.  Maybe I'm missing the point, but why can't you
just use a filter (cgi.escape() or something comparable)?  I for one,
am *NOT* in favor of complicating the Python parser in this most bogus
manner.

Furthermore, with respect to the editor argument, I can't think of a
single reason why any sane programmer would be writing programs in
Microsoft Word or whatever it is that you're talking about.
Therefore, I don't think that the Python parser should be modified in
any way to account for XML tags, entities, or other extraneous markup
that's not part of the core language.  I know that I, for one, would
be extremely pissed if I fired up emacs and had to maintain someone
else's code that had all of this garbage in it.  Just my 0.02.

-- Dave


From gvwilson at nevex.com  Mon Mar 13 23:48:33 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Mon, 13 Mar 2000 17:48:33 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <200003132235.QAA08031@rustler.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>

> David M. Beazley wrote:

> ...and while we're at it, maybe we can add support for C trigraph
> sequences as well.

I don't know of any mass-market editors that generate C trigraphs.

> ...I can't think of a single reason why any sane programmer would be
> writing programs in Microsoft Word or whatever it is that you're
> talking about.

'S funny --- my non-programmer friends can't figure out why any sane
person would use a glorified glass TTY like emacs... or why they should
have to, just to program... I just think that someone's going to do this
for some language, some time soon, and I'd rather Python be in the lead
than play catch-up.

Thanks,
Greg


From effbot at telia.com  Tue Mar 14 00:16:41 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Tue, 14 Mar 2000 00:16:41 +0100
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
Message-ID: <00ca01bf8d42$6a154500$34aab5d4@hagrid>

Greg wrote:

> > ...I can't think of a single reason why any sane programmer would be
> > writing programs in Microsoft Word or whatever it is that you're
> > talking about.
> 
> 'S funny --- my non-programmer friends can't figure out why any sane
> person would use a glorified glass TTY like emacs... or why they should
> have to, just to program... I just think that someone's going to do this
> for some language, some time soon, and I'd rather Python be in the lead
> than play catch-up.

I don't get it.  the XML specification contains a lot of stuff,
and I completely fail to see how adding support for a very
small part of XML would make it possible to use XML editors
to write Python code.

what am I missing?

</F>


From DavidA at ActiveState.com  Tue Mar 14 00:15:25 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Mon, 13 Mar 2000 15:15:25 -0800
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
Message-ID: <NDBBJPNCJLKKIOBLDOMJEENOCBAA.DavidA@ActiveState.com>

> 'S funny --- my non-programmer friends can't figure out why any sane
> person would use a glorified glass TTY like emacs... or why they should
> have to, just to program... I just think that someone's going to do this
> for some language, some time soon, and I'd rather Python be in the lead
> than play catch-up.

But the scheme you put forth causes major problems for current Python users
who *are* using glass TTYs, so I don't think it'll fly for very basic
political reasons nicely illustrated by Dave-the-diplomat's response.

While storage of Python files in XML documents is a good thing, it's hard to
see why XML should be viewed as the only storage format for Python files.  I
think a much richer XML schema could be useful in some distant future:

<class name="Foo">
  <method name="Foo">
    <argumentlist>
      <argument name="self">
      ...

What might be more useful in the short them IMO is to define a _standard_
mechanism for Python-in-XML encoding/decoding, so that all code which
encodes Python in XML is done the same way, and so that XML editors can
figure out once and for all how to decode Python-in-CDATA.

Strawman Encoding # 1:
  replace < with &lt; and > with &gt; when not in strings, and vice versa on
the decoding side.

Strawman Encoding # 2:
  - do Strawman 1, AND
  - replace space-determined indentation with { and } tokens or other INDENT
and DEDENT markers using some rare Unicode characters to work around
inevitable bugs in whitespace handling of XML processors.

--david


From gvwilson at nevex.com  Tue Mar 14 00:14:43 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Mon, 13 Mar 2000 18:14:43 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJEENOCBAA.DavidA@ActiveState.com>
Message-ID: <Pine.LNX.4.10.10003131811020.14229-100000@akbar.nevex.com>

> David Ascher wrote:
> But the scheme you put forth causes major problems for current Python
> users who *are* using glass TTYs, so I don't think it'll fly for very
> basic political reasons nicely illustrated by Dave's response.

Understood.  I thought that handling standard entities might be a
useful first step toward storage of Python as XML, which in turn would
help make Python more accessible to people who don't want to switch
editors just to program.  I felt that an all-or-nothing approach would be
even less likely to get a favorable response than handling entities... :-)

Greg


From beazley at rustler.cs.uchicago.edu  Tue Mar 14 00:12:55 2000
From: beazley at rustler.cs.uchicago.edu (David M. Beazley)
Date: Mon, 13 Mar 2000 17:12:55 -0600 (CST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
References: <200003132235.QAA08031@rustler.cs.uchicago.edu>
	<Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
Message-ID: <200003132312.RAA08107@rustler.cs.uchicago.edu>

gvwilson at nevex.com writes:
 > 
 > 'S funny --- my non-programmer friends can't figure out why any sane
 > person would use a glorified glass TTY like emacs... or why they should
 > have to, just to program...

Look, I'm all for CP4E and making programming more accessible to the
masses, but as a professional programmer, I frankly do not care what
non-programmers think about the tools that I (and most of the
programming world) use to write software.  Furthermore, if all of your
non-programmer friends don't want to care about the underlying
details, they certainly won't care how programs are
represented---including a nice and *simple* text representation
without markup, entities, and other syntax that is not an essential
part of the language.  However, as a professional, I most certainly DO
care about how programs are represented--specifically, I want to be
able to move them around between machines. Edit them with essentially
any editor, transform them as I see fit, and be able to easily read
them and have a sense of what is going on.  Markup is just going to
make this a huge pain in the butt. No, I'm not for this idea one
bit. Sorry.

 > I just think that someone's going to do this
 > for some language, some time soon, and I'd rather Python be in the lead
 > than play catch-up.

What gives you the idea that Python is behind?  What is it playing
catch up to?

-- Dave


From DavidA at ActiveState.com  Tue Mar 14 00:36:54 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Mon, 13 Mar 2000 15:36:54 -0800
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131811020.14229-100000@akbar.nevex.com>
Message-ID: <NDBBJPNCJLKKIOBLDOMJEEOACBAA.DavidA@ActiveState.com>

> > David Ascher wrote:
> > But the scheme you put forth causes major problems for current Python
> > users who *are* using glass TTYs, so I don't think it'll fly for very
> > basic political reasons nicely illustrated by Dave's response.
>
> Understood.  I thought that handling standard entities might be a
> useful first step toward storage of Python as XML, which in turn would
> help make Python more accessible to people who don't want to switch
> editors just to program.  I felt that an all-or-nothing approach would be
> even less likely to get a favorable response than handling entities... :-)
>
> Greg

If you propose a transformation between Python Syntax and XML, then you
potentially have something which all parties can agree to as being a good
thing.  Forcing one into the other is denying the history and current
practices of both domains and user populations.  You cannot ignore the fact
that "I can read anyone's Python" is a key selling point of Python among its
current practitioners, or that its cleanliness and lack of magic characters
($ is usually invoked, but &lt; is just as magic/ugly) are part of its
appeal/success.

No XML editor is going to edit all XML documents without custom editors
anyway!  I certainly don't expect to be drawing SVG diagrams with a
keyboard!  That's what schemas and custom editors are for.  Define a schema
for 'encoded Python' (well, first, find a schema notation that will
survive), write a plugin to your favorite XML editor, and then your
(theoretical? =) users can use the same 'editor' to edit PythonXML or any
other XML.  Most XML probably won't be edited with a keyboard but with a
pointing device or a speech recognizer anyway...

IMO, you're being seduced by the apparent closeness between XML and
Python-in-ASCII.  It's only superficial...  Think of Python-in-ASCII as a
rendering of Python-in-XML, Dave will think of Python-in-XML as a rendering
of Python-in-ASCII, and everyone will be happy (as long as everyone agrees
on the one-to-one transformation).

--david


From paul at prescod.net  Tue Mar 14 00:43:48 2000
From: paul at prescod.net (Paul Prescod)
Date: Mon, 13 Mar 2000 15:43:48 -0800
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <38CD7D34.6569C1AA@prescod.net>

You should use your entities in the XML files, and then whatever
application actually launches Python (PMZ, your make engine, XMetaL)
could decode the data and launch Python. 
This is already how it works in XMetaL. I've just reinstalled recently
so I don't have my macro file. Therefore, please excuse the Javascript
(not Python) example.

<MACRO name="Revert To Saved" lang="JScript" id="90" 
desc="Opens last saved version of the current document">
<![CDATA[
if (!ActiveDocument.Saved) {
  retVal = Application.Confirm("If you continue you will lose changes to
this document.\nDo you want to revert to the last-saved version?");
  if (retVal) {
    ActiveDocument.Reload();
  }
}
]]></MACRO> 
 
This is in "journalist.mcr" in the "Macros" folder of XMetaL. This
already works fine for Python. You change lang="Python" and thanks to
the benevalence of Bill Gates and the hard work of Mark Hammond, you can
use Python for XMetaL macros. It doesn't work perfectly: exceptions
crash XMetaL, last I tried.

As long as you don't make mistakes, everything works nicely. :) You can
write XMetaL macros in Python and the whole thing is stored as XML.
Still, XMetaL is not very friendly as a Python editor. It doesn't have
nice whitespace handling!

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Out of timber so crooked as that which man is made nothing entirely
straight can be built. - Immanuel Kant


From paul at prescod.net  Tue Mar 14 00:59:23 2000
From: paul at prescod.net (Paul Prescod)
Date: Mon, 13 Mar 2000 15:59:23 -0800
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
Message-ID: <38CD80DB.39150F33@prescod.net>

gvwilson at nevex.com wrote:
> 
> 'S funny --- my non-programmer friends can't figure out why any sane
> person would use a glorified glass TTY like emacs... or why they should
> have to, just to program... I just think that someone's going to do this
> for some language, some time soon, and I'd rather Python be in the lead
> than play catch-up.

Your goal is worth pursuing but I agree with the others that the syntax
change is not the right way.

It _is_ possible to teach XMetaL to edit Python programs -- structurally
-- just as it does XML. What you do is hook into the macro engine (which
already supports Python) and use the Python tokenizer to build a parse
tree. You copy that into a DOM using the same elements and attributes
you would use if you were doing some kind of batch conversion. Then on
"save" you reverse the process. Implementation time: ~3 days.

The XMetaL competitor, Documentor has an API specifically designed to
make this sort of thing easy.

Making either of them into a friendly programmer's editor is a much
larger task. I think this is where the majority of the R&D should occur,
not at the syntax level. If one invents a fundamentally better way of
working with the structures behind Python code, then it would be
relatively easy to write code that maps that to today's Python syntax.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Out of timber so crooked as that which man is made nothing entirely
straight can be built. - Immanuel Kant


From moshez at math.huji.ac.il  Tue Mar 14 02:14:09 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 14 Mar 2000 03:14:09 +0200 (IST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <Pine.GSO.4.10.10003140312520.12735-100000@sundial>

On Mon, 13 Mar 2000 gvwilson at nevex.com wrote:

> Once 1.6 is out the door, would people be willing to consider extending
> Python's token set to make HTML/XML-ish spellings using entity references
> legal?  This would make the following 100% legal Python:
> 
> i = 0
> while i &lt; 10:
>     print i &amp; 1
>     i = i + 1
> 
> which would in turn make it easier to embed Python in XML such as
> config-files-for-whatever-Software-Carpentry-produces-to-replace-make,
> PMZ, and so on.

Why? Whatever XML parser you use will output "i&lt;1" as "i<1", so 
the Python that comes out of the XML parser is quite all right. Why change
Python to do an XML parser job?
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mhammond at skippinet.com.au  Tue Mar 14 02:18:45 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue, 14 Mar 2000 12:18:45 +1100
Subject: [Python-Dev] unicode objects and C++
Message-ID: <ECEPKNMJLHAPFFJHDOJBIEHMCGAA.mhammond@skippinet.com.au>

I struck a bit of a snag with the Unicode support when trying to use the
most recent update in a C++ source file.

The problem turned out to be that unicodeobject.h did a #include "wchar.h",
but did it while an 'extern "C"' block was open.  This upset the MSVC6
wchar.h, as it has special C++ support.

Attached below is a patch I made to unicodeobject.h that solved my problem
and allowed my compilations to succeed.  Theoretically the same problem
could exist for wctype.h, and probably lots of other headers, but this is
the immediate problem :-)

An alternative patch would be to #include "whcar.h" in PC\config.h outside
of any 'extern "C"' blocks - wchar.h on Windows has guards that allows for
multiple includes, so the unicodeobject.h include of that file will succeed,
but not have the side-effect it has now.

Im not sure what the preferred solution is - quite possibly the PC\config.h
change, but Ive include the unicodeobject.h patch anyway :-)

Mark.

*** unicodeobject.h	2000/03/13 23:22:24	2.2
--- unicodeobject.h	2000/03/14 01:06:57
***************
*** 85,91 ****
--- 85,101 ----
  #endif

  #ifdef HAVE_WCHAR_H
+
+ #ifdef __cplusplus
+ } /* Close the 'extern "C"' before bringing in system headers */
+ #endif
+
  # include "wchar.h"
+
+ #ifdef __cplusplus
+ extern "C" {
+ #endif
+
  #endif

  #ifdef HAVE_USABLE_WCHAR_T


From mal at lemburg.com  Tue Mar 14 00:31:30 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 14 Mar 2000 00:31:30 +0100
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131811020.14229-100000@akbar.nevex.com>
Message-ID: <38CD7A52.5709DF5F@lemburg.com>

gvwilson at nevex.com wrote:
> 
> > David Ascher wrote:
> > But the scheme you put forth causes major problems for current Python
> > users who *are* using glass TTYs, so I don't think it'll fly for very
> > basic political reasons nicely illustrated by Dave's response.
> 
> Understood.  I thought that handling standard entities might be a
> useful first step toward storage of Python as XML, which in turn would
> help make Python more accessible to people who don't want to switch
> editors just to program.  I felt that an all-or-nothing approach would be
> even less likely to get a favorable response than handling entities... :-)

This should be easy to implement provided a hook for compile()
is added to e.g. the sys-module which then gets used instead
of calling the byte code compiler directly...

Then you could redirect the compile() arguments to whatever
codec you wish (e.g. a SGML entity codec) and the builtin
compiler would only see the output of that codec.

Well, just a thought... I don't think encoding programs would
make life as a programmer easier, but instead harder. It adds
one more level of confusion on top of it all.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Tue Mar 14 10:45:49 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 14 Mar 2000 10:45:49 +0100
Subject: [Python-Dev] unicode objects and C++
References: <ECEPKNMJLHAPFFJHDOJBIEHMCGAA.mhammond@skippinet.com.au>
Message-ID: <38CE0A4D.1209B830@lemburg.com>

Mark Hammond wrote:
> 
> I struck a bit of a snag with the Unicode support when trying to use the
> most recent update in a C++ source file.
> 
> The problem turned out to be that unicodeobject.h did a #include "wchar.h",
> but did it while an 'extern "C"' block was open.  This upset the MSVC6
> wchar.h, as it has special C++ support.

Thanks for reporting this.
 
> Attached below is a patch I made to unicodeobject.h that solved my problem
> and allowed my compilations to succeed.  Theoretically the same problem
> could exist for wctype.h, and probably lots of other headers, but this is
> the immediate problem :-)
> 
> An alternative patch would be to #include "whcar.h" in PC\config.h outside
> of any 'extern "C"' blocks - wchar.h on Windows has guards that allows for
> multiple includes, so the unicodeobject.h include of that file will succeed,
> but not have the side-effect it has now.
> 
> Im not sure what the preferred solution is - quite possibly the PC\config.h
> change, but Ive include the unicodeobject.h patch anyway :-)
> 
> Mark.
> 
> *** unicodeobject.h     2000/03/13 23:22:24     2.2
> --- unicodeobject.h     2000/03/14 01:06:57
> ***************
> *** 85,91 ****
> --- 85,101 ----
>   #endif
> 
>   #ifdef HAVE_WCHAR_H
> +
> + #ifdef __cplusplus
> + } /* Close the 'extern "C"' before bringing in system headers */
> + #endif
> +
>   # include "wchar.h"
> +
> + #ifdef __cplusplus
> + extern "C" {
> + #endif
> +
>   #endif
> 
>   #ifdef HAVE_USABLE_WCHAR_T
> 

I've included this patch (should solve the problem for all inlcuded
system header files, since it wraps only the Unicode 
APIs in extern "C"):

--- /home/lemburg/clients/cnri/CVS-Python/Include/unicodeobject.h       Fri Mar 10 23:33:05 2000
+++ unicodeobject.h     Tue Mar 14 10:38:08 2000
@@ -1,10 +1,7 @@
 #ifndef Py_UNICODEOBJECT_H
 #define Py_UNICODEOBJECT_H
-#ifdef __cplusplus
-extern "C" {
-#endif
 
 /*
 
 Unicode implementation based on original code by Fredrik Lundh,
 modified by Marc-Andre Lemburg (mal at lemburg.com) according to the
@@ -167,10 +165,14 @@ typedef unsigned short Py_UNICODE;
 
 #define Py_UNICODE_MATCH(string, offset, substring)\
     (!memcmp((string)->str + (offset), (substring)->str,\
              (substring)->length*sizeof(Py_UNICODE)))
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* --- Unicode Type ------------------------------------------------------- */
 
 typedef struct {
     PyObject_HEAD
     int length;                        /* Length of raw Unicode data in buffer */


I'll post a complete Unicode update patch by the end of the week
for inclusion in CVS.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From ping at lfw.org  Tue Mar 14 12:19:59 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Tue, 14 Mar 2000 06:19:59 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.GSO.4.10.10003140312520.12735-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003140616390.558-100000@skuld.lfw.org>

On Tue, 14 Mar 2000, Moshe Zadka wrote:
> On Mon, 13 Mar 2000 gvwilson at nevex.com wrote:
> > legal?  This would make the following 100% legal Python:
> > 
> > i = 0
> > while i &lt; 10:
> >     print i &amp; 1
> >     i = i + 1
> 
> Why? Whatever XML parser you use will output "i&lt;1" as "i<1", so 
> the Python that comes out of the XML parser is quite all right. Why change
> Python to do an XML parser job?

I totally agree.

To me, this is the key issue: it is NOT the responsibility of the
programming language to accommodate any particular encoding format.

While we're at it, why don't we change Python to accept
quoted-printable source code?  Or base64-encoded source code?

XML already defines a perfectly reasonable mechanism for
escaping a plain stream of text -- adding this processing to
Python adds nothing but confusion.  The possible useful
benefit from adding the proposed "feature" is exactly zero.


-- ?!ng

"This code is better than any code that doesn't work has any right to be."
    -- Roger Gregory, on Xanadu


From ping at lfw.org  Tue Mar 14 12:21:59 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Tue, 14 Mar 2000 06:21:59 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJEEOACBAA.DavidA@ActiveState.com>
Message-ID: <Pine.LNX.4.10.10003140620470.558-100000@skuld.lfw.org>

On Mon, 13 Mar 2000, David Ascher wrote:
> 
> If you propose a transformation between Python Syntax and XML, then you
> potentially have something which all parties can agree to as being a good
> thing.

Indeed.  I know that i wouldn't have any use for it at the moment,
but i can see the potential for usefulness of a structured representation
for Python source code (like an AST in XML) which could be directly
edited in an XML editor, and processed (by an XSL stylesheet?) to produce
actual runnable Python.  But attempting to mix the two doesn't get
you anywhere.


-- ?!ng

"This code is better than any code that doesn't work has any right to be."
    -- Roger Gregory, on Xanadu


From effbot at telia.com  Tue Mar 14 16:41:01 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Tue, 14 Mar 2000 16:41:01 +0100
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131811020.14229-100000@akbar.nevex.com>
Message-ID: <002201bf8dcb$ba9a11c0$34aab5d4@hagrid>

Greg:

> Understood.  I thought that handling standard entities might be a
> useful first step toward storage of Python as XML, which in turn would
> help make Python more accessible to people who don't want to switch
> editors just to program.  I felt that an all-or-nothing approach would be
> even less likely to get a favorable response than handling entities... :-)

well, I would find it easier to support a more aggressive
proposal:

    make sure Python 1.7 can deal with source code
    written in Unicode, using any supported encoding.

with that in place, you can plug in your favourite unicode
encoding via the Unicode framework.

</F>


From effbot at telia.com  Tue Mar 14 23:21:38 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Tue, 14 Mar 2000 23:21:38 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
Message-ID: <000901bf8e03$abf88420$34aab5d4@hagrid>

> I've just checked in a massive patch from Marc-Andre Lemburg which
> adds Unicode support to Python.

massive, indeed.

didn't notice this before, but I just realized that after the
latest round of patches, the python15.dll is now 700k larger
than it was for 1.5.2 (more than twice the size).

my original unicode DLL was 13k.

hmm...

</F>


From akuchlin at mems-exchange.org  Tue Mar 14 23:19:44 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Tue, 14 Mar 2000 17:19:44 -0500 (EST)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <000901bf8e03$abf88420$34aab5d4@hagrid>
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
	<000901bf8e03$abf88420$34aab5d4@hagrid>
Message-ID: <14542.47872.184978.985612@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>didn't notice this before, but I just realized that after the
>latest round of patches, the python15.dll is now 700k larger
>than it was for 1.5.2 (more than twice the size).

Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source
code, and produces a 632168-byte .o file on my Sparc.  (Will some
compiler systems choke on a file that large?  Could we read database
info from a file instead, or mmap it into memory?)

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
    "Are you OK, dressed like that? You don't seem to notice the cold."
    "I haven't come ten thousand miles to discuss the weather, Mr Moberly."
    -- Moberly and the Doctor, in "The Seeds of Doom"


From mal at lemburg.com  Wed Mar 15 09:32:29 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 09:32:29 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
		<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us>
Message-ID: <38CF4A9D.13A0080@lemburg.com>

"Andrew M. Kuchling" wrote:
> 
> Fredrik Lundh writes:
> >didn't notice this before, but I just realized that after the
> >latest round of patches, the python15.dll is now 700k larger
> >than it was for 1.5.2 (more than twice the size).
> 
> Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source
> code, and produces a 632168-byte .o file on my Sparc.  (Will some
> compiler systems choke on a file that large?  Could we read database
> info from a file instead, or mmap it into memory?)

That is dues to the unicodedata module being compiled
into the DLL statically. On Unix you can build it shared too
-- there are no direct references to it in the implementation.
I suppose that on Windows the same should be done... the
question really is whether this is intended or not -- moving
the module into a DLL is at least technically no problem
(someone would have to supply a patch for the MSVC project
files though).

Note that unicodedata is only needed by programs which do
a lot of Unicode manipulations and in the future probably
by some codecs too.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From pf at artcom-gmbh.de  Wed Mar 15 11:42:26 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 15 Mar 2000 11:42:26 +0100 (MET)
Subject: [Python-Dev] Unicode in Python and Tcl/Tk compared (was Unicode patches checked in...)
In-Reply-To: <38CF4A9D.13A0080@lemburg.com> from "M.-A. Lemburg" at "Mar 15, 2000  9:32:29 am"
Message-ID: <m12VBFy-000CnCC@artcom0.artcom-gmbh.de>

Hi!

> > Fredrik Lundh writes:
> > >didn't notice this before, but I just realized that after the
> > >latest round of patches, the python15.dll is now 700k larger
> > >than it was for 1.5.2 (more than twice the size).
> > 
> "Andrew M. Kuchling" wrote:
> > Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source
> > code, and produces a 632168-byte .o file on my Sparc.  (Will some
> > compiler systems choke on a file that large?  Could we read database
> > info from a file instead, or mmap it into memory?)
> 
M.-A. Lemburg wrote:
> That is dues to the unicodedata module being compiled
> into the DLL statically. On Unix you can build it shared too
> -- there are no direct references to it in the implementation.
> I suppose that on Windows the same should be done... the
> question really is whether this is intended or not -- moving
> the module into a DLL is at least technically no problem
> (someone would have to supply a patch for the MSVC project
> files though).
> 
> Note that unicodedata is only needed by programs which do
> a lot of Unicode manipulations and in the future probably
> by some codecs too.

Now as the unicode patches were checked in and as Fredrik Lundh
noticed a considerable increase of the size of the python-DLL,
which was obviously mostly caused by those tables, I had some fear
that a Python/Tcl/Tk based application could eat up much more memory,
if we update from Python1.5.2 and Tcl/Tk 8.0.5 
to Python 1.6 and Tcl/Tk 8.3.0.

As some of you certainly know, some kind of unicode support has
also been added to Tcl/Tk since 8.1.  So I did some research and
would like to share what I have found out so far:

Here are the compared sizes of the tcl/tk shared libs on Linux:

   old:                   | new:                   | bloat increase in %:
   -----------------------+------------------------+---------------------
   libtcl8.0.so    533414 | libtcl8.3.so    610241 | 14.4 %
   libtk8.0.so     714908 | libtk8.3.so     811916 | 13.6 %

The addition of unicode wasn't the only change to TclTk.  So this
seems reasonable.  Unfortunately there is no python shared library,
so a direct comparison of increased memory consumption is impossible.
Nevertheless I've the following figures (stripped binary sizes of
the Python interpreter):
   1.5.2           382616 
   CVS_10-02-00    393668 (a month before unicode)
   CVS_12-03-00    507448 (just after unicode)
That is an increase of "only" 111 kBytes.  Not so bad but nevertheless
a "bloat increase" of 32.6 %.  And additionally there is now
   unicodedata.so  634940 
   _codecsmodule.so 38955 
which (I guess) will also be loaded if the application starts using some
of the new features.

Since I didn't take care of unicode in the past, I feel unable to
compare the implementations of unicode in both systems and what impact
they will have on the real memory performance and even more important on
the functionality of the combined use of both packages together with
Tkinter.

Tcl/Tk keeps around a sub-directory called 'encoding', which --I guess--
contains information somehow similar or related to that in 'unicodedata.so', 
but separated into several files?

So below I included a shortened excerpts from the 200k+ tcl8.3.0/changes
and the tk8.3.0/changes files about unicode.  May be someone
else more involved with unicode can shed some light on this topic?

Do we need some changes to Tkinter.py or _tkinter or both?

---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ----
[...]
======== Changes for 8.1 go below this line ========

6/18/97 (new feature) Tcl now supports international character sets:
    - All C APIs now accept UTF-8 strings instead of iso8859-1 strings,
      wherever you see "char *", unless explicitly noted otherwise.
    - All Tcl strings represented in UTF-8, which is a convenient
      multi-byte encoding of Unicode.  Variable names, procedure names,
      and all other values in Tcl may include arbitrary Unicode characters.
      For example, the Tcl command "string length" returns how many
      Unicode characters are in the argument string.
    - For Java compatibility, embedded null bytes in C strings are
      represented as \xC080 in UTF-8 strings, but the null byte at the end
      of a UTF-8 string remains \0.  Thus Tcl strings once again do not
      contain null bytes, except for termination bytes.
    - For Java compatibility, "\uXXXX" is used in Tcl to enter a Unicode
      character.  "\u0000" through "\uffff" are acceptable Unicode 
      characters.  
    - "\xXX" is used to enter a small Unicode character (between 0 and 255)
      in Tcl.
    - Tcl automatically translates between UTF-8 and the normal encoding for
      the platform during interactions with the system.
    - The fconfigure command now supports a -encoding option for specifying
      the encoding of an open file or socket.  Tcl will automatically
      translate between the specified encoding and UTF-8 during I/O. 
      See the directory library/encoding to find out what encodings are
      supported (eventually there will be an "encoding" command that
      makes this information more accessible).
    - There are several new C APIs that support UTF-8 and various encodings.
      See Utf.3 for procedures that translate between Unicode and UTF-8
      and manipulate UTF-8 strings. See Encoding.3 for procedures that
      create new encodings and translate between encodings.  See
      ToUpper.3 for procedures that perform case conversions on UTF-8
      strings.
[...]
1/16/98 (new feature) Tk now supports international characters sets:
    - Font display mechanism overhauled to display Unicode strings
      containing full set of international characters.  You do not need
      Unicode fonts on your system in order to use tk or see international
      characters.  For those familiar with the Japanese or Chinese patches,
      there is no "-kanjifont" option.  Characters from any available fonts
      will automatically be used if the widget's originally selected font is
      not capable of displaying a given character.  
    - Textual widgets are international aware.  For instance, cursor
      positioning commands would now move the cursor forwards/back by 1
      international character, not by 1 byte.  
    - Input Method Editors (IMEs) work on Mac and Windows.  Unix is still in
      progress.
[...]
10/15/98 (bug fix) Changed regexp and string commands to properly
handle case folding according to the Unicode character
tables. (stanton)

10/21/98 (new feature) Added an "encoding" command to facilitate
translations of strings between different character encodings.  See
the encoding.n manual entry for more details. (stanton)

11/3/98 (bug fix) The regular expression character classification
syntax now includes Unicode characters in the supported
classes. (stanton)
[...]
11/17/98 (bug fix) "scan" now correctly handles Unicode
characters. (stanton)
[...]
11/19/98 (bug fix) Fixed menus and titles so they properly display
Unicode characters under Windows. [Bug: 819] (stanton)
[...]
4/2/99 (new apis)  Made various Unicode utility functions public.
Tcl_UtfToUniCharDString, Tcl_UniCharToUtfDString, Tcl_UniCharLen,
Tcl_UniCharNcmp, Tcl_UniCharIsAlnum, Tcl_UniCharIsAlpha,
Tcl_UniCharIsDigit, Tcl_UniCharIsLower, Tcl_UniCharIsSpace,
Tcl_UniCharIsUpper, Tcl_UniCharIsWordChar, Tcl_WinUtfToTChar,
Tcl_WinTCharToUtf (stanton)
[...]
4/5/99 (bug fix) Fixed handling of Unicode in text searches.  The
-count option was returning byte counts instead of character counts.
[...]
5/18/99 (bug fix) Fixed clipboard code so it handles Unicode data
properly on Windows NT and 95. [Bug: 1791] (stanton)
[...]
6/3/99  (bug fix) Fixed selection code to handle Unicode data in
COMPOUND_TEXT and STRING selections.  [Bug: 1791] (stanton)
[...]
6/7/99  (new feature) Optimized string index, length, range, and
append commands. Added a new Unicode object type. (hershey)
[...]
6/14/99 (new feature) Merged string and Unicode object types.  Added
new public Tcl API functions:  Tcl_NewUnicodeObj, Tcl_SetUnicodeObj,
Tcl_GetUnicode, Tcl_GetUniChar, Tcl_GetCharLength, Tcl_GetRange,
Tcl_AppendUnicodeToObj. (hershey)
[...]
6/23/99 (new feature) Updated Unicode character tables to reflect
Unicode 2.1 data. (stanton)
[...]

--- Released 8.3.0, February 10, 2000 --- See ChangeLog for details ---
---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ----

Sorry if this was boring old stuff for some of you.

Best Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From marangoz at python.inrialpes.fr  Wed Mar 15 12:40:21 2000
From: marangoz at python.inrialpes.fr (Vladimir Marangozov)
Date: Wed, 15 Mar 2000 12:40:21 +0100 (CET)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38CF4A9D.13A0080@lemburg.com> from "M.-A. Lemburg" at Mar 15, 2000 09:32:29 AM
Message-ID: <200003151140.MAA30301@python.inrialpes.fr>

M.-A. Lemburg wrote:
> 
> Note that unicodedata is only needed by programs which do
> a lot of Unicode manipulations and in the future probably
> by some codecs too.

Perhaps it would make sense to move the Unicode database on the
Python side (write it in Python)? Or init the database dynamically
in the unicodedata module on import? It's quite big, so if it's
possible to avoid the static declaration (and if the unicodata module
is enabled by default), I'd vote for a dynamic initialization of the
database from reference (Python ?) file(s).

M-A, is something in this spirit doable?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tismer at tismer.com  Wed Mar 15 13:57:04 2000
From: tismer at tismer.com (Christian Tismer)
Date: Wed, 15 Mar 2000 13:57:04 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
			<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com>
Message-ID: <38CF88A0.CF876A74@tismer.com>


"M.-A. Lemburg" wrote:
...

> Note that unicodedata is only needed by programs which do
> a lot of Unicode manipulations and in the future probably
> by some codecs too.

Would it be possible to make the Unicode support configurable?

My problem is that patches in the CVS are of different kinds.
Some are error corrections and enhancements which I would
definately like to use.
Others are brand new features like the Unicode support.
Absolutely great stuff! But this will most probably change
a number of times again, and I think it is a bad idea when
I include it into my Stackless distribution.

I'd appreciate it very much if I could use the same CVS tree
for testing new stuff, and to build my distribution, with
new features switched off. Please :-)

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From jim at digicool.com  Wed Mar 15 14:35:48 2000
From: jim at digicool.com (Jim Fulton)
Date: Wed, 15 Mar 2000 08:35:48 -0500
Subject: [Python-Dev] Finalizers considered questionable ;)
Message-ID: <38CF91B4.A36C8C5@digicool.com>

Here's my $0.02.

I agree with the sentiments that use of finalizers 
should be discouraged.  They are extremely helpful
in cases like tempfile.TemporaryFileWrapper, so I
think that they should be supported. I do think that
the language should not promise a high level of service.

Some observations:

  - I spent a little bit of time on the ANSI 
    Smalltalk committee, where I naively advocated
    adding finalizers to the language. I was resoundingly
    told no. :)

  - Most of the Python objects I deal with these days
    are persistent. Their lifetimes are a lot more complicated
    that most Python objects.  They get created once, but they
    get loaded into and out of memory many times.  In fact, they
    can be in memory many times simultaneously. :) A couple
    of years ago I realized that it only made sense to call
    __init__ when an object was first created, not when it is
    subsequently (re)loaded into memory.  This led to a 
    change in Python pickling semantics and the deprecation
    of the loathsome __getinitargs__ protocol. :)

    For me, a similar case can be made against use of __del__
    for persistent objects.  For persistent objects, a __del__
    method should only be used for cleaning up the most volatile
    of resources. A persistent object __del__ should not perform
    any semantically meaningful operations because __del__ has 
    no semantic meaning.

  - Zope has a few uses of __del__. These are all for
    non-persistent objects. Interesting, in grepping for __del__,
    I found a lot of cases where __del__ was used and then commented 
    out.  Finalizers seem to be the sort of thing that people
    want initially and then get over.

I'm inclined to essentially keep the current rules and
simply not promise that __del__ will be able to run correctly.
That is, Python should call __del__ and ignore exceptions raised
(or provide some *optional* logging or other debugging facility).
There is no reason for __del__ to fail unless it depends on
cyclicly-related objects, which should be viewed as a design
mistake.

OTOH, __del__ should never fail because module globals go away. 
IMO, the current circular references involving module globals are
unnecessary, but that's a different topic. ;)

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From mal at lemburg.com  Wed Mar 15 16:00:14 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 16:00:14 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
				<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com>
Message-ID: <38CFA57E.21A3B3EF@lemburg.com>

Christian Tismer wrote:
> 
> "M.-A. Lemburg" wrote:
> ...
> 
> > Note that unicodedata is only needed by programs which do
> > a lot of Unicode manipulations and in the future probably
> > by some codecs too.
> 
> Would it be possible to make the Unicode support configurable?

This is currently not planned as the Unicode integration
touches many different parts of the interpreter to
enhance string/Unicode integration... sorry.

Also, I'm not sure whether adding #ifdefs throuhgout
the code would increase its elegance ;-)
 
> My problem is that patches in the CVS are of different kinds.
> Some are error corrections and enhancements which I would
> definately like to use.
> Others are brand new features like the Unicode support.
> Absolutely great stuff! But this will most probably change
> a number of times again, and I think it is a bad idea when
> I include it into my Stackless distribution.

Why not ? All you have to do is rebuild the distribution
every time you push a new version -- just like I did
for the Unicode version before the CVS checkin was done.
 
> I'd appreciate it very much if I could use the same CVS tree
> for testing new stuff, and to build my distribution, with
> new features switched off. Please :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Mar 15 15:57:13 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 15:57:13 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003151140.MAA30301@python.inrialpes.fr>
Message-ID: <38CFA4C9.E6B8EB5D@lemburg.com>

Vladimir Marangozov wrote:
> 
> M.-A. Lemburg wrote:
> >
> > Note that unicodedata is only needed by programs which do
> > a lot of Unicode manipulations and in the future probably
> > by some codecs too.
> 
> Perhaps it would make sense to move the Unicode database on the
> Python side (write it in Python)? Or init the database dynamically
> in the unicodedata module on import? It's quite big, so if it's
> possible to avoid the static declaration (and if the unicodata module
> is enabled by default), I'd vote for a dynamic initialization of the
> database from reference (Python ?) file(s).

The unicodedatabase module contains the Unicode database
as static C data - this makes it shareable among (Python)
processes.

Python modules don't provide this feature: instead a dictionary
would have to be built on import which would increase the heap
size considerably. Those dicts would *not* be shareable.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From tismer at tismer.com  Wed Mar 15 16:20:06 2000
From: tismer at tismer.com (Christian Tismer)
Date: Wed, 15 Mar 2000 16:20:06 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
					<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com>
Message-ID: <38CFAA26.2B2F0D01@tismer.com>


"M.-A. Lemburg" wrote:
> 
> Christian Tismer wrote:
...
> > Absolutely great stuff! But this will most probably change
> > a number of times again, and I think it is a bad idea when
> > I include it into my Stackless distribution.
> 
> Why not ? All you have to do is rebuild the distribution
> every time you push a new version -- just like I did
> for the Unicode version before the CVS checkin was done.

But how can I then publish my source code, when I always
pull Unicode into it. I don't like to be exposed to
side effects like 700kb code bloat, just by chance, since it
is in the dist right now (and will vanish again).

I don't say there must be #ifdefs all and everywhere, but
can I build without *using* Unicode? I don't want to
introduce something new to my users what they didn't ask for.
And I don't want to take care about their installations.
Finally I will for sure not replace a 500k DLL by a 1.2M
monster, so this is definately not what I want at the moment.

How do I build a dist that doesn't need to change a lot of
stuff in the user's installation?
Note that Stackless Python is a drop-in replacement,
not a Python distribution. Or should it be?

ciao - chris   (who really wants to get SLP 1.1 out)

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From effbot at telia.com  Wed Mar 15 17:04:54 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Wed, 15 Mar 2000 17:04:54 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>				<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com>
Message-ID: <014001bf8e98$35644480$34aab5d4@hagrid>

CT:
> How do I build a dist that doesn't need to change a lot of
> stuff in the user's installation?

somewhere in this thread, Guido wrote:

> BTW, I added a tag "pre-unicode" to the CVS tree to the revisions
> before the Unicode changes were made.

maybe you could base SLP on that one?

</F>


From marangoz at python.inrialpes.fr  Wed Mar 15 17:27:36 2000
From: marangoz at python.inrialpes.fr (Vladimir Marangozov)
Date: Wed, 15 Mar 2000 17:27:36 +0100 (CET)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38CFA4C9.E6B8EB5D@lemburg.com> from "M.-A. Lemburg" at Mar 15, 2000 03:57:13 PM
Message-ID: <200003151627.RAA32543@python.inrialpes.fr>

> [me]
> > 
> > Perhaps it would make sense to move the Unicode database on the
> > Python side (write it in Python)? Or init the database dynamically
> > in the unicodedata module on import? It's quite big, so if it's
> > possible to avoid the static declaration (and if the unicodata module
> > is enabled by default), I'd vote for a dynamic initialization of the
> > database from reference (Python ?) file(s).

[Marc-Andre]
> 
> The unicodedatabase module contains the Unicode database
> as static C data - this makes it shareable among (Python)
> processes.

The static data is shared if the module is a shared object (.so).
If unicodedata is not a .so, then you'll have a seperate copy of the
database in each process.

> 
> Python modules don't provide this feature: instead a dictionary
> would have to be built on import which would increase the heap
> size considerably. Those dicts would *not* be shareable.

I haven't mentioned dicts, have I? I suggested that the entries in the
C version of the database be rewritten in Python (or a text file)
The unicodedata module would, in it's init function, allocate memory
for the database and would populate it before returning "import okay"
to Python -- this is one way to init the db dynamically, among others.

As to sharing the database among different processes, this is a classic
IPC pb, which has nothing to do with the static C declaration of the db.
Or, hmmm, one of us is royally confused <wink>.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tismer at tismer.com  Wed Mar 15 17:22:42 2000
From: tismer at tismer.com (Christian Tismer)
Date: Wed, 15 Mar 2000 17:22:42 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>				<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid>
Message-ID: <38CFB8D2.537FCAD9@tismer.com>


Fredrik Lundh wrote:
> 
> CT:
> > How do I build a dist that doesn't need to change a lot of
> > stuff in the user's installation?
> 
> somewhere in this thread, Guido wrote:
> 
> > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions
> > before the Unicode changes were made.
> 
> maybe you could base SLP on that one?

I have no idea how this works. Would this mean that I cannot
get patctes which come after unicode?

Meanwhile, I've looked into the sources. It is easy for me
to get rid of the problem by supplying my own unicodedata.c,
where I replace all functions by some unimplemented exception.

Furthermore, I wondered about the data format. Is the unicode
database used inyou re package as well? Otherwise, I see
only references form unicodedata.c, and that means the data
structure can be massively enhanced.
At the moment, that baby is 64k entries long, with four bytes
and an optional string.
This is a big waste. The strings are almost all some distinct
<xxx> prefixes, together with a list of hex smallwords. This
is done as strings, probably this makes 80 percent of the space.

The only function that uses the "decomposition" field (namely
the string) is unicodedata_decomposition. It does nothing
more than to wrap it into a PyObject.
We can do a little better here. I gues I can bring it down
to a third of this space without much effort, just by using
- binary encoding for the <xxx> tags as enumeration
- binary encoding of the hexed entries
- omission of the spaces
Instead of a 64 k of structures which contain pointers anyway,
I can use a 64k pointer array with offsets into one packed
table.

The unicodedata access functions would change *slightly*,
just building some hex strings and so on. I guess this
is not a time critical section?

Should I try this evening? :-)

cheers - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From mal at lemburg.com  Wed Mar 15 17:04:43 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 17:04:43 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
						<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com>
Message-ID: <38CFB49B.885B8B16@lemburg.com>

Christian Tismer wrote:
> 
> "M.-A. Lemburg" wrote:
> >
> > Christian Tismer wrote:
> ...
> > > Absolutely great stuff! But this will most probably change
> > > a number of times again, and I think it is a bad idea when
> > > I include it into my Stackless distribution.
> >
> > Why not ? All you have to do is rebuild the distribution
> > every time you push a new version -- just like I did
> > for the Unicode version before the CVS checkin was done.
> 
> But how can I then publish my source code, when I always
> pull Unicode into it. I don't like to be exposed to
> side effects like 700kb code bloat, just by chance, since it
> is in the dist right now (and will vanish again).

All you have to do is build the unicodedata module shared
and not statically bound into python.dll. This one module
causes most of the code bloat...
 
> I don't say there must be #ifdefs all and everywhere, but
> can I build without *using* Unicode? I don't want to
> introduce something new to my users what they didn't ask for.
> And I don't want to take care about their installations.
> Finally I will for sure not replace a 500k DLL by a 1.2M
> monster, so this is definately not what I want at the moment.
> 
> How do I build a dist that doesn't need to change a lot of
> stuff in the user's installation?

I don't think that the Unicode stuff will disable
the running environment... (haven't tried this though).
The unicodedata module is not used by the interpreter
and the rest is imported on-the-fly, not during init
time, so at least in theory, not using Unicode will
result in Python not looking for e.g. the encodings
package.

> Note that Stackless Python is a drop-in replacement,
> not a Python distribution. Or should it be?

Probably... I think it's simply easier to install
and probably also easier to maintain because it doesn't
cause dependencies on other "default" installations.
The user will then explicitly know that she is installing
something a little different from the default distribution...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Mar 15 18:26:15 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 18:26:15 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>				<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> <38CFB8D2.537FCAD9@tismer.com>
Message-ID: <38CFC7B7.A1ABD51C@lemburg.com>

Christian Tismer wrote:
> 
> Fredrik Lundh wrote:
> >
> > CT:
> > > How do I build a dist that doesn't need to change a lot of
> > > stuff in the user's installation?
> >
> > somewhere in this thread, Guido wrote:
> >
> > > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions
> > > before the Unicode changes were made.
> >
> > maybe you could base SLP on that one?
> 
> I have no idea how this works. Would this mean that I cannot
> get patctes which come after unicode?
> 
> Meanwhile, I've looked into the sources. It is easy for me
> to get rid of the problem by supplying my own unicodedata.c,
> where I replace all functions by some unimplemented exception.

No need (see my other posting): simply disable the module
altogether... this shouldn't hurt any part of the interpreter
as the module is a user-land only module.

> Furthermore, I wondered about the data format. Is the unicode
> database used inyou re package as well? Otherwise, I see
> only references form unicodedata.c, and that means the data
> structure can be massively enhanced.
> At the moment, that baby is 64k entries long, with four bytes
> and an optional string.
> This is a big waste. The strings are almost all some distinct
> <xxx> prefixes, together with a list of hex smallwords. This
> is done as strings, probably this makes 80 percent of the space.

I have made no attempt to optimize the structure... (due
to lack of time mostly) the current implementation is
really not much different from a rewrite of the UnicodeData.txt
file availble at the unicode.org site.

If you want to, I can mail you the marshalled Python dict version of
that database to play with.
 
> The only function that uses the "decomposition" field (namely
> the string) is unicodedata_decomposition. It does nothing
> more than to wrap it into a PyObject.
> We can do a little better here. I gues I can bring it down
> to a third of this space without much effort, just by using
> - binary encoding for the <xxx> tags as enumeration
> - binary encoding of the hexed entries
> - omission of the spaces
> Instead of a 64 k of structures which contain pointers anyway,
> I can use a 64k pointer array with offsets into one packed
> table.
> 
> The unicodedata access functions would change *slightly*,
> just building some hex strings and so on. I guess this
> is not a time critical section?

It may be if these functions are used in codecs, so you should
pay attention to speed too...
 
> Should I try this evening? :-)

Sure :-) go ahead...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Mar 15 18:39:14 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 18:39:14 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003151627.RAA32543@python.inrialpes.fr>
Message-ID: <38CFCAC2.7690DF55@lemburg.com>

Vladimir Marangozov wrote:
> 
> > [me]
> > >
> > > Perhaps it would make sense to move the Unicode database on the
> > > Python side (write it in Python)? Or init the database dynamically
> > > in the unicodedata module on import? It's quite big, so if it's
> > > possible to avoid the static declaration (and if the unicodata module
> > > is enabled by default), I'd vote for a dynamic initialization of the
> > > database from reference (Python ?) file(s).
> 
> [Marc-Andre]
> >
> > The unicodedatabase module contains the Unicode database
> > as static C data - this makes it shareable among (Python)
> > processes.
> 
> The static data is shared if the module is a shared object (.so).
> If unicodedata is not a .so, then you'll have a seperate copy of the
> database in each process.

Uhm, comparing the two versions Python 1.5 and the current
CVS Python I get these figures on Linux:

Executing : ./python -i -c '1/0'

Python 1.5: 1208kB / 728 kB (resident/shared)
Python CVS: 1280kB / 808 kB ("/")

Not much of a change if you ask me and the CVS version has the
unicodedata module linked statically... so there's got to be
some sharing and load-on-demand going on behind the scenes:
this is what I was referring to when I mentioned static
C data. The OS can much better deal with these sharing techniques
and delayed loads than anything we could implement on top of
it in C or Python.

But perhaps this is Linux-specific...
 
> > Python modules don't provide this feature: instead a dictionary
> > would have to be built on import which would increase the heap
> > size considerably. Those dicts would *not* be shareable.
> 
> I haven't mentioned dicts, have I? I suggested that the entries in the
> C version of the database be rewritten in Python (or a text file)
> The unicodedata module would, in it's init function, allocate memory
> for the database and would populate it before returning "import okay"
> to Python -- this is one way to init the db dynamically, among others.

I'm leaving this as exercise to the interested reader ;-)
Really, if you have better ideas for the unicodedata module,
please go ahead.
 
> As to sharing the database among different processes, this is a classic
> IPC pb, which has nothing to do with the static C declaration of the db.
> Or, hmmm, one of us is royally confused <wink>.

Could you check this on other platforms ? Perhaps Linux is
doing more than other OSes are in this field.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From effbot at telia.com  Wed Mar 15 19:23:59 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Wed, 15 Mar 2000 19:23:59 +0100
Subject: [Python-Dev] first public SRE snapshot now available!
References: <200003151627.RAA32543@python.inrialpes.fr> <38CFCAC2.7690DF55@lemburg.com>
Message-ID: <01f901bf8eab$a353e780$34aab5d4@hagrid>

I just uploaded the first public SRE snapshot to:

    http://w1.132.telia.com/~u13208596/sre.htm

-- this kit contains windows binaries only (make
   sure you have built the interpreter from a recent
   CVS version)

-- the engine fully supports unicode target strings.
   (not sure about the pattern compiler, though...)

-- it's probably buggy as hell.  for things I'm working
   on at this very moment, see:

   http://w1.132.telia.com/~u13208596/sre/status.htm

I hope to get around to fix the core dump (it crashes half-
ways through sre_fulltest.py, by no apparent reason) and
the backreferencing problem later today.  stay tuned.

</F>

PS. note that "public" doesn't really mean "suitable for the
c.l.python crowd", or "suitable for production use".  in other
words, let's keep this one on this list for now.  thanks!


From tismer at tismer.com  Wed Mar 15 19:15:27 2000
From: tismer at tismer.com (Christian Tismer)
Date: Wed, 15 Mar 2000 19:15:27 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>				<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> <38CFB8D2.537FCAD9@tismer.com> <38CFC7B7.A1ABD51C@lemburg.com>
Message-ID: <38CFD33F.3C02BF43@tismer.com>


"M.-A. Lemburg" wrote:
> 
> Christian Tismer wrote:

[the old data comression guy has been reanimated]

> If you want to, I can mail you the marshalled Python dict version of
> that database to play with.
...
> > Should I try this evening? :-)
> 
> Sure :-) go ahead...

Thank you. Meanwhile I've heard that there is some well-known
bot working on that under the hood, with a much better approach
than mine. So I'll take your advice, and continue to write
silly stackless enhancements. They say this is my destiny :-)

ciao - continuous

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From DavidA at ActiveState.com  Wed Mar 15 19:21:40 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Wed, 15 Mar 2000 10:21:40 -0800
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38CFA4C9.E6B8EB5D@lemburg.com>
Message-ID: <NDBBJPNCJLKKIOBLDOMJKEAJCCAA.DavidA@ActiveState.com>

> The unicodedatabase module contains the Unicode database
> as static C data - this makes it shareable among (Python)
> processes.
>
> Python modules don't provide this feature: instead a dictionary
> would have to be built on import which would increase the heap
> size considerably. Those dicts would *not* be shareable.

I know it's complicating things, but wouldn't an mmap'ed buffer allow
inter-process sharing while keeping DLL size down and everything on-disk
until needed?

Yes, I know, mmap calls aren't uniform across platforms and isn't supported
on all platforms -- I still think that it's silly not to use it on those
platforms where it is available, and I'd like to see mmap unification move
forward, so this is as good a motivation as any to bite the bullet.

Just a thought,

--david


From jim at digicool.com  Wed Mar 15 19:24:53 2000
From: jim at digicool.com (Jim Fulton)
Date: Wed, 15 Mar 2000 13:24:53 -0500
Subject: [Python-Dev] Allowing multiple socket maps in asyncore (and asynchat)
Message-ID: <38CFD575.A0536439@digicool.com>

I find asyncore to be quite useful, however, it is currently
geared to having a single main loop. It uses a global socket
map that all asyncore dispatchers register with.

I have an application in which I want to have multiple 
socket maps.

I propose that we start moving toward a model in which 
selection of a socket map and control of the asyncore loop
is a bit more explicit.  

If no one objects, I'll work up some initial patches.

Who should I submit these to? Sam? 
Should the medusa public CVS form the basis?

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From jcw at equi4.com  Wed Mar 15 20:39:37 2000
From: jcw at equi4.com (Jean-Claude Wippler)
Date: Wed, 15 Mar 2000 20:39:37 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <NDBBJPNCJLKKIOBLDOMJKEAJCCAA.DavidA@ActiveState.com>
Message-ID: <38CFE6F9.3E8E9385@equi4.com>

David Ascher wrote:

[shareable unicodedatabase]
> I know it's complicating things, but wouldn't an mmap'ed buffer allow
> inter-process sharing while keeping DLL size down and everything
> on-disk until needed?

AFAIK, on platforms which support mmap, static data already gets mmap'ed
in by the OS (just like all code), so this might have little effect.

I'm more concerned by the distribution size increase.

-jcw


From bwarsaw at cnri.reston.va.us  Wed Mar 15 19:41:00 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Wed, 15 Mar 2000 13:41:00 -0500 (EST)
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
	<000901bf8e03$abf88420$34aab5d4@hagrid>
	<14542.47872.184978.985612@amarok.cnri.reston.va.us>
	<38CF4A9D.13A0080@lemburg.com>
	<38CF88A0.CF876A74@tismer.com>
	<38CFA57E.21A3B3EF@lemburg.com>
	<38CFAA26.2B2F0D01@tismer.com>
	<014001bf8e98$35644480$34aab5d4@hagrid>
Message-ID: <14543.55612.969101.206695@anthem.cnri.reston.va.us>

>>>>> "FL" == Fredrik Lundh <effbot at telia.com> writes:

    FL> somewhere in this thread, Guido wrote:

    >> BTW, I added a tag "pre-unicode" to the CVS tree to the
    >> revisions before the Unicode changes were made.

    FL> maybe you could base SLP on that one?

/F's got it exactly right.  Check out a new directory using a stable
tag (maybe you want to base your changes on pre-unicode tag, or python
1.52?).  Patch in that subtree and then eventually you'll have to
merge your changes into the head of the branch.

-Barry


From rushing at nightmare.com  Thu Mar 16 02:52:22 2000
From: rushing at nightmare.com (Sam Rushing)
Date: Wed, 15 Mar 2000 17:52:22 -0800 (PST)
Subject: [Python-Dev] Allowing multiple socket maps in asyncore (and asynchat)
In-Reply-To: <38CFD575.A0536439@digicool.com>
References: <38CFD575.A0536439@digicool.com>
Message-ID: <14544.15958.546712.466506@seattle.nightmare.com>

Jim Fulton writes:
 > I find asyncore to be quite useful, however, it is currently
 > geared to having a single main loop. It uses a global socket
 > map that all asyncore dispatchers register with.
 > 
 > I have an application in which I want to have multiple 
 > socket maps.

But still only a single event loop, yes?
Why do you need multiple maps?  For a priority system of some kind?

 > I propose that we start moving toward a model in which selection of
 > a socket map and control of the asyncore loop is a bit more
 > explicit.
 > 
 > If no one objects, I'll work up some initial patches.

If it can be done in a backward-compatible fashion, that sounds fine;
but it sounds tricky.  Even the simple {<descriptor>:object...} change
broke so many things that we're still using the old stuff at eGroups.

 > Who should I submit these to? Sam? 
 > Should the medusa public CVS form the basis?

Yup, yup.

-Sam


From tim_one at email.msn.com  Thu Mar 16 08:06:23 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 16 Mar 2000 02:06:23 -0500
Subject: [Python-Dev] Finalizers considered questionable ;)
In-Reply-To: <38CF91B4.A36C8C5@digicool.com>
Message-ID: <000201bf8f16$237e5e80$662d153f@tim>

[Jim Fulton]
> ...
> There is no reason for __del__ to fail unless it depends on
> cyclicly-related objects, which should be viewed as a design
> mistake.
>
> OTOH, __del__ should never fail because module globals go away.
> IMO, the current circular references involving module globals are
> unnecessary, but that's a different topic. ;)

IOW, you view "the current circular references involving module globals" as
"a design mistake" <wink>.  And perhaps they are!  I wouldn't call it a
different topic, though:  so long as people are *viewing* shutdown __del__
problems as just another instance of finalizers in cyclic trash, it makes
the latter *seem* inescapably "normal", and so something that has to be
catered to.  If you have a way to take the shutdown problems out of the
discussion, it would help clarify both topics, at the very least by
deconflating them.

it's-a-mailing-list-so-no-need-to-stay-on-topic<wink>-ly y'rs  - tim


From gstein at lyra.org  Thu Mar 16 13:01:36 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 16 Mar 2000 04:01:36 -0800 (PST)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38CF88A0.CF876A74@tismer.com>
Message-ID: <Pine.LNX.4.10.10003160357500.2258-100000@nebula.lyra.org>

On Wed, 15 Mar 2000, Christian Tismer wrote:
>...
> Would it be possible to make the Unicode support configurable?

This might be interesting from the standpoint of those guys who are doing
the tiny Python interpreter thingy for embedded systems.

> My problem is that patches in the CVS are of different kinds.
> Some are error corrections and enhancements which I would
> definately like to use.
> Others are brand new features like the Unicode support.
> Absolutely great stuff! But this will most probably change
> a number of times again, and I think it is a bad idea when
> I include it into my Stackless distribution.
> 
> I'd appreciate it very much if I could use the same CVS tree
> for testing new stuff, and to build my distribution, with
> new features switched off. Please :-)

But! I find this reason completely off the mark. In essence, you're
arguing that we should not put *any* new feature into the CVS repository
because it might mess up what *you* are doing.

Sorry, but that just irks me. If you want a stable Python, then don't use
the CVS version. Or base it off a specific tag in CVS. Or something. Just
don't ask for development to be stopped.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Thu Mar 16 13:08:43 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 16 Mar 2000 04:08:43 -0800 (PST)
Subject: [Python-Dev] const data (was: Unicode patches checked in)
In-Reply-To: <200003151627.RAA32543@python.inrialpes.fr>
Message-ID: <Pine.LNX.4.10.10003160401570.2258-100000@nebula.lyra.org>

On Wed, 15 Mar 2000, Vladimir Marangozov wrote:
> > [me]
> > > 
> > > Perhaps it would make sense to move the Unicode database on the
> > > Python side (write it in Python)? Or init the database dynamically
> > > in the unicodedata module on import? It's quite big, so if it's
> > > possible to avoid the static declaration (and if the unicodata module
> > > is enabled by default), I'd vote for a dynamic initialization of the
> > > database from reference (Python ?) file(s).
> 
> [Marc-Andre]
> > 
> > The unicodedatabase module contains the Unicode database
> > as static C data - this makes it shareable among (Python)
> > processes.
> 
> The static data is shared if the module is a shared object (.so).
> If unicodedata is not a .so, then you'll have a seperate copy of the
> database in each process.

Nope. A shared module means that multiple executables can share the code.
Whether the const data resides in an executable or a .so, the OS will map
it into readonly memory and share it across all procsses.

> > Python modules don't provide this feature: instead a dictionary
> > would have to be built on import which would increase the heap
> > size considerably. Those dicts would *not* be shareable.
> 
> I haven't mentioned dicts, have I? I suggested that the entries in the
> C version of the database be rewritten in Python (or a text file)
> The unicodedata module would, in it's init function, allocate memory
> for the database and would populate it before returning "import okay"
> to Python -- this is one way to init the db dynamically, among others.

This would place all that data into the per-process heap. Definitely not
shared, and definitely a big hit for each Python process.

> As to sharing the database among different processes, this is a classic
> IPC pb, which has nothing to do with the static C declaration of the db.
> Or, hmmm, one of us is royally confused <wink>.

This isn't IPC. It is sharing of some constant data. The most effective
way to manage this is through const C data. The OS will properly manage
it.

And sorry, David, but mmap'ing a file will simply add complexity. As jcw
mentioned, the OS is pretty much doing this anyhow when it deals with a
const data segment in your executable.

I don't believe this is Linux specific. This kind of stuff has been done
for a *long* time on the platforms, too.

Side note: the most effective way of exposing this const data up to Python
(without shoving it onto the heap) is through buffers created via:
   PyBuffer_FromMemory(ptr, size)
This allows the data to reside in const, shared memory while it is also
exposed up to Python.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From marangoz at python.inrialpes.fr  Thu Mar 16 13:39:42 2000
From: marangoz at python.inrialpes.fr (Vladimir Marangozov)
Date: Thu, 16 Mar 2000 13:39:42 +0100 (CET)
Subject: [Python-Dev] const data (was: Unicode patches checked in)
In-Reply-To: <Pine.LNX.4.10.10003160401570.2258-100000@nebula.lyra.org> from "Greg Stein" at Mar 16, 2000 04:08:43 AM
Message-ID: <200003161239.NAA01671@python.inrialpes.fr>

Greg Stein wrote:
> 
> [me]
> > The static data is shared if the module is a shared object (.so).
> > If unicodedata is not a .so, then you'll have a seperate copy of the
> > database in each process.
> 
> Nope. A shared module means that multiple executables can share the code.
> Whether the const data resides in an executable or a .so, the OS will map
> it into readonly memory and share it across all procsses.

I must have been drunk yesterday<wink>. You're right.

> I don't believe this is Linux specific. This kind of stuff has been done
> for a *long* time on the platforms, too.

Yes.

> 
> Side note: the most effective way of exposing this const data up to Python
> (without shoving it onto the heap) is through buffers created via:
>    PyBuffer_FromMemory(ptr, size)
> This allows the data to reside in const, shared memory while it is also
> exposed up to Python.

And to avoid the size increase of the Python library, perhaps unicodedata
needs to be uncommented by default in Setup.in (for the release, not now).
As M-A pointed out, the module isn't isn't necessary for the normal
operation of the interpreter.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From gstein at lyra.org  Thu Mar 16 13:56:21 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 16 Mar 2000 04:56:21 -0800 (PST)
Subject: [Python-Dev] Finalizers considered questionable ;)
In-Reply-To: <000201bf8f16$237e5e80$662d153f@tim>
Message-ID: <Pine.LNX.4.10.10003160455020.2258-100000@nebula.lyra.org>

On Thu, 16 Mar 2000, Tim Peters wrote:
>...
> IOW, you view "the current circular references involving module globals" as
> "a design mistake" <wink>.  And perhaps they are!  I wouldn't call it a
> different topic, though:  so long as people are *viewing* shutdown __del__
> problems as just another instance of finalizers in cyclic trash, it makes
> the latter *seem* inescapably "normal", and so something that has to be
> catered to.  If you have a way to take the shutdown problems out of the
> discussion, it would help clarify both topics, at the very least by
> deconflating them.

Bah. Module globals are easy. My tp_clean suggestion handles them quite
easily at shutdown. No more special-code in import.c.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tismer at tismer.com  Thu Mar 16 13:53:46 2000
From: tismer at tismer.com (Christian Tismer)
Date: Thu, 16 Mar 2000 13:53:46 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <Pine.LNX.4.10.10003160357500.2258-100000@nebula.lyra.org>
Message-ID: <38D0D95A.B13EC17E@tismer.com>


Greg Stein wrote:
> 
> On Wed, 15 Mar 2000, Christian Tismer wrote:
> >...
> > Would it be possible to make the Unicode support configurable?
> 
> This might be interesting from the standpoint of those guys who are doing
> the tiny Python interpreter thingy for embedded systems.
> 
> > My problem is that patches in the CVS are of different kinds.
> > Some are error corrections and enhancements which I would
> > definately like to use.
> > Others are brand new features like the Unicode support.
> > Absolutely great stuff! But this will most probably change
> > a number of times again, and I think it is a bad idea when
> > I include it into my Stackless distribution.
> >
> > I'd appreciate it very much if I could use the same CVS tree
> > for testing new stuff, and to build my distribution, with
> > new features switched off. Please :-)
> 
> But! I find this reason completely off the mark. In essence, you're
> arguing that we should not put *any* new feature into the CVS repository
> because it might mess up what *you* are doing.

No, this is your interpretation, and a reduction which I can't follow.
There are inprovements and features in the CVS version which I need.
I prefer to build against it, instead of the old 1.5.2. What's wrong
with that? I want to find a way that gives me the least trouble
in doing so.

> Sorry, but that just irks me. If you want a stable Python, then don't use
> the CVS version. Or base it off a specific tag in CVS. Or something. Just
> don't ask for development to be stopped.

No, I ask for development to be stopped. Code freeze until Y3k :-)
Why are you trying to put such a nonsense into my mouth?
You know that I know that you know better.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From tismer at tismer.com  Thu Mar 16 14:25:48 2000
From: tismer at tismer.com (Christian Tismer)
Date: Thu, 16 Mar 2000 14:25:48 +0100
Subject: [Python-Dev] const data (was: Unicode patches checked in)
References: <200003161239.NAA01671@python.inrialpes.fr>
Message-ID: <38D0E0DC.B997F836@tismer.com>


Vladimir Marangozov wrote:
> 
> Greg Stein wrote:

> > Side note: the most effective way of exposing this const data up to Python
> > (without shoving it onto the heap) is through buffers created via:
> >    PyBuffer_FromMemory(ptr, size)
> > This allows the data to reside in const, shared memory while it is also
> > exposed up to Python.
> 
> And to avoid the size increase of the Python library, perhaps unicodedata
> needs to be uncommented by default in Setup.in (for the release, not now).
> As M-A pointed out, the module isn't isn't necessary for the normal
> operation of the interpreter.

Sounds like a familiar idea. :-)

BTW., yesterday evening I wrote an analysis script, to see how
far this data is compactable without going into real compression,
just redundancy folding and byte/short indexing was used.
If I'm not wrong, this reduces the size of the database to less
than 25kb. That small amount of extra data would make the
uncommenting feature quite unimportant, except for the issue
of building tiny Pythons.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From gstein at lyra.org  Thu Mar 16 14:06:46 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 16 Mar 2000 05:06:46 -0800 (PST)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38D0D95A.B13EC17E@tismer.com>
Message-ID: <Pine.LNX.4.10.10003160502590.2258-100000@nebula.lyra.org>

On Thu, 16 Mar 2000, Christian Tismer wrote:
> Greg Stein wrote:
>...
> > Sorry, but that just irks me. If you want a stable Python, then don't use
> > the CVS version. Or base it off a specific tag in CVS. Or something. Just
> > don't ask for development to be stopped.
> 
> No, I ask for development to be stopped. Code freeze until Y3k :-)
> Why are you trying to put such a nonsense into my mouth?
> You know that I know that you know better.

Simply because that is what it sounds like on this side of my monitor :-)

I'm seeing your request as asking for people to make special
considerations in their patches for your custom distribution. While I
don't have a problem with making Python more flexible to distro
maintainers, it seemed like you were approaching it from the "wrong"
angle. Like I said, making Unicode optional for the embedded space makes
sense; making it optional so it doesn't bloat your distro didn't :-)

Not a big deal... it is mostly a perception on my part. I also tend to
dislike things that hold development back.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Fri Mar 17 19:53:39 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 17 Mar 2000 19:53:39 +0100
Subject: [Python-Dev] Unicode Update 2000-03-17
Message-ID: <38D27F33.4055A942@lemburg.com>

Attached you find an update of the Unicode implementation.

The patch is against the current CVS version. I would appreciate
if someone with CVS checkin permissions could check the changes
in.

The patch contains all bugs and patches sent this week and
also fixes a leak in the codecs code and a bug in the free list
code for Unicode objects (which only shows up when compiling
Python with Py_DEBUG; thanks to MarkH for spotting this one).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/
-------------- next part --------------
Only in CVS-Python/Doc/tools: anno-api.py
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Include/unicodeobject.h Python+Unicode/Include/unicodeobject.h
--- CVS-Python/Include/unicodeobject.h	Fri Mar 17 15:24:30 2000
+++ Python+Unicode/Include/unicodeobject.h	Tue Mar 14 10:38:08 2000
@@ -1,8 +1,5 @@
 #ifndef Py_UNICODEOBJECT_H
 #define Py_UNICODEOBJECT_H
-#ifdef __cplusplus
-extern "C" {
-#endif
 
 /*
 
@@ -109,8 +106,9 @@
 /* --- Internal Unicode Operations ---------------------------------------- */
 
 /* If you want Python to use the compiler's wctype.h functions instead
-   of the ones supplied with Python, define WANT_WCTYPE_FUNCTIONS.
-   This reduces the interpreter's code size. */
+   of the ones supplied with Python, define WANT_WCTYPE_FUNCTIONS or
+   configure Python using --with-ctype-functions.  This reduces the
+   interpreter's code size. */
 
 #if defined(HAVE_USABLE_WCHAR_T) && defined(WANT_WCTYPE_FUNCTIONS)
 
@@ -169,6 +167,10 @@
     (!memcmp((string)->str + (offset), (substring)->str,\
              (substring)->length*sizeof(Py_UNICODE)))
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* --- Unicode Type ------------------------------------------------------- */
 
 typedef struct {
@@ -647,7 +649,7 @@
     int direction		/* Find direction: +1 forward, -1 backward */
     );
 
-/* Count the number of occurances of substr in str[start:end]. */
+/* Count the number of occurrences of substr in str[start:end]. */
 
 extern DL_IMPORT(int) PyUnicode_Count(
     PyObject *str,		/* String */ 
@@ -656,7 +658,7 @@
     int end			/* Stop index */
     );
 
-/* Replace at most maxcount occurances of substr in str with replstr
+/* Replace at most maxcount occurrences of substr in str with replstr
    and return the resulting Unicode object. */
 
 extern DL_IMPORT(PyObject *) PyUnicode_Replace(
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/codecs.py Python+Unicode/Lib/codecs.py
--- CVS-Python/Lib/codecs.py	Sat Mar 11 00:20:43 2000
+++ Python+Unicode/Lib/codecs.py	Mon Mar 13 14:33:54 2000
@@ -55,7 +55,7 @@
     """
     def encode(self,input,errors='strict'):
         
-        """ Encodes the object intput and returns a tuple (output
+        """ Encodes the object input and returns a tuple (output
             object, length consumed).
 
             errors defines the error handling to apply. It defaults to
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/encodings/__init__.py Python+Unicode/Lib/encodings/__init__.py
--- CVS-Python/Lib/encodings/__init__.py	Sat Mar 11 00:17:18 2000
+++ Python+Unicode/Lib/encodings/__init__.py	Mon Mar 13 14:30:33 2000
@@ -30,13 +30,13 @@
 import string,codecs,aliases
 
 _cache = {}
-_unkown = '--unkown--'
+_unknown = '--unknown--'
 
 def search_function(encoding):
     
     # Cache lookup
-    entry = _cache.get(encoding,_unkown)
-    if entry is not _unkown:
+    entry = _cache.get(encoding,_unknown)
+    if entry is not _unknown:
         return entry
 
     # Import the module
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_string.py Python+Unicode/Lib/test/test_string.py
--- CVS-Python/Lib/test/test_string.py	Sat Mar 11 10:52:43 2000
+++ Python+Unicode/Lib/test/test_string.py	Mon Mar 13 10:12:46 2000
@@ -143,6 +143,7 @@
 test('translate', 'xyz', 'xyz', table)
 
 test('replace', 'one!two!three!', 'one at two!three!', '!', '@', 1)
+test('replace', 'one!two!three!', 'onetwothree', '!', '')
 test('replace', 'one!two!three!', 'one at two@three!', '!', '@', 2)
 test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 3)
 test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 4)
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py
--- CVS-Python/Lib/test/test_unicode.py	Fri Mar 17 15:24:31 2000
+++ Python+Unicode/Lib/test/test_unicode.py	Mon Mar 13 10:13:05 2000
@@ -108,6 +108,7 @@
     test('translate', u'xyz', u'xyz', table)
 
 test('replace', u'one!two!three!', u'one at two!three!', u'!', u'@', 1)
+test('replace', u'one!two!three!', u'onetwothree', '!', '')
 test('replace', u'one!two!three!', u'one at two@three!', u'!', u'@', 2)
 test('replace', u'one!two!three!', u'one at two@three@', u'!', u'@', 3)
 test('replace', u'one!two!three!', u'one at two@three@', u'!', u'@', 4)
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt
--- CVS-Python/Misc/unicode.txt	Sat Mar 11 00:14:11 2000
+++ Python+Unicode/Misc/unicode.txt	Fri Mar 17 16:55:11 2000
@@ -743,8 +743,9 @@
 stream codecs as available through the codecs module should 
 be used.
 
-XXX There should be a short-cut open(filename,mode,encoding) available which
-    also assures that mode contains the 'b' character when needed.
+The codecs module should provide a short-cut open(filename,mode,encoding)
+available which also assures that mode contains the 'b' character when
+needed.
 
 
 File/Stream Input:
@@ -810,6 +811,10 @@
 Introduction to Unicode (a little outdated by still nice to read):
         http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html
 
+For comparison:
+	Introducing Unicode to ECMAScript --
+	http://www-4.ibm.com/software/developer/library/internationalization-support.html
+
 Encodings:
 
     Overview:
@@ -832,7 +837,7 @@
 
 History of this Proposal:
 -------------------------
-1.2: 
+1.2: Removed POD about codecs.open()
 1.1: Added note about comparisons and hash values. Added note about
      case mapping algorithms. Changed stream codecs .read() and
      .write() method to match the standard file-like object methods
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Modules/stropmodule.c Python+Unicode/Modules/stropmodule.c
--- CVS-Python/Modules/stropmodule.c	Wed Mar  1 10:22:53 2000
+++ Python+Unicode/Modules/stropmodule.c	Mon Mar 13 14:33:23 2000
@@ -1054,7 +1054,7 @@
 
   strstr replacement for arbitrary blocks of memory.
 
-  Locates the first occurance in the memory pointed to by MEM of the
+  Locates the first occurrence in the memory pointed to by MEM of the
   contents of memory pointed to by PAT.  Returns the index into MEM if
   found, or -1 if not found.  If len of PAT is greater than length of
   MEM, the function returns -1.
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/stringobject.c Python+Unicode/Objects/stringobject.c
--- CVS-Python/Objects/stringobject.c	Tue Mar 14 00:14:17 2000
+++ Python+Unicode/Objects/stringobject.c	Mon Mar 13 14:33:24 2000
@@ -1395,7 +1395,7 @@
 
   strstr replacement for arbitrary blocks of memory.
 
-  Locates the first occurance in the memory pointed to by MEM of the
+  Locates the first occurrence in the memory pointed to by MEM of the
   contents of memory pointed to by PAT.  Returns the index into MEM if
   found, or -1 if not found.  If len of PAT is greater than length of
   MEM, the function returns -1.
@@ -1578,7 +1578,7 @@
 		return NULL;
 
 	if (sub_len <= 0) {
-		PyErr_SetString(PyExc_ValueError, "empty replacement string");
+		PyErr_SetString(PyExc_ValueError, "empty pattern string");
 		return NULL;
 	}
 	new_s = mymemreplace(str,len,sub,sub_len,repl,repl_len,count,&out_len);
Only in CVS-Python/Objects: stringobject.c.orig
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/unicodeobject.c Python+Unicode/Objects/unicodeobject.c
--- CVS-Python/Objects/unicodeobject.c	Tue Mar 14 00:14:17 2000
+++ Python+Unicode/Objects/unicodeobject.c	Wed Mar 15 10:49:19 2000
@@ -83,7 +83,7 @@
    all objects on the free list having a size less than this
    limit. This reduces malloc() overhead for small Unicode objects.  
 
-   At worse this will result in MAX_UNICODE_FREELIST_SIZE *
+   At worst this will result in MAX_UNICODE_FREELIST_SIZE *
    (sizeof(PyUnicodeObject) + STAYALIVE_SIZE_LIMIT +
    malloc()-overhead) bytes of unused garbage.
 
@@ -180,7 +180,7 @@
         unicode_freelist = *(PyUnicodeObject **)unicode_freelist;
         unicode_freelist_size--;
         unicode->ob_type = &PyUnicode_Type;
-        _Py_NewReference(unicode);
+        _Py_NewReference((PyObject *)unicode);
 	if (unicode->str) {
 	    if (unicode->length < length &&
 		_PyUnicode_Resize(unicode, length)) {
@@ -199,16 +199,19 @@
 	unicode->str = PyMem_NEW(Py_UNICODE, length + 1);
     }
 
-    if (!unicode->str) {
-        PyMem_DEL(unicode);
-        PyErr_NoMemory();
-        return NULL;
-    }
+    if (!unicode->str) 
+	goto onError;
     unicode->str[length] = 0;
     unicode->length = length;
     unicode->hash = -1;
     unicode->utf8str = NULL;
     return unicode;
+
+ onError:
+    _Py_ForgetReference((PyObject *)unicode);
+    PyMem_DEL(unicode);
+    PyErr_NoMemory();
+    return NULL;
 }
 
 static
@@ -224,7 +227,6 @@
         *(PyUnicodeObject **)unicode = unicode_freelist;
         unicode_freelist = unicode;
         unicode_freelist_size++;
-        _Py_ForgetReference(unicode);
     }
     else {
 	free(unicode->str);
@@ -489,7 +491,7 @@
     }
     else {
         PyErr_Format(PyExc_ValueError,
-                     "UTF-8 decoding error; unkown error handling code: %s",
+                     "UTF-8 decoding error; unknown error handling code: %s",
                      errors);
         return -1;
     }
@@ -611,7 +613,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "UTF-8 encoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -733,7 +735,7 @@
     }
     else {
         PyErr_Format(PyExc_ValueError,
-                     "UTF-16 decoding error; unkown error handling code: %s",
+                     "UTF-16 decoding error; unknown error handling code: %s",
                      errors);
         return -1;
     }
@@ -921,7 +923,7 @@
     else {
         PyErr_Format(PyExc_ValueError,
                      "Unicode-Escape decoding error; "
-                     "unkown error handling code: %s",
+                     "unknown error handling code: %s",
                      errors);
         return -1;
     }
@@ -1051,6 +1053,10 @@
 
 */
 
+static const Py_UNICODE *findchar(const Py_UNICODE *s,
+				  int size,
+				  Py_UNICODE ch);
+
 static
 PyObject *unicodeescape_string(const Py_UNICODE *s,
                                int size,
@@ -1069,9 +1075,6 @@
     p = q = PyString_AS_STRING(repr);
 
     if (quotes) {
-        static const Py_UNICODE *findchar(const Py_UNICODE *s,
-					  int size,
-					  Py_UNICODE ch);
         *p++ = 'u';
         *p++ = (findchar(s, size, '\'') && 
                 !findchar(s, size, '"')) ? '"' : '\'';
@@ -1298,7 +1301,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "Latin-1 encoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1369,7 +1372,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "ASCII decoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1431,7 +1434,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "ASCII encoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1502,7 +1505,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "charmap decoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1618,7 +1621,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "charmap encoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1750,7 +1753,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "translate error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Python/codecs.c Python+Unicode/Python/codecs.c
--- CVS-Python/Python/codecs.c	Fri Mar 10 23:57:27 2000
+++ Python+Unicode/Python/codecs.c	Wed Mar 15 11:27:54 2000
@@ -93,9 +93,14 @@
 
 PyObject *_PyCodec_Lookup(const char *encoding)
 {
-    PyObject *result, *args = NULL, *v;
+    PyObject *result, *args = NULL, *v = NULL;
     int i, len;
 
+    if (_PyCodec_SearchCache == NULL || _PyCodec_SearchPath == NULL) {
+	PyErr_SetString(PyExc_SystemError,
+			"codec module not properly initialized");
+	goto onError;
+    }
     if (!import_encodings_called)
 	import_encodings();
 
@@ -109,6 +114,7 @@
     result = PyDict_GetItem(_PyCodec_SearchCache, v);
     if (result != NULL) {
 	Py_INCREF(result);
+	Py_DECREF(v);
 	return result;
     }
     
@@ -121,6 +127,7 @@
     if (args == NULL)
 	goto onError;
     PyTuple_SET_ITEM(args,0,v);
+    v = NULL;
 
     for (i = 0; i < len; i++) {
 	PyObject *func;
@@ -146,7 +153,7 @@
     if (i == len) {
 	/* XXX Perhaps we should cache misses too ? */
 	PyErr_SetString(PyExc_LookupError,
-			"unkown encoding");
+			"unknown encoding");
 	goto onError;
     }
 
@@ -156,6 +163,7 @@
     return result;
 
  onError:
+    Py_XDECREF(v);
     Py_XDECREF(args);
     return NULL;
 }
@@ -378,5 +386,7 @@
 void _PyCodecRegistry_Fini()
 {
     Py_XDECREF(_PyCodec_SearchPath);
+    _PyCodec_SearchPath = NULL;
     Py_XDECREF(_PyCodec_SearchCache);
+    _PyCodec_SearchCache = NULL;
 }

From bwarsaw at cnri.reston.va.us  Fri Mar 17 20:16:02 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 17 Mar 2000 14:16:02 -0500 (EST)
Subject: [Python-Dev] Unicode Update 2000-03-17
References: <38D27F33.4055A942@lemburg.com>
Message-ID: <14546.33906.771022.916209@anthem.cnri.reston.va.us>

>>>>> "M" == M  <mal at lemburg.com> writes:

    M> The patch is against the current CVS version. I would
    M> appreciate if someone with CVS checkin permissions could check
    M> the changes in.

Hi MAL, I just tried to apply your patch against the tree, however
patch complains that the Lib/codecs.py patch is reversed.  I haven't
looked closely at it, but do you have any ideas?  Or why don't you
just send me Lib/codecs.py and I'll drop it in place.

Everything else patched cleanly.

-Barry


From ping at lfw.org  Fri Mar 17 15:06:13 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 08:06:13 -0600 (CST)
Subject: [Python-Dev] Boolean type for Py3K?
Message-ID: <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org>

I wondered to myself today while reading through the Python
tutorial whether it would be a good idea to have a separate
boolean type in Py3K.  Would this help catch common mistakes?

I won't presume to truly understand the new-to-Python experience,
but one might *guess* that

    >>> 5 > 3
    true

would make a little more sense to a beginner than

    >>> 5 > 3
    1

Of course this means introducing "true" and "false" as keywords 
(or built-in values like None -- perhaps they should be spelled
True and False?) and completely changing the way a lot of code
runs by introducing a bunch of type checking, so it may be too
radical a change, but --

And i don't know if it's already been discussed a lot, but --

I thought it wouldn't hurt just to raise the question.


-- ?!ng


From ping at lfw.org  Fri Mar 17 15:06:55 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 08:06:55 -0600 (CST)
Subject: [Python-Dev] Should None be a keyword?
Message-ID: <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org>

Related to my last message: should None become a keyword in Py3K?


-- ?!ng


From bwarsaw at cnri.reston.va.us  Fri Mar 17 21:49:24 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 17 Mar 2000 15:49:24 -0500 (EST)
Subject: [Python-Dev] Boolean type for Py3K?
References: <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org>
Message-ID: <14546.39508.312796.221069@anthem.cnri.reston.va.us>

>>>>> "KY" == Ka-Ping Yee <ping at lfw.org> writes:

    KY> I wondered to myself today while reading through the Python
    KY> tutorial whether it would be a good idea to have a separate
    KY> boolean type in Py3K.  Would this help catch common mistakes?

Almost a year ago, I mused about a boolean type in c.l.py, and came up
with this prototype in Python.

-------------------- snip snip --------------------
class Boolean:
    def __init__(self, flag=0):
        self.__flag = not not flag

    def __str__(self):
        return self.__flag and 'true' or 'false'

    def __repr__(self):
        return self.__str__()

    def __nonzero__(self):
        return self.__flag == 1

    def __cmp__(self, other):
        if (self.__flag and other) or (not self.__flag and not other):
            return 0
        else:
            return 1

    def __rcmp__(self, other):
        return -self.__cmp__(other)

true = Boolean(1)
false = Boolean()
-------------------- snip snip --------------------

I think it makes sense to augment Python's current truth rules with a
built-in boolean type and True and False values.  But unless it's tied
in more deeply (e.g. comparisons return one of these instead of
integers -- and what are the implications of that?) then it's pretty
much just syntactic sugar <0.75 lick>.

-Barry


From bwarsaw at cnri.reston.va.us  Fri Mar 17 21:50:00 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 17 Mar 2000 15:50:00 -0500 (EST)
Subject: [Python-Dev] Should None be a keyword?
References: <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org>
Message-ID: <14546.39544.673335.378797@anthem.cnri.reston.va.us>

>>>>> "KY" == Ka-Ping Yee <ping at lfw.org> writes:

    KY> Related to my last message: should None become a keyword in
    KY> Py3K?

Why?  Just to reserve it?
-Barry


From moshez at math.huji.ac.il  Fri Mar 17 21:52:29 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Fri, 17 Mar 2000 22:52:29 +0200 (IST)
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: <14546.39508.312796.221069@anthem.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003172248210.16605-100000@sundial>

On Fri, 17 Mar 2000, Barry A. Warsaw wrote:

> Almost a year ago, I mused about a boolean type in c.l.py, and came up
> with this prototype in Python.

Cool prototype!
However, I think I have a problem with the proposed semantics:

>     def __cmp__(self, other):
>         if (self.__flag and other) or (not self.__flag and not other):
>             return 0
>         else:
>             return 1

This means:

true == 1
true == 2

But 

1 != 2

I have some difficulty with == not being an equivalence relation...

> I think it makes sense to augment Python's current truth rules with a
> built-in boolean type and True and False values.

Right on! Except for the built-in...why not have it like exceptions.py,
Python code necessary for the interpreter? Languages which compile
themselves are not unheard of <wink>

> But unless it's tied
> in more deeply (e.g. comparisons return one of these instead of
> integers -- and what are the implications of that?) 

Breaking loads of horrible code. Unacceptable for the 1.x series, but 
perfectly fine in Py3K

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From effbot at telia.com  Fri Mar 17 22:12:15 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Fri, 17 Mar 2000 22:12:15 +0100
Subject: [Python-Dev] Should None be a keyword?
References: <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org> <14546.39544.673335.378797@anthem.cnri.reston.va.us>
Message-ID: <004e01bf9055$79012000$34aab5d4@hagrid>

Barry A. Warsaw wrote:
> >>>>> "KY" == Ka-Ping Yee <ping at lfw.org> writes:
> 
>     KY> Related to my last message: should None become a keyword in
>     KY> Py3K?
> 
> Why?  Just to reserve it?

to avoid stuff errors like:

    def foo():

        result = None

        # two screenfuls of code

        None, a, b = mytuple # perlish unpacking

which gives an interesting error on the first line, instead
of a syntax error on the last.

</F>


From guido at python.org  Fri Mar 17 22:20:05 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 17 Mar 2000 16:20:05 -0500
Subject: [Python-Dev] Should None be a keyword?
In-Reply-To: Your message of "Fri, 17 Mar 2000 08:06:55 CST."
             <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org> 
References: <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org> 
Message-ID: <200003172120.QAA09045@eric.cnri.reston.va.us>

Yes.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Fri Mar 17 22:20:36 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 17 Mar 2000 16:20:36 -0500
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: Your message of "Fri, 17 Mar 2000 08:06:13 CST."
             <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org> 
References: <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org> 
Message-ID: <200003172120.QAA09115@eric.cnri.reston.va.us>

Yes.  True and False make sense.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pf at artcom-gmbh.de  Fri Mar 17 22:17:06 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Fri, 17 Mar 2000 22:17:06 +0100 (MET)
Subject: [Python-Dev] Should None be a keyword?
In-Reply-To: <14546.39544.673335.378797@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at "Mar 17, 2000  3:50: 0 pm"
Message-ID: <m12W47G-000CnCC@artcom0.artcom-gmbh.de>

> >>>>> "KY" == Ka-Ping Yee <ping at lfw.org> writes:
> 
>     KY> Related to my last message: should None become a keyword in
>     KY> Py3K?

Barry A. Warsaw schrieb:
> Why?  Just to reserve it?

This is related to the general type checking discussion.  IMO the suggested
    >>> 1 > 0
    True
wouldn't buy us much, as long the following behaviour stays in Py3K:
    >>> a = '2' ; b = 3
    >>> a < b
    0
    >>> a > b
    1
This is irritating to Newcomers (at least from rather short time experience
as member of python-help)!  And this is esspecially irritating, since you 
can't do
    >>> c = a + b
    Traceback (innermost last):
      File "<stdin>", line 1, in ?
    TypeError: illegal argument type for built-in operation

IMO this difference is far more difficult to catch for newcomers than 
the far more often discussed 5/3 == 1 behaviour.

Have a nice weekend and don't forget to hunt for remaining bugs in 
Fred upcoming 1.5.2p2 docs ;-), Peter.
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From ping at lfw.org  Fri Mar 17 16:53:38 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 09:53:38 -0600 (CST)
Subject: [Python-Dev] list.shift()
Message-ID: <Pine.LNX.4.10.10003170950440.16448-100000@server1.lfw.org>

Has list.shift() been proposed?

    # pretend lists are implemented in Python and 'self' is a list
    def shift(self):
        item = self[0]
        del self[:1]
        return item

This would make queues read nicely... use "append" and "pop" for
a stack, "append" and "shift" for a queue.

(This is while on the thought-train of "making built-in types do
more, rather than introducing more special types", as you'll see
in my next message.)


-- ?!ng


From gvanrossum at beopen.com  Fri Mar 17 23:00:18 2000
From: gvanrossum at beopen.com (Guido van Rossum)
Date: Fri, 17 Mar 2000 17:00:18 -0500
Subject: [Python-Dev] list.shift()
References: <Pine.LNX.4.10.10003170950440.16448-100000@server1.lfw.org>
Message-ID: <38D2AAF2.CFBF3A2@beopen.com>

Ka-Ping Yee wrote:
> 
> Has list.shift() been proposed?
> 
>     # pretend lists are implemented in Python and 'self' is a list
>     def shift(self):
>         item = self[0]
>         del self[:1]
>         return item
> 
> This would make queues read nicely... use "append" and "pop" for
> a stack, "append" and "shift" for a queue.
> 
> (This is while on the thought-train of "making built-in types do
> more, rather than introducing more special types", as you'll see
> in my next message.)

You can do this using list.pop(0).  I don't think the name "shift" is very
intuitive (smells of sh and Perl :-).  Do we need a new function?

--Guido


From ping at lfw.org  Fri Mar 17 17:08:37 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:08:37 -0600 (CST)
Subject: [Python-Dev] Using lists as sets
Message-ID: <Pine.LNX.4.10.10003170953410.16448-100000@server1.lfw.org>

A different way to provide sets in Python, which occurred to
me on Wednesday at Guido's talk in Mountain View (hi Guido!),
is to just make lists work better.

Someone asked Guido a question about the ugliness of using
dicts in a certain way, and it was clear that what he wanted
was a real set.  Guido's objection to introducing more core
data types is that it makes it more difficult to choose which
data type to use, and opens the possibility of using entirely
the wrong one -- a very well-taken point, i thought.

(That recently-mentioned study of scripting vs. system language
performance seems relevant here: a few of the C programs
submitted were much *slower* than the ones in Python or Perl
just because people had to choose and implement their own data
structures, and so they were able to completely shoot themselves
in both feet and lose a leg or two in the process.)

So...

Hypothesis: The only real reason people might want a separate
set type, or have to use dicts as sets, is that linear search
on a list is too slow.

Therefore: All we have to do is speed up "in" on lists, and now
we have a set type that is nice to read and write, and already
has nice spellings for set semantics like "in".

Implementation possibilities:

    + Whip up a hash table behind the scenes if "in" gets used
      a lot on a particular list and all its members are hashable.
      This makes "in" no longer O(n), which is most of the battle.
      remove() can also be cheap -- though you have to do a little
      more bookkeeping to take care of multiple copies of elements.

    + Or, add a couple of methods, e.g. take() appends an item to
      a list if it's not there already, drop() removes all copies
      of an item from a list.  These tip us off: the first time one
      of these methods gets used, we make the hash table then.

I think the semantics would be pretty understandable and simple to
explain, which is the main thing.

Any thoughts?


-- ?!ng


From ping at lfw.org  Fri Mar 17 17:12:22 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:12:22 -0600 (CST)
Subject: [Python-Dev] list.shift()
In-Reply-To: <38D2AAF2.CFBF3A2@beopen.com>
Message-ID: <Pine.LNX.4.10.10003171009150.16549-100000@server1.lfw.org>

On Fri, 17 Mar 2000, Guido van Rossum wrote:
> You can do this using list.pop(0).  I don't think the name "shift" is very
> intuitive (smells of sh and Perl :-).  Do we need a new function?

Oh -- sorry, that's my ignorance showing.  I didn't know pop()
took an argument (of course it would -- duh...).  No need to
add anything more, then, i think.  Sorry!

Fred et al. on doc-sig: it would be really good for the tutorial 
to show a queue example and a stack example in the section where
list methods are introduced.


-- ?!ng


From ping at lfw.org  Fri Mar 17 17:13:44 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:13:44 -0600 (CST)
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: <200003172120.QAA09115@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003171012440.16549-100000@server1.lfw.org>

Guido: (re None being a keyword)
> Yes.

Guido: (re booleans)
> Yes.  True and False make sense.


Astounding.  I don't think i've ever seen such quick agreement on
anything!  And twice in one day!  I'm think i'm going to go lie down.

:)  :)


-- ?!ng


From DavidA at ActiveState.com  Fri Mar 17 23:23:53 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Fri, 17 Mar 2000 14:23:53 -0800
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003170953410.16448-100000@server1.lfw.org>
Message-ID: <NDBBJPNCJLKKIOBLDOMJAEEJCCAA.DavidA@ActiveState.com>

> I think the semantics would be pretty understandable and simple to
> explain, which is the main thing.
>
> Any thoughts?

Would

	(a,b) in Set

return true of (a,b) was a subset of Set, or if (a,b) was an element of Set?

--david


From mal at lemburg.com  Fri Mar 17 23:41:46 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 17 Mar 2000 23:41:46 +0100
Subject: [Python-Dev] Boolean type for Py3K?
References: <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org> <200003172120.QAA09115@eric.cnri.reston.va.us>
Message-ID: <38D2B4AA.2EE933BD@lemburg.com>

Guido van Rossum wrote:
> 
> Yes.  True and False make sense.

mx.Tools defines these as new builtins... and they correspond
to the C level singletons Py_True and Py_False.

# Truth constants
True = (1==1)
False = (1==0)

I'm not sure whether breaking the idiom of True == 1 and
False == 0 (or in other words: truth values are integers)
would be such a good idea. Nothing against adding name
bindings in __builtins__ though...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From ping at lfw.org  Fri Mar 17 17:53:12 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:53:12 -0600 (CST)
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: <14546.39508.312796.221069@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003170940500.16448-100000@server1.lfw.org>

On Fri, 17 Mar 2000, Barry A. Warsaw wrote:
> Almost a year ago, I mused about a boolean type in c.l.py, and came up
> with this prototype in Python.
> 
> -------------------- snip snip --------------------
> class Boolean:
[...]
> 
> I think it makes sense to augment Python's current truth rules with a
> built-in boolean type and True and False values.  But unless it's tied
> in more deeply (e.g. comparisons return one of these instead of
> integers -- and what are the implications of that?) then it's pretty
> much just syntactic sugar <0.75 lick>.

Yeah, and the whole point *is* the change in semantics, not the
syntactic sugar.  I'm hoping we can gain some safety from the
type checking... though i can't seem to think of a good example
off the top of my head.

It's easier to think of examples if things like 'if', 'and', 'or',
etc. only accept booleans as conditional arguments -- but i can't
imagine going that far, as that would just be really annoying.

Let's see.  Specifically, the following would probably return
booleans:

    magnitude comparisons:      <, >, <=, >=  (and __cmp__)
    value equality comparisons: ==, !=
    identity comparisons:       is, is not
    containment tests:          in, not in (and __contains__)

... and booleans would be different from integers in that
arithmetic would be illegal... but that's about it. (?)
Booleans are still storable immutable values; they could be
keys to dicts but not lists; i don't know what else.

Maybe this wouldn't actually buy us anything except for the
nicer spelling of "True" and "False", which might not be worth
it.  ... Hmm.  Can anyone think of common cases where this
could help?


-- n!?g


From ping at lfw.org  Fri Mar 17 17:59:17 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:59:17 -0600 (CST)
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJAEEJCCAA.DavidA@ActiveState.com>
Message-ID: <Pine.LNX.4.10.10003171053290.16648-100000@server1.lfw.org>

On Fri, 17 Mar 2000, David Ascher wrote:
> > I think the semantics would be pretty understandable and simple to
> > explain, which is the main thing.
> >
> > Any thoughts?
> 
> Would
> 
> 	(a,b) in Set
> 
> return true of (a,b) was a subset of Set, or if (a,b) was an element of Set?

This would return true if (a, b) was an element of the set --
exactly the same semantics as we currently have for lists.

Ideally it would also be kind of nice to use < > <= >= as
subset/superset operators, but that requires revising the
way we do comparisons, and you know, it might not really be
used all that often anyway.

-, |, and & could operate on lists sensibly when we use
them as sets -- just define a few simple rules for ordering
and you should be fine.  e.g.

    c = a - b is equivalent to

        c = a
        for item in b: c.drop(item)

    c = a | b is equivalent to

        c = a
        for item in b: c.take(item)

    c = a & b is equivalent to

        c = []
        for item in a:
            if item in b: c.take(item)

where

    c.take(item) is equivalent to

        if item not in c: c.append(item)

    c.drop(item) is equivalent to

        while item in c: c.remove(item)


The above is all just semantics, of course, to make the point
that the semantics can be simple.  The implementation could do
different things that are much faster when there's a hash table
helping out.


-- ?!ng


From gvwilson at nevex.com  Sat Mar 18 00:28:05 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Fri, 17 Mar 2000 18:28:05 -0500 (EST)
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: <Pine.LNX.4.10.10003171012440.16549-100000@server1.lfw.org>
Message-ID: <Pine.LNX.4.10.10003171825420.20710-100000@akbar.nevex.com>

> Guido: (re None being a keyword)
> > Yes.

> Guido: (re booleans)
> > Yes.  True and False make sense.

> Ka-Ping:
> Astounding.  I don't think i've ever seen such quick agreement on
> anything!  And twice in one day!  I'm think i'm going to go lie down.

No, no, keep going --- you're on a roll.

Greg


From ping at lfw.org  Fri Mar 17 18:49:18 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 11:49:18 -0600 (CST)
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003171053290.16648-100000@server1.lfw.org>
Message-ID: <Pine.LNX.4.10.10003171147520.16707-100000@server1.lfw.org>

On Fri, 17 Mar 2000, Ka-Ping Yee wrote:
> 
>     c.take(item) is equivalent to
> 
>         if item not in c: c.append(item)
> 
>     c.drop(item) is equivalent to
> 
>         while item in c: c.remove(item)

I think i've decided that i like the verb "include" much better than
the rather vague word "take".  Perhaps this also suggests "exclude"
instead of "drop".


-- ?!ng


From klm at digicool.com  Sat Mar 18 01:32:56 2000
From: klm at digicool.com (Ken Manheimer)
Date: Fri, 17 Mar 2000 19:32:56 -0500 (EST)
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003171053290.16648-100000@server1.lfw.org>
Message-ID: <Pine.LNX.4.21.0003171909080.3101-100000@korak.digicool.com>

On Fri, 17 Mar 2000, Ka-Ping Yee wrote:

> On Fri, 17 Mar 2000, David Ascher wrote:
> > > I think the semantics would be pretty understandable and simple to
> > > explain, which is the main thing.
> > >
> > > Any thoughts?
> > 
> > Would
> > 
> > 	(a,b) in Set
> > 
> > return true of (a,b) was a subset of Set, or if (a,b) was an element of Set?
> 
> This would return true if (a, b) was an element of the set --
> exactly the same semantics as we currently have for lists.

I really like the idea of using dynamically-tuned lists provide set
functionality!  I often wind up needing something like set functionality,
and implementing little convenience routines (unique, difference, etc)
repeatedly.  I don't mind that so much, but the frequency signifies that
i, at least, would benefit from built-in support for sets...

I guess the question is whether it's practical to come up with a
reasonably adequate, reasonably general dynamic optimization strategy.  
Seems like an interesting challenge - is there prior art?

As ping says, maintaining the existing list semantics handily answers
challenges like david's question.  New methods, like [].subset('a', 'b'),
could provide the desired additional functionality - and contribute to
biasing the object towards set optimization, etc.  Neato!

Ken
klm at digicool.com


From ping at lfw.org  Fri Mar 17 20:02:13 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 13:02:13 -0600 (CST)
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <Pine.LNX.4.21.0003171909080.3101-100000@korak.digicool.com>
Message-ID: <Pine.LNX.4.10.10003171247020.16707-100000@server1.lfw.org>

On Fri, 17 Mar 2000, Ken Manheimer wrote:
> 
> I really like the idea of using dynamically-tuned lists provide set
> functionality!  I often wind up needing something like set functionality,
> and implementing little convenience routines (unique, difference, etc)
> repeatedly.  I don't mind that so much, but the frequency signifies that
> i, at least, would benefit from built-in support for sets...

Greg asked about how to ensure that a given item only appears
once in each list when used as a set, and whether i would
flag the list as "i'm now operating as a set".  My answer is no --
i don't want there to be any visible state on the list.  (It can
internally decide to optimize its behaviour for a particular purpose,
but in no event should this decision ever affect the semantics of
its manifested behaviour.)  Externally visible state puts us back
right where we started -- now the user has to decide what type of
thing she wants to use, and that's more decisions and loaded guns
pointing at feet that we were trying to avoid in the first place.

There's something very nice about there being just two mutable
container types in Python.  As Guido said, the first two types
you learn are lists and dicts, and it's pretty obvious which
one to pick for your purposes, and you can't really go wrong.

I'd like to copy my reply to Greg here because it exposes some of the
philosophy i'm attempting with this proposal:

    You'd trust the client to use take() (or should i say include())
    instead of append().  But, in the end, this wouldn't make any
    difference to the result of "in".  In fact, you could do multisets
    since lists already have count().

    What i'm trying to do is to put together a few very simple pieces
    to get all the behaviour necessary to work with sets, if you want
    it.  I don't want the object itself to have any state that manifests
    itself as "now i'm a set", or "now i'm a list".  You just pick the
    methods you want to use.

    It's just like stacks and queues.  There's no state on the list that
    says "now i'm a stack, so read from the end" or "now i'm a queue,
    so read from the front".  You decide where you want to read items
    by picking the appropriate method, and this lets you get the best
    of both worlds -- flexibility and simplicity.

Back to Ken:
> I guess the question is whether it's practical to come up with a
> reasonably adequate, reasonably general dynamic optimization strategy.  
> Seems like an interesting challenge - is there prior art?

I'd be quite happy with just turning on set optimization when
include() and exclude() get used (nice and predictable).  Maybe you
could provide a set() built-in that would construct you a list with
set optimization turned on, but i'm not too sure if we really want
to expose it that way.


-- ?!ng


From moshez at math.huji.ac.il  Sat Mar 18 06:27:13 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 18 Mar 2000 07:27:13 +0200 (IST)
Subject: [Python-Dev] list.shift()
In-Reply-To: <Pine.LNX.4.10.10003170950440.16448-100000@server1.lfw.org>
Message-ID: <Pine.GSO.4.10.10003180721560.18689-100000@sundial>

On Fri, 17 Mar 2000, Ka-Ping Yee wrote:

> 
> Has list.shift() been proposed?
> 
>     # pretend lists are implemented in Python and 'self' is a list
>     def shift(self):
>         item = self[0]
>         del self[:1]
>         return item
> 
> This would make queues read nicely... use "append" and "pop" for
> a stack, "append" and "shift" for a queue.

Actually, I once thought about writing a Deque in Python for a couple
of hours (I later wrote it, and then threw it away because I had nothing
to do with it, but that isn't my point). So I did write "shift" (though
I'm certain I didn't call it that). It's not as easy to write a
maintainable yet efficient "shift": I got stuck with a pointer to the 
beginning of the "real list" which I incremented on a "shift", and a
complex heuristic for when lists de- and re-allocate.

I think the tradeoffs are shaky enough that it is better to write it in 
pure Python rather then having more functions in C (whether in an old
builtin type rather then a new one). Anyone needing to treat a list as a 
Deque would just construct one

l = Deque(l)

built-in-functions:-just-say-no-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From artcom0!pf at artcom-gmbh.de  Fri Mar 17 23:43:35 2000
From: artcom0!pf at artcom-gmbh.de (artcom0!pf at artcom-gmbh.de)
Date: Fri, 17 Mar 2000 23:43:35 +0100 (MET)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <38D2AAF2.CFBF3A2@beopen.com> from Guido van Rossum at "Mar 17, 2000  5: 0:18 pm"
Message-ID: <m12WICF-000CnCC@artcom0.artcom-gmbh.de>

Ka-Ping Yee wrote:
[...]
> >     # pretend lists are implemented in Python and 'self' is a list
> >     def shift(self):
> >         item = self[0]
> >         del self[:1]
> >         return item
[...]

Guido van Rossum:
> You can do this using list.pop(0).  I don't think the name "shift" is very
> intuitive (smells of sh and Perl :-).  Do we need a new function?

I think no.  But what about this one?:

	# pretend self and dict are dictionaries:
	def supplement(self, dict):
	    for k, v in dict.items():
	        if not self.data.has_key(k):
		    self.data[k] = v

Note the similarities to {}.update(dict), but update replaces existing
entries in self, which is sometimes not desired.  I know, that supplement
can also simulated with:
	tmp = dict.copy()
	tmp.update(self)
	self.data = d
But this is stll a little ugly.  IMO a builtin method to supplement
(complete?) a dictionary with default values from another dictionary 
would sometimes be a useful tool.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From ping at lfw.org  Sat Mar 18 19:48:10 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Sat, 18 Mar 2000 10:48:10 -0800 (PST)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <m12WICF-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.LNX.4.10.10003181047250.1758-100000@localhost>

On Fri, 17 Mar 2000 artcom0!pf at artcom-gmbh.de wrote:
> 
> I think no.  But what about this one?:
> 
> 	# pretend self and dict are dictionaries:
> 	def supplement(self, dict):
> 	    for k, v in dict.items():
> 	        if not self.data.has_key(k):
> 		    self.data[k] = v

I'd go for that.  It would be nice to have a non-overwriting update().
The only issue is the choice of verb; "supplement" sounds pretty
reasonable to me.


-- ?!ng

"If I have not seen as far as others, it is because giants were standing
on my shoulders."
    -- Hal Abelson


From pf at artcom-gmbh.de  Sat Mar 18 20:23:37 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Sat, 18 Mar 2000 20:23:37 +0100 (MET)
Subject: [Python-Dev] dict.supplement()
In-Reply-To: <Pine.LNX.4.10.10003181047250.1758-100000@localhost> from Ka-Ping Yee at "Mar 18, 2000 10:48:10 am"
Message-ID: <m12WOoz-000CnCC@artcom0.artcom-gmbh.de>

Hi!
> > 	# pretend self and dict are dictionaries:
> > 	def supplement(self, dict):
> > 	    for k, v in dict.items():
> > 	        if not self.data.has_key(k):
> > 		    self.data[k] = v
 
Ka-Ping Yee schrieb:
> I'd go for that.  It would be nice to have a non-overwriting update().
> The only issue is the choice of verb; "supplement" sounds pretty
> reasonable to me.

In German we have the verb "erg?nzen" which translates 
either into "supplement" or "complete" (from my  dictionary).  
"supplement" has the disadvantage of being rather long for 
the name of a builtin method.

Nevertheless I've used this in my class derived from UserDict.UserDict.

Now let's witch topic to the recent discussion about Set type:  
you all certainly know, that something similar has been done before by 
Aaron Watters?  see:
  <http://starship.python.net/crew/aaron_watters/kjbuckets/kjbuckets.html>

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From gvwilson at nevex.com  Mon Mar 20 15:52:12 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Mon, 20 Mar 2000 09:52:12 -0500 (EST)
Subject: [Python-Dev] re: Using lists as sets
Message-ID: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>

[After discussion with Ping, and weekend thought]

I would like to vote against using lists as sets:

1. It blurs Python's categorization of containers.  The rest of the world
   thinks of sets as unordered, associative, and binary-valued (a term I
   just made up to mean "containing 0 or 1 instance of X").  Lists, on the
   other hand, are ordered, positionally-indexed, and multi-valued.
   While a list is always a legal queue or stack (although lists permit
   state transitions that are illegal for queues or stacks), most lists
   are not legal sets.

2. Python has, in dictionaries, a much more logical starting point for
   sets.  A set is exactly a dictionary whose keys matter, and whose
   values don't.  Adding operations to dictionaries to insert keys, etc.,
   without having to supply a value, naively appears no harder than adding
   operations to lists, and would probably be much easier to explain when
   teaching a class.

3. (Long-term speculation) Even if P3K isn't written in C++, many modules
   for it will be.  It would therefore seem sensible to design P3K in a
   C++-friendly way --- in particular, to align Python's container  
   hierarchy with that used in the Standard Template Library.  Using lists
   as a basis for sets would give Python a very different container type
   hierarchy than the STL, which could make it difficult for automatic
   tools like SWIG to map STL-based things to Python and vice versa.
   Using dictionaries as a basis for sets would seem to be less
   problematic.  (Note that if Wadler et al's Generic Java proposal
   becomes part of that language, an STL clone will almost certainly
   become part of that language, and require JPython interfacing.)

On a semi-related note, can someone explain why programs are not allowed
to iterate directly through the elements of a dictionary:

   for (key, value) in dict:
      ...body...

Thanks,

Greg

      "No XML entities were harmed in the production of this message."


From moshez at math.huji.ac.il  Mon Mar 20 16:03:47 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Mon, 20 Mar 2000 17:03:47 +0200 (IST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>
Message-ID: <Pine.GSO.4.10.10003201656060.29136-100000@sundial>

On Mon, 20 Mar 2000 gvwilson at nevex.com wrote:

> [After discussion with Ping, and weekend thought]
> 
> I would like to vote against using lists as sets:

I'd like to object too, but for slightly different reasons: 20-something
lines of Python can implement a set (I just chacked it) with the new 
__contains__. We can just suply it in the standard library (Set module?)
and be over and done with. 
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From jcw at equi4.com  Mon Mar 20 16:37:19 2000
From: jcw at equi4.com (Jean-Claude Wippler)
Date: Mon, 20 Mar 2000 16:37:19 +0100
Subject: [Python-Dev] re: Using lists as sets
References: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>
Message-ID: <38D645AF.661CA335@equi4.com>

gvwilson at nevex.com wrote:
> 
> [After discussion with Ping, and weekend thought]

[good stuff]

Allow me to offer yet another perspective on this.  I'll keep it short.

Python has sequences (indexable collections) and maps (associative
collections).  C++'s STL has vectors, sets, multi-sets, maps, and
multi-maps.

I find the distinction between these puzzling, and hereby offer another,
somewhat relational-database minded, categorization as food for thought:

- collections consist of objects, each of them with attributes
- the first N attributes form the "key", the rest is the "residue"
- there is also an implicit position attribute, which I'll call "#"
- so an object consists of attributes: (K1,K2,...KN,#,R1,R2,...,RM)
- one more bit of specification is needed: whether # is part of the key

Let me mark the position between key attributes and residue with ":", so
everything before the colon marks the uniquely identifying attributes.

  A vector (sequence) is:  #:R1,R2,...,RM
  A set is:                K1,K2,...KN:
  A multi-set is:          K1,K2,...KN,#:
  A map is:                K1,K2,...KN:#,R1,R2,...,RM
  A multi-map is:          K1,K2,...KN,#:R1,R2,...,RM

And a somewhat esoteric member of this classification:

  A singleton is:          :R1,R2,...,RM

I have no idea what this means for Python, but merely wanted to show how
a relational, eh, "view" on all this might perhaps simplify the issues.

-jcw


From fdrake at acm.org  Mon Mar 20 17:55:59 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 20 Mar 2000 11:55:59 -0500 (EST)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <m12WICF-000CnCC@artcom0.artcom-gmbh.de>
References: <38D2AAF2.CFBF3A2@beopen.com>
	<m12WICF-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <14550.22559.550660.403909@weyr.cnri.reston.va.us>

artcom0!pf at artcom-gmbh.de writes:
 > Note the similarities to {}.update(dict), but update replaces existing
 > entries in self, which is sometimes not desired.  I know, that supplement
 > can also simulated with:

Peter,
  I like this!

 > 	tmp = dict.copy()
 > 	tmp.update(self)
 > 	self.data = d

  I presume you mean "self.data = tmp"; "self.data.update(tmp)" would
be just a little more robust, at the cost of an additional update.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From tismer at tismer.com  Mon Mar 20 18:10:34 2000
From: tismer at tismer.com (Christian Tismer)
Date: Mon, 20 Mar 2000 18:10:34 +0100
Subject: [Python-Dev] re: Using lists as sets
References: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com> <38D645AF.661CA335@equi4.com>
Message-ID: <38D65B8A.50B81D08@tismer.com>


Jean-Claude Wippler wrote:
[relational notation]

>   A vector (sequence) is:  #:R1,R2,...,RM
>   A set is:                K1,K2,...KN:
>   A multi-set is:          K1,K2,...KN,#:
>   A map is:                K1,K2,...KN:#,R1,R2,...,RM
>   A multi-map is:          K1,K2,...KN,#:R1,R2,...,RM

This is a nice classification!
To my understanding, why not
   A map is:                K1,K2,...KN:R1,R2,...,RM

Where is a # in a map?

And what do you mean by N and M?
Is K1..KN one key, mae up of N sub keys, or do you mean the
whole set of keys, where each one is mapped somehow.
I guess not, the notation looks like I should think of tuples.
No, that would imply that N and M were fixed, but they are not.
But you say
"- collections consist of objects, each of them with attributes".
Ok, N and M seem to be individual for each object, right?

But when defining a map for instance, and we're talking of the
objects, then the map is the set of these objects, and I have to
think of
  K[0]..K(N(o)):R[0]..R(M(o))
where N and M are functions of the individual object o, right?

Isn't it then better to think different of these objects, saying
they can produce some key object and some value object of any
shape, and a position, where each of these can be missing?

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From jeremy at cnri.reston.va.us  Mon Mar 20 18:28:28 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Mon, 20 Mar 2000 12:28:28 -0500 (EST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>
Message-ID: <14550.24508.341533.908941@goon.cnri.reston.va.us>

>>>>> "GVW" == gvwilson  <gvwilson at nevex.com> writes:

  GVW> On a semi-related note, can someone explain why programs are
  GVW> not allowed to iterate directly through the elements of a
  GVW> dictionary:

  GVW>    for (key, value) in dict:
              ...body...

Pythonic design rules #2:
     Explicit is better than implicit.

There are at least three "natural" ways to interpret "for ... in dict:"
In addition to the version that strikes you as most natural, some
people also imagine that a for loop should iterate over the keys or the
values.  Instead of guessing, Python provides explicit methods for
each possibility: items, keys, values.

Yet another possibility, implemented in early versions of JPython and
later removed, was to treat a dictionary exactly like a list: Call
__getitem__(0), then 1, ..., until a KeyError was raised.  In other
words, a dictionary could behave like a list provided that it had
integer keys.

Jeremy


From jcw at equi4.com  Mon Mar 20 18:56:44 2000
From: jcw at equi4.com (Jean-Claude Wippler)
Date: Mon, 20 Mar 2000 18:56:44 +0100
Subject: [Python-Dev] re: Using lists as sets
References: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com> <38D645AF.661CA335@equi4.com> <38D65B8A.50B81D08@tismer.com>
Message-ID: <38D6665C.ECDE09DE@equi4.com>

Christian,

>    A map is:                K1,K2,...KN:R1,R2,...,RM

Yes, my list was inconsistent.

> Is K1..KN one key, made up of N sub keys, or do you mean the
> whole set of keys, where each one is mapped somehow.
[...]
> Ok, N and M seem to be individual for each object, right?
[...] 
> Isn't it then better to think different of these objects, saying
> they can produce some key object and some value object of any
> shape, and a position, where each of these can be missing?

Depends on your perspective.  In the relational world, the (K1,...,KN)
attributes identify the object, but they are not themselves considered
an object.  In OO-land, (K1,...,KN) is an object, and a map takes such
as an object as input and delivers (R1,...,RM) as result.

This tension shows the boundary of both relational and OO models, IMO.
I wish it'd be possible to unify them, but I haven't figured it out.

-jcw, concept maverick / fool on the hill - pick one :)


From pf at artcom-gmbh.de  Mon Mar 20 19:28:17 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Mon, 20 Mar 2000 19:28:17 +0100 (MET)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <14550.22559.550660.403909@weyr.cnri.reston.va.us> from "Fred L. Drake, Jr." at "Mar 20, 2000 11:55:59 am"
Message-ID: <m12X6uX-000CnCC@artcom0.artcom-gmbh.de>

I wrote:
>  > Note the similarities to {}.update(dict), but update replaces existing
>  > entries in self, which is sometimes not desired.  I know, that supplement
>  > can also simulated with:
> 
Fred L. Drake, Jr.:
> Peter,
>   I like this!
> 
>  > 	tmp = dict.copy()
>  > 	tmp.update(self)
>  > 	self.data = d
> 
>   I presume you mean "self.data = tmp"; "self.data.update(tmp)" would
> be just a little more robust, at the cost of an additional update.

Ouppss... I should have tested this before posting.  But currently I use 
the more explicit (and probably slower version) in my code:

class ConfigDict(UserDict.UserDict):
    def supplement(self, defaults):
    	for k, v in defaults.items():
	    if not self.data.has_key(k):
		self.data[k] = v

Works fine so far, although it requires usually an additional copy operation.
Consider another example, where arbitrary instance attributes should be
specified as keyword arguments to the constructor:

  >>> class Example:
  ...     _defaults = {'a': 1, 'b': 2}
  ...     _config = _defaults
  ...     def __init__(self, **kw):
  ...         if kw:
  ...             self._config = self._defaults.copy()
  ...             self._config.update(kw)
  ... 
  >>> A = Example(a=12345)
  >>> A._config
  {'b': 2, 'a': 12345}
  >>> B = Example(c=3)
  >>> B._config
  {'b': 2, 'c': 3, 'a': 1}

If 'supplement' were a dictionary builtin method, this would become simply:
	kw.supplement(self._defaults)
	self._config = kw

Unfortunately this can't be achieved using a wrapper class like UserDict,
since the **kw argument is always a builtin dictionary object.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, 27777 Ganderkesee, Tel: 04222 9502 70, Fax: -60


From ping at lfw.org  Mon Mar 20 13:36:34 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Mon, 20 Mar 2000 06:36:34 -0600 (CST)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <m12X6uX-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.LNX.4.10.10003200634430.22891-100000@server1.lfw.org>

On Mon, 20 Mar 2000, Peter Funk wrote:
> Consider another example, where arbitrary instance attributes should be
> specified as keyword arguments to the constructor:
> 
>   >>> class Example:
>   ...     _defaults = {'a': 1, 'b': 2}
>   ...     _config = _defaults
>   ...     def __init__(self, **kw):
>   ...         if kw:
>   ...             self._config = self._defaults.copy()
>   ...             self._config.update(kw)

Yes!  I do this all the time.  I wrote a user-interface module
to take care of exactly this kind of hassle when creating lots
of UI components.  When you're making UI, you can easily drown in
keyword arguments and default values if you're not careful.


-- ?!ng


From fdrake at acm.org  Mon Mar 20 20:02:48 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 20 Mar 2000 14:02:48 -0500 (EST)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <m12X6uX-000CnCC@artcom0.artcom-gmbh.de>
References: <14550.22559.550660.403909@weyr.cnri.reston.va.us>
	<m12X6uX-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <14550.30168.129259.356581@weyr.cnri.reston.va.us>

Peter Funk writes:
 > Ouppss... I should have tested this before posting.  But currently I use 
 > the more explicit (and probably slower version) in my code:

  The performance is based entirely on the size of each; in the
(probably typical) case of smallish dictionaries (<50 entries), it's
probably cheaper to use a temporary dict and do the update.
  For large dicts (on the defaults side), it may make more sense to
reduce the number of objects that need to be created:

       target = ...
       has_key = target.has_key
       for key in defaults.keys():
           if not has_key(key):
               target[key] = defaults[key]

  This saves the construction of len(defaults) 2-tuples.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From moshez at math.huji.ac.il  Mon Mar 20 20:23:01 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Mon, 20 Mar 2000 21:23:01 +0200 (IST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <14550.24508.341533.908941@goon.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003202118470.4407-100000@sundial>

On Mon, 20 Mar 2000, Jeremy Hylton wrote:

> Yet another possibility, implemented in early versions of JPython and
> later removed, was to treat a dictionary exactly like a list: Call
> __getitem__(0), then 1, ..., until a KeyError was raised.  In other
> words, a dictionary could behave like a list provided that it had
> integer keys.

Two remarks: Jeremy meant "consecutive natural keys starting with 0",
(yes, I've managed to learn mind-reading from the timbot) and that (the
following is considered a misfeature):

import UserDict
a = UserDict.UserDict()
a[0]="hello"
a[1]="world"

for word in a:
	print word

Will print "hello", "world", and then die with KeyError.
I realize why this is happening, and realize it could only be fixed in
Py3K. However, a temporary (though not 100% backwards compatible) fix is
that "for" will catch LookupError, rather then IndexError.

Any comments?
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mhammond at skippinet.com.au  Mon Mar 20 20:39:31 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Mon, 20 Mar 2000 11:39:31 -0800
Subject: [Python-Dev] Unicode and Windows
Message-ID: <ECEPKNMJLHAPFFJHDOJBAENNCGAA.mhammond@skippinet.com.au>

I would like to discuss Unicode on the Windows platform, and how it relates
to MBCS that Windows uses.

My main goal here is to ensure that Unicode on Windows can make a round-trip
to and from native Unicode stores.  As an example, let's take the registry -
a Windows user should be able to read a Unicode value from the registry then
write it back.  The value written back should be _identical_ to the value
read.  Ditto for the file system: If the filesystem is Unicode, then I would
expect the following code:
  for fname in os.listdir():
    f = open(fname + ".tmp", "w")

To create filenames on the filesystem with the exact base name even when the
basename contains non-ascii characters.


However, the Unicode patches do not appear to make this possible.  open()
uses PyArg_ParseTuple(args, "s...");  PyArg_ParseTuple() will automatically
convert a Unicode object to UTF-8, so we end up passing a UTF-8 encoded
string to the C runtime fopen function.

The end result of all this is that we end up with UTF-8 encoded names in the
registry/on the file system.  It does not seem possible to get a true
Unicode string onto either the file system or in the registry.

Unfortunately, Im not experienced enough to know the full ramifications, but
it _appears_ that on Windows the default "unicode to string" translation
should be done via the WideCharToMultiByte() API.  This will then pass an
MBCS encoded ascii string to Windows, and the "right thing" should magically
happen.  Unfortunately, MBCS encoding is dependant on the current locale
(ie, one MBCS sequence will mean completely different things depending on
the locale).  I dont see a portability issue here, as the documentation
could state that "Unicode->ASCII conversions use the most appropriate
conversion for the platform.  If the platform is not Unicode aware, then
UTF-8 will be used."

This issue is the final one before I release the win32reg module.  It seems
_critical_ to me that if Python supports Unicode and the platform supports
Unicode, then Python unicode values must be capable of being passed to the
platform.  For the win32reg module I could quite possibly hack around the
problem, but the more general problem (categorized by the open() example
above) still remains...

Any thoughts?

Mark.


From jeremy at cnri.reston.va.us  Mon Mar 20 20:51:28 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Mon, 20 Mar 2000 14:51:28 -0500 (EST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <Pine.GSO.4.10.10003202118470.4407-100000@sundial>
References: <14550.24508.341533.908941@goon.cnri.reston.va.us>
	<Pine.GSO.4.10.10003202118470.4407-100000@sundial>
Message-ID: <14550.33088.110785.78631@goon.cnri.reston.va.us>

>>>>> "MZ" == Moshe Zadka <moshez at math.huji.ac.il> writes:

  MZ> On Mon, 20 Mar 2000, Jeremy Hylton wrote:
  >> Yet another possibility, implemented in early versions of JPython
  >> and later removed, was to treat a dictionary exactly like a list:
  >> Call __getitem__(0), then 1, ..., until a KeyError was raised.
  >> In other words, a dictionary could behave like a list provided
  >> that it had integer keys.

  MZ> Two remarks: Jeremy meant "consecutive natural keys starting
  MZ> with 0", (yes, I've managed to learn mind-reading from the
  MZ> timbot) 

I suppose I meant that (perhaps you can read my mind as well as I
can);  I also meant using values of Python's integer datatype :-).


and that (the following is considered a misfeature):

  MZ> import UserDict 
  MZ> a = UserDict.UserDict() 
  MZ> a[0]="hello"
  MZ> a[1]="world"

  MZ> for word in a: print word

  MZ> Will print "hello", "world", and then die with KeyError.  I
  MZ> realize why this is happening, and realize it could only be
  MZ> fixed in Py3K. However, a temporary (though not 100% backwards
  MZ> compatible) fix is that "for" will catch LookupError, rather
  MZ> then IndexError.

I'm not sure what you mean by "fix."  (Please read your mind for me
<wink>.)  I think by fix you mean, "allow the broken code above to
execute without raising an exception."  Yuck!

As far as I can tell, the problem is caused by the special
way that a for loop uses the __getitem__ protocol.  There are two
related issues that lead to confusion.

In cases other than for loops, __getitem__ is invoked when the
syntactic construct x[i] is used.  This means either lookup in a list
or in a dict depending on the type of x.  If it is a list, the index
must be an integer and IndexError can be raised.  If it is a dict, the
index can be anything (even an unhashable type; TypeError is only
raised by insertion for this case) and KeyError can be raised.

In a for loop, the same protocol (__getitem__) is used, but with the
special convention that the object should be a sequence.  Python will
detect when you try to use a builtin type that is not a sequence,
e.g. a dictionary.  If the for loop iterates over an instance type
rather than a builtin type, there is no way to check whether the
__getitem__ protocol is being implemented by a sequence or a mapping.

The right solution, I think, is to allow a means for stating
explicitly whether a class with an __getitem__ method is a sequence or
a mapping (or both?).  Then UserDict can declare itself to be a
mapping and using it in a for loop will raise the TypeError, "loop
over non-sequence" (which has a standard meaning defined in Skip's
catalog <0.8 wink>).

I believe this is where types-vs.-classes meets
subtyping-vs.-inheritance.  I suspect that the right solution, circa
Py3K, is that classes must explicitly state what types they are
subtypes of or what interfaces they implement.

Jeremy


From moshez at math.huji.ac.il  Mon Mar 20 21:13:20 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Mon, 20 Mar 2000 22:13:20 +0200 (IST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <14550.33088.110785.78631@goon.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003202205420.4980-100000@sundial>

On Mon, 20 Mar 2000, Jeremy Hylton wrote:

> I'm not sure what you mean by "fix."

I mean any sane behaviour -- either failing on TypeError at the beginning, 
like "for" does, or executing without raising an exception. Raising an
exception in the middle which is imminent is definitely (for the right
values of definitely) a suprising behaviour (I know it suprised me!).

>I think by fix you mean, "allow the broken code above to
> execute without raising an exception."  Yuck!

I agree it is yucky -- it is all a weird echo of the yuckiness of the
type/class dichotomy. What I suggested it a temporary patch...
 
> As far as I can tell, the problem is caused by the special
> way that a for loop uses the __getitem__ protocol.

Well, my look is that it is caused by the fact __getitem__ is used both
for the sequence protocol and the mapping protocol (well, I'm cheating
through my teeth here, but you understand what I mean <wink>)

Agreed though, that the whole iteration protocol should be revisited --
but that is a subject for another post.

> The right solution, I think, is to allow a means for stating
> explicitly whether a class with an __getitem__ method is a sequence or
> a mapping (or both?).

And this is the fix I wanted for Py3K (details to be debated, still).
See? You read my mind perfectly.

> I suspect that the right solution, circa
> Py3K, is that classes must explicitly state what types they are
> subtypes of or what interfaces they implement.

Exactly. And have subclassable built-in classes in the same fell swoop.

getting-all-excited-for-py3k-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping at lfw.org  Mon Mar 20 15:34:12 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Mon, 20 Mar 2000 08:34:12 -0600 (CST)
Subject: [Python-Dev] Set options
Message-ID: <Pine.LNX.4.10.10003200805080.23306-100000@server1.lfw.org>

I think that at this point the possibilities for doing sets
come down to four options:


1. use lists

    visible changes:   new methods l.include, l.exclude

    invisible changes: faster 'in'

    usage:             s = [1, 2], s.include(3), s.exclude(3),
                       if item in s, for item in s

2. use dicts

    visible changes:   for/if x in dict means keys
                       accept dicts without values (e.g. {1, 2})
                       new special non-printing value ": Present"
                       new method d.insert(x) means d[x] = Present

    invisible changes: none

    usage:             s = {1, 2}, s.insert(3), del s[3],
                       if item in s, for item in s

3. new type

    visible changes:   set() built-in
                       new <type 'set'> with methods .insert, .remove

    invisible changes: none

    usage:             s = set(1, 2), s.insert(3), s.remove(3)
                       if item in s, for item in s

4. do nothing

    visible changes:   none

    invisible changes: none

    usage:             s = {1: 1, 2: 1}, s[3] = 1, del s[3],
                       if s.has_key(item), for item in s.keys()


Let me say a couple of things about #1 and #2.  I'm happy with both.
I quite like the idea of using dicts this way (#2), in fact -- i
think it was the first idea i remember chatting about.

If i remember correctly, Guido's objection to #2 was that "in" on
a dictionary would work on the keys, which isn't consistent with
the fact that "in" on a list works on the values.

However, this doesn't really bother me at all.  It's a very simple
rule, especially when you think of how people understand dictionaries.
If you hand someone a *real* dictionary, and ask them

    Is the word "python" in the dictionary?
    
they'll go look up "python" in the *keys* of the dictionary (the
words), not the values (the definitions).

So i'm quite all right with saying

    for x in dict:

and having that loop over the keys, or saying

    if x in dict:

and having that check whether x is a valid key.  It makes perfect
sense to me.  My main issue with #2 was that sets would print like

    {"Alice": 1, "Bob": 1, "Ted": 1}

and this would look weird.  However, as Greg explained to me, it
would be possible to introduce a default value to go with set
members that just says "i'm here", such as 'Present' (read as:
"Alice" is present in the set) or 'Member' or even 'None', and
this value wouldn't print out -- thus

    s = {"Bob"}
    s.include("Alice")
    print s

would produce

    {"Alice", "Bob"}

representing a dictionary that actually contained

    {"Alice": Present, "Bob": Present}

You'd construct set constants like this too:

    {2, 4, 7}

Using dicts this way (rather than having a separate set type
that just happened to be spelled with {}) avoids the parsing
issue: no need for look-ahead; you just toss in "Present" when
the text doesn't supply a colon, and move on.

I'd be okay with this, though i'm not sure everyone would; and
together with Guido's initial objection, that's what motivated me
to propose the lists-as-sets thing: fewer changes all around, no
ambiguities introduced -- just two new methods, and we're done.

Hmm.

I know someone who's just learning Python.  I will attempt to
ask some questions about what she would find natural, and see
if that reveals anything interesting.


-- ?!ng


From bwarsaw at cnri.reston.va.us  Mon Mar 20 23:01:00 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Mon, 20 Mar 2000 17:01:00 -0500 (EST)
Subject: [Python-Dev] re: Using lists as sets
References: <14550.24508.341533.908941@goon.cnri.reston.va.us>
	<Pine.GSO.4.10.10003202118470.4407-100000@sundial>
	<14550.33088.110785.78631@goon.cnri.reston.va.us>
Message-ID: <14550.40860.72418.648591@anthem.cnri.reston.va.us>

>>>>> "JH" == Jeremy Hylton <jeremy at cnri.reston.va.us> writes:

    JH> As far as I can tell, the problem is caused by the special way
    JH> that a for loop uses the __getitem__ protocol.  There are two
    JH> related issues that lead to confusion.

>>>>> "MZ" == Moshe Zadka <moshez at math.huji.ac.il> writes:

    MZ> Well, my look is that it is caused by the fact __getitem__ is
    MZ> used both for the sequence protocol and the mapping protocol

Right.

    MZ> Agreed though, that the whole iteration protocol should be
    MZ> revisited -- but that is a subject for another post.

Yup.

    JH> The right solution, I think, is to allow a means for stating
    JH> explicitly whether a class with an __getitem__ method is a
    JH> sequence or a mapping (or both?).

Or should the two protocol use different method names (code breakage!).

    JH> I believe this is where types-vs.-classes meets
    JH> subtyping-vs.-inheritance.

meets protocols-vs.-interfaces.


From moshez at math.huji.ac.il  Tue Mar 21 06:16:00 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 21 Mar 2000 07:16:00 +0200 (IST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <14550.40860.72418.648591@anthem.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003210712270.8637-100000@sundial>

On Mon, 20 Mar 2000, Barry A. Warsaw wrote:

>     MZ> Agreed though, that the whole iteration protocol should be
>     MZ> revisited -- but that is a subject for another post.
> 
> Yup.

(Go Stackless, go!?)

>     JH> I believe this is where types-vs.-classes meets
>     JH> subtyping-vs.-inheritance.
> 
> meets protocols-vs.-interfaces.

It took me 5 minutes of intensive thinking just to understand what Barry
meant. Just wait until we introduce Sather-like "supertypes" (which are
pretty Pythonic, IMHO)

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Tue Mar 21 06:21:24 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 21 Mar 2000 07:21:24 +0200 (IST)
Subject: [Python-Dev] Set options
In-Reply-To: <Pine.LNX.4.10.10003200805080.23306-100000@server1.lfw.org>
Message-ID: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>

On Mon, 20 Mar 2000, Ka-Ping Yee wrote:

> I think that at this point the possibilities for doing sets
> come down to four options:
> 
> 
> 1. use lists
> 2. use dicts
> 3. new type
> 4. do nothing

5. new Python module with a class "Set"
(The issues are similar to #3, but this has the advantage of not changing
the interpreter)
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mal at lemburg.com  Tue Mar 21 01:25:09 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 21 Mar 2000 01:25:09 +0100
Subject: [Python-Dev] Unicode and Windows
References: <ECEPKNMJLHAPFFJHDOJBAENNCGAA.mhammond@skippinet.com.au>
Message-ID: <38D6C165.EEF58232@lemburg.com>

Mark Hammond wrote:
> 
> I would like to discuss Unicode on the Windows platform, and how it relates
> to MBCS that Windows uses.
> 
> My main goal here is to ensure that Unicode on Windows can make a round-trip
> to and from native Unicode stores.  As an example, let's take the registry -
> a Windows user should be able to read a Unicode value from the registry then
> write it back.  The value written back should be _identical_ to the value
> read.  Ditto for the file system: If the filesystem is Unicode, then I would
> expect the following code:
>   for fname in os.listdir():
>     f = open(fname + ".tmp", "w")
> 
> To create filenames on the filesystem with the exact base name even when the
> basename contains non-ascii characters.
> 
> However, the Unicode patches do not appear to make this possible.  open()
> uses PyArg_ParseTuple(args, "s...");  PyArg_ParseTuple() will automatically
> convert a Unicode object to UTF-8, so we end up passing a UTF-8 encoded
> string to the C runtime fopen function.

Right. The idea with open() was to write a special version (using
#ifdefs) for use on Windows platforms which does all the needed
magic to convert Unicode to whatever the native format and locale
is...

Using parser markers for this is obviously *not* the right way
to get to the core of the problem. Basically, you will have to
write a helper which takes a string, Unicode or some other
"t" compatible object as name object and then converts it to
the system's view of things.

I think we had a private discussion about this a few months ago:
there was some way to convert Unicode to a platform independent
format which then got converted to MBCS -- don't remember the details
though.

> The end result of all this is that we end up with UTF-8 encoded names in the
> registry/on the file system.  It does not seem possible to get a true
> Unicode string onto either the file system or in the registry.
> 
> Unfortunately, Im not experienced enough to know the full ramifications, but
> it _appears_ that on Windows the default "unicode to string" translation
> should be done via the WideCharToMultiByte() API.  This will then pass an
> MBCS encoded ascii string to Windows, and the "right thing" should magically
> happen.  Unfortunately, MBCS encoding is dependant on the current locale
> (ie, one MBCS sequence will mean completely different things depending on
> the locale).  I dont see a portability issue here, as the documentation
> could state that "Unicode->ASCII conversions use the most appropriate
> conversion for the platform.  If the platform is not Unicode aware, then
> UTF-8 will be used."

No, no, no... :-) The default should be (and is) UTF-8 on all platforms
-- whether the platform supports Unicode or not. If a platform
uses a different encoding, an encoder should be used which applies
the needed transformation.
 
> This issue is the final one before I release the win32reg module.  It seems
> _critical_ to me that if Python supports Unicode and the platform supports
> Unicode, then Python unicode values must be capable of being passed to the
> platform.  For the win32reg module I could quite possibly hack around the
> problem, but the more general problem (categorized by the open() example
> above) still remains...
> 
> Any thoughts?

Can't you use the wchar_t interfaces for the task (see
the unicodeobject.h file for details) ? Perhaps you can
first transfer Unicode to wchar_t and then on to MBCS
using a win32 API ?!

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Tue Mar 21 10:27:56 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 21 Mar 2000 10:27:56 +0100
Subject: [Python-Dev] Set options
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
Message-ID: <38D7409C.169B0C42@lemburg.com>

Moshe Zadka wrote:
> 
> On Mon, 20 Mar 2000, Ka-Ping Yee wrote:
> 
> > I think that at this point the possibilities for doing sets
> > come down to four options:
> >
> >
> > 1. use lists
> > 2. use dicts
> > 3. new type
> > 4. do nothing
> 
> 5. new Python module with a class "Set"
> (The issues are similar to #3, but this has the advantage of not changing
> the interpreter)

Perhaps someone could take Aaron's kjbuckets and write
a Python emulation for it (I think he's even already done something
like this for gadfly). Then the emulation could go into the
core and if people want speed they can install his extension
(the emulation would have to detect this and use the real thing
then).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack at oratrix.nl  Tue Mar 21 12:54:30 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Tue, 21 Mar 2000 12:54:30 +0100
Subject: [Python-Dev] Unicode and Windows 
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
	     Tue, 21 Mar 2000 01:25:09 +0100 , <38D6C165.EEF58232@lemburg.com> 
Message-ID: <20000321115430.88A11370CF2@snelboot.oratrix.nl>

I guess we need another format specifier than "s" here. "s" does the 
conversion to standard-python-utf8 for wide strings, and we'd need another 
format for conversion to current-local-os-convention-8-bit-encoding-of-unicode-
strings.

I assume that that would also come in handy for MacOS, where we'll have the 
same problem (filenames are in Apple's proprietary 8bit encoding).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal at lemburg.com  Tue Mar 21 13:14:54 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 21 Mar 2000 13:14:54 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000321115430.88A11370CF2@snelboot.oratrix.nl>
Message-ID: <38D767BE.C45F8286@lemburg.com>

Jack Jansen wrote:
> 
> I guess we need another format specifier than "s" here. "s" does the
> conversion to standard-python-utf8 for wide strings,

Actually, "t" does the UTF-8 conversion... "s" will give you
the raw internal UTF-16 representation in platform byte order.

> and we'd need another
> format for conversion to current-local-os-convention-8-bit-encoding-of-unicode-
> strings.

I'd suggest adding some king of generic

	PyOS_FilenameFromObject(PyObject *v,
				void *buffer,
				int buffer_len)

API for the conversion of strings, Unicode and text buffers
to an OS dependent filename buffer.

And/or perhaps sepcific APIs for each OS... e.g.

	PyOS_MBCSFromObject() (only on WinXX)
	PyOS_AppleFromObject() (only on Mac ;)

> I assume that that would also come in handy for MacOS, where we'll have the
> same problem (filenames are in Apple's proprietary 8bit encoding).

Is that encoding already supported by the encodings package ?
If not, could you point me to a map file for the encoding ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake at acm.org  Tue Mar 21 15:56:47 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue, 21 Mar 2000 09:56:47 -0500 (EST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <38D767BE.C45F8286@lemburg.com>
References: <20000321115430.88A11370CF2@snelboot.oratrix.nl>
	<38D767BE.C45F8286@lemburg.com>
Message-ID: <14551.36271.33825.841965@weyr.cnri.reston.va.us>

M.-A. Lemburg writes:
 > And/or perhaps sepcific APIs for each OS... e.g.
 > 
 > 	PyOS_MBCSFromObject() (only on WinXX)
 > 	PyOS_AppleFromObject() (only on Mac ;)

  Another approach may be to add some format modifiers:

	te -- text in an encoding specified by a C string (somewhat
              similar to O&)
        tE -- text, encoding specified by a Python object (probably a
              string passed as a parameter or stored from some other
              call)

  (I'd prefer the [eE] before the t, but the O modifiers follow, so
consistency requires this ugly construct.)
  This brings up the issue of using a hidden conversion function which 
may create a new object that needs the same lifetime guarantees as the 
real parameters; we discussed this issue a month or two ago.
  Somewhere, there's a call context that includes the actual parameter 
tuple.  PyArg_ParseTuple() could have access to a "scratch" area where
it could place objects constructed during parameter parsing.  This
area could just be a hidden tuple.  When the C call returns, the
scratch area can be discarded.
  The difficulty is in giving PyArg_ParseTuple() access to the scratch 
area, but I don't know how hard that would be off the top of my head.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From jeremy at cnri.reston.va.us  Tue Mar 21 18:14:07 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Tue, 21 Mar 2000 12:14:07 -0500 (EST)
Subject: [Python-Dev] Set options
In-Reply-To: <38D7409C.169B0C42@lemburg.com>
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
	<38D7409C.169B0C42@lemburg.com>
Message-ID: <14551.44511.805860.808811@goon.cnri.reston.va.us>

>>>>> "MAL" == M -A Lemburg <mal at lemburg.com> writes:

  MAL> Perhaps someone could take Aaron's kjbuckets and write a Python
  MAL> emulation for it (I think he's even already done something like
  MAL> this for gadfly). Then the emulation could go into the core and
  MAL> if people want speed they can install his extension (the
  MAL> emulation would have to detect this and use the real thing
  MAL> then).

I've been waiting for Tim Peters to say something about sets, but I'll
chime in with what I recall him saying last time a discussion like
this came up on c.l.py.  (I may misremember, in which case I'll at
least draw him into the discussion in order to correct me <0.5 wink>.)

The problem with a set module is that there are a number of different
ways to implement them -- in C using kjbuckets is one example.  Each
approach is appropriate for some applications, but not for every one.
A set is pretty simple to build from a list or a dictionary, so we
leave it to application writers to write the one that is appropriate
for their application.

Jeremy


From skip at mojam.com  Tue Mar 21 18:25:57 2000
From: skip at mojam.com (Skip Montanaro)
Date: Tue, 21 Mar 2000 11:25:57 -0600 (CST)
Subject: [Python-Dev] Set options
In-Reply-To: <38D7409C.169B0C42@lemburg.com>
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
	<38D7409C.169B0C42@lemburg.com>
Message-ID: <14551.45221.447838.534003@beluga.mojam.com>

    Marc> Perhaps someone could take Aaron's kjbuckets and write a Python
    Marc> emulation for it ...

Any reason why kjbuckets and friends have never been placed in the core?
If, as it seems from the discussion, a set type is a good thing to add to
the core, it seems to me that Aaron's code would be a good candidate
implementation/foundation. 

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From bwarsaw at cnri.reston.va.us  Tue Mar 21 18:47:49 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 21 Mar 2000 12:47:49 -0500 (EST)
Subject: [Python-Dev] Set options
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
	<38D7409C.169B0C42@lemburg.com>
	<14551.45221.447838.534003@beluga.mojam.com>
Message-ID: <14551.46533.918688.13801@anthem.cnri.reston.va.us>

>>>>> "SM" == Skip Montanaro <skip at mojam.com> writes:

    SM> Any reason why kjbuckets and friends have never been placed in
    SM> the core?  If, as it seems from the discussion, a set type is
    SM> a good thing to add to the core, it seems to me that Aaron's
    SM> code would be a good candidate implementation/foundation.

It would seem to me that distutils is a better way to go for
kjbuckets.  The core already has basic sets (via dictionaries).  We're
pretty much just quibbling about efficiency, API, and syntax, aren't
we?

-Barry


From mhammond at skippinet.com.au  Tue Mar 21 18:48:06 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue, 21 Mar 2000 09:48:06 -0800
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <38D6C165.EEF58232@lemburg.com>
Message-ID: <ECEPKNMJLHAPFFJHDOJBKEOKCGAA.mhammond@skippinet.com.au>

>
> Right. The idea with open() was to write a special version (using
> #ifdefs) for use on Windows platforms which does all the needed
> magic to convert Unicode to whatever the native format and locale
> is...

That works for open() - but what about other extension modules?

This seems to imply that any Python extension on Windows that wants to pass
a Unicode string to an external function can not use PyArg_ParseTuple() with
anything other than "O", and perform the magic themselves.

This just seems a little back-to-front to me.  Platforms that have _no_
native Unicode support have useful utilities for working with Unicode.
Platforms that _do_ have native Unicode support can not make use of these
utilities.  Is this by design, or simply a sad side-effect of the design?

So - it is trivial to use Unicode on platforms that dont support it, but
quite difficult on platforms that do.

> Using parser markers for this is obviously *not* the right way
> to get to the core of the problem. Basically, you will have to
> write a helper which takes a string, Unicode or some other
> "t" compatible object as name object and then converts it to
> the system's view of things.

Why "obviously"?  What on earth does the existing mechamism buy me on
Windows, other than grief that I can not use it?

> I think we had a private discussion about this a few months ago:
> there was some way to convert Unicode to a platform independent
> format which then got converted to MBCS -- don't remember the details
> though.

There is a Win32 API function for this.  However, as you succinctly pointed
out, not many people are going to be aware of its name, or how to use the
multitude of flags offered by these conversion functions, or know how to
deal with the memory management, etc.

> Can't you use the wchar_t interfaces for the task (see
> the unicodeobject.h file for details) ? Perhaps you can
> first transfer Unicode to wchar_t and then on to MBCS
> using a win32 API ?!

Sure - I can.  But can everyone who writes interfaces to Unicode functions?
You wrote the Python Unicode support but dont know its name - pity the poor
Joe Average trying to write an extension.

It seems to me that, on Windows, the Python Unicode support as it stands is
really internal.  I can not think of a single time that an extension writer
on Windows would ever want to use the "t" markers - am I missing something?
I dont believe that a single Unicode-aware function in the Windows
extensions (of which there are _many_) could be changed to use the "t"
markers.

It still seems to me that the Unicode support works well on platforms with
no Unicode support, and is fairly useless on platforms with the support.  I
dont believe that any extension on Windows would want to use the "t"
marker - so, as Fred suggested, how about providing something for us that
can help us interface to the platform's Unicode?

This is getting too hard for me - I will release my windows registry module
without Unicode support, and hope that in the future someone cares enough to
address it, and to add a large number of LOC that will be needed simply to
get Unicode talking to Unicode...

Mark.


From skip at mojam.com  Tue Mar 21 19:04:11 2000
From: skip at mojam.com (Skip Montanaro)
Date: Tue, 21 Mar 2000 12:04:11 -0600 (CST)
Subject: [Python-Dev] Set options
In-Reply-To: <14551.46533.918688.13801@anthem.cnri.reston.va.us>
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
	<38D7409C.169B0C42@lemburg.com>
	<14551.45221.447838.534003@beluga.mojam.com>
	<14551.46533.918688.13801@anthem.cnri.reston.va.us>
Message-ID: <14551.47515.648064.969034@beluga.mojam.com>

    BAW> It would seem to me that distutils is a better way to go for
    BAW> kjbuckets.  The core already has basic sets (via dictionaries).
    BAW> We're pretty much just quibbling about efficiency, API, and syntax,
    BAW> aren't we?

Yes (though I would quibble with your use of the word "quibbling" ;-).  If
new syntax is in the offing as some have proposed, why not go for a more
efficient implementation at the same time?  I believe Aaron has maintained
that kjbuckets is generally more efficient than Python's dictionary object.

Skip


From mal at lemburg.com  Tue Mar 21 18:44:11 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 21 Mar 2000 18:44:11 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000321115430.88A11370CF2@snelboot.oratrix.nl>
		<38D767BE.C45F8286@lemburg.com> <14551.36271.33825.841965@weyr.cnri.reston.va.us>
Message-ID: <38D7B4EB.66DAEBF3@lemburg.com>

"Fred L. Drake, Jr." wrote:
> 
> M.-A. Lemburg writes:
>  > And/or perhaps sepcific APIs for each OS... e.g.
>  >
>  >      PyOS_MBCSFromObject() (only on WinXX)
>  >      PyOS_AppleFromObject() (only on Mac ;)
> 
>   Another approach may be to add some format modifiers:
> 
>         te -- text in an encoding specified by a C string (somewhat
>               similar to O&)
>         tE -- text, encoding specified by a Python object (probably a
>               string passed as a parameter or stored from some other
>               call)
> 
>   (I'd prefer the [eE] before the t, but the O modifiers follow, so
> consistency requires this ugly construct.)
>   This brings up the issue of using a hidden conversion function which
> may create a new object that needs the same lifetime guarantees as the
> real parameters; we discussed this issue a month or two ago.
>   Somewhere, there's a call context that includes the actual parameter
> tuple.  PyArg_ParseTuple() could have access to a "scratch" area where
> it could place objects constructed during parameter parsing.  This
> area could just be a hidden tuple.  When the C call returns, the
> scratch area can be discarded.
>   The difficulty is in giving PyArg_ParseTuple() access to the scratch
> area, but I don't know how hard that would be off the top of my head.

Some time ago, I considered adding "U+" with builtin auto-conversion
to the tuple parser... after some discussion about the error
handling issues involved with this I quickly dropped that idea
again and used the standard "O" approach plus a call to a helper
function which then applied the conversion.

(Note the "+" behind "U": this was intended to indicate that the
returned object has had the refcount incremented and that the
caller must take care of decrementing it again.)

The "O" + helper approach is a little clumsy, but works
just fine. Plus it doesn't add any more overhead to the
already convoluted PyArg_ParseTuple().

BTW, what other external char formats are we talking about ?
E.g. how do you handle MBCS or DBCS under WinXX ? Are there
routines to have wchar_t buffers converted into the two ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gmcm at hypernet.com  Tue Mar 21 19:25:43 2000
From: gmcm at hypernet.com (Gordon McMillan)
Date: Tue, 21 Mar 2000 13:25:43 -0500
Subject: [Python-Dev] Set options
In-Reply-To: <14551.44511.805860.808811@goon.cnri.reston.va.us>
References: <38D7409C.169B0C42@lemburg.com>
Message-ID: <1258459347-36172889@hypernet.com>

Jeremy wrote:

> The problem with a set module is that there are a number of different
> ways to implement them -- in C using kjbuckets is one example.  

Nah. Sets are pretty unambiguous. They're also easy, and 
boring. The interesting stuff is graphs and operations like 
composition, closure and transpositions. That's also where 
stuff gets ambiguous. E.g., what's the right behavior when you 
invert {'a':1,'b':1}? Hint: any answer you give will be met by the 
wrath of God.

I would love this stuff, and as a faithful worshipper of Our Lady 
of Corrugated Ironism, I could probably live with whatever rules 
are arrived at; but I'm afraid I would have to considerably 
enlarge my kill file.


- Gordon


From gstein at lyra.org  Tue Mar 21 19:40:20 2000
From: gstein at lyra.org (Greg Stein)
Date: Tue, 21 Mar 2000 10:40:20 -0800 (PST)
Subject: [Python-Dev] Set options
In-Reply-To: <14551.44511.805860.808811@goon.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003211039420.19728-100000@nebula.lyra.org>

On Tue, 21 Mar 2000, Jeremy Hylton wrote:
> >>>>> "MAL" == M -A Lemburg <mal at lemburg.com> writes:
>   MAL> Perhaps someone could take Aaron's kjbuckets and write a Python
>   MAL> emulation for it (I think he's even already done something like
>   MAL> this for gadfly). Then the emulation could go into the core and
>   MAL> if people want speed they can install his extension (the
>   MAL> emulation would have to detect this and use the real thing
>   MAL> then).
> 
> I've been waiting for Tim Peters to say something about sets, but I'll
> chime in with what I recall him saying last time a discussion like
> this came up on c.l.py.  (I may misremember, in which case I'll at
> least draw him into the discussion in order to correct me <0.5 wink>.)
> 
> The problem with a set module is that there are a number of different
> ways to implement them -- in C using kjbuckets is one example.  Each
> approach is appropriate for some applications, but not for every one.
> A set is pretty simple to build from a list or a dictionary, so we
> leave it to application writers to write the one that is appropriate
> for their application.

Yah... +1 on what Jeremy said.

Leave them out of the distro since we can't do them Right for all people.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From moshez at math.huji.ac.il  Tue Mar 21 19:34:56 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 21 Mar 2000 20:34:56 +0200 (IST)
Subject: [Python-Dev] Set options
In-Reply-To: <14551.47515.648064.969034@beluga.mojam.com>
Message-ID: <Pine.GSO.4.10.10003212027090.28133-100000@sundial>

On Tue, 21 Mar 2000, Skip Montanaro wrote:

>     BAW> It would seem to me that distutils is a better way to go for
>     BAW> kjbuckets.  The core already has basic sets (via dictionaries).
>     BAW> We're pretty much just quibbling about efficiency, API, and syntax,
>     BAW> aren't we?
> 
> If new syntax is in the offing as some have proposed,

FWIW, I'm against new syntax. The core-language has changed quite a lot
between 1.5.2 and 1.6 --

* strings have grown methods
* there are unicode strings
* "in" operator overloadable

The second change even includes a syntax change (u"some string") whose
variants I'm still not familiar enough to comment on (ru"some\string"?
ur"some\string"? Both legal?). I feel too many changes destabilize the
language (this might seem a bit extreme, considering I pushed towards one
of the changes), and we should try to improve on things other then the
core -- one of these is a more hierarchical standard library, and a
standard distribution mechanism, to rival CPAN -- then anyone could 

import data.sets.kjbuckets

With only a trivial 

>>> import dist
>>> dist.install("data.sets.kjbuckets")

> why not go for a more efficient implementation at the same time? 

Because Python dicts are "pretty efficient", and it is not a trivial
question to check optimiality in this area: tests can be rigged to prove
almost anything with the right test-cases, and there's no promise we'll
choose the "right ones".

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Tue Mar 21 19:38:02 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 21 Mar 2000 20:38:02 +0200 (IST)
Subject: [Python-Dev] Set options
In-Reply-To: <1258459347-36172889@hypernet.com>
Message-ID: <Pine.GSO.4.10.10003212036480.28133-100000@sundial>

On Tue, 21 Mar 2000, Gordon McMillan wrote:

> E.g., what's the right behavior when you 
> invert {'a':1,'b':1}? Hint: any answer you give will be met by the 
> wrath of God.

Isn't "wrath of God" translated into Python is "an exception"?

raise ValueError("dictionary is not 1-1") 

seems fine to me.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From skip at mojam.com  Tue Mar 21 19:42:55 2000
From: skip at mojam.com (Skip Montanaro)
Date: Tue, 21 Mar 2000 12:42:55 -0600 (CST)
Subject: [Python-Dev] Set options
In-Reply-To: <Pine.GSO.4.10.10003212027090.28133-100000@sundial>
References: <14551.47515.648064.969034@beluga.mojam.com>
	<Pine.GSO.4.10.10003212027090.28133-100000@sundial>
Message-ID: <14551.49839.377385.99637@beluga.mojam.com>

    Skip> If new syntax is in the offing as some have proposed,

    Moshe> FWIW, I'm against new syntax. The core-language has changed quite
    Moshe> a lot between 1.5.2 and 1.6 --

I thought we were talking about Py3K, where syntax changes are somewhat more
expected.  Just to make things clear, the syntax change I was referring to
was the value-less dict syntax that someone proposed a few days ago:

    myset = {"a", "b", "c"}

Note that I wasn't necessarily supporting the proposal, only acknowledging
that it had been made.

In general, I think we need to keep straight where people feel various
proposals are going to fit.  When a thread goes for more than a few messages 
it's easy to forget.

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From ping at lfw.org  Tue Mar 21 14:07:51 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Tue, 21 Mar 2000 07:07:51 -0600 (CST)
Subject: [Python-Dev] Set options
In-Reply-To: <14551.46533.918688.13801@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003210643440.27995-100000@server1.lfw.org>

Jeremy Hylton wrote:
> The problem with a set module is that there are a number of different
> ways to implement them -- in C using kjbuckets is one example.  Each
> approach is appropriate for some applications, but not for every one.

For me, anyway, this is not about trying to engineer a universally
perfect solution into Python -- it's about providing some simple, basic,
easy-to-understand functionality that takes care of the common case.
For example, dictionaries are simple, their workings are easy enough
to understand, and they aren't written to efficiently support things
like inversion and composition because most of the time no one needs
to do these things.

The same holds true for sets.  All i would want is something i can
put things into, and take things out of, and ask about what's inside.

Barry Warsaw wrote:
> It would seem to me that distutils is a better way to go for
> kjbuckets.  The core already has basic sets (via dictionaries).  We're
> pretty much just quibbling about efficiency, API, and syntax, aren't we?

Efficiency: Hashtables have proven quite adequate for dicts, so
i think they're quite adequate for sets.

API and syntax: I believe the goal is obvious, because Python already
has very nice notation ("in", "not in") -- it just doesn't work quite
the way one would want.  It works semantically right on lists, but
they're a little slow.  It doesn't work on dicts, but we can make it so.

Here is where my "explanation metric" comes into play.  How much
additional explaining do you have to do in each case to answer the
question "what do i do when i need a set"?


1.  Use lists.

    Explain that "include()" means "append if not already present",
    and "exclude()" means "remove if present".  You are done.


2.  Use dicts.
    
    Explain that "for x in dict" iterates over the keys, and
    "if x in dict" looks for a key.  Explain what happens when
    you write "{1, 2, 3}", and the special non-printing value
    constant.  Explain how to add elements to a set and remove
    elements from a set.


3.  Create a new type.

    Explain that there exists another type "set" with methods
    "insert" and "remove".  Explain how to construct sets.
    Explain how "in" and "not in" work, where this type fits
    in with the other types, and when to choose this type
    over other types.


4.  Do nothing.

    Explain that dictionaries can be used as sets if you assign
    keys a dummy value, use "del" to remove keys, iterate over
    "dict.keys()", and use "dict.has_key()" to test membership.


This is what motivated my proposal for using lists: it requires
by far the least explanation.  This is no surprise because a lot
of things about lists have been explained already.

My preference in terms of elegance is about equal for 1, 2, 3,
with 4 distinctly behind; but my subjective ranking of "explanation
complexity" (as in "how to get there from here") is 1 < 4 < 3 < 2.


-- ?!ng


From tismer at tismer.com  Tue Mar 21 21:13:38 2000
From: tismer at tismer.com (Christian Tismer)
Date: Tue, 21 Mar 2000 21:13:38 +0100
Subject: [Python-Dev] Unicode Database Compression
Message-ID: <38D7D7F2.14A2FBB5@tismer.com>

Hi,

I have spent the last four days on compressing the
Unicode database.

With little decoding effort, I can bring the data down to 25kb.
This would still be very fast, since codes are randomly
accessible, although there are some simple shifts and masks.

With a bit more effort, this can be squeezed down to 15kb
by some more aggressive techniques like common prefix
elimination. Speed would be *slightly* worse, since a small
loop (average 8 cycles) is performed to obtain a character
from a packed nybble.

This is just all the data which is in Marc's unicodedatabase.c
file. I checked efficiency by creating a delimited file like
the original database text file with only these columns and
ran PkZip over it. The result was 40kb. This says that I found
a lot of correlations which automatic compressors cannot see.

Now, before generating the final C code, I'd like to ask some
questions:

What is more desirable: Low compression and blinding speed?
Or high compression and less speed, since we always want to
unpack a whole code page?

Then, what about the other database columns?
There are a couple of extra atrributes which I find coded
as switch statements elsewhere. Should I try to pack these
codes into my squeezy database, too?

And last: There are also two quite elaborated columns with
textual descriptions of the codes (the uppercase blah version
of character x). Do we want these at all? And if so, should
I try to compress them as well? Should these perhaps go
into a different source file as a dynamic module, since they
will not be used so often?

waiting for directives - ly y'rs - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From moshez at math.huji.ac.il  Wed Mar 22 06:44:00 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 22 Mar 2000 07:44:00 +0200 (IST)
Subject: [1.x] Re: [Python-Dev] Set options
In-Reply-To: <14551.49839.377385.99637@beluga.mojam.com>
Message-ID: <Pine.GSO.4.10.10003212114530.29516-100000@sundial>

On Tue, 21 Mar 2000, Skip Montanaro wrote:

>     Skip> If new syntax is in the offing as some have proposed,
> 
>     Moshe> FWIW, I'm against new syntax. The core-language has changed quite
>     Moshe> a lot between 1.5.2 and 1.6 --
> 
> I thought we were talking about Py3K

My argument was strictly a 1.x argument. I'm hoping to get sets it in 1.7
or 1.8.

> In general, I think we need to keep straight where people feel various
> proposals are going to fit. 

You're right. I'll start prefixing my posts accordingally.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mal at lemburg.com  Wed Mar 22 11:11:25 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 22 Mar 2000 11:11:25 +0100
Subject: [Python-Dev] Re: Unicode Database Compression
References: <38D7D7F2.14A2FBB5@tismer.com>
Message-ID: <38D89C4D.370C19D@lemburg.com>

Christian Tismer wrote:
> 
> Hi,
> 
> I have spent the last four days on compressing the
> Unicode database.

Cool :-)
 
> With little decoding effort, I can bring the data down to 25kb.
> This would still be very fast, since codes are randomly
> accessible, although there are some simple shifts and masks.
> 
> With a bit more effort, this can be squeezed down to 15kb
> by some more aggressive techniques like common prefix
> elimination. Speed would be *slightly* worse, since a small
> loop (average 8 cycles) is performed to obtain a character
> from a packed nybble.
> 
> This is just all the data which is in Marc's unicodedatabase.c
> file. I checked efficiency by creating a delimited file like
> the original database text file with only these columns and
> ran PkZip over it. The result was 40kb. This says that I found
> a lot of correlations which automatic compressors cannot see.

Not bad ;-)
 
> Now, before generating the final C code, I'd like to ask some
> questions:
> 
> What is more desirable: Low compression and blinding speed?
> Or high compression and less speed, since we always want to
> unpack a whole code page?

I'd say high speed and less compression. The reason is that
the Asian codecs will need fast access to the database. With
their large mapping tables size the few more kB don't hurt,
I guess.

> Then, what about the other database columns?
> There are a couple of extra atrributes which I find coded
> as switch statements elsewhere. Should I try to pack these
> codes into my squeezy database, too?

You basically only need to provide the APIs (and columns)
defined in the unicodedata Python API, e.g. the
character description column is not needed.
 
> And last: There are also two quite elaborated columns with
> textual descriptions of the codes (the uppercase blah version
> of character x). Do we want these at all? And if so, should
> I try to compress them as well? Should these perhaps go
> into a different source file as a dynamic module, since they
> will not be used so often?

I guess you are talking about the "Unicode 1.0 Name"
and the "10646 comment field" -- see above, there's no
need to include these descriptions in the database...
 
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Mar 22 12:04:32 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 22 Mar 2000 12:04:32 +0100
Subject: [Python-Dev] Unicode and Windows
References: <ECEPKNMJLHAPFFJHDOJBKEOKCGAA.mhammond@skippinet.com.au>
Message-ID: <38D8A8C0.66123F2C@lemburg.com>

Mark Hammond wrote:
> 
> >
> > Right. The idea with open() was to write a special version (using
> > #ifdefs) for use on Windows platforms which does all the needed
> > magic to convert Unicode to whatever the native format and locale
> > is...
> 
> That works for open() - but what about other extension modules?
> 
> This seems to imply that any Python extension on Windows that wants to pass
> a Unicode string to an external function can not use PyArg_ParseTuple() with
> anything other than "O", and perform the magic themselves.
> 
> This just seems a little back-to-front to me.  Platforms that have _no_
> native Unicode support have useful utilities for working with Unicode.
> Platforms that _do_ have native Unicode support can not make use of these
> utilities.  Is this by design, or simply a sad side-effect of the design?
> 
> So - it is trivial to use Unicode on platforms that dont support it, but
> quite difficult on platforms that do.

The problem is that Windows seems to use a completely different
internal Unicode format than most of the rest of the world.

As I've commented on in a different post, the only way to have
PyArg_ParseTuple() perform auto-conversion is by allowing it
to return objects which are garbage collected by the caller.
The problem with this is error handling, since PyArg_ParseTuple()
will have to keep track of all objects it created until the
call returns successfully. An alternative approach is sketched
below.

Note that *all* platforms will have to use this approach...
not only Windows or other platforms with Unicode support.

> > Using parser markers for this is obviously *not* the right way
> > to get to the core of the problem. Basically, you will have to
> > write a helper which takes a string, Unicode or some other
> > "t" compatible object as name object and then converts it to
> > the system's view of things.
> 
> Why "obviously"?  What on earth does the existing mechamism buy me on
> Windows, other than grief that I can not use it?

Sure, you can :-) Just fetch the object, coerce it to
Unicode and then encode it according to your platform needs
(PyUnicode_FromObject() takes care of the coercion part for you).
 
> > I think we had a private discussion about this a few months ago:
> > there was some way to convert Unicode to a platform independent
> > format which then got converted to MBCS -- don't remember the details
> > though.
> 
> There is a Win32 API function for this.  However, as you succinctly pointed
> out, not many people are going to be aware of its name, or how to use the
> multitude of flags offered by these conversion functions, or know how to
> deal with the memory management, etc.
> 
> > Can't you use the wchar_t interfaces for the task (see
> > the unicodeobject.h file for details) ? Perhaps you can
> > first transfer Unicode to wchar_t and then on to MBCS
> > using a win32 API ?!
> 
> Sure - I can.  But can everyone who writes interfaces to Unicode functions?
> You wrote the Python Unicode support but dont know its name - pity the poor
> Joe Average trying to write an extension.

Hey, Mark... I'm not a Windows geek. How can I know which APIs
are available and which of them to use ?

And that's my point: add conversion APIs and codecs for the different
OSes which make the extension writer life easier.
 
> It seems to me that, on Windows, the Python Unicode support as it stands is
> really internal.  I can not think of a single time that an extension writer
> on Windows would ever want to use the "t" markers - am I missing something?
> I dont believe that a single Unicode-aware function in the Windows
> extensions (of which there are _many_) could be changed to use the "t"
> markers.

"t" is intended to return a text representation of a buffer
interface aware type... this happens to be UTF-8 for Unicode
objects -- what other encoding would you have expected ?

> It still seems to me that the Unicode support works well on platforms with
> no Unicode support, and is fairly useless on platforms with the support.  I
> dont believe that any extension on Windows would want to use the "t"
> marker - so, as Fred suggested, how about providing something for us that
> can help us interface to the platform's Unicode?

That's exactly what I'm talking about all the time... 
there currently are PyUnicode_AsWideChar() and PyUnicode_FromWideChar()
to interface to the compiler's wchar_t type. I have no problem
adding more of these APIs for the various OSes -- but they
would have to be coded by someone with Unicode skills on each
of those platforms, e.g. PyUnicode_AsMBCS() and PyUnicode_FromMBCS()
on Windows.
 
> This is getting too hard for me - I will release my windows registry module
> without Unicode support, and hope that in the future someone cares enough to
> address it, and to add a large number of LOC that will be needed simply to
> get Unicode talking to Unicode...

I think you're getting this wrong: I'm not argueing against adding
better support for Windows.

The only way I can think of using parser markers in this context
would be by having PyArg_ParseTuple() *copy* data into a given
data buffer rather than only passing a reference to it. This
would enable PyArg_ParseTuple() to apply whatever conversion
is needed while still keeping the temporary objects internal.

Hmm, sketching a little:

"es#",&encoding,&buffer,&buffer_len
	-- could mean: coerce the object to Unicode, then
	   encode it using the given encoding and then 
	   copy at most buffer_len bytes of data into
	   buffer and update buffer_len to the number of bytes
	   copied

This costs some cycles for copying data, but gets rid off
the problems involved in cleaning up after errors. The
caller will have to ensure that the buffer is large enough
and that the encoding fits the application's needs. Error
handling will be poor since the caller can't take any
action other than to pass on the error generated by
PyArg_ParseTuple().

Thoughts ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Mar 22 14:40:23 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 22 Mar 2000 14:40:23 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000322113129.5E67C370CF2@snelboot.oratrix.nl>
Message-ID: <38D8CD47.E573A246@lemburg.com>

Jack Jansen wrote:
> 
> > "es#",&encoding,&buffer,&buffer_len
> >       -- could mean: coerce the object to Unicode, then
> >          encode it using the given encoding and then
> >          copy at most buffer_len bytes of data into
> >          buffer and update buffer_len to the number of bytes
> >          copied
> 
> This is a possible solution, but I think I would really prefer to also have
>  "eS", &encoding, &buffer_ptr
>  -- coerce the object to Unicode, then encode it using the given
>     encoding, malloc() a buffer to put the result in and return that.
> 
> I don't mind doing something like
> 
> {
>    char *filenamebuffer = NULL;
> 
>    if ( PyArg_ParseTuple(args, "eS", &macencoding, &filenamebuffer)
>        ...
>    open(filenamebuffer, ....);
>    PyMem_XDEL(filenamebuffer);
>    ...
> }
> 
> I think this would be much less error-prone than having fixed-length buffers
> all over the place.

PyArg_ParseTuple() should probably raise an error in case the
data doesn't fit into the buffer.

> And if this is indeed going to be used mainly in open()
> calls and such the cost of the extra malloc()/free() is going to be dwarfed by
> what the underlying OS call is going to use.

Good point. You'll still need the buffer_len output parameter
though -- otherwise you wouldn't be able tell the size of the
allocated buffer (the returned data may not be terminated).

How about this:

"es#", &encoding, &buffer, &buffer_len
	-- both buffer and buffer_len are in/out parameters
	-- if **buffer is non-NULL, copy the data into it
	   (at most buffer_len bytes) and update buffer_len
	   on output; truncation produces an error
	-- if **buffer is NULL, malloc() a buffer of size
	   buffer_len and return it through *buffer; if buffer_len
	   is -1, the allocated buffer should be large enough
	   to hold all data; again, truncation is an error
	-- apply coercion and encoding as described above

(could be that I've got the '*'s wrong, but you get the picture...:)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack at oratrix.nl  Wed Mar 22 14:46:50 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 22 Mar 2000 14:46:50 +0100
Subject: [Python-Dev] Unicode and Windows 
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
	     Wed, 22 Mar 2000 14:40:23 +0100 , <38D8CD47.E573A246@lemburg.com> 
Message-ID: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl>

> > [on the user-supplies-buffer interface]
> > I think this would be much less error-prone than having fixed-length buffers
> > all over the place.
> 
> PyArg_ParseTuple() should probably raise an error in case the
> data doesn't fit into the buffer.

Ah, that's right, that solves most of that problem.

> > [on the malloced interface]
> Good point. You'll still need the buffer_len output parameter
> though -- otherwise you wouldn't be able tell the size of the
> allocated buffer (the returned data may not be terminated).

Are you sure? I would expect the "eS" format to be used to obtain 8-bit data 
in some local encoding, and I would expect that all 8-bit encodings of unicode 
data would still allow for null-termination. Or are there 8-bit encodings out 
there where a zero byte is normal occurrence and where it can't be used as 
terminator?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal at lemburg.com  Wed Mar 22 17:31:26 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 22 Mar 2000 17:31:26 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl>
Message-ID: <38D8F55E.6E324281@lemburg.com>

Jack Jansen wrote:
> 
> > > [on the user-supplies-buffer interface]
> > > I think this would be much less error-prone than having fixed-length buffers
> > > all over the place.
> >
> > PyArg_ParseTuple() should probably raise an error in case the
> > data doesn't fit into the buffer.
> 
> Ah, that's right, that solves most of that problem.
> 
> > > [on the malloced interface]
> > Good point. You'll still need the buffer_len output parameter
> > though -- otherwise you wouldn't be able tell the size of the
> > allocated buffer (the returned data may not be terminated).
> 
> Are you sure? I would expect the "eS" format to be used to obtain 8-bit data
> in some local encoding, and I would expect that all 8-bit encodings of unicode
> data would still allow for null-termination. Or are there 8-bit encodings out
> there where a zero byte is normal occurrence and where it can't be used as
> terminator?

Not sure whether these exist or not, but they are certainly
a possibility to keep in mind.

Perhaps adding "es#" and "es" (with 0-byte check) would be
ideal ?!

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From pf at artcom-gmbh.de  Wed Mar 22 17:54:42 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 22 Mar 2000 17:54:42 +0100 (MET)
Subject: [Python-Dev] Nitpicking on UserList implementation
Message-ID: <m12XoP4-000CnDC@artcom0.artcom-gmbh.de>

Hi!

Please have a look at the following method cited from Lib/UserList.py:

    def __radd__(self, other):
        if isinstance(other, UserList):                    # <-- ? 
            return self.__class__(other.data + self.data)  # <-- ?
        elif isinstance(other, type(self.data)):
            return self.__class__(other + self.data)
        else:
            return self.__class__(list(other) + self.data)

The reference manual tells about the __r*__ methods: 

    """These functions are only called if the left operand does not 
       support the corresponding operation."""

So if the left operand is a UserList instance, it should always have
a __add__ method, which will be called instead of the right operands
__radd__.  So I think the condition 'isinstance(other, UserList)'
in __radd__ above will always evaluate to False and so the two lines
marked with '# <-- ?' seem to be superfluous.

But 'UserList' is so mature:  Please tell me what I've oveerlooked before
I make a fool of myself and submit a patch removing this two lines.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From gvwilson at nevex.com  Thu Mar 23 18:10:16 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Thu, 23 Mar 2000 12:10:16 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
Message-ID: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>

[The following passed the Ping test, so I'm posting it here]

If None becomes a keyword, I would like to ask whether it could be used to
signal that a method is a class method, as opposed to an instance method:

class Ping:

    def __init__(self, arg):
        ...as usual...

    def method(self, arg):
        ...no change...

    def classMethod(None, arg):
        ...equivalent of C++ 'static'...

p = Ping("thinks this is cool")    # as always
p.method("who am I to argue?")     # as always
Ping.classMethod("hey, cool!")     # no 'self'
p.classMethod("hey, cool!")        # also selfless


I'd also like to ask (separately) that assignment to None be defined as a
no-op, so that programmers can write:

    year, month, None, None, None, None, weekday, None, None = gmtime(time())

instead of having to create throw-away variables to fill in slots in
tuples that they don't care about.  I think both behaviors are readable;
the first provides genuinely new functionality, while I often found the
second handy when I was doing logic programming.

Greg


From jim at digicool.com  Thu Mar 23 18:18:29 2000
From: jim at digicool.com (Jim Fulton)
Date: Thu, 23 Mar 2000 12:18:29 -0500
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <38DA51E5.B39D3E7B@digicool.com>

gvwilson at nevex.com wrote:
> 
> [The following passed the Ping test, so I'm posting it here]
> 
> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:
> 
> class Ping:
> 
>     def __init__(self, arg):
>         ...as usual...
> 
>     def method(self, arg):
>         ...no change...
> 
>     def classMethod(None, arg):
>         ...equivalent of C++ 'static'...

(snip)

As a point of jargon, please lets call this thing a "static 
method" (or an instance function, or something) rather than
a "class method".  

The distinction between "class methods" and "static methods"
has been discussed at length in the types sig (over a year
ago). If this proposal goes forward and the name "class method"
is used, I'll have to argue strenuously, and I really don't want
to do that. :] So, if you can live with the term "static method", 
you could save us alot of trouble by just saying "static method".

Jim

--
Jim Fulton           mailto:jim at digicool.com
Technical Director   (888) 344-4332              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From gvwilson at nevex.com  Thu Mar 23 18:21:48 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Thu, 23 Mar 2000 12:21:48 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <38DA51E5.B39D3E7B@digicool.com>
Message-ID: <Pine.LNX.4.10.10003231221170.890-100000@akbar.nevex.com>

> As a point of jargon, please lets call this thing a "static method"
> (or an instance function, or something) rather than a "class method".

I'd call it a penguin if that was what it took to get something like this
implemented... :-)

greg


From jim at digicool.com  Thu Mar 23 18:28:25 2000
From: jim at digicool.com (Jim Fulton)
Date: Thu, 23 Mar 2000 12:28:25 -0500
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231221170.890-100000@akbar.nevex.com>
Message-ID: <38DA5439.F5FE8FE6@digicool.com>

gvwilson at nevex.com wrote:
> 
> > As a point of jargon, please lets call this thing a "static method"
> > (or an instance function, or something) rather than a "class method".
> 
> I'd call it a penguin if that was what it took to get something like this
> implemented... :-)

Thanks a great name. Let's go with penguin. :)

Jim

--
Jim Fulton           mailto:jim at digicool.com
Technical Director   (888) 344-4332              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From mhammond at skippinet.com.au  Thu Mar 23 18:29:53 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu, 23 Mar 2000 09:29:53 -0800
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <ECEPKNMJLHAPFFJHDOJBEEAKCHAA.mhammond@skippinet.com.au>

...
> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:
>
>     def classMethod(None, arg):
>         ...equivalent of C++ 'static'...
...

> I'd also like to ask (separately) that assignment to None be defined as a
> no-op, so that programmers can write:
>
>     year, month, None, None, None, None, weekday, None, None =
> gmtime(time())

In the vernacular of a certain Mr Stein...

+2 on both of these :-)

[Although I do believe "static method" is a better name than "penguin" :-]


Mark.


From ping at lfw.org  Thu Mar 23 18:47:47 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Thu, 23 Mar 2000 09:47:47 -0800 (PST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <Pine.LNX.4.10.10003230942180.1187-100000@localhost>

On Thu, 23 Mar 2000 gvwilson at nevex.com wrote:
> 
> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:
> 
> class Ping:

[...]

Ack!  I've been reduced to a class with just three methods.
Oh well, i never really considered it a such a bad thing
to be called "simple-minded".  :)

>     def classMethod(None, arg):
>         ...equivalent of C++ 'static'...

Yeah, i agree with Jim; you might as well call this a "static
method" as opposed to a "class method".

I like the way "None" is explicitly stated here, so there's
no confusion about what the method does.  (Without it, there's
the question of whether the first argument will get thrown in,
or what...)

Hmm... i guess this also means one should ask what

    def function(None, arg):
        ...

does outside a class definition.  I suppose that should simply
be illegal.

> I'd also like to ask (separately) that assignment to None be defined as a
> no-op, so that programmers can write:
> 
>     year, month, None, None, None, None, weekday, None, None = gmtime(time())
> 
> instead of having to create throw-away variables to fill in slots in
> tuples that they don't care about.

For what it's worth, i sometimes use "_" for this purpose
(shades of Prolog!) but i can't make much of an argument
for its readability...


-- ?!ng

        I never dreamt that i would get to be
        The creature that i always meant to be
        But i thought, in spite of dreams,
        You'd be sitting somewhere here with me.


From fdrake at acm.org  Thu Mar 23 19:11:39 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 23 Mar 2000 13:11:39 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <14554.24155.948286.451340@weyr.cnri.reston.va.us>

gvwilson at nevex.com writes:
 > p.classMethod("hey, cool!")        # also selfless

  This is the example that I haven't seen before (I'm not on the
types-sig, so it may have been presented there), and I think this is
what makes it interesting; a method in a module isn't quite sufficient 
here, since a subclass can override or extend the penguin this way.
  (Er, if we *do* go with penguin, does this mean it only works on
Linux?  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From pf at artcom-gmbh.de  Thu Mar 23 19:25:57 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Thu, 23 Mar 2000 19:25:57 +0100 (MET)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com> from "gvwilson@nevex.com" at "Mar 23, 2000 12:10:16 pm"
Message-ID: <m12YCIv-000CnDC@artcom0.artcom-gmbh.de>

Hi!

gvwilson at nevex.com:
> I'd also like to ask (separately) that assignment to None be defined as a
> no-op, so that programmers can write:
> 
>     year, month, None, None, None, None, weekday, None, None = gmtime(time())

You can already do this today with 1.5.2, if you use a 'del None' statement:

Python 1.5.2 (#1, Jul 23 1999, 06:38:16)  [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> from time import time, gmtime
>>> year, month, None, None, None, None, weekday, None, None = gmtime(time())
>>> print year, month, None, weekday
2000 3 0 3
>>> del None
>>> print year, month, None, weekday
2000 3 None 3
>>> 

if None will become a keyword in Py3K this pyidiom should better be written as 
    year, month, None, None, None, None, ... = ...	
    if sys.version[0] == '1': del None

or
    try:
        del None
    except SyntaxError:
        pass # Wow running Py3K here!

I wonder, how much existinng code the None --> keyword change would brake.

Regards, Peter


From paul at prescod.net  Thu Mar 23 19:47:55 2000
From: paul at prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 10:47:55 -0800
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <38DA66DB.635E8731@prescod.net>

gvwilson at nevex.com wrote:
> 
> [The following passed the Ping test, so I'm posting it here]
> 
> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:

+1

Idea is good, but I'm not really happy with any of the the proposed
terminology...Python doesn't really have static anything.

I would vote at the same time to make self a keyword and signal if the
first argument is not one of None or self. Even now, one of my most
common Python mistakes is in forgetting self. I expect it happens to
anyone who shifts between other languages and Python.

Why does None have an upper case "N"? Maybe the keyword version should
be lower-case...

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"I and my companions suffer from a disease of the heart that can only
be cured with gold", Hernan Cortes


From bwarsaw at cnri.reston.va.us  Thu Mar 23 19:57:00 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 23 Mar 2000 13:57:00 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <14554.26876.514559.320219@anthem.cnri.reston.va.us>

>>>>> "gvwilson" ==   <gvwilson at nevex.com> writes:

    gvwilson> If None becomes a keyword, I would like to ask whether
    gvwilson> it could be used to signal that a method is a class
    gvwilson> method, as opposed to an instance method:

It still seems mildly weird that None would be a special kind of
keyword, one that has a value and is used in ways that no other
keyword is used.  Greg gives an example, and here's a few more:

def baddaboom(x, y, z=None):
    ...

if z is None:
    ...

try substituting `else' for `None' in these examples. ;)

Putting that issue aside, Greg's suggestion for static method
definitions is interesting.

class Ping:
    # would this be a SyntaxError?
    def __init__(None, arg):
	...

    def staticMethod(None, arg):
	...

p = Ping()
Ping.staticMethod(p, 7)  # TypeError
Ping.staticMethod(7)     # This is fine
p.staticMethod(7)        # So's this
Ping.staticMethod(p)     # and this !!

-Barry


From paul at prescod.net  Thu Mar 23 19:52:25 2000
From: paul at prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 10:52:25 -0800
Subject: [Python-Dev] dir()
Message-ID: <38DA67E9.AA593B7A@prescod.net>

Can someone explain why dir(foo) does not return all of foo's methods? I
know it's documented that way, I just don't know why it *is* that way.

I'm also not clear why instances don't have auto-populated __methods__
and __members__ members?

If there isn't a good reason (there probably is) then I would advocate
that these functions and members should be more comprehensive.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"I and my companions suffer from a disease of the heart that can only
be cured with gold", Hernan Cortes


From bwarsaw at cnri.reston.va.us  Thu Mar 23 20:00:57 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 23 Mar 2000 14:00:57 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
	<m12YCIv-000CnDC@artcom0.artcom-gmbh.de>
Message-ID: <14554.27113.546575.170565@anthem.cnri.reston.va.us>

>>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:

    |     try:
    |         del None
    |     except SyntaxError:
    |         pass # Wow running Py3K here!

I know how to break your Py3K code: stick None=None some where higher
up :)

    PF> I wonder, how much existinng code the None --> keyword change
    PF> would brake.

Me too.
-Barry


From gvwilson at nevex.com  Thu Mar 23 20:01:06 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Thu, 23 Mar 2000 14:01:06 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.26876.514559.320219@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003231359020.4065-100000@akbar.nevex.com>

> class Ping:
>     # would this be a SyntaxError?
>     def __init__(None, arg):
> 	...

Absolutely a syntax error; ditto any of the other special names (e.g.
__add__).

Greg


From akuchlin at mems-exchange.org  Thu Mar 23 20:06:33 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 23 Mar 2000 14:06:33 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.27113.546575.170565@anthem.cnri.reston.va.us>
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
	<m12YCIv-000CnDC@artcom0.artcom-gmbh.de>
	<14554.27113.546575.170565@anthem.cnri.reston.va.us>
Message-ID: <14554.27449.69043.924322@amarok.cnri.reston.va.us>

Barry A. Warsaw writes:
>>>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:
>    PF> I wonder, how much existinng code the None --> keyword change
>    PF> would brake.
>Me too.

I can't conceive of anyone using None as a function name or a variable
name, except through a bug or thinking that 'None, useful, None =
1,2,3' works.  Even though None isn't a fixed constant, it might as
well be.  How much C code have you see lately that starts with int
function(void *NULL) ?

Being able to do "None = 2" also smacks a bit of those legendary
Fortran compilers that let you accidentally change 2 into 4.  +1 on
this change for Py3K, and I doubt it would cause breakage even if
introduced into 1.x.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
    Principally I played pedants, idiots, old fathers, and drunkards.
    As you see, I had a narrow escape from becoming a professor.
    -- Robertson Davies, "Shakespeare over the Port"


From paul at prescod.net  Thu Mar 23 20:02:33 2000
From: paul at prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 11:02:33 -0800
Subject: [Python-Dev] Unicode character names
Message-ID: <38DA6A49.A60E405B@prescod.net>

Here's a feature I like from Perl's Unicode support:

"""
Support for interpolating named characters

The new \N escape interpolates named characters within strings. For
example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
unicode smiley face at the end. 
"""

I get really tired of looking up the Unicode character for "ndash" or
"right dagger". Does our Unicode database have enough information to
make something like this possible?

Obviously using the official (English) name is only really helpful for
people who speak English, so we should not remove the numeric option.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"I and my companions suffer from a disease of the heart that can only
be cured with gold", Hernan Cortes


From tismer at tismer.com  Thu Mar 23 20:27:53 2000
From: tismer at tismer.com (Christian Tismer)
Date: Thu, 23 Mar 2000 20:27:53 +0100
Subject: [Python-Dev] None as a keyword / class methods
References: <ECEPKNMJLHAPFFJHDOJBEEAKCHAA.mhammond@skippinet.com.au>
Message-ID: <38DA7039.B7CDC6FF@tismer.com>


Mark Hammond wrote:
> 
> ...
> > If None becomes a keyword, I would like to ask whether it could be used to
> > signal that a method is a class method, as opposed to an instance method:
> >
> >     def classMethod(None, arg):
> >         ...equivalent of C++ 'static'...
> ...
> 
> > I'd also like to ask (separately) that assignment to None be defined as a
> > no-op, so that programmers can write:
> >
> >     year, month, None, None, None, None, weekday, None, None =
> > gmtime(time())
> 
> In the vernacular of a certain Mr Stein...
> 
> +2 on both of these :-)

me 2, ?h 1.5...

The assignment no-op seems to be ok. Having None as a place
holder for static methods creates the problem that we loose
compatibility with ordinary functions.
What I would propose instead is:

make the parameter name "self" mandatory for methods, and turn
everything else into a static method. This does not change
function semantics, but just the way the method binding works.

> [Although I do believe "static method" is a better name than "penguin" :-]

pynguin

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From gvwilson at nevex.com  Thu Mar 23 20:33:47 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Thu, 23 Mar 2000 14:33:47 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <38DA7039.B7CDC6FF@tismer.com>
Message-ID: <Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>

Hi, Christian; thanks for your mail.

> What I would propose instead is:
> make the parameter name "self" mandatory for methods, and turn
> everything else into a static method.

In my experience, significant omissions (i.e. something being important
because it is *not* there) often give beginners trouble.  For example,
in C++, you can't tell whether:

int foo::bar(int bah)
{
  return 0;
}

belongs to instances, or to the class as a whole, without referring back
to the header file [1].  To quote the immortal Jeremy Hylton:

    Pythonic design rules #2:
         Explicit is better than implicit.

Also, people often ask why 'self' is required as a method argument in
Python, when it is not in C++ or Java; this proposal would (retroactively)
answer that question...

Greg

[1] I know this isn't a problem in Java or Python; I'm just using it as an
illustration.


From skip at mojam.com  Thu Mar 23 21:09:00 2000
From: skip at mojam.com (Skip Montanaro)
Date: Thu, 23 Mar 2000 14:09:00 -0600 (CST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.27449.69043.924322@amarok.cnri.reston.va.us>
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
	<m12YCIv-000CnDC@artcom0.artcom-gmbh.de>
	<14554.27113.546575.170565@anthem.cnri.reston.va.us>
	<14554.27449.69043.924322@amarok.cnri.reston.va.us>
Message-ID: <14554.31196.387213.472302@beluga.mojam.com>

    AMK> +1 on this change for Py3K, and I doubt it would cause breakage
    AMK> even if introduced into 1.x.

Or if it did, it's probably code that's marginally broken already...

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From tismer at tismer.com  Thu Mar 23 21:21:09 2000
From: tismer at tismer.com (Christian Tismer)
Date: Thu, 23 Mar 2000 21:21:09 +0100
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>
Message-ID: <38DA7CB5.87D62E14@tismer.com>

Yo,

gvwilson at nevex.com wrote:
> 
> Hi, Christian; thanks for your mail.
> 
> > What I would propose instead is:
> > make the parameter name "self" mandatory for methods, and turn
> > everything else into a static method.
> 
> In my experience, significant omissions (i.e. something being important
> because it is *not* there) often give beginners trouble.  For example,
> in C++, you can't tell whether:
> 
> int foo::bar(int bah)
> {
>   return 0;
> }
> 
> belongs to instances, or to the class as a whole, without referring back
> to the header file [1].  To quote the immortal Jeremy Hylton:
> 
>     Pythonic design rules #2:
>          Explicit is better than implicit.

Sure. I am explicitly *not* using self if I want no self. :-)

> Also, people often ask why 'self' is required as a method argument in
> Python, when it is not in C++ or Java; this proposal would (retroactively)
> answer that question...

You prefer to use the explicit keyword None? How would you then deal
with

def outside(None, blah):
    pass # stuff

I believe one answer about the explicit "self" is that it should
be simple and compatible with ordinary functions. Guido had just
to add the semantics that in methods the first parameter
automatically binds to the instance.

The None gives me a bit of trouble, but not much.
What I would like to spell is

ordinary functions                    (as it is now)
functions which are instance methods  (with the immortal self)
functions which are static methods    ???
functions which are class methods     !!!

Static methods can work either with the "1st param==None" rule
or with the "1st paramname!=self" rule or whatever.
But how would you do class methods, which IMHO should have
their class passed in as first parameter?
Do you see a clean syntax for this?

I thought of some weirdness like

def meth(self, ...
def static(self=None, ...           # eek
def classm(self=class, ...          # ahem

but this breaks the rule of default argument order.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From akuchlin at mems-exchange.org  Thu Mar 23 21:27:41 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 23 Mar 2000 15:27:41 -0500 (EST)
Subject: [Python-Dev] Unicode character names
In-Reply-To: <38DA6A49.A60E405B@prescod.net>
References: <38DA6A49.A60E405B@prescod.net>
Message-ID: <14554.32317.730574.967165@amarok.cnri.reston.va.us>

Paul Prescod writes:
>The new \N escape interpolates named characters within strings. For
>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
>unicode smiley face at the end. 

Cute idea, and it certainly means you can avoid looking up Unicode
numbers.  (You can look up names instead. :) )  Note that this means the
Unicode database is no longer optional if this is done; it has to be
around at code-parsing time.  Python could import it automatically, as
exceptions.py is imported.  Christian's work on compressing
unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
dragging around the Unicode database in the binary, or is it read out
of some external file or data structure?)

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
About ten days later, it being the time of year when the National collected
down and outs to walk on and understudy I arrived at the head office of the
National Theatre in Aquinas Street in Waterloo.
    -- Tom Baker, in his autobiography


From bwarsaw at cnri.reston.va.us  Thu Mar 23 21:39:43 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 23 Mar 2000 15:39:43 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
References: <38DA7039.B7CDC6FF@tismer.com>
	<Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>
Message-ID: <14554.33039.4390.591036@anthem.cnri.reston.va.us>

>>>>> "gvwilson" ==   <gvwilson at nevex.com> writes:

    gvwilson> belongs to instances, or to the class as a whole,
    gvwilson> without referring back to the header file [1].  To quote
    gvwilson> the immortal Jeremy Hylton:

Not to take anything away from Jeremy, who has contributed some
wonderfully Pythonic quotes of his own, but this one is taken from Tim
Peters' Zen of Python

    http://www.python.org/doc/Humor.html#zen

timbot-is-the-only-one-who's-gonna-outlive-his-current-chip-set-
around-here-ly y'rs,

-Barry


From jeremy at cnri.reston.va.us  Thu Mar 23 21:55:25 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Thu, 23 Mar 2000 15:55:25 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>
References: <38DA7039.B7CDC6FF@tismer.com>
	<Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>
Message-ID: <14554.33590.844200.145871@walden>

>>>>> "GVW" == gvwilson  <gvwilson at nevex.com> writes:

  GVW> To quote the immortal Jeremy Hylton:

  GVW>     Pythonic design rules #2: 
  GVW>             Explicit is better than implicit.

I wish I could take credit for that :-).  Tim Peters posted a list of
20 Pythonic theses to comp.lang.python under the title "The Python
Way."  I'll collect them all here in hopes of future readers mistaking
me for Tim again <wink>.

     Beautiful is better than ugly.
     Explicit is better than implicit.
     Simple is better than complex.
     Complex is better than complicated.
     Flat is better than nested.
     Sparse is better than dense.
     Readability counts.
     Special cases aren't special enough to break the rules.
     Although practicality beats purity.
     Errors should never pass silently.
     Unless explicitly silenced.
     In the face of ambiguity, refuse the temptation to guess.
     There should be one-- and preferably only one --obvious way to do it.
     Although that way may not be obvious at first unless you're Dutch.     
     Now is better than never.
     Although never is often better than *right* now.
     If the implementation is hard to explain, it's a bad idea.
     If the implementation is easy to explain, it may be a good idea.
     Namespaces are one honking great idea -- let's do more of those! 
  
See
http://x27.deja.com/getdoc.xp?AN=485548918&CONTEXT=953844380.1254555688&hitnum=9
for the full post.

to-be-immortal-i'd-need-to-be-a-bot-ly y'rs
Jeremy


From jeremy at alum.mit.edu  Thu Mar 23 22:01:01 2000
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Thu, 23 Mar 2000 16:01:01 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <14554.34037.232728.670271@walden>

>>>>> "GVW" == gvwilson  <gvwilson at nevex.com> writes:

  GVW> I'd also like to ask (separately) that assignment to None be
  GVW> defined as a no-op, so that programmers can write:

  GVW>     year, month, None, None, None, None, weekday, None, None =
  GVW> gmtime(time())

  GVW> instead of having to create throw-away variables to fill in
  GVW> slots in tuples that they don't care about.  I think both
  GVW> behaviors are readable; the first provides genuinely new
  GVW> functionality, while I often found the second handy when I was
  GVW> doing logic programming.

-1 on this proposal

Pythonic design rule #8:
    Special cases aren't special enough to break the rules.

I think it's confusing to have assignment mean pop the top of the
stack for the special case that the name is None.  If Py3K makes None
a keyword, then it would also be the only keyword that can be used in
an assignment.  Finally, we'd need to explain to the rare newbie 
who used None as variable name why they assigned 12 to None but that
it's value was its name when it was later referenced.  (Think 
'print None'.)

When I need to ignore some of the return values, I use the name nil.

year, month, nil, nil, nil, nil, weekday, nil, nil = gmtime(time())

I think that's just as clear, only a whisker less efficient, and
requires no special cases.  Heck, it's even less typing <0.5 wink>.

Jeremy


From gvwilson at nevex.com  Thu Mar 23 21:59:41 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Thu, 23 Mar 2000 15:59:41 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.33590.844200.145871@walden>
Message-ID: <Pine.LNX.4.10.10003231558330.4218-100000@akbar.nevex.com>

>   GVW> To quote the immortal Jeremy Hylton:
>   GVW>     Pythonic design rules #2: 
>   GVW>             Explicit is better than implicit.
> 
> I wish I could take credit for that :-).  Tim Peters posted a list of
> 20 Pythonic theses to comp.lang.python under the title "The Python
> Way."

Traceback (innermost last):
  File "<stdin>", line 1, in ?
AttributionError: insight incorrectly ascribed


From paul at prescod.net  Thu Mar 23 22:26:42 2000
From: paul at prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 13:26:42 -0800
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com> <14554.34037.232728.670271@walden>
Message-ID: <38DA8C12.DFFD63D5@prescod.net>

Jeremy Hylton wrote:
> 
> ...
> year, month, nil, nil, nil, nil, weekday, nil, nil = gmtime(time())

So you're proposing nil as a new keyword?

I like it. +2

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"No, I'm not QUITE that stupid", Paul Prescod


From pf at artcom-gmbh.de  Thu Mar 23 22:46:49 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Thu, 23 Mar 2000 22:46:49 +0100 (MET)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.27113.546575.170565@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at "Mar 23, 2000  2: 0:57 pm"
Message-ID: <m12YFRJ-000CnDC@artcom0.artcom-gmbh.de>

Hi Barry!

> >>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:
> 
>     |     try:
>     |         del None
>     |     except SyntaxError:
>     |         pass # Wow running Py3K here!
 
Barry A. Warsaw:
> I know how to break your Py3K code: stick None=None some where higher
> up :)

Hmm.... I must admit, that I don't understand your argument.

In Python <= 1.5.2 'del None' works fine, iff it follows any assignment
to None in the same scope regardless, whether there has been a None=None
in the surrounding scope or in the same scope before this.

Since something like 'del for' or 'del import' raises a SyntaxError 
exception in Py152, I expect 'del None' to raise the same exception in
Py3K, after None has become a keyword.  Right?

Regards, Peter


From andy at reportlab.com  Thu Mar 23 22:54:23 2000
From: andy at reportlab.com (Andy Robinson)
Date: Thu, 23 Mar 2000 21:54:23 GMT
Subject: [Python-Dev] Unicode Character Names
In-Reply-To: <20000323202533.ABDB31CEF8@dinsdale.python.org>
References: <20000323202533.ABDB31CEF8@dinsdale.python.org>
Message-ID: <38da90b4.756297@post.demon.co.uk>

>Message: 20
>From: "Andrew M. Kuchling" <akuchlin at mems-exchange.org>
>Date: Thu, 23 Mar 2000 15:27:41 -0500 (EST)
>To: "python-dev at python.org" <python-dev at python.org>
>Subject: Re: [Python-Dev] Unicode character names
>
>Paul Prescod writes:
>>The new \N escape interpolates named characters within strings. For
>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
>>unicode smiley face at the end. 
>
>Cute idea, and it certainly means you can avoid looking up Unicode
>numbers.  (You can look up names instead. :) )  Note that this means the
>Unicode database is no longer optional if this is done; it has to be
>around at code-parsing time.  Python could import it automatically, as
>exceptions.py is imported.  Christian's work on compressing
>unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
>dragging around the Unicode database in the binary, or is it read out
>of some external file or data structure?)

I agree - the names are really useful.  If you are doing conversion
work, often you want to know what a character is, but don't have a
complete Unicode font handy.  Being able to get the description for a
Unicode character is useful, as well as being able to use the
description as a constructor for it.

Also, there are some language specific things that might make it
useful to have the full character descriptions in Christian's
database.  For example, we'll have an (optional, not in the standard
library) Japanese module with functions like 
isHalfWidthKatakana(), isFullWidthKatakana() to help normalize things.
Parsing the database and looking for strings in the descriptions is
one way to build this - not the only one, but it might be useful.

So I'd vote to put names in at first, and give us a few weeks to see
how useful they are before a final decision.

- Andy Robinson


From paul at prescod.net  Thu Mar 23 23:09:42 2000
From: paul at prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 14:09:42 -0800
Subject: [Python-Dev] Unicode character names
References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us>
Message-ID: <38DA9626.8B62DB77@prescod.net>

"Andrew M. Kuchling" wrote:
> 
> Paul Prescod writes:
> >The new \N escape interpolates named characters within strings. For
> >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> >unicode smiley face at the end.
> 
> Cute idea, and it certainly means you can avoid looking up Unicode
> numbers.  (You can look up names instead. :) )  

More important, though, the code is "self documenting". You never have
to go from the number back to the name.

> Note that this means the
> Unicode database is no longer optional if this is done; it has to be
> around at code-parsing time.  

I don't like the idea enough to exclude support for small machines or
anything like that. We should way the costs of requiring the Unicode
database at compile time.

> (Is Perl5.6 actually
> dragging around the Unicode database in the binary, or is it read out
> of some external file or data structure?)

I have no idea.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"I and my companions suffer from a disease of the heart that can only
be cured with gold", Hernan Cortes


From pf at artcom-gmbh.de  Thu Mar 23 23:12:25 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Thu, 23 Mar 2000 23:12:25 +0100 (MET)
Subject: [Python-Dev] Py3K: True and False builtin or keyword?
Message-ID: <m12YFq5-000CnDC@artcom0.artcom-gmbh.de>

Regarding the discussion about None becoming a keyword
in Py3K:  Recently the truth values True and False have been
mentioned.  Should they become builtin values --like None is now--
or should they become keywords?

Nevertheless: for the time being I came up with the following
weird idea:  If you put this in front of the main module of a Python app:

#!/usr/bin/env python
if __name__ == "__main__":
    import sys
    if sys.version[0] <= '1':
        __builtins__.True  = 1
        __builtins__.False = 0
    del sys
# --- continue with your app from here: ---
import foo, bar, ...
....

Now you can start to use False and True in any immported module 
as if they were already builtins.  Of course this is no surprise here
and Python is really fun, Peter.


From mal at lemburg.com  Thu Mar 23 22:07:35 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 23 Mar 2000 22:07:35 +0100
Subject: [Python-Dev] Unicode character names
References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us>
Message-ID: <38DA8797.F16301E4@lemburg.com>

"Andrew M. Kuchling" wrote:
> 
> Paul Prescod writes:
> >The new \N escape interpolates named characters within strings. For
> >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> >unicode smiley face at the end.
> 
> Cute idea, and it certainly means you can avoid looking up Unicode
> numbers.  (You can look up names instead. :) )  Note that this means the
> Unicode database is no longer optional if this is done; it has to be
> around at code-parsing time.  Python could import it automatically, as
> exceptions.py is imported.  Christian's work on compressing
> unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
> dragging around the Unicode database in the binary, or is it read out
> of some external file or data structure?)

Sorry to disappoint you guys, but the Unicode name and comments
are *not* included in the unicodedatabase.c file Christian
is currently working on. The reason is simple: it would add
huge amounts of string data to the file. So this is a no-no
for the core distribution...

Still, the above is easily possible by inventing a new
encoding, say unicode-with-smileys, which then reads in
a file containing the Unicode names and applies the necessary
magic to decode/encode data as Paul described above.

Would probably make a cool fun-project for someone who wants
to dive into writing codecs.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From bwarsaw at cnri.reston.va.us  Fri Mar 24 00:02:06 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Thu, 23 Mar 2000 18:02:06 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
References: <14554.27113.546575.170565@anthem.cnri.reston.va.us>
	<m12YFRJ-000CnDC@artcom0.artcom-gmbh.de>
Message-ID: <14554.41582.688247.569547@anthem.cnri.reston.va.us>

Hi Peter!

>>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:

    PF> Since something like 'del for' or 'del import' raises a
    PF> SyntaxError exception in Py152, I expect 'del None' to raise
    PF> the same exception in Py3K, after None has become a keyword.
    PF> Right?

I misread your example the first time through, but it still doesn't
quite parse on my second read.

-------------------- snip snip --------------------
pyvers = '2k'
try:
    del import
except SyntaxError:
    pyvers = '3k'
-------------------- snip snip --------------------
% python /tmp/foo.py
  File "/tmp/foo.py", line 3
    del import
             ^
SyntaxError: invalid syntax
-------------------- snip snip --------------------

See, you can't catch that SyntaxError because it doesn't happen at
run-time.  Maybe you meant to wrap the try suite in an exec?  Here's a
code sample that ought to work with 1.5.2 and the mythical
Py3K-with-a-None-keyword.

-------------------- snip snip --------------------
pyvers = '2k'
try:
    exec "del None"
except SyntaxError:
    pyvers = '3k'
except NameError:
    pass

print pyvers
-------------------- snip snip --------------------

Cheers,
-Barry


From klm at digicool.com  Fri Mar 24 00:05:08 2000
From: klm at digicool.com (Ken Manheimer)
Date: Thu, 23 Mar 2000 18:05:08 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <m12YFRJ-000CnDC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.LNX.4.21.0003231759571.3101-100000@korak.digicool.com>

On Thu, 23 Mar 2000 pf at artcom-gmbh.de wrote:

> Hi Barry!
> 
> > >>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:
> > 
> >     |     try:
> >     |         del None
> >     |     except SyntaxError:
> >     |         pass # Wow running Py3K here!
>  
> Barry A. Warsaw:
> > I know how to break your Py3K code: stick None=None some where higher
> > up :)

Huh.  Does anyone really think we're going to catch SyntaxError at
runtime, ever?  Seems like the code fragment above wouldn't work in the
first place.

But i suppose, with most of a millennium to emerge, py3k could have more
fundamental changes than i could even imagine...-)

Ken
klm at digicool.com


From pf at artcom-gmbh.de  Thu Mar 23 23:53:34 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Thu, 23 Mar 2000 23:53:34 +0100 (MET)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.27449.69043.924322@amarok.cnri.reston.va.us> from "Andrew M. Kuchling" at "Mar 23, 2000  2: 6:33 pm"
Message-ID: <m12YGTu-000CnDC@artcom0.artcom-gmbh.de>

Hi!

> Barry A. Warsaw writes:
> >>>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:
> >    PF> I wonder, how much existinng code the None --> keyword change
> >    PF> would brake.
> >Me too.
 
Andrew M. Kuchling:
> I can't conceive of anyone using None as a function name or a variable
> name, except through a bug or thinking that 'None, useful, None =
> 1,2,3' works.  Even though None isn't a fixed constant, it might as
> well be.  How much C code have you see lately that starts with int
> function(void *NULL) ?

I agree.  

urban legend:  Once upon a time someone found the following neat snippet 
of C source hidden in some header file of a very very huge software, 
after he has spend some nights trying to figure out, why some simple edits 
he made in order to make the code more readable broke the system:
	#ifdef TRUE
	/* eat this: you arrogant Quiche Eaters */
	#undef TRUE
	#undef FALSE
	#define TRUE (0)
	#define FALSE (1)
	#endif
Obviously the poor guy would have found this particular small piece of evil 
code much earlier, if he had simply 'grep'ed for comments... there were not 
so many in this system. ;-)

> Being able to do "None = 2" also smacks a bit of those legendary
> Fortran compilers that let you accidentally change 2 into 4.  +1 on
> this change for Py3K, and I doubt it would cause breakage even if
> introduced into 1.x.

We'll see: those "Real Programmers" never die.  Fortunately they
prefer Perl over Python. <0.5 grin>

Regards, Peter


From klm at digicool.com  Fri Mar 24 00:15:42 2000
From: klm at digicool.com (Ken Manheimer)
Date: Thu, 23 Mar 2000 18:15:42 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.41582.688247.569547@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003231809240.3101-100000@korak.digicool.com>

On Thu, 23 Mar 2000 bwarsaw at cnri.reston.va.us wrote:

> See, you can't catch that SyntaxError because it doesn't happen at
> run-time.  Maybe you meant to wrap the try suite in an exec?  Here's a

Huh.  Guess i should have read barry's re-response before i posted mine:

Desperately desiring to redeem myself, and contribute something to the
discussion, i'll settle the class/static method naming quandry with the
obvious alternative:

>  > p.classMethod("hey, cool!")        # also selfless

These should be called buddha methods - no self, samadhi, one with
everything, etc.

There, now i feel better.

:-)

Ken
klm at digicool.com

 A Zen monk walks up to a hotdog vendor and says "make me one with
 everything."

 Ha.  But that's not all.

 He gets the hot dog and pays with a ten.  After several moments
 waiting, he says to the vendor, "i was expecting change", and the
 vendor say, "you of all people should know, change comes from inside."

 That's all.


From bwarsaw at cnri.reston.va.us  Fri Mar 24 00:19:28 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 23 Mar 2000 18:19:28 -0500 (EST)
Subject: [Python-Dev] Py3K: True and False builtin or keyword?
References: <m12YFq5-000CnDC@artcom0.artcom-gmbh.de>
Message-ID: <14554.42624.213027.854942@anthem.cnri.reston.va.us>

>>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:

    PF> Now you can start to use False and True in any immported
    PF> module as if they were already builtins.  Of course this is no
    PF> surprise here and Python is really fun, Peter.

You /can/ do this, but that doesn't mean you /should/ :) Mucking with
builtins is fun the way huffing dry erase markers is fun.  Things are
very pretty at first, but eventually the brain cell lossage will more
than outweigh that cheap thrill.

I've seen a few legitimate uses for hacking builtins.  In Zope, I
believe Jim hacks get_transaction() or somesuch into builtins because
that way it's easy to get at without passing it through the call tree.
And in Zope it makes sense since this is a fancy database application
and your current transaction is a central concept.

I've occasionally wrapped an existing builtin because I needed to
extend it's functionality while keeping it's semantics and API
unchanged.  An example of this was my pre-Python-1.5.2 open_ex() in
Mailman's CGI driver script.  Before builtin open() would print the
failing file name, my open_ex() -- shown below -- would hack that into
the exception object.

But one of the things about Python that I /really/ like is that YOU
KNOW WHERE THINGS COME FROM.  If I suddenly start seeing True and
False in your code, I'm going to look for function locals and args,
then module globals, then from ... import *'s.  If I don't see it in
any of those, I'm going to put down my dry erase markers, look again,
and then utter a loud "huh?" :)

-Barry

realopen = open
def open_ex(filename, mode='r', bufsize=-1, realopen=realopen):
    from Mailman.Utils import reraise
    try:
        return realopen(filename, mode, bufsize)
    except IOError, e:
        strerror = e.strerror + ': ' + filename
        e.strerror = strerror
        e.filename = filename
        e.args = (e.args[0], strerror)
        reraise(e)

import __builtin__
__builtin__.__dict__['open'] = open_ex


From pf at artcom-gmbh.de  Fri Mar 24 00:23:57 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Fri, 24 Mar 2000 00:23:57 +0100 (MET)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.21.0003231759571.3101-100000@korak.digicool.com> from Ken Manheimer at "Mar 23, 2000  6: 5: 8 pm"
Message-ID: <m12YGxJ-000CnDC@artcom0.artcom-gmbh.de>

Hi!

> > >     |     try:
> > >     |         del None
> > >     |     except SyntaxError:
> > >     |         pass # Wow running Py3K here!
> >  
> > Barry A. Warsaw:
> > > I know how to break your Py3K code: stick None=None some where higher
> > > up :)
> 
Ken Manheimer:
> Huh.  Does anyone really think we're going to catch SyntaxError at
> runtime, ever?  Seems like the code fragment above wouldn't work in the
> first place.

Ouuppps... 

Unfortunately I had no chance to test this with Py3K before making a
fool of myself by posting this silly example.  Now I understand what
Barry meant.  So if None really becomes a keyword in Py3K we can be
sure to catch all those imaginary 'del None' statements very quickly.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From billtut at microsoft.com  Fri Mar 24 03:46:06 2000
From: billtut at microsoft.com (Bill Tutt)
Date: Thu, 23 Mar 2000 18:46:06 -0800
Subject: [Python-Dev] Re: Unicode character names
Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>

MAL wrote:

>Andrew M. Kuchling" wrote:
>> 
>> Paul Prescod writes:
>>>The new \N escape interpolates named characters within strings. For
>>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
>>>unicode smiley face at the end.
>> 
>> Cute idea, and it certainly means you can avoid looking up Unicode
>> numbers.  (You can look up names instead. :) )  Note that this means the
>> Unicode database is no longer optional if this is done; it has to be
>> around at code-parsing time.  Python could import it automatically, as
>> exceptions.py is imported.  Christian's work on compressing
>> unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
>> dragging around the Unicode database in the binary, or is it read out
>> of some external file or data structure?)
>
> Sorry to disappoint you guys, but the Unicode name and comments
> are *not* included in the unicodedatabase.c file Christian
> is currently working on. The reason is simple: it would add
> huge amounts of string data to the file. So this is a no-no
> for the core distribution...
>

Ok, now you're just being silly. Its possible to put the character names in
a separate structure so that they don't automatically get paged in with the
normal unicode character property data. If you never use it, it won't get
paged in, its that simple....

Looking up the Unicode code value from the Unicode character name smells
like a good time to use gperf to generate a perfect hash function for the
character names. Esp. for the Unicode 3.0 character namespace. Then you can
just store the hashkey -> Unicode character mapping, and hardly ever need to
page in the actual full character name string itself.

I haven't looked at what the comment field contains, so I have no idea how
useful that info is.

*waits while gperf crunches through the ~10,550 Unicode characters where
this would be useful*

Bill


From akuchlin at mems-exchange.org  Fri Mar 24 03:51:25 2000
From: akuchlin at mems-exchange.org (Andrew Kuchling)
Date: Thu, 23 Mar 2000 21:51:25 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
Message-ID: <200003240251.VAA19921@newcnri.cnri.reston.va.us>

I've written up a list of things that need to get done before 1.6 is
finished.  This is my vision of what needs to be done, and doesn't
have an official stamp of approval from GvR or anyone else.  So it's
very probably wrong.

http://starship.python.net/crew/amk/python/1.6-jobs.html

Here's the list formatted as text.  The major outstanding things at
the moment seem to be sre and Distutils; once they go in, you could
probably release an alpha, because the other items are relatively
minor.

Still to do

     * XXX Revamped import hooks (or is this a post-1.6 thing?)
     * Update the documentation to match 1.6 changes.
     * Document more undocumented modules
     * Unicode: Add Unicode support for open() on Windows
     * Unicode: Compress the size of unicodedatabase
     * Unicode: Write \N{SMILEY} codec for Unicode
     * Unicode: the various XXX items in Misc/unicode.txt
     * Add module: Distutils
     * Add module: Jim Ahlstrom's zipfile.py
     * Add module: PyExpat interface
     * Add module: mmapfile
     * Add module: sre
     * Drop cursesmodule and package it separately. (Any other obsolete
       modules that should go?)
     * Delete obsolete subdirectories in Demo/ directory
     * Refurbish Demo subdirectories to be properly documented, match
       modern coding style, etc.
     * Support Unicode strings in PyExpat interface
     * Fix ./ld_so_aix installation problem on AIX
     * Make test.regrtest.py more usable outside of the Python test suite
     * Conservative garbage collection of cycles (maybe?)
     * Write friendly "What's New in 1.6" document/article

Done

   Nothing at the moment.

After 1.7

     * Rich comparisons
     * Revised coercions
     * Parallel for loop (for i in L; j in M: ...),
     * Extended slicing for all sequences.
     * GvR: "I've also been thinking about making classes be types (not
       as huge a change as you think, if you don't allow subclassing
       built-in types), and adding a built-in array type suitable for use
       by NumPy."

--amk


From esr at thyrsus.com  Fri Mar 24 04:30:53 2000
From: esr at thyrsus.com (Eric S. Raymond)
Date: Thu, 23 Mar 2000 22:30:53 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003240251.VAA19921@newcnri.cnri.reston.va.us>; from Andrew Kuchling on Thu, Mar 23, 2000 at 09:51:25PM -0500
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
Message-ID: <20000323223053.J28880@thyrsus.com>

Andrew Kuchling <akuchlin at mems-exchange.org>:
>      * Drop cursesmodule and package it separately. (Any other obsolete
>        modules that should go?)

Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel
configuration system I'm writing.  Why is it on the hit list?
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

Still, if you will not fight for the right when you can easily
win without bloodshed, if you will not fight when your victory
will be sure and not so costly, you may come to the moment when
you will have to fight with all the odds against you and only a
precarious chance for survival. There may be a worse case.  You
may have to fight when there is no chance of victory, because it
is better to perish than to live as slaves.
	--Winston Churchill


From dan at cgsoftware.com  Fri Mar 24 04:52:54 2000
From: dan at cgsoftware.com (Daniel Berlin+list.python-dev)
Date: 23 Mar 2000 22:52:54 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: "Eric S. Raymond"'s message of "Thu, 23 Mar 2000 22:30:53 -0500"
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com>
Message-ID: <4s9x6n3d.fsf@dan.resnet.rochester.edu>

"Eric S. Raymond" <esr at thyrsus.com> writes:


> Andrew Kuchling <akuchlin at mems-exchange.org>:
> >      * Drop cursesmodule and package it separately. (Any other obsolete
> >        modules that should go?)
> 
> Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel
> configuration system I'm writing.  Why is it on the hit list?

IIRC, it's because nobody really maintains it, and those that care
about it, use a different one (either ncurses module, or a newer cursesmodule).
So from what i understand, you get complaints, but no real advantage
to having it there.
I'm just trying to summarize, not fall on either side (some people get
touchy about issues like this).

--Dan


From esr at thyrsus.com  Fri Mar 24 05:11:37 2000
From: esr at thyrsus.com (Eric S. Raymond)
Date: Thu, 23 Mar 2000 23:11:37 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <4s9x6n3d.fsf@dan.resnet.rochester.edu>; from Daniel Berlin+list.python-dev on Thu, Mar 23, 2000 at 10:52:54PM -0500
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu>
Message-ID: <20000323231137.U28880@thyrsus.com>

Daniel Berlin+list.python-dev <dan at cgsoftware.com>:
> > Andrew Kuchling <akuchlin at mems-exchange.org>:
> > >      * Drop cursesmodule and package it separately. (Any other obsolete
> > >        modules that should go?)
> > 
> > Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel
> > configuration system I'm writing.  Why is it on the hit list?
> 
> IIRC, it's because nobody really maintains it, and those that care
> about it, use a different one (either ncurses module, or a newer cursesmodule).
> So from what i understand, you get complaints, but no real advantage
> to having it there.

OK.  Then what I guess I'd like is for a maintained equivalent of this
to join the core -- the ncurses module you referred to, for choice.

I'm not being random.  I'm trying to replace the mess that currently 
constitutes the kbuild system -- but I'll need to support an equivalent
of menuconfig.
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

"The state calls its own violence `law', but that of the individual `crime'"
	-- Max Stirner


From akuchlin at mems-exchange.org  Fri Mar 24 05:33:24 2000
From: akuchlin at mems-exchange.org (Andrew Kuchling)
Date: Thu, 23 Mar 2000 23:33:24 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <20000323231137.U28880@thyrsus.com>
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
	<20000323223053.J28880@thyrsus.com>
	<4s9x6n3d.fsf@dan.resnet.rochester.edu>
	<20000323231137.U28880@thyrsus.com>
Message-ID: <14554.61460.311650.599253@newcnri.cnri.reston.va.us>

Eric S. Raymond writes:
>OK.  Then what I guess I'd like is for a maintained equivalent of this
>to join the core -- the ncurses module you referred to, for choice.

See the "Whither cursesmodule" thread in the python-dev archives:
http://www.python.org/pipermail/python-dev/2000-February/003796.html

One possibility was to blow off backward compatibility; are there any
systems that only have BSD curses, not SysV curses / ncurses?  Given
that Pavel Curtis announced he was dropping BSD curses maintainance
some years ago, I expect even the *BSDs use ncurses these days. 

However, Oliver Andrich doesn't seem interested in maintaining his
ncurses module, and someone just started a SWIG-generated interface
(http://pyncurses.sourceforge.net), so it's not obvious which one
you'd use.  (I *would* be willing to take over maintaining Andrich's
code; maintaining the BSD curses version just seems pointless these
days.)

--amk


From dan at cgsoftware.com  Fri Mar 24 05:43:51 2000
From: dan at cgsoftware.com (Daniel Berlin+list.python-dev)
Date: 23 Mar 2000 23:43:51 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Andrew Kuchling's message of "Thu, 23 Mar 2000 23:33:24 -0500 (EST)"
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> <14554.61460.311650.599253@newcnri.cnri.reston.va.us>
Message-ID: <em915660.fsf@dan.resnet.rochester.edu>

Andrew Kuchling <akuchlin at mems-exchange.org> writes:


> Eric S. Raymond writes:
> >OK.  Then what I guess I'd like is for a maintained equivalent of this
> >to join the core -- the ncurses module you referred to, for choice.
> 
> See the "Whither cursesmodule" thread in the python-dev archives:
> http://www.python.org/pipermail/python-dev/2000-February/003796.html
> 
> One possibility was to blow off backward compatibility; are there any
> systems that only have BSD curses, not SysV curses / ncurses?  Given
> that Pavel Curtis announced he was dropping BSD curses maintainance
> some years ago, I expect even the *BSDs use ncurses these days. 

Yes, they do.
ls /usr/src/lib/libncurses/
Makefile  ncurses_cfg.h  pathnames.h termcap.c
grep 5\.0 /usr/src/contrib/ncurses/*
<Shows the source tree contains ncurses 5.0>

At least, this is FreeBSD.
So there is no need for BSD curses anymore, on FreeBSD's account.


> --amk
> 


From esr at thyrsus.com  Fri Mar 24 05:47:56 2000
From: esr at thyrsus.com (Eric S. Raymond)
Date: Thu, 23 Mar 2000 23:47:56 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14554.61460.311650.599253@newcnri.cnri.reston.va.us>; from Andrew Kuchling on Thu, Mar 23, 2000 at 11:33:24PM -0500
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> <14554.61460.311650.599253@newcnri.cnri.reston.va.us>
Message-ID: <20000323234756.A29775@thyrsus.com>

Andrew Kuchling <akuchlin at mems-exchange.org>:
> Eric S. Raymond writes:
> >OK.  Then what I guess I'd like is for a maintained equivalent of this
> >to join the core -- the ncurses module you referred to, for choice.
> 
> See the "Whither cursesmodule" thread in the python-dev archives:
> http://www.python.org/pipermail/python-dev/2000-February/003796.html
> 
> One possibility was to blow off backward compatibility; are there any
> systems that only have BSD curses, not SysV curses / ncurses?  Given
> that Pavel Curtis announced he was dropping BSD curses maintainance
> some years ago, I expect even the *BSDs use ncurses these days. 

BSD curses was officially declared dead by its maintainer, Keith
Bostic, in early 1995.  Keith and I conspired to kill it of in favor
of ncurses :-).
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

If gun laws in fact worked, the sponsors of this type of legislation
should have no difficulty drawing upon long lists of examples of
criminal acts reduced by such legislation. That they cannot do so
after a century and a half of trying -- that they must sweep under the
rug the southern attempts at gun control in the 1870-1910 period, the
northeastern attempts in the 1920-1939 period, the attempts at both
Federal and State levels in 1965-1976 -- establishes the repeated,
complete and inevitable failure of gun laws to control serious crime.
        -- Senator Orrin Hatch, in a 1982 Senate Report


From andy at reportlab.com  Fri Mar 24 11:14:44 2000
From: andy at reportlab.com (Andy Robinson)
Date: Fri, 24 Mar 2000 10:14:44 GMT
Subject: [Python-Dev] Unicode character names
In-Reply-To: <20000324024913.B8C3A1CF22@dinsdale.python.org>
References: <20000324024913.B8C3A1CF22@dinsdale.python.org>
Message-ID: <38db3fc6.7370137@post.demon.co.uk>

On Thu, 23 Mar 2000 21:49:13 -0500 (EST), you wrote:

>Sorry to disappoint you guys, but the Unicode name and comments
>are *not* included in the unicodedatabase.c file Christian
>is currently working on. The reason is simple: it would add
>huge amounts of string data to the file. So this is a no-no
>for the core distribution...


You're right about what is compiled into the core.  I have to keep
reminding myself to distinguish three places functionality can live:

1. What is compiled into the Python core
2. What is in the standard Python library relating to encodings.  
3. Completely separate add-on packages, maintained outside of Python,
to provide extra functionality for (e.g.) Asian encodings.

It is clear that both the Unicode database, and the mapping tables and
other files at unicode.org, are a great resource; but they could be
placed in (2) or (3) easily, along with scripts to unpack them.  It
probably makes sense for the i18n-sig to kick off a separate
'CodecKit' project for now, and we can see what good emerges from it
before thinking about what should go into the library.

>Still, the above is easily possible by inventing a new
>encoding, say unicode-with-smileys, which then reads in
>a file containing the Unicode names and applies the necessary
>magic to decode/encode data as Paul described above.
>Would probably make a cool fun-project for someone who wants
>to dive into writing codecs.
Yup.  Prime candidate for CodecKit.


- Andy


From mal at lemburg.com  Fri Mar 24 09:52:36 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 09:52:36 +0100
Subject: [Python-Dev] Re: Unicode character names
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
Message-ID: <38DB2CD4.CAD9F0E2@lemburg.com>

Bill Tutt wrote:
> 
> MAL wrote:
> 
> >Andrew M. Kuchling" wrote:
> >>
> >> Paul Prescod writes:
> >>>The new \N escape interpolates named characters within strings. For
> >>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> >>>unicode smiley face at the end.
> >>
> >> Cute idea, and it certainly means you can avoid looking up Unicode
> >> numbers.  (You can look up names instead. :) )  Note that this means the
> >> Unicode database is no longer optional if this is done; it has to be
> >> around at code-parsing time.  Python could import it automatically, as
> >> exceptions.py is imported.  Christian's work on compressing
> >> unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
> >> dragging around the Unicode database in the binary, or is it read out
> >> of some external file or data structure?)
> >
> > Sorry to disappoint you guys, but the Unicode name and comments
> > are *not* included in the unicodedatabase.c file Christian
> > is currently working on. The reason is simple: it would add
> > huge amounts of string data to the file. So this is a no-no
> > for the core distribution...
> >
> 
> Ok, now you're just being silly. Its possible to put the character names in
> a separate structure so that they don't automatically get paged in with the
> normal unicode character property data. If you never use it, it won't get
> paged in, its that simple....

Sure, but it would still cause the interpreter binary or DLL
to increase in size considerably... that caused some major
noise a few days ago due to the fact that the unicodedata module
adds some 600kB to the interpreter -- even though it would
only get swapped in when needed (the interpreter itself doesn't
use it).
 
> Looking up the Unicode code value from the Unicode character name smells
> like a good time to use gperf to generate a perfect hash function for the
> character names. Esp. for the Unicode 3.0 character namespace. Then you can
> just store the hashkey -> Unicode character mapping, and hardly ever need to
> page in the actual full character name string itself.

Great idea, but why not put this into separate codec module ?
 
> I haven't looked at what the comment field contains, so I have no idea how
> useful that info is.

Probably not worth looking at...
 
> *waits while gperf crunches through the ~10,550 Unicode characters where
> this would be useful*

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Fri Mar 24 11:37:53 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 11:37:53 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl> <38D8F55E.6E324281@lemburg.com>
Message-ID: <38DB4581.EB5315E0@lemburg.com>

Ok, I've just added two new parser markers to PyArg_ParseTuple()
which will hopefully make life a little easier for extension
writers.

The new code will be in the next patch set which I will release
early next week.

Here are the docs:

Internal Argument Parsing:
--------------------------

These markers are used by the PyArg_ParseTuple() APIs:

  "U":  Check for Unicode object and return a pointer to it

  "s":  For Unicode objects: auto convert them to the <default encoding>
        and return a pointer to the object's <defencstr> buffer.

  "s#": Access to the Unicode object via the bf_getreadbuf buffer interface 
        (see Buffer Interface); note that the length relates to the buffer
        length, not the Unicode string length (this may be different
        depending on the Internal Format).

  "t#": Access to the Unicode object via the bf_getcharbuf buffer interface
        (see Buffer Interface); note that the length relates to the buffer
        length, not necessarily to the Unicode string length (this may
        be different depending on the <default encoding>).

  "es": 
	Takes two parameters: encoding (const char **) and
	buffer (char **). 

	The input object is first coerced to Unicode in the usual way
	and then encoded into a string using the given encoding.

	On output, a buffer of the needed size is allocated and
	returned through *buffer as NULL-terminated string.
	The encoded may not contain embedded NULL characters.
	The caller is responsible for free()ing the allocated *buffer
	after usage.

  "es#":
	Takes three parameters: encoding (const char **),
	buffer (char **) and buffer_len (int *).
	
	The input object is first coerced to Unicode in the usual way
	and then encoded into a string using the given encoding.

	If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer)
	on input. Output is then copied to *buffer.

	If *buffer is NULL, a buffer of the needed size is
	allocated and output copied into it. *buffer is then
	updated to point to the allocated memory area. The caller
	is responsible for free()ing *buffer after usage.

	In both cases *buffer_len is updated to the number of
	characters written (excluding the trailing NULL-byte).
	The output buffer is assured to be NULL-terminated.

Examples:

Using "es#" with auto-allocation:

    static PyObject *
    test_parser(PyObject *self,
		PyObject *args)
    {
	PyObject *str;
	const char *encoding = "latin-1";
	char *buffer = NULL;
	int buffer_len = 0;

	if (!PyArg_ParseTuple(args, "es#:test_parser",
			      &encoding, &buffer, &buffer_len))
	    return NULL;
	if (!buffer) {
	    PyErr_SetString(PyExc_SystemError,
			    "buffer is NULL");
	    return NULL;
	}
	str = PyString_FromStringAndSize(buffer, buffer_len);
	free(buffer);
	return str;
    }

Using "es" with auto-allocation returning a NULL-terminated string:    
    
    static PyObject *
    test_parser(PyObject *self,
		PyObject *args)
    {
	PyObject *str;
	const char *encoding = "latin-1";
	char *buffer = NULL;

	if (!PyArg_ParseTuple(args, "es:test_parser",
			      &encoding, &buffer))
	    return NULL;
	if (!buffer) {
	    PyErr_SetString(PyExc_SystemError,
			    "buffer is NULL");
	    return NULL;
	}
	str = PyString_FromString(buffer);
	free(buffer);
	return str;
    }

Using "es#" with a pre-allocated buffer:
    
    static PyObject *
    test_parser(PyObject *self,
		PyObject *args)
    {
	PyObject *str;
	const char *encoding = "latin-1";
	char _buffer[10];
	char *buffer = _buffer;
	int buffer_len = sizeof(_buffer);

	if (!PyArg_ParseTuple(args, "es#:test_parser",
			      &encoding, &buffer, &buffer_len))
	    return NULL;
	if (!buffer) {
	    PyErr_SetString(PyExc_SystemError,
			    "buffer is NULL");
	    return NULL;
	}
	str = PyString_FromStringAndSize(buffer, buffer_len);
	return str;
    }

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein at lyra.org  Fri Mar 24 11:54:02 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 02:54:02 -0800 (PST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <38DB4581.EB5315E0@lemburg.com>
Message-ID: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, M.-A. Lemburg wrote:
>...
>   "s":  For Unicode objects: auto convert them to the <default encoding>
>         and return a pointer to the object's <defencstr> buffer.

Guess that I didn't notice this before, but it seems wierd that "s" and
"s#" return different encodings.

Why?

>   "es": 
> 	Takes two parameters: encoding (const char **) and
> 	buffer (char **). 
>...
>   "es#":
> 	Takes three parameters: encoding (const char **),
> 	buffer (char **) and buffer_len (int *).

I see no reason to make the encoding (const char **) rather than
(const char *). We are never returning a value, so this just makes it
harder to pass the encoding into ParseTuple.

There is precedent for passing in single-ref pointers. For example:

  PyArg_ParseTuple(args, "O!", &s, PyString_Type)


I would recommend using just one pointer level for the encoding.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Fri Mar 24 12:29:12 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 12:29:12 +0100
Subject: [Python-Dev] Unicode and Windows
References: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
Message-ID: <38DB5188.AA580652@lemburg.com>

Greg Stein wrote:
> 
> On Fri, 24 Mar 2000, M.-A. Lemburg wrote:
> >...
> >   "s":  For Unicode objects: auto convert them to the <default encoding>
> >         and return a pointer to the object's <defencstr> buffer.
> 
> Guess that I didn't notice this before, but it seems wierd that "s" and
> "s#" return different encodings.
> 
> Why?

This is due to the buffer interface being used for "s#". Since
"s#" refers to the getreadbuf slot, it returns raw data. In
this case this is UTF-16 in platform dependent byte order.

"s" relies on NULL-terminated strings and doesn't use the
buffer interface at all. Thus "s" returns NULL-terminated
UTF-8 (UTF-16 is full of NULLs).
 
"t#" uses the getcharbuf slot and thus should return character
data. UTF-8 is the right encoding here.

> >   "es":
> >       Takes two parameters: encoding (const char **) and
> >       buffer (char **).
> >...
> >   "es#":
> >       Takes three parameters: encoding (const char **),
> >       buffer (char **) and buffer_len (int *).
> 
> I see no reason to make the encoding (const char **) rather than
> (const char *). We are never returning a value, so this just makes it
> harder to pass the encoding into ParseTuple.
> 
> There is precedent for passing in single-ref pointers. For example:
> 
>   PyArg_ParseTuple(args, "O!", &s, PyString_Type)
> 
> I would recommend using just one pointer level for the encoding.

You have a point there... even though it breaks the notion
of prepending all parameters with an '&' (ok, except the
type check one). OTOH, it would allow passing the encoding
right with the PyArg_ParseTuple() call which probably makes
more sense in this context.

I'll change it...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From tismer at tismer.com  Fri Mar 24 14:13:02 2000
From: tismer at tismer.com (Christian Tismer)
Date: Fri, 24 Mar 2000 14:13:02 +0100
Subject: [Python-Dev] Unicode character names
References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> <38DA8797.F16301E4@lemburg.com>
Message-ID: <38DB69DE.6D04B084@tismer.com>


"M.-A. Lemburg" wrote:
> 
> "Andrew M. Kuchling" wrote:
> >
> > Paul Prescod writes:
> > >The new \N escape interpolates named characters within strings. For
> > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> > >unicode smiley face at the end.
> >
> > Cute idea, and it certainly means you can avoid looking up Unicode
> > numbers.  (You can look up names instead. :) )  Note that this means the
> > Unicode database is no longer optional if this is done; it has to be
> > around at code-parsing time.  Python could import it automatically, as
> > exceptions.py is imported.  Christian's work on compressing
> > unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
> > dragging around the Unicode database in the binary, or is it read out
> > of some external file or data structure?)
> 
> Sorry to disappoint you guys, but the Unicode name and comments
> are *not* included in the unicodedatabase.c file Christian
> is currently working on. The reason is simple: it would add
> huge amounts of string data to the file. So this is a no-no
> for the core distribution...

This is not settled, still an open question.
What I have for non-textual data:
25 kb with dumb compression
15 kb with enhanced compression

What amounts of data am I talking about?
- The whole unicode database text file has size 
  632 kb.
- With PkZip this goes down to 
  96 kb.

Now, I produced another text file with just the currently
used data in it, and this sounds so:
- the stripped unicode text file has size
  216 kb.
- PkZip melts this down to
  40 kb.

Please compare that to my results above: I can do at least
twice as good. I hope I can compete for the text sections
as well (since this is something where zip is *good* at),
but just let me try.
Let's target 60 kb for the whole crap, and I'd be very pleased.

Then, there is still the question where to put the data.
Having one file in the dll and another externally would
be an option. I could also imagine to use a binary external
file all the time, with maximum possible compression.
By loading this structure, this would be partially expanded
to make it fast.
An advantage is that the compressed Unicode database
could become a stand-alone product. The size is in fact
so crazy small, that I'd like to make this available
to any other language.

> Still, the above is easily possible by inventing a new
> encoding, say unicode-with-smileys, which then reads in
> a file containing the Unicode names and applies the necessary
> magic to decode/encode data as Paul described above.

That sounds reasonable. Compression makes sense as well here,
since the expanded stuff makes quite an amount of kb, compared
to what it is "worth", compared to, say, the Python dll.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From mal at lemburg.com  Fri Mar 24 14:41:27 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 14:41:27 +0100
Subject: [Python-Dev] Unicode character names
References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> <38DA8797.F16301E4@lemburg.com> <38DB69DE.6D04B084@tismer.com>
Message-ID: <38DB7087.1B105AC7@lemburg.com>

Christian Tismer wrote:
> 
> "M.-A. Lemburg" wrote:
> >
> > "Andrew M. Kuchling" wrote:
> > >
> > > Paul Prescod writes:
> > > >The new \N escape interpolates named characters within strings. For
> > > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> > > >unicode smiley face at the end.
> > >
> > > Cute idea, and it certainly means you can avoid looking up Unicode
> > > numbers.  (You can look up names instead. :) )  Note that this means the
> > > Unicode database is no longer optional if this is done; it has to be
> > > around at code-parsing time.  Python could import it automatically, as
> > > exceptions.py is imported.  Christian's work on compressing
> > > unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
> > > dragging around the Unicode database in the binary, or is it read out
> > > of some external file or data structure?)
> >
> > Sorry to disappoint you guys, but the Unicode name and comments
> > are *not* included in the unicodedatabase.c file Christian
> > is currently working on. The reason is simple: it would add
> > huge amounts of string data to the file. So this is a no-no
> > for the core distribution...
> 
> This is not settled, still an open question.

Well, ok, depends on how much you can sqeeze out of the
text columns ;-) I still think that its better to leave
these gimmicks out of the core and put them into some
add-on, though.

> What I have for non-textual data:
> 25 kb with dumb compression
> 15 kb with enhanced compression

Looks good :-) With these sizes I think we could even integrate
the unicodedatabase.c + API into the core interpreter and
only have the unicodedata module to access the database
from within Python.
 
> What amounts of data am I talking about?
> - The whole unicode database text file has size
>   632 kb.
> - With PkZip this goes down to
>   96 kb.
> 
> Now, I produced another text file with just the currently
> used data in it, and this sounds so:
> - the stripped unicode text file has size
>   216 kb.
> - PkZip melts this down to
>   40 kb.
> 
> Please compare that to my results above: I can do at least
> twice as good. I hope I can compete for the text sections
> as well (since this is something where zip is *good* at),
> but just let me try.
> Let's target 60 kb for the whole crap, and I'd be very pleased.
>
> Then, there is still the question where to put the data.
> Having one file in the dll and another externally would
> be an option. I could also imagine to use a binary external
> file all the time, with maximum possible compression.
> By loading this structure, this would be partially expanded
> to make it fast.
> An advantage is that the compressed Unicode database
> could become a stand-alone product. The size is in fact
> so crazy small, that I'd like to make this available
> to any other language.

You could take the unicodedatabase.c file (+ header file)
and use it everywhere... I don't think it needs to contain
any Python specific code. The API names would have to follow
the Python naming schemes though.
 
> > Still, the above is easily possible by inventing a new
> > encoding, say unicode-with-smileys, which then reads in
> > a file containing the Unicode names and applies the necessary
> > magic to decode/encode data as Paul described above.
> 
> That sounds reasonable. Compression makes sense as well here,
> since the expanded stuff makes quite an amount of kb, compared
> to what it is "worth", compared to, say, the Python dll.

With 25kB for the non-text columns, I'd suggest simply
adding the file to the core. Text columns could then
go into a separate module.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Fri Mar 24 15:14:51 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 09:14:51 -0500
Subject: [Python-Dev] Hi -- I'm back!
Message-ID: <200003241414.JAA11740@eric.cnri.reston.va.us>

I'm back from ten days on the road.  I'll try to dig through the
various mailing list archives over the next few days, but it would be
more efficient if you are waiting for me to take action or express an
opinion on a particular issue (in *any* Python-related mailing list)
to mail me a summary or at least a pointer.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jack at oratrix.nl  Fri Mar 24 16:01:25 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Fri, 24 Mar 2000 16:01:25 +0100
Subject: [Python-Dev] None as a keyword / class methods 
In-Reply-To: Message by Ka-Ping Yee <ping@lfw.org> ,
	     Thu, 23 Mar 2000 09:47:47 -0800 (PST) , <Pine.LNX.4.10.10003230942180.1187-100000@localhost> 
Message-ID: <20000324150125.7144A370CF2@snelboot.oratrix.nl>

> Hmm... i guess this also means one should ask what
> 
>     def function(None, arg):
>         ...
> 
> does outside a class definition.  I suppose that should simply
> be illegal.

No, it forces you to call the function with keyword arguments!
(initially meant jokingly, but thinking about it for a couple of seconds there 
might actually be cases where this is useful)
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From skip at mojam.com  Fri Mar 24 16:14:11 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 24 Mar 2000 09:14:11 -0600 (CST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
Message-ID: <14555.34371.749039.946891@beluga.mojam.com>

    AMK> I've written up a list of things that need to get done before 1.6
    AMK> is finished.  This is my vision of what needs to be done, and
    AMK> doesn't have an official stamp of approval from GvR or anyone else.
    AMK> So it's very probably wrong.

Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
of general usefulness (this is at least generally useful for anyone writing
web spiders ;-) shouldn't live in Tools, because it's not always available
and users need to do extra work to make them available.

I'd be happy to write up some documentation for it and twiddle the module to 
include doc strings.

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From fdrake at acm.org  Fri Mar 24 16:20:03 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 10:20:03 -0500 (EST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
References: <38DB4581.EB5315E0@lemburg.com>
	<Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
Message-ID: <14555.34723.841426.504538@weyr.cnri.reston.va.us>

Greg Stein writes:
 > There is precedent for passing in single-ref pointers. For example:
 > 
 >   PyArg_ParseTuple(args, "O!", &s, PyString_Type)
                                  ^^^^^^^^^^^^^^^^^

  Feeling ok?  I *suspect* these are reversed.  :)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake at acm.org  Fri Mar 24 16:24:13 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 10:24:13 -0500 (EST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <38DB5188.AA580652@lemburg.com>
References: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
	<38DB5188.AA580652@lemburg.com>
Message-ID: <14555.34973.303273.716146@weyr.cnri.reston.va.us>

M.-A. Lemburg writes:
 > You have a point there... even though it breaks the notion
 > of prepending all parameters with an '&' (ok, except the

  I've never heard of this notion; I hope I didn't just miss it in the 
docs!
  The O& also doesn't require a & in front of the name of the
conversion function, you just pass the right value.  So there are at
least two cases where you *typically* don't use a &.  (Other cases in
the 1.5.2 API are probably just plain weird if they don't!)
  Changing it to avoid the extra machinery is the Right Thing; you get 
to feel good today.  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From mal at lemburg.com  Fri Mar 24 17:38:06 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 17:38:06 +0100
Subject: [Python-Dev] Unicode and Windows
References: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
		<38DB5188.AA580652@lemburg.com> <14555.34973.303273.716146@weyr.cnri.reston.va.us>
Message-ID: <38DB99EE.F5949889@lemburg.com>

"Fred L. Drake, Jr." wrote:
> 
> M.-A. Lemburg writes:
>  > You have a point there... even though it breaks the notion
>  > of prepending all parameters with an '&' (ok, except the
> 
>   I've never heard of this notion; I hope I didn't just miss it in the
> docs!

If you scan the parameters list in getargs.c you'll come to
this conclusion and thus my notion: I've been programming like
this for years now :-)

>   The O& also doesn't require a & in front of the name of the
> conversion function, you just pass the right value.  So there are at
> least two cases where you *typically* don't use a &.  (Other cases in
> the 1.5.2 API are probably just plain weird if they don't!)
>   Changing it to avoid the extra machinery is the Right Thing; you get
> to feel good today.  ;)

Ok, feeling good now ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Fri Mar 24 21:44:02 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 15:44:02 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 09:14:11 CST."
             <14555.34371.749039.946891@beluga.mojam.com> 
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>  
            <14555.34371.749039.946891@beluga.mojam.com> 
Message-ID: <200003242044.PAA00677@eric.cnri.reston.va.us>

> Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
> of general usefulness (this is at least generally useful for anyone writing
> web spiders ;-) shouldn't live in Tools, because it's not always available
> and users need to do extra work to make them available.
> 
> I'd be happy to write up some documentation for it and twiddle the module to 
> include doc strings.

Deal.  Soon as we get the docs we'll move it to Lib.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gstein at lyra.org  Fri Mar 24 21:50:43 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 12:50:43 -0800 (PST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <14555.34723.841426.504538@weyr.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241248010.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Fred L. Drake, Jr. wrote:
> Greg Stein writes:
>  > There is precedent for passing in single-ref pointers. For example:
>  > 
>  >   PyArg_ParseTuple(args, "O!", &s, PyString_Type)
>                                   ^^^^^^^^^^^^^^^^^
> 
>   Feeling ok?  I *suspect* these are reversed.  :)

I just checked the code to ensure that it took a single pointer rather
than a double-pointer. I guess that I didn't verify the order :-)

Concept is valid, tho... the params do not necessarily require an
ampersand.

oop! Actually... this does require an ampersand:

    PyArg_ParseTuple(args, "O!", &PyString_Type, &s)

Don't want to pass the whole structure...

Well, regardless: I would much prefer to see the encoding passed as a
constant string, rather than having to shove the sucker into a variable
first, just so that I can insert a useless address-of operator.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From akuchlin at mems-exchange.org  Fri Mar 24 21:51:56 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 24 Mar 2000 15:51:56 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242044.PAA00677@eric.cnri.reston.va.us>
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
	<14555.34371.749039.946891@beluga.mojam.com>
	<200003242044.PAA00677@eric.cnri.reston.va.us>
Message-ID: <14555.54636.811100.254309@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>> Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
>Deal.  Soon as we get the docs we'll move it to Lib.

What about putting it in a package like 'www' or 'web'?  Packagizing
the existing library is hard because of backward compatibility, but
there's no such constraint for new modules.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
One need not be a chamber to be haunted; / One need not be a house; / The
brain has corridors surpassing / Material place.
    -- Emily Dickinson, "Time and Eternity"


From gstein at lyra.org  Fri Mar 24 22:00:25 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:00:25 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.54636.811100.254309@amarok.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Andrew M. Kuchling wrote:
> Guido van Rossum writes:
> >> Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
> >Deal.  Soon as we get the docs we'll move it to Lib.
> 
> What about putting it in a package like 'www' or 'web'?  Packagizing
> the existing library is hard because of backward compatibility, but
> there's no such constraint for new modules.

Or in the "network" package that was suggested a month ago?

And why *can't* we start on repackaging old module? I think the only
reason that somebody came up with to NOT do it was "well, if we don't
repackage the whole thing, then we should repackage nothing."  Which, IMO,
is totally bogus. We'll never get anywhere operating under that principle.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From fdrake at acm.org  Fri Mar 24 22:00:19 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 16:00:19 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>
References: <14555.54636.811100.254309@amarok.cnri.reston.va.us>
	<Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>
Message-ID: <14555.55139.484135.602894@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Or in the "network" package that was suggested a month ago?

  +1

 > And why *can't* we start on repackaging old module? I think the only
 > reason that somebody came up with to NOT do it was "well, if we don't
 > repackage the whole thing, then we should repackage nothing."  Which, IMO,
 > is totally bogus. We'll never get anywhere operating under that principle.

  That doesn't bother me, but I tend to be a little conservative
(though usually not as conservative as Guido on such matters).  I
*would* like to decided theat 1.7 will be fully packagized, and not
wait until 2.0.  As long as 1.7 is a "testing the evolutionary path"
release, I think that's the right thing to do.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido at python.org  Fri Mar 24 22:03:54 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:03:54 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
Message-ID: <200003242103.QAA03288@eric.cnri.reston.va.us>

Someone noticed that socket.connect() and a few related functions
(connect_ex() and bind()) take either a single (host, port) tuple or
two separate arguments, but that only the tuple is documented.

Similar to append(), I'd like to close this gap, and I've made the
necessary changes.  This will probably break lots of code.

Similar to append(), I'd like people to fix their code rather than
whine -- two-arg connect() has never been documented, although it's
found in much code (even the socket module test code :-( ).

Similar to append(), I may revert the change if it is shown to cause
too much pain during beta testing...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Fri Mar 24 22:05:57 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:05:57 -0500
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: Your message of "Fri, 24 Mar 2000 12:50:43 PST."
             <Pine.LNX.4.10.10003241248010.27878-100000@nebula.lyra.org> 
References: <Pine.LNX.4.10.10003241248010.27878-100000@nebula.lyra.org> 
Message-ID: <200003242105.QAA03543@eric.cnri.reston.va.us>

> Well, regardless: I would much prefer to see the encoding passed as a
> constant string, rather than having to shove the sucker into a variable
> first, just so that I can insert a useless address-of operator.

Of course.  Use & for output args, not as a matter of principle.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Fri Mar 24 22:11:25 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:11:25 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 13:00:25 PST."
             <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org> 
References: <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org> 
Message-ID: <200003242111.QAA04208@eric.cnri.reston.va.us>

[Greg]
> And why *can't* we start on repackaging old module? I think the only
> reason that somebody came up with to NOT do it was "well, if we don't
> repackage the whole thing, then we should repackage nothing."  Which, IMO,
> is totally bogus. We'll never get anywhere operating under that principle.

The reason is backwards compatibility.  Assume we create a package
"web" and move all web related modules into it: httplib, urllib,
htmllib, etc.  Now for backwards compatibility, we add the web
directory to sys.path, so one can write either "import web.urllib" or
"import urllib".  But that loads the same code twice!  And in this
(carefully chosen :-) example, urllib actually has some state which
shouldn't be replicated.

Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
door, and there's a lot of other stuff I need to do besides moving
modules around.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Fri Mar 24 22:15:00 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:15:00 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 16:00:19 EST."
             <14555.55139.484135.602894@weyr.cnri.reston.va.us> 
References: <14555.54636.811100.254309@amarok.cnri.reston.va.us> <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>  
            <14555.55139.484135.602894@weyr.cnri.reston.va.us> 
Message-ID: <200003242115.QAA04648@eric.cnri.reston.va.us>

> Greg Stein writes:
>  > Or in the "network" package that was suggested a month ago?

[Fred]
>   +1

Which reminds me of another reason to wait: coming up with the right
package hierarchy is hard.  (E.g. I find network too long; plus, does
htmllib belong there?)

>   That doesn't bother me, but I tend to be a little conservative
> (though usually not as conservative as Guido on such matters).  I
> *would* like to decided theat 1.7 will be fully packagized, and not
> wait until 2.0.  As long as 1.7 is a "testing the evolutionary path"
> release, I think that's the right thing to do.

Agreed.

At the SD conference I gave a talk about the future of Python, and
there was (again) a good suggestion about forwards compatibility.
Starting with 1.7 (if not sooner), several Python 3000 features that
necessarily have to be incompatible (like 1/2 yielding 0.5 instead of
0) could issue warnings when (or unless?) Python is invoked with a
compatibility flag.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From bwarsaw at cnri.reston.va.us  Fri Mar 24 22:21:54 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 24 Mar 2000 16:21:54 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
Message-ID: <14555.56434.974884.832078@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido at python.org> writes:

    GvR> Someone noticed that socket.connect() and a few related
    GvR> functions (connect_ex() and bind()) take either a single
    GvR> (host, port) tuple or two separate arguments, but that only
    GvR> the tuple is documented.

    GvR> Similar to append(), I'd like to close this gap, and I've
    GvR> made the necessary changes.  This will probably break lots of
    GvR> code.

I don't agree that socket.connect() and friends need this fix.  Yes,
obviously append() needed fixing because of the application of Tim's
Twelfth Enlightenment to the semantic ambiguity.  But socket.connect()
has no such ambiguity; you may spell it differently, but you know
exactly what you mean.

My suggestion would be to not break any code, but extend connect's
interface to allow an optional second argument.  Thus all of these
calls would be legal:

sock.connect(addr)
sock.connect(addr, port)
sock.connect((addr, port))

One nit on the documentation of the socket module.  The second entry
says:

    bind (address) 
	 Bind the socket to address. The socket must not already be
	 bound. (The format of address depends on the address family --
	 see above.)

Huh?  What "above" part should I see?  Note that I'm reading this doc
off the web!

-Barry


From gstein at lyra.org  Fri Mar 24 22:27:57 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:27:57 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242111.QAA04208@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Guido van Rossum wrote:
> [Greg]
> > And why *can't* we start on repackaging old module? I think the only
> > reason that somebody came up with to NOT do it was "well, if we don't
> > repackage the whole thing, then we should repackage nothing."  Which, IMO,
> > is totally bogus. We'll never get anywhere operating under that principle.
> 
> The reason is backwards compatibility.  Assume we create a package
> "web" and move all web related modules into it: httplib, urllib,
> htmllib, etc.  Now for backwards compatibility, we add the web
> directory to sys.path, so one can write either "import web.urllib" or
> "import urllib".  But that loads the same code twice!  And in this
> (carefully chosen :-) example, urllib actually has some state which
> shouldn't be replicated.

We don't add it to the path. Instead, we create new modules that look
like:

---- httplib.py ----
from web.httplib import *
----

The only backwards-compat issue with this approach is that people who poke
values into the module will have problems. I don't believe that any of the
modules were designed for that, anyhow, so it would seem an acceptable to
(effectively) disallow that behavior.

> Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
> door, and there's a lot of other stuff I need to do besides moving
> modules around.

Stuff that *you* need to do, sure. But there *are* a lot of us who can
help here, and some who desire to spend their time moving modules.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Fri Mar 24 22:32:14 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:32:14 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242115.QAA04648@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241330080.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Guido van Rossum wrote:
> > Greg Stein writes:
> >  > Or in the "network" package that was suggested a month ago?
> 
> [Fred]
> >   +1
> 
> Which reminds me of another reason to wait: coming up with the right
> package hierarchy is hard.  (E.g. I find network too long; plus, does
> htmllib belong there?)

htmllib does not go there. Where does it go? Dunno. Leave it unless/until
somebody comes up with a place for it.

We package up obvious ones. We don't have to design a complete hierarchy.
There seemed to be a general "good feeling" around some kind of network
(protocol) package. Call it "net" if "network" is too long.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From guido at python.org  Fri Mar 24 22:27:51 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:27:51 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: Your message of "Fri, 24 Mar 2000 16:21:54 EST."
             <14555.56434.974884.832078@anthem.cnri.reston.va.us> 
References: <200003242103.QAA03288@eric.cnri.reston.va.us>  
            <14555.56434.974884.832078@anthem.cnri.reston.va.us> 
Message-ID: <200003242127.QAA06269@eric.cnri.reston.va.us>

> >>>>> "GvR" == Guido van Rossum <guido at python.org> writes:
> 
>     GvR> Someone noticed that socket.connect() and a few related
>     GvR> functions (connect_ex() and bind()) take either a single
>     GvR> (host, port) tuple or two separate arguments, but that only
>     GvR> the tuple is documented.
> 
>     GvR> Similar to append(), I'd like to close this gap, and I've
>     GvR> made the necessary changes.  This will probably break lots of
>     GvR> code.
> 
> I don't agree that socket.connect() and friends need this fix.  Yes,
> obviously append() needed fixing because of the application of Tim's
> Twelfth Enlightenment to the semantic ambiguity.  But socket.connect()
> has no such ambiguity; you may spell it differently, but you know
> exactly what you mean.
> 
> My suggestion would be to not break any code, but extend connect's
> interface to allow an optional second argument.  Thus all of these
> calls would be legal:
> 
> sock.connect(addr)
> sock.connect(addr, port)
> sock.connect((addr, port))

You probably meant:

  sock.connect(addr)
  sock.connect(host, port)
  sock.connect((host, port))

since (host, port) is equivalent to (addr).

> One nit on the documentation of the socket module.  The second entry
> says:
> 
>     bind (address) 
> 	 Bind the socket to address. The socket must not already be
> 	 bound. (The format of address depends on the address family --
> 	 see above.)
> 
> Huh?  What "above" part should I see?  Note that I'm reading this doc
> off the web!

Fred typically directs latex2html to break all sections apart.  It's
in the previous section:

  Socket addresses are represented as a single string for the AF_UNIX
  address family and as a pair (host, port) for the AF_INET address
  family, where host is a string representing either a hostname in
  Internet domain notation like 'daring.cwi.nl' or an IP address like
  '100.50.200.5', and port is an integral port number. Other address
  families are currently not supported.  The address format required by
  a particular socket object is automatically selected based on the
  address family specified when the socket object was created.

This also explains the reason for requiring a single argument: when
using AF_UNIX, the second argument makes no sense!

Frankly, I'm not sure what do here -- it's more correct to require a
single address argument always, but it's more convenient to allow two
sometimes.

Note that sendto(data, addr) only accepts the tuple form: you cannot
write sendto(data, host, port).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at acm.org  Fri Mar 24 22:28:32 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 16:28:32 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
References: <200003242111.QAA04208@eric.cnri.reston.va.us>
	<Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
Message-ID: <14555.56832.336242.378838@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Stuff that *you* need to do, sure. But there *are* a lot of us who can
 > help here, and some who desire to spend their time moving modules.

  Would it make sense for one of these people with time on their hands 
to propose a specific mapping from old->new names?  I think that would 
be a good first step, regardless of the implementation timing.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido at python.org  Fri Mar 24 22:29:44 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:29:44 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 13:27:57 PST."
             <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org> 
References: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org> 
Message-ID: <200003242129.QAA06510@eric.cnri.reston.va.us>

> We don't add it to the path. Instead, we create new modules that look
> like:
> 
> ---- httplib.py ----
> from web.httplib import *
> ----
> 
> The only backwards-compat issue with this approach is that people who poke
> values into the module will have problems. I don't believe that any of the
> modules were designed for that, anyhow, so it would seem an acceptable to
> (effectively) disallow that behavior.

OK, that's reasonable.  I'll have to invent a different reason why I
don't want this -- because I really don't!

> > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
> > door, and there's a lot of other stuff I need to do besides moving
> > modules around.
> 
> Stuff that *you* need to do, sure. But there *are* a lot of us who can
> help here, and some who desire to spend their time moving modules.

Hm.  Moving modules requires painful and arcane CVS manipulations that
can only be done by the few of us here at CNRI -- and I'm the only one
left who's full time on Python.  I'm still not convinced that it's a
good plan.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at acm.org  Fri Mar 24 22:32:39 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 16:32:39 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: <14555.56434.974884.832078@anthem.cnri.reston.va.us>
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
	<14555.56434.974884.832078@anthem.cnri.reston.va.us>
Message-ID: <14555.57079.187670.916002@weyr.cnri.reston.va.us>

Barry A. Warsaw writes:
 > I don't agree that socket.connect() and friends need this fix.  Yes,
 > obviously append() needed fixing because of the application of Tim's
 > Twelfth Enlightenment to the semantic ambiguity.  But socket.connect()
 > has no such ambiguity; you may spell it differently, but you know
 > exactly what you mean.

  Crock.  The address representations have been fairly well defined
for quite a while.  Be explicit.

 > sock.connect(addr)

  This is the only legal signature.  (host, port) is simply the form
of addr for a particular address family.

 > One nit on the documentation of the socket module.  The second entry
 > says:
 > 
 >     bind (address) 
 > 	 Bind the socket to address. The socket must not already be
 > 	 bound. (The format of address depends on the address family --
 > 	 see above.)
 > 
 > Huh?  What "above" part should I see?  Note that I'm reading this doc
 > off the web!

  Definately written for the paper document!  Remind me about this
again in a month and I'll fix it, but I don't want to play games with
this little stuff until the 1.5.2p2 and 1.6 trees have been merged.
  Harrumph.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From gstein at lyra.org  Fri Mar 24 22:37:41 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:37:41 -0800 (PST)
Subject: [Python-Dev] delegating (was: 1.6 job list)
In-Reply-To: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Greg Stein wrote:
>...
> > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
> > door, and there's a lot of other stuff I need to do besides moving
> > modules around.
> 
> Stuff that *you* need to do, sure. But there *are* a lot of us who can
> help here, and some who desire to spend their time moving modules.

I just want to empahisize this point some more.

Python 1.6 has a defined timeline, with a defined set of minimal
requirements. However! I don't believe that a corollary of that says we
MUST ignore everything else. If those other options fit within the
required timeline, then why not? (assuming we have adequate testing and
doc to go with the changes)

There are ample people who have time and inclination to contribute. If
those contributions add positive benefit, then I see no reason to exclude
them (other than on pure merit, of course).

Note that some of the problems stem from CVS access. Much Guido-time could
be saved by a commit-then-review model, rather than review-then-Guido-
commits model. Fred does this very well with the Doc/ area.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Fri Mar 24 22:38:48 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:38:48 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241337460.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Guido van Rossum wrote:
>...
> > We don't add it to the path. Instead, we create new modules that look
> > like:
> > 
> > ---- httplib.py ----
> > from web.httplib import *
> > ----
> > 
> > The only backwards-compat issue with this approach is that people who poke
> > values into the module will have problems. I don't believe that any of the
> > modules were designed for that, anyhow, so it would seem an acceptable to
> > (effectively) disallow that behavior.
> 
> OK, that's reasonable.  I'll have to invent a different reason why I
> don't want this -- because I really don't!

Fair enough.

> > > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
> > > door, and there's a lot of other stuff I need to do besides moving
> > > modules around.
> > 
> > Stuff that *you* need to do, sure. But there *are* a lot of us who can
> > help here, and some who desire to spend their time moving modules.
> 
> Hm.  Moving modules requires painful and arcane CVS manipulations that
> can only be done by the few of us here at CNRI -- and I'm the only one
> left who's full time on Python.  I'm still not convinced that it's a
> good plan.

There are a number of ways to do this, and I'm familiar with all of them.
It is a continuing point of strife in the Apache CVS repositories :-)

But... it is premised on accepting the desire to move them, of course.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From guido at python.org  Fri Mar 24 22:38:51 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:38:51 -0500
Subject: [Python-Dev] delegating (was: 1.6 job list)
In-Reply-To: Your message of "Fri, 24 Mar 2000 13:37:41 PST."
             <Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org> 
References: <Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org> 
Message-ID: <200003242138.QAA07621@eric.cnri.reston.va.us>

> Note that some of the problems stem from CVS access. Much Guido-time could
> be saved by a commit-then-review model, rather than review-then-Guido-
> commits model. Fred does this very well with the Doc/ area.

Actually, I'm experimenting with this already: Unicode, list.append()
and socket.connect() are done in this way!

For renames it is really painful though, even if someone else at CNRI
can do it.

I'd like to see a draft package hierarchy please?

Also, if you have some time, please review the bugs in the bugs list.
Patches submitted with a corresponding PR# will be treated with
priority!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Fri Mar 24 22:40:48 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 22:40:48 +0100
Subject: [Python-Dev] Unicode Patch Set 2000-03-24
Message-ID: <38DBE0E0.76A298FE@lemburg.com>

Attached you find the latest update of the Unicode implementation.
The patch is against the current CVS version.

It includes the fix I posted yesterday for the core dump problem
in codecs.c (was introduced by my previous patch set -- sorry),
adds more tests for the codecs and two new parser markers
"es" and "es#".

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/
-------------- next part --------------
Only in CVS-Python/Doc/tools: anno-api.py
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/codecs.py Python+Unicode/Lib/codecs.py
--- CVS-Python/Lib/codecs.py	Thu Mar 23 23:58:41 2000
+++ Python+Unicode/Lib/codecs.py	Fri Mar 17 23:51:01 2000
@@ -46,7 +46,7 @@
         handling schemes by providing the errors argument. These
         string values are defined:
 
-         'strict' - raise an error (or a subclass)
+         'strict' - raise a ValueError error (or a subclass)
          'ignore' - ignore the character and continue with the next
          'replace' - replace with a suitable replacement character;
                     Python will use the official U+FFFD REPLACEMENT
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/output/test_unicode Python+Unicode/Lib/test/output/test_unicode
--- CVS-Python/Lib/test/output/test_unicode	Fri Mar 24 22:21:26 2000
+++ Python+Unicode/Lib/test/output/test_unicode	Sat Mar 11 00:23:21 2000
@@ -1,5 +1,4 @@
 test_unicode
 Testing Unicode comparisons... done.
-Testing Unicode contains method... done.
 Testing Unicode formatting strings... done.
 Testing unicodedata module... done.
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py
--- CVS-Python/Lib/test/test_unicode.py	Thu Mar 23 23:58:47 2000
+++ Python+Unicode/Lib/test/test_unicode.py	Fri Mar 24 00:29:43 2000
@@ -293,3 +293,33 @@
     assert unicodedata.combining(u'\u20e1') == 230
     
     print 'done.'
+
+# Test builtin codecs
+print 'Testing builtin codecs...',
+
+assert unicode('hello','ascii') == u'hello'
+assert unicode('hello','utf-8') == u'hello'
+assert unicode('hello','utf8') == u'hello'
+assert unicode('hello','latin-1') == u'hello'
+
+assert u'hello'.encode('ascii') == 'hello'
+assert u'hello'.encode('utf-8') == 'hello'
+assert u'hello'.encode('utf8') == 'hello'
+assert u'hello'.encode('utf-16-le') == 'h\000e\000l\000l\000o\000'
+assert u'hello'.encode('utf-16-be') == '\000h\000e\000l\000l\000o'
+assert u'hello'.encode('latin-1') == 'hello'
+
+u = u''.join(map(unichr, range(1024)))
+for encoding in ('utf-8', 'utf-16', 'utf-16-le', 'utf-16-be',
+                 'raw_unicode_escape', 'unicode_escape', 'unicode_internal'):
+    assert unicode(u.encode(encoding),encoding) == u
+
+u = u''.join(map(unichr, range(256)))
+for encoding in ('latin-1',):
+    assert unicode(u.encode(encoding),encoding) == u
+
+u = u''.join(map(unichr, range(128)))
+for encoding in ('ascii',):
+    assert unicode(u.encode(encoding),encoding) == u
+
+print 'done.'
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt
--- CVS-Python/Misc/unicode.txt	Thu Mar 23 23:58:48 2000
+++ Python+Unicode/Misc/unicode.txt	Fri Mar 24 22:29:35 2000
@@ -715,21 +715,126 @@
 
 These markers are used by the PyArg_ParseTuple() APIs:
 
-  'U':  Check for Unicode object and return a pointer to it
+  "U":  Check for Unicode object and return a pointer to it
 
-  's':  For Unicode objects: auto convert them to the <default encoding>
+  "s":  For Unicode objects: auto convert them to the <default encoding>
         and return a pointer to the object's <defencstr> buffer.
 
-  's#': Access to the Unicode object via the bf_getreadbuf buffer interface 
+  "s#": Access to the Unicode object via the bf_getreadbuf buffer interface 
         (see Buffer Interface); note that the length relates to the buffer
         length, not the Unicode string length (this may be different
         depending on the Internal Format).
 
-  't#': Access to the Unicode object via the bf_getcharbuf buffer interface
+  "t#": Access to the Unicode object via the bf_getcharbuf buffer interface
         (see Buffer Interface); note that the length relates to the buffer
         length, not necessarily to the Unicode string length (this may
         be different depending on the <default encoding>).
 
+  "es": 
+	Takes two parameters: encoding (const char *) and
+	buffer (char **). 
+
+	The input object is first coerced to Unicode in the usual way
+	and then encoded into a string using the given encoding.
+
+	On output, a buffer of the needed size is allocated and
+	returned through *buffer as NULL-terminated string.
+	The encoded may not contain embedded NULL characters.
+	The caller is responsible for free()ing the allocated *buffer
+	after usage.
+
+  "es#":
+	Takes three parameters: encoding (const char *),
+	buffer (char **) and buffer_len (int *).
+	
+	The input object is first coerced to Unicode in the usual way
+	and then encoded into a string using the given encoding.
+
+	If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer)
+	on input. Output is then copied to *buffer.
+
+	If *buffer is NULL, a buffer of the needed size is
+	allocated and output copied into it. *buffer is then
+	updated to point to the allocated memory area. The caller
+	is responsible for free()ing *buffer after usage.
+
+	In both cases *buffer_len is updated to the number of
+	characters written (excluding the trailing NULL-byte).
+	The output buffer is assured to be NULL-terminated.
+
+Examples:
+
+Using "es#" with auto-allocation:
+
+    static PyObject *
+    test_parser(PyObject *self,
+		PyObject *args)
+    {
+	PyObject *str;
+	const char *encoding = "latin-1";
+	char *buffer = NULL;
+	int buffer_len = 0;
+
+	if (!PyArg_ParseTuple(args, "es#:test_parser",
+			      encoding, &buffer, &buffer_len))
+	    return NULL;
+	if (!buffer) {
+	    PyErr_SetString(PyExc_SystemError,
+			    "buffer is NULL");
+	    return NULL;
+	}
+	str = PyString_FromStringAndSize(buffer, buffer_len);
+	free(buffer);
+	return str;
+    }
+
+Using "es" with auto-allocation returning a NULL-terminated string:    
+    
+    static PyObject *
+    test_parser(PyObject *self,
+		PyObject *args)
+    {
+	PyObject *str;
+	const char *encoding = "latin-1";
+	char *buffer = NULL;
+
+	if (!PyArg_ParseTuple(args, "es:test_parser",
+			      encoding, &buffer))
+	    return NULL;
+	if (!buffer) {
+	    PyErr_SetString(PyExc_SystemError,
+			    "buffer is NULL");
+	    return NULL;
+	}
+	str = PyString_FromString(buffer);
+	free(buffer);
+	return str;
+    }
+
+Using "es#" with a pre-allocated buffer:
+    
+    static PyObject *
+    test_parser(PyObject *self,
+		PyObject *args)
+    {
+	PyObject *str;
+	const char *encoding = "latin-1";
+	char _buffer[10];
+	char *buffer = _buffer;
+	int buffer_len = sizeof(_buffer);
+
+	if (!PyArg_ParseTuple(args, "es#:test_parser",
+			      encoding, &buffer, &buffer_len))
+	    return NULL;
+	if (!buffer) {
+	    PyErr_SetString(PyExc_SystemError,
+			    "buffer is NULL");
+	    return NULL;
+	}
+	str = PyString_FromStringAndSize(buffer, buffer_len);
+	return str;
+    }
+
 
 File/Stream Output:
 -------------------
@@ -837,6 +942,7 @@
 
 History of this Proposal:
 -------------------------
+1.3: Added new "es" and "es#" parser markers
 1.2: Removed POD about codecs.open()
 1.1: Added note about comparisons and hash values. Added note about
      case mapping algorithms. Changed stream codecs .read() and
Only in CVS-Python/Objects: .#stringobject.c.2.59
Only in CVS-Python/Objects: stringobject.c.orig
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Python/getargs.c Python+Unicode/Python/getargs.c
--- CVS-Python/Python/getargs.c	Sat Mar 11 10:55:21 2000
+++ Python+Unicode/Python/getargs.c	Fri Mar 24 20:22:26 2000
@@ -178,6 +178,8 @@
 		}
 		else if (level != 0)
 			; /* Pass */
+		else if (c == 'e')
+			; /* Pass */
 		else if (isalpha(c))
 			max++;
 		else if (c == '|')
@@ -654,6 +656,122 @@
 			break;
 		}
 	
+	case 'e': /* encoded string */
+		{
+			char **buffer;
+			const char *encoding;
+			PyObject *u, *s;
+			int size;
+
+			/* Get 'e' parameter: the encoding name */
+			encoding = (const char *)va_arg(*p_va, const char *);
+			if (encoding == NULL)
+				return "(encoding is NULL)";
+			
+			/* Get 's' parameter: the output buffer to use */
+			if (*format != 's')
+				return "(unkown parser marker combination)";
+			buffer = (char **)va_arg(*p_va, char **);
+			format++;
+			if (buffer == NULL)
+				return "(buffer is NULL)";
+			
+			/* Convert object to Unicode */
+			u = PyUnicode_FromObject(arg);
+			if (u == NULL)
+				return "string, unicode or text buffer";
+			
+			/* Encode object; use default error handling */
+			s = PyUnicode_AsEncodedString(u,
+						      encoding,
+						      NULL);
+			Py_DECREF(u);
+			if (s == NULL)
+				return "(encoding failed)";
+			if (!PyString_Check(s)) {
+				Py_DECREF(s);
+				return "(encoder failed to return a string)";
+			}
+			size = PyString_GET_SIZE(s);
+
+			/* Write output; output is guaranteed to be
+			   0-terminated */
+			if (*format == '#') { 
+				/* Using buffer length parameter '#':
+
+				   - if *buffer is NULL, a new buffer
+				   of the needed size is allocated and
+				   the data copied into it; *buffer is
+				   updated to point to the new buffer;
+				   the caller is responsible for
+				   free()ing it after usage
+
+				   - if *buffer is not NULL, the data
+				   is copied to *buffer; *buffer_len
+				   has to be set to the size of the
+				   buffer on input; buffer overflow is
+				   signalled with an error; buffer has
+				   to provide enough room for the
+				   encoded string plus the trailing
+				   0-byte
+
+				   - in both cases, *buffer_len is
+				   updated to the size of the buffer
+				   /excluding/ the trailing 0-byte
+
+				*/
+				int *buffer_len = va_arg(*p_va, int *);
+
+				format++;
+				if (buffer_len == NULL)
+					return "(buffer_len is NULL)";
+				if (*buffer == NULL) {
+					*buffer = PyMem_NEW(char, size + 1);
+					if (*buffer == NULL) {
+						Py_DECREF(s);
+						return "(memory error)";
+					}
+				} else {
+					if (size + 1 > *buffer_len) {
+						Py_DECREF(s);
+						return "(buffer overflow)";
+					}
+				}
+				memcpy(*buffer,
+				       PyString_AS_STRING(s),
+				       size + 1);
+				*buffer_len = size;
+			} else {
+				/* Using a 0-terminated buffer:
+
+				   - the encoded string has to be
+				   0-terminated for this variant to
+				   work; if it is not, an error raised
+
+				   - a new buffer of the needed size
+				   is allocated and the data copied
+				   into it; *buffer is updated to
+				   point to the new buffer; the caller
+				   is responsible for free()ing it
+				   after usage
+
+				 */
+				if (strlen(PyString_AS_STRING(s)) != size)
+					return "(encoded string without "\
+					       "NULL bytes)";
+				*buffer = PyMem_NEW(char, size + 1);
+				if (*buffer == NULL) {
+					Py_DECREF(s);
+					return "(memory error)";
+				}
+				memcpy(*buffer,
+				       PyString_AS_STRING(s),
+				       size + 1);
+			}
+			Py_DECREF(s);
+			break;
+		}
+
 	case 'S': /* string object */
 		{
 			PyObject **p = va_arg(*p_va, PyObject **);

From fdrake at acm.org  Fri Mar 24 22:40:38 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 16:40:38 -0500 (EST)
Subject: [Python-Dev] delegating (was: 1.6 job list)
In-Reply-To: <Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org>
References: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
	<Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org>
Message-ID: <14555.57558.939236.363358@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Note that some of the problems stem from CVS access. Much Guido-time could
 > be saved by a commit-then-review model, rather than review-then-Guido-

  This is a non-problem; I'm willing to do the arcane CVS
manipulations if the issue is Guido's time.
  What I will *not* do is do it piecemeal without a cohesive plan that 
Guido approves of at least 95%, and I'll be really careful to do that
last 5% when he's not in the office.  ;)

 > commits model. Fred does this very well with the Doc/ area.

  Thanks for the vote of confidence!
  The model that I use for the Doc/ area is more like "Fred reviews,
Fred commits, and Guido can read it on python.org like everyone else."
Works for me!  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From bwarsaw at cnri.reston.va.us  Fri Mar 24 22:45:38 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 24 Mar 2000 16:45:38 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <200003242115.QAA04648@eric.cnri.reston.va.us>
	<Pine.LNX.4.10.10003241330080.27878-100000@nebula.lyra.org>
Message-ID: <14555.57858.824301.693390@anthem.cnri.reston.va.us>

One thing you can definitely do now which breaks no code: propose a
package hierarchy for the standard library.


From akuchlin at mems-exchange.org  Fri Mar 24 22:46:28 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 24 Mar 2000 16:46:28 -0500 (EST)
Subject: [Python-Dev] Unicode charnames impl.
In-Reply-To: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
Message-ID: <14555.57908.151946.182639@amarok.cnri.reston.va.us>

Here's a strawman codec for doing the \N{NULL} thing.  Questions:

0) Is the code below correct?

1) What the heck would this encoding be called?

2) What does .encode() do?  (Right now it escapes \N as
\N{BACKSLASH}N.)

3) How can we store all those names?  The resulting dictionary makes a
361K .py file; Python dumps core trying to parse it.  (Another bug...)

4) What do you with the error \N{...... no closing right bracket.
   Right now it stops at that point, and never advances any farther.  
   Maybe it should assume it's an error if there's no } within the
   next 200 chars or some similar limit?
  
5) Do we need StreamReader/Writer classes, too?

I've also add a script that parses the names out of the NameList.txt 
file at ftp://ftp.unicode.org/Public/UNIDATA/.

--amk 


namecodec.py:
=============

import codecs

#from _namedict import namedict
namedict = {'NULL': 0, 'START OF HEADING' : 1,
            'BACKSLASH':ord('\\')}
            
class NameCodec(codecs.Codec):
    def encode(self,input,errors='strict'):
        # XXX what should this do?  Escape the
        # sequence \N as '\N{BACKSLASH}N'?
        return input.replace( '\\N', '\\N{BACKSLASH}N' )

    def decode(self,input,errors='strict'):
        output = unicode("")
        last = 0
        index = input.find( u'\\N{' )
        while index != -1:
            output = output + unicode( input[last:index] )
            used = index
            r_bracket = input.find( '}', index)
            if r_bracket == -1:
                # No closing bracket; bail out...
                break

            name = input[index + 3 : r_bracket]
            code = namedict.get( name )
            if code is not None:
                output = output + unichr(code)
            elif errors == 'strict':
                raise ValueError, 'Unknown character name %s' % repr(name)
            elif errors == 'ignore': pass
            elif errors == 'replace':
                output = output + unichr( 0xFFFD )
            
            last = r_bracket + 1
            index = input.find( '\\N{', last)
        else:
            # Finally failed gently, no longer finding a \N{...
            output = output + unicode( input[last:] )
            return len(input), output

        # Otherwise, we hit the break for an unterminated \N{...}
        return index, output
        
if __name__ == '__main__':
    c = NameCodec()
    for s in [ r'b\lah blah \N{NULL} asdf',
               r'b\l\N{START OF HEADING}\N{NU' ]:
        used, s2 = c.decode(s)
        print repr( s2 )

        s3 = c.encode(s)
        _, s4 = c.decode(s3)
        print repr(s3)
        assert s4 == s
        
    print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='replace' ))
    print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='ignore' ))


makenamelist.py
===============

# Hack to extract character names from NamesList.txt
# Output the repr() of the resulting dictionary
        
import re, sys, string

namedict = {}

while 1:
    L = sys.stdin.readline()
    if L == "": break

    m = re.match('([0-9a-fA-F]){4}(?:\t(.*)\s*)', L)
    if m is not None:
        last_char = int(m.group(1), 16)
        if m.group(2) is not None:
            name = string.upper( m.group(2) )
            if name not in ['<CONTROL>',
                            '<NOT A CHARACTER>']:
                namedict[ name ] = last_char
#                print name, last_char
            
    m = re.match('\t=\s*(.*)\s*(;.*)?', L)
    if m is not None:
        name = string.upper( m.group(1) )
        names = string.split(name, ',')
        names = map(string.strip, names)
        for n in names:
            namedict[ n ] = last_char
#            print n, last_char

# XXX and do what with this dictionary?        
print namedict


From mal at lemburg.com  Fri Mar 24 22:50:19 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 22:50:19 +0100
Subject: [Python-Dev] Unicode Patch Set 2000-03-24
References: <38DBE0E0.76A298FE@lemburg.com>
Message-ID: <38DBE31B.BCB342CA@lemburg.com>

Oops, sorry, the patch file wasn't supposed to go to python-dev.

Anyway, Greg's wish is included in there and MarkH should be
happy now -- at least I hope he his ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Jasbahr at origin.EA.com  Fri Mar 24 22:49:35 2000
From: Jasbahr at origin.EA.com (Asbahr, Jason)
Date: Fri, 24 Mar 2000 15:49:35 -0600
Subject: [Python-Dev] Memory Management
Message-ID: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com>

Greetings!

We're working on integrating our own memory manager into our project
and the current challenge is figuring out how to make it play nice
with Python (and SWIG).  The approach we're currently taking is to
patch 1.5.2 and augment the PyMem* macros to call external memory
allocation functions that we provide.  The idea is to easily allow 
the addition of third party memory management facilities to Python.
Assuming 1) we get it working :-), and 2) we sync to the latest Python
CVS and patch that, would this be a useful patch to give back to the 
community?  Has anyone run up against this before?

Thanks,

Jason Asbahr
Origin Systems, Inc.
jasbahr at origin.ea.com


From bwarsaw at cnri.reston.va.us  Fri Mar 24 22:53:01 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Fri, 24 Mar 2000 16:53:01 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
	<14555.56434.974884.832078@anthem.cnri.reston.va.us>
	<200003242127.QAA06269@eric.cnri.reston.va.us>
Message-ID: <14555.58301.790774.159381@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido at python.org> writes:

    GvR> You probably meant:

    |   sock.connect(addr)
    |   sock.connect(host, port)
    |   sock.connect((host, port))

    GvR> since (host, port) is equivalent to (addr).

Doh, yes. :)

    GvR> Fred typically directs latex2html to break all sections
    GvR> apart.  It's in the previous section:

I know, I was being purposefully dense for effect :)  Fred, is there
some way to make the html contain a link to the previous section for
the "see above" text?  That would solve the problem I think.

    GvR> This also explains the reason for requiring a single
    GvR> argument: when using AF_UNIX, the second argument makes no
    GvR> sense!

    GvR> Frankly, I'm not sure what do here -- it's more correct to
    GvR> require a single address argument always, but it's more
    GvR> convenient to allow two sometimes.

    GvR> Note that sendto(data, addr) only accepts the tuple form: you
    GvR> cannot write sendto(data, host, port).

Hmm, that /does/ complicate things -- it makes explaining the API more
difficult.  Still, in this case I think I'd lean toward liberal
acceptance of input parameters. :)

-Barry


From bwarsaw at cnri.reston.va.us  Fri Mar 24 22:57:01 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 24 Mar 2000 16:57:01 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
	<200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <14555.58541.207868.496747@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido at python.org> writes:

    GvR> OK, that's reasonable.  I'll have to invent a different
    GvR> reason why I don't want this -- because I really don't!

Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't
be persuaded to change your mind :)

-Barry


From fdrake at acm.org  Fri Mar 24 23:10:41 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 17:10:41 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: <14555.58301.790774.159381@anthem.cnri.reston.va.us>
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
	<14555.56434.974884.832078@anthem.cnri.reston.va.us>
	<200003242127.QAA06269@eric.cnri.reston.va.us>
	<14555.58301.790774.159381@anthem.cnri.reston.va.us>
Message-ID: <14555.59361.460705.258859@weyr.cnri.reston.va.us>

bwarsaw at cnri.reston.va.us writes:
 > I know, I was being purposefully dense for effect :)  Fred, is there
 > some way to make the html contain a link to the previous section for
 > the "see above" text?  That would solve the problem I think.

  No.  I expect this to no longer be a problem when we push to
SGML/XML, so I won't waste any time hacking around it.
  On the other hand, lots of places in the documentation refer to
"above" and "below" in the traditional sense used in paper documents,
and that doesn't work well for hypertext, even in the strongly
traditional book-derivation way the Python manuals are done.  As soon
as it's not in the same HTML file, "above" and "below" break for a lot 
of people.  So it still should be adjusted at an appropriate time.

 > Hmm, that /does/ complicate things -- it makes explaining the API more
 > difficult.  Still, in this case I think I'd lean toward liberal
 > acceptance of input parameters. :)

  No -- all the more reason to be strict and keep the descriptions as
simple as reasonable.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido at python.org  Fri Mar 24 23:10:32 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 17:10:32 -0500
Subject: [Python-Dev] Memory Management
In-Reply-To: Your message of "Fri, 24 Mar 2000 15:49:35 CST."
             <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> 
References: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> 
Message-ID: <200003242210.RAA11434@eric.cnri.reston.va.us>

> We're working on integrating our own memory manager into our project
> and the current challenge is figuring out how to make it play nice
> with Python (and SWIG).  The approach we're currently taking is to
> patch 1.5.2 and augment the PyMem* macros to call external memory
> allocation functions that we provide.  The idea is to easily allow 
> the addition of third party memory management facilities to Python.
> Assuming 1) we get it working :-), and 2) we sync to the latest Python
> CVS and patch that, would this be a useful patch to give back to the 
> community?  Has anyone run up against this before?

Check out the archives for patches at python.org looking for posts by
Vladimir Marangozov.  Vladimir has produced several rounds of patches
with a very similar goal in mind.  We're still working out some
details -- but it shouldn't be too long, and I hope that his patches
are also suitable for you.  If not, discussion is required!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From bwarsaw at cnri.reston.va.us  Fri Mar 24 23:12:35 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Fri, 24 Mar 2000 17:12:35 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
	<14555.56434.974884.832078@anthem.cnri.reston.va.us>
	<200003242127.QAA06269@eric.cnri.reston.va.us>
	<14555.58301.790774.159381@anthem.cnri.reston.va.us>
	<14555.59361.460705.258859@weyr.cnri.reston.va.us>
Message-ID: <14555.59475.802130.434345@anthem.cnri.reston.va.us>

>>>>> "Fred" == Fred L Drake, Jr <fdrake at acm.org> writes:

    Fred>   No -- all the more reason to be strict and keep the
    Fred> descriptions as simple as reasonable.

At the expense of (IMO unnecessarily) breaking existing code?


From mal at lemburg.com  Fri Mar 24 23:13:04 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 23:13:04 +0100
Subject: [Python-Dev] Unicode charnames impl.
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us>
Message-ID: <38DBE870.D88915B5@lemburg.com>

"Andrew M. Kuchling" wrote:
> 
> Here's a strawman codec for doing the \N{NULL} thing.  Questions:
> 
> 0) Is the code below correct?

Some comments below.
 
> 1) What the heck would this encoding be called?

Ehm, 'unicode-with-smileys' I guess... after all that's what motivated
the thread ;-) Seriously, I'd go with 'unicode-named'. You can then
stack it on top of 'unicode-escape' and get the best of both
worlds...
 
> 2) What does .encode() do?  (Right now it escapes \N as
> \N{BACKSLASH}N.)

.encode() should translate Unicode to a string. Since the
named char thing is probably only useful on input, I'd say:
don't do anything, except maybe return input.encode('unicode-escape').
 
> 3) How can we store all those names?  The resulting dictionary makes a
> 361K .py file; Python dumps core trying to parse it.  (Another bug...)

I've made the same experience with the large Unicode mapping
tables... the trick is to split the dictionary definition
in chunks and then use dict.update() to paste them together
again.
 
> 4) What do you with the error \N{...... no closing right bracket.
>    Right now it stops at that point, and never advances any farther.
>    Maybe it should assume it's an error if there's no } within the
>    next 200 chars or some similar limit?

I'd suggest to take the upper bound of all Unicode name
lengths as limit.
 
> 5) Do we need StreamReader/Writer classes, too?

If you plan to have it registered with a codec search
function, yes. No big deal though, because you can use
the Codec class as basis for them:

class StreamWriter(Codec,codecs.StreamWriter):
    pass
        
class StreamReader(Codec,codecs.StreamReader):
    pass

### encodings module API

def getregentry():

    return (Codec().encode,Codec().decode,StreamReader,StreamWriter)

Then call drop the scripts into the encodings package dir
and it should be useable via unicode(r'\N{SMILEY}','unicode-named')
and u":-)".encode('unicode-named').

> I've also add a script that parses the names out of the NameList.txt
> file at ftp://ftp.unicode.org/Public/UNIDATA/.
> 
> --amk
> 
> namecodec.py:
> =============
> 
> import codecs
> 
> #from _namedict import namedict
> namedict = {'NULL': 0, 'START OF HEADING' : 1,
>             'BACKSLASH':ord('\\')}
> 
> class NameCodec(codecs.Codec):
>     def encode(self,input,errors='strict'):
>         # XXX what should this do?  Escape the
>         # sequence \N as '\N{BACKSLASH}N'?
>         return input.replace( '\\N', '\\N{BACKSLASH}N' )

You should return a string on output... input will be a Unicode
object and the return value too if you don't add e.g.
an .encode('unicode-escape').
 
>     def decode(self,input,errors='strict'):
>         output = unicode("")
>         last = 0
>         index = input.find( u'\\N{' )
>         while index != -1:
>             output = output + unicode( input[last:index] )
>             used = index
>             r_bracket = input.find( '}', index)
>             if r_bracket == -1:
>                 # No closing bracket; bail out...
>                 break
> 
>             name = input[index + 3 : r_bracket]
>             code = namedict.get( name )
>             if code is not None:
>                 output = output + unichr(code)
>             elif errors == 'strict':
>                 raise ValueError, 'Unknown character name %s' % repr(name)

This could also be UnicodeError (its a subclass of ValueError).

>             elif errors == 'ignore': pass
>             elif errors == 'replace':
>                 output = output + unichr( 0xFFFD )

'\uFFFD' would save a call.
 
>             last = r_bracket + 1
>             index = input.find( '\\N{', last)
>         else:
>             # Finally failed gently, no longer finding a \N{...
>             output = output + unicode( input[last:] )
>             return len(input), output
> 
>         # Otherwise, we hit the break for an unterminated \N{...}
>         return index, output

Note that .decode() must only return the decoded data.
The "bytes read" integer was removed in order to make
the Codec APIs compatible with the standard file object
APIs.
 
> if __name__ == '__main__':
>     c = NameCodec()
>     for s in [ r'b\lah blah \N{NULL} asdf',
>                r'b\l\N{START OF HEADING}\N{NU' ]:
>         used, s2 = c.decode(s)
>         print repr( s2 )
> 
>         s3 = c.encode(s)
>         _, s4 = c.decode(s3)
>         print repr(s3)
>         assert s4 == s
> 
>     print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='replace' ))
>     print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='ignore' ))
> 
> makenamelist.py
> ===============
> 
> # Hack to extract character names from NamesList.txt
> # Output the repr() of the resulting dictionary
> 
> import re, sys, string
> 
> namedict = {}
> 
> while 1:
>     L = sys.stdin.readline()
>     if L == "": break
> 
>     m = re.match('([0-9a-fA-F]){4}(?:\t(.*)\s*)', L)
>     if m is not None:
>         last_char = int(m.group(1), 16)
>         if m.group(2) is not None:
>             name = string.upper( m.group(2) )
>             if name not in ['<CONTROL>',
>                             '<NOT A CHARACTER>']:
>                 namedict[ name ] = last_char
> #                print name, last_char
> 
>     m = re.match('\t=\s*(.*)\s*(;.*)?', L)
>     if m is not None:
>         name = string.upper( m.group(1) )
>         names = string.split(name, ',')
>         names = map(string.strip, names)
>         for n in names:
>             namedict[ n ] = last_char
> #            print n, last_char
> 
> # XXX and do what with this dictionary?
> print namedict
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://www.python.org/mailman/listinfo/python-dev

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake at acm.org  Fri Mar 24 23:12:42 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 17:12:42 -0500 (EST)
Subject: [Python-Dev] Memory Management
In-Reply-To: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com>
References: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com>
Message-ID: <14555.59482.61317.992089@weyr.cnri.reston.va.us>

Asbahr, Jason writes:
 > community?  Has anyone run up against this before?

  You should talk to Vladimir Marangozov; he's done a fair bit of work 
dealing with memory management in Python.  You probably want to read
the chapter he contributed to the Python/C API document for the
release earlier this week.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From skip at mojam.com  Fri Mar 24 23:19:50 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 24 Mar 2000 16:19:50 -0600 (CST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242115.QAA04648@eric.cnri.reston.va.us>
References: <14555.54636.811100.254309@amarok.cnri.reston.va.us>
	<Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>
	<14555.55139.484135.602894@weyr.cnri.reston.va.us>
	<200003242115.QAA04648@eric.cnri.reston.va.us>
Message-ID: <14555.59910.631130.241930@beluga.mojam.com>

    Guido> Which reminds me of another reason to wait: coming up with the
    Guido> right package hierarchy is hard.  (E.g. I find network too long;
    Guido> plus, does htmllib belong there?)

Ah, another topic for python-dev.  Even if we can't do the packaging right
away, we should be able to hash out the structure.

Skip


From guido at python.org  Fri Mar 24 23:25:01 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 17:25:01 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: Your message of "Fri, 24 Mar 2000 17:10:41 EST."
             <14555.59361.460705.258859@weyr.cnri.reston.va.us> 
References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> <14555.58301.790774.159381@anthem.cnri.reston.va.us>  
            <14555.59361.460705.258859@weyr.cnri.reston.va.us> 
Message-ID: <200003242225.RAA13408@eric.cnri.reston.va.us>

> bwarsaw at cnri.reston.va.us writes:
>  > I know, I was being purposefully dense for effect :)  Fred, is there
>  > some way to make the html contain a link to the previous section for
>  > the "see above" text?  That would solve the problem I think.

[Fred]
>   No.  I expect this to no longer be a problem when we push to
> SGML/XML, so I won't waste any time hacking around it.
>   On the other hand, lots of places in the documentation refer to
> "above" and "below" in the traditional sense used in paper documents,
> and that doesn't work well for hypertext, even in the strongly
> traditional book-derivation way the Python manuals are done.  As soon
> as it's not in the same HTML file, "above" and "below" break for a lot 
> of people.  So it still should be adjusted at an appropriate time.

My approach to this: put more stuff on the same page!  I personally
favor putting an entire chapter on one page; even if you split the
top-level subsections this wouldn't have happened.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From klm at digicool.com  Fri Mar 24 23:40:54 2000
From: klm at digicool.com (Ken Manheimer)
Date: Fri, 24 Mar 2000 17:40:54 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003241729380.1711-100000@korak.digicool.com>

Guido wrote:

> OK, that's reasonable.  I'll have to invent a different reason why I
> don't want this -- because I really don't!

I'm glad this organize-the-library-in-packages initiative seems to be
moving towards concentrating on the organization, rather than just
starting to put obvious things in the obvious places.  Personally, i
*crave* sensible, discoverable organization.  The only thing i like less
than complicated disorganization is complicated misorganization - and i
think that just diving in and doing the "obvious" placements would have
the terrible effect of making it harder, not easier, to move eventually to
the right arrangement.

Ken
klm at digicool.com


From akuchlin at mems-exchange.org  Fri Mar 24 23:45:20 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 24 Mar 2000 17:45:20 -0500 (EST)
Subject: [Python-Dev] Unicode charnames impl.
In-Reply-To: <38DBE870.D88915B5@lemburg.com>
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
	<14555.57908.151946.182639@amarok.cnri.reston.va.us>
	<38DBE870.D88915B5@lemburg.com>
Message-ID: <14555.61440.613940.50492@amarok.cnri.reston.va.us>

M.-A. Lemburg writes:
>.encode() should translate Unicode to a string. Since the
>named char thing is probably only useful on input, I'd say:
>don't do anything, except maybe return input.encode('unicode-escape').

Wait... then you can't stack it on top of unicode-escape, because it
would already be Unicode escaped.
 
>> 4) What do you with the error \N{...... no closing right bracket.
>I'd suggest to take the upper bound of all Unicode name
>lengths as limit.

Seems like a hack.

>Note that .decode() must only return the decoded data.
>The "bytes read" integer was removed in order to make
>the Codec APIs compatible with the standard file object
>APIs.

Huh? Why does Misc/unicode.txt describe decode() as "Decodes the
object input and returns a tuple (output object, length consumed)"?
Or are you talking about a different .decode() method?

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
    "Ruby's dead?"
    "Yes."
    "Ah me. That's the trouble with mortals. They do that. Not to worry, eh?"
    -- Dream and Pharamond, in SANDMAN #46: "Brief Lives:6"


From gmcm at hypernet.com  Fri Mar 24 23:50:12 2000
From: gmcm at hypernet.com (Gordon McMillan)
Date: Fri, 24 Mar 2000 17:50:12 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: <200003242103.QAA03288@eric.cnri.reston.va.us>
Message-ID: <1258184279-6957124@hypernet.com>

[Guido]
> Someone noticed that socket.connect() and a few related functions
> (connect_ex() and bind()) take either a single (host, port) tuple or
> two separate arguments, but that only the tuple is documented.
> 
> Similar to append(), I'd like to close this gap, and I've made the
> necessary changes.  This will probably break lots of code.

This will indeed cause great wailing and gnashing of teeth. I've 
been criticized for using the tuple form in the Sockets 
HOWTO (in fact I foolishly changed it to demonstrate both 
forms).
 
> Similar to append(), I'd like people to fix their code rather than
> whine -- two-arg connect() has never been documented, although it's
> found in much code (even the socket module test code :-( ).
> 
> Similar to append(), I may revert the change if it is shown to cause
> too much pain during beta testing...

I say give 'em something to whine about.

put-sand-in-the-vaseline-ly y'rs

- Gordon


From klm at digicool.com  Fri Mar 24 23:55:43 2000
From: klm at digicool.com (Ken Manheimer)
Date: Fri, 24 Mar 2000 17:55:43 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.58541.207868.496747@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003241747570.1711-100000@korak.digicool.com>

On Fri, 24 Mar 2000, Barry A. Warsaw wrote:

> 
> >>>>> "GvR" == Guido van Rossum <guido at python.org> writes:
> 
>     GvR> OK, that's reasonable.  I'll have to invent a different
>     GvR> reason why I don't want this -- because I really don't!
> 
> Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't
> be persuaded to change your mind :)

Maybe i'm just a slave to my organization mania, but i'd suggest the
following order change of 5 and 6, plus an addition; from:

5 now: Flat is better than nested.
6 now: Sparse is better than dense.

to:

5 Sparse is better than dense.
6 Flat is better than nested
6.5 until it gets too dense.

or-is-it-me-that-gets-too-dense'ly yrs,

ken
klm at digicool.com

(And couldn't the humor page get hooked up a bit better?  That was
definitely a fun part of maintaining python.org...)


From gstein at lyra.org  Sat Mar 25 02:19:18 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 17:19:18 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.57858.824301.693390@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241718500.26440-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Barry A. Warsaw wrote:
> One thing you can definitely do now which breaks no code: propose a
> package hierarchy for the standard library.

I already did!

http://www.python.org/pipermail/python-dev/2000-February/003761.html


*grumble*

-g

-- 
Greg Stein, http://www.lyra.org/


From tim_one at email.msn.com  Sat Mar 25 05:19:33 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 24 Mar 2000 23:19:33 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <001001bf9611$52e960a0$752d153f@tim>

[GregS proposes a partial packaging of std modules for 1.6, Guido objects on
 spurious grounds, GregS refutes that, Guido agrees]

> I'll have to invent a different reason why I don't want this -- because
> I really don't!

This one's easy!  It's why I left the 20th of the 20 Pythonic Theses for you
to fill in <wink>.  All you have to do now is come up with a pithy way to
say "if it's something Guido is so interested in that he wants to be deeply
involved in it himself, but it comes at a time when he's buried under prior
commitments, then tough tulips, it waits".

shades-of-the-great-renaming-ly y'rs  - tim


From tim_one at email.msn.com  Sat Mar 25 05:19:36 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 24 Mar 2000 23:19:36 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.58541.207868.496747@anthem.cnri.reston.va.us>
Message-ID: <001101bf9611$544239e0$752d153f@tim>

[Guido]
> OK, that's reasonable.  I'll have to invent a different
> reason why I don't want this -- because I really don't!

[Barry]
> Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't
> be persuaded to change your mind :)

No no no no no:  "namespaces are one honking great idea ..." is the
controlling one here:  Guido really *does* want this!  It's a question of
timing, in the sense of "never is often better than *right* now", but to be
eventually modified by "now is better than never".  These were carefully
designed to support any position whatsoever, you know <wink>.

although-in-any-particular-case-there's-only-one-true-interpretation-ly
    y'rs  - tim


From guido at python.org  Sat Mar 25 05:19:41 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 23:19:41 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 17:19:18 PST."
             <Pine.LNX.4.10.10003241718500.26440-100000@nebula.lyra.org> 
References: <Pine.LNX.4.10.10003241718500.26440-100000@nebula.lyra.org> 
Message-ID: <200003250419.XAA25751@eric.cnri.reston.va.us>

> > One thing you can definitely do now which breaks no code: propose a
> > package hierarchy for the standard library.
> 
> I already did!
> 
> http://www.python.org/pipermail/python-dev/2000-February/003761.html
> 
> *grumble*

You've got to be kidding.  That's not a package hierarchy proposal,
it's just one package (network).

Without a comprehensive proposal I'm against a partial reorganization:
without a destination we can't start marching.

Naming things is very contentious -- everybody has an opinion.  To
pick the right names you must see things in perspective.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From moshez at math.huji.ac.il  Sat Mar 25 09:45:28 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 25 Mar 2000 10:45:28 +0200 (IST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <Pine.GSO.4.10.10003251036170.3539-100000@sundial>

On Thu, 23 Mar 2000 gvwilson at nevex.com wrote:

> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:

I'd like to know what you mean by "class" method. (I do know C++ and Java,
so I have some idea...). Specifically, my question is: how does a class
method access class variables? They can't be totally unqualified (because
that's very unpythonic). If they are qualified by the class's name, I see
it as a very mild improvement on the current situation. You could suggest,
for example, to qualify class variables by "class" (so you'd do things
like:
	class.x = 1), but I'm not sure I like it. On the whole, I think it
is a much bigger issue on how be denote class methods.

Also, one slight problem with your method of denoting class methods:
currently, it is possible to add instance method at run time to a class by
something like

class C:
	pass

def foo(self):
	pass

C.foo = foo

In your suggestion, how do you view the possiblity of adding class methods
to a class? (Note that "foo", above, is also perfectly usable as a plain
function). 

I want to note that Edward suggested denotation by a seperate namespace:

C.foo = foo # foo is an instance method
C.__methods__.foo = foo # foo is a class method

The biggest problem with that suggestion is that it doesn't address the
common case of defining it textually inside the class definition.

> I'd also like to ask (separately) that assignment to None be defined as a
> no-op, so that programmers can write:
> 
>     year, month, None, None, None, None, weekday, None, None = gmtime(time())
> 
> instead of having to create throw-away variables to fill in slots in
> tuples that they don't care about.

Currently, I use "_" for that purpose, after I heard the idea from Fredrik
Lundh.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From gstein at lyra.org  Sat Mar 25 10:26:23 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 01:26:23 -0800 (PST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <200003250419.XAA25751@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003250005430.30345-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Guido van Rossum wrote:
> > > One thing you can definitely do now which breaks no code: propose a
> > > package hierarchy for the standard library.
> > 
> > I already did!
> > 
> > http://www.python.org/pipermail/python-dev/2000-February/003761.html
> > 
> > *grumble*
> 
> You've got to be kidding.  That's not a package hierarchy proposal,
> it's just one package (network).
>
> Without a comprehensive proposal I'm against a partial reorganization:
> without a destination we can't start marching.

Not kidding at all. I said before that I don't think we can do everything
all at once. I *do* think this is solvable with a greedy algorithm rather
than waiting for some nebulous completion point.

> Naming things is very contentious -- everybody has an opinion.  To
> pick the right names you must see things in perspective.

Sure. And those diverse opinions are why I don't believe it is possible to
do all at once. The task is simply too large to tackle in one shot. IMO,
it must be solved incrementally. I'm not even going to attempt to try to
define a hierarchy for all those modules. I count 137 on my local system.
Let's say that I *do* try... some are going to end up "forced" rather than
obeying some obvious grouping. If you do it a chunk at a time, then you
get the obvious, intuitive groupings. Try for more, and you just bung it
all up.

For discussion's sake: can you provide a rationale for doing it all at
once? In the current scenario, modules just appear at some point. After a
partial reorg, some modules appear at a different point. "No big whoop."
Just because module A is in a package doesn't imply that module B must
also be in a package.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Sat Mar 25 10:35:39 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 01:35:39 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <001001bf9611$52e960a0$752d153f@tim>
Message-ID: <Pine.LNX.4.10.10003250127400.30345-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Tim Peters wrote:
> [GregS proposes a partial packaging of std modules for 1.6, Guido objects on
>  spurious grounds, GregS refutes that, Guido agrees]
> 
> > I'll have to invent a different reason why I don't want this -- because
> > I really don't!
> 
> This one's easy!  It's why I left the 20th of the 20 Pythonic Theses for you
> to fill in <wink>.  All you have to do now is come up with a pithy way to
> say "if it's something Guido is so interested in that he wants to be deeply
> involved in it himself, but it comes at a time when he's buried under prior
> commitments, then tough tulips, it waits".

No need for Pythonic Theses. I don't see anybody disagreeing with the end
goal. The issue comes up with *how* to get there.

I say "do it incrementally" while others say "do it all at once."
Personally, I don't think it is possible to do all at once. As a
corollary, if you can't do it all at once, but you *require* that it be
done all at once, then you have effectively deferred the problem. To put
it another way, Guido has already invented a reason to not do it: he just
requires that it be done all at once. Result: it won't be done.

[ not saying this was Guido's intent or desire... but this is how I read
  the result :-) ]

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From moshez at math.huji.ac.il  Sat Mar 25 10:55:12 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 25 Mar 2000 11:55:12 +0200 (IST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.34371.749039.946891@beluga.mojam.com>
Message-ID: <Pine.GSO.4.10.10003251154020.3539-100000@sundial>

On Fri, 24 Mar 2000, Skip Montanaro wrote:

> Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
> of general usefulness (this is at least generally useful for anyone writing
> web spiders ;-) shouldn't live in Tools, because it's not always available
> and users need to do extra work to make them available.

You're right, but I'd like this to be a 1.7 change. It's just that I plan
to suggest a great-renaming-fest for 1.7 modules, and then namespace
wouldn't be cluttered when you don't need it.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Sat Mar 25 11:16:23 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 25 Mar 2000 12:16:23 +0200 (IST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003251214081.3539-100000@sundial>

On Fri, 24 Mar 2000, Guido van Rossum wrote:

> OK, that's reasonable.  I'll have to invent a different reason why I
> don't want this -- because I really don't!

Here's a reason: there shouldn't be changes we'll retract later -- we
need to come up with the (more or less) right hierarchy the first time,
or we'll do a lot of work for nothing.

> Hm.  Moving modules requires painful and arcane CVS manipulations that
> can only be done by the few of us here at CNRI -- and I'm the only one
> left who's full time on Python.

Hmmmmm....this is a big problem. Maybe we need to have more people with
access to the CVS?
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mal at lemburg.com  Sat Mar 25 11:47:30 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 25 Mar 2000 11:47:30 +0100
Subject: [Python-Dev] Unicode charnames impl.
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
		<14555.57908.151946.182639@amarok.cnri.reston.va.us>
		<38DBE870.D88915B5@lemburg.com> <14555.61440.613940.50492@amarok.cnri.reston.va.us>
Message-ID: <38DC9942.3C4E4B92@lemburg.com>

"Andrew M. Kuchling" wrote:
> 
> M.-A. Lemburg writes:
> >.encode() should translate Unicode to a string. Since the
> >named char thing is probably only useful on input, I'd say:
> >don't do anything, except maybe return input.encode('unicode-escape').
> 
> Wait... then you can't stack it on top of unicode-escape, because it
> would already be Unicode escaped.

Sorry for the mixup (I guess yesterday wasn't my day...). I had
stream codecs in mind: these are stackable, meaning that you can
wrap one codec around another. And its also their interface API
that was changed -- not the basic stateless encoder/decoder ones.

Stacking of .encode()/.decode() must be done "by hand" in e.g.
the way I described above. Another approach would be subclassing
the unicode-escape Codec and then calling the base class method.

> >> 4) What do you with the error \N{...... no closing right bracket.
> >I'd suggest to take the upper bound of all Unicode name
> >lengths as limit.
> 
> Seems like a hack.

It is... but what other way would there be ?
 
> >Note that .decode() must only return the decoded data.
> >The "bytes read" integer was removed in order to make
> >the Codec APIs compatible with the standard file object
> >APIs.
> 
> Huh? Why does Misc/unicode.txt describe decode() as "Decodes the
> object input and returns a tuple (output object, length consumed)"?
> Or are you talking about a different .decode() method?

You're right... I was thinking about .read() and .write().
.decode() should do return a tuple, just as documented in
unicode.txt.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mhammond at skippinet.com.au  Sat Mar 25 14:20:59 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sun, 26 Mar 2000 00:20:59 +1100
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <Pine.LNX.4.10.10003250005430.30345-100000@nebula.lyra.org>
Message-ID: <ECEPKNMJLHAPFFJHDOJBCECKCHAA.mhammond@skippinet.com.au>

[Greg writes]
> I'm not even going to attempt to try to
> define a hierarchy for all those modules. I count 137 on my local system.
> Let's say that I *do* try... some are going to end up "forced" rather than
> obeying some obvious grouping. If you do it a chunk at a time, then you
> get the obvious, intuitive groupings. Try for more, and you just bung it
> all up.
...
> Just because module A is in a package doesn't imply that module B must
> also be in a package.

I agree with Greg - every module will not fit into a package.

But I also agree with Guido - we _should_ attempt to go through the 137
modules and put the ones that fit into logical groupings.  Greg is probably
correct with his selection for "net", but a general evaluation is still a
good thing.  A view of the bigger picture will help to quell debates over
the structure, and only leave us with the squabbles over the exact spelling
:-)

+2 on ... err .... -1 on ... errr - awww - screw that-<grin>-ly,

Mark.


From tismer at tismer.com  Sat Mar 25 14:35:50 2000
From: tismer at tismer.com (Christian Tismer)
Date: Sat, 25 Mar 2000 14:35:50 +0100
Subject: [Python-Dev] Unicode charnames impl.
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us>
Message-ID: <38DCC0B6.2A7D0EF1@tismer.com>


"Andrew M. Kuchling" wrote:
...
> 3) How can we store all those names?  The resulting dictionary makes a
> 361K .py file; Python dumps core trying to parse it.  (Another bug...)

This is simply not the place to use a dictionary.
You don't need fast lookup from names to codes,
but something that supports incremental search.
This would enable PythonWin to sho a pop-up list after
you typed the first letters.

I'm working on a common substring analysis that makes
each entry into 3 to 5 small integers.
You then encode these in an order-preserving way. That means,
the resulting code table is still lexically ordered, and
access to the sentences is done via bisection.
Takes me some more time to get that, but it will not
be larger than 60k, or I drop it.
Also note that all the names use uppercase letters and space
only. An opportunity to use simple context encoding and
use just 4 bits most of the time.

...
> I've also add a script that parses the names out of the NameList.txt
> file at ftp://ftp.unicode.org/Public/UNIDATA/.

Is there any reason why you didn't use the UnicodeData.txt file,
I mean do I cover everything if I continue to use that?

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From Vladimir.Marangozov at inrialpes.fr  Sat Mar 25 15:59:55 2000
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Sat, 25 Mar 2000 15:59:55 +0100 (CET)
Subject: [Python-Dev] Windows and PyObject_NEW
Message-ID: <200003251459.PAA09181@python.inrialpes.fr>

For MarkH, Guido and the Windows experienced:

I've been reading Jeffrey Richter's "Advanced Windows" last night in order
to try understanding better why PyObject_NEW is implemented differently for
Windows. Again, I feel uncomfortable with this, especially now, when
I'm dealing with the memory aspect of Python's object constructors/desctrs.

Some time ago, Guido elaborated on why PyObject_NEW uses malloc() on the
user's side, before calling _PyObject_New (on Windows, cf. objimpl.h):

[Guido]
> I can explain the MS_COREDLL business:
> 
> This is defined on Windows because the core is in a DLL.  Since the
> caller may be in another DLL, and each DLL (potentially) has a
> different default allocator, and (in pre-Vladimir times) the
> type-specific deallocator typically calls free(), we (Mark & I)
> decided that the allocation should be done in the type-specific
> allocator.  We changed the PyObject_NEW() macro to call malloc() and
> pass that into _PyObject_New() as a second argument.

While I agree with this, from reading chapters 5-9 of (a French copy of)
the book (translated backwards here):

5. Win32 Memory Architecture
6. Exploring Virtual Memory
7. Using Virtual Memory in Your Applications
8. Memory Mapped Files
9. Heaps

I can't find any radical Windows specificities for memory management.
On Windows, like the rest of the OSes, the (virtual & physical) memory
allocated for a process is common and seem to be accessible from all
DDLs involved in an executable.

Things like page sharing, copy-on-write, private process mem, etc. are
conceptually all the same on Windows and Unix.

Now, the backwards binary compatibility argument aside (assuming that
extensions get recompiled when a new Python version comes out),
my concern is that with the introduction of PyObject_NEW *and* PyObject_DEL,
there's no point in having separate implementations for Windows and Unix
any more  (or I'm really missing something and I fail to see what it is).

User objects would be allocated *and* freed by the core DLL (at least
the object headers). Even if several DLLs use different allocators, this
shouldn't be a problem if what's obtained via PyObject_NEW is freed via
PyObject_DEL. This Python memory would be allocated from the Python's
core DLL regions/pages/heaps. And I believe that the memory allocated
by the core DLL is accessible from the other DLL's of the process.
(I haven't seen evidence on the opposite, but tell me if this is not true)

I thought that maybe Windows malloc() uses different heaps for the different
DLLs, but that's fine too, as long as the _NEW/_DEL symmetry is respected
and all heaps are accessible from all DLLs (which seems to be the case...),
but:

In the beginning of Chapter 9, Heaps, I read the following:

"""
...About Win32 heaps (compared to Win16 heaps)...

* There is only one kind of heap (it doesn't have any particular name,
  like "local" or "global" on Win16, because it's unique)

* Heaps are always local to a process. The contents of a process heap is
  not accessible from the threads of another process. A large number of
  Win16 applications use the global heap as a way of sharing data between
  processes; this change in the Win32 heaps is often a source of problems
  for porting Win16 applications to Win32.

* One process can create several heaps in its addressing space and can
  manipulate them all.

* A DLL does not have its own heap. It uses the heaps as part of the
  addressing space of the process. However, a DLL can create a heap in
  the addressing space of a process and reserve it for its own use.
  Since several 16-bit DLLs share data between processes by using the
  local heap of a DLL, this change is a source of problems when porting
  Win16 apps to Win32...
"""

This last paragraph confuses me. On one hand, it's stated that all heaps
can be manipulated by the process, and OTOH, a DLL can reserve a heap for
personal use within that process (implying the heap is r/w protected for
the other DLLs ?!?). The rest of this chapter does not explain how this
"private reservation" is or can be done, so some of you would probably
want to chime in and explain this to me.

Going back to PyObject_NEW, if it turns out that all heaps are accessible
from all DLLs involved in the process, I would probably lobby for unifying
the implementation of _PyObject_NEW/_New and _PyObject_DEL/_Del for Windows
and Unix.

Actually on Windows, object allocation does not depend on a central,
Python core memory allocator. Therefore, with the patches I'm working on,
changing the core allocator would work (would be changed for real) only for
platforms other than Windows.

Next, ff it's possible to unify the implementation, it would also be
possible to expose and officialize in the C API a new function set:

PyObject_New() and PyObject_Del() (without leading underscores)

For now, due to the implementation difference on Windows, we're forced to
use the macro versions PyObject_NEW/DEL.

Clearly, please tell me what would be wrong on Windows if a) & b) & c):

a) we have PyObject_New(), PyObject_Del()
b) their implementation is platform independent (no MS_COREDLL diffs,
   we retain the non-Windows variant)
c) they're both used systematically for all object types

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From gmcm at hypernet.com  Sat Mar 25 16:46:01 2000
From: gmcm at hypernet.com (Gordon McMillan)
Date: Sat, 25 Mar 2000 10:46:01 -0500
Subject: [Python-Dev] Windows and PyObject_NEW
In-Reply-To: <200003251459.PAA09181@python.inrialpes.fr>
Message-ID: <1258123323-10623548@hypernet.com>

Vladimir Marangozov

> ... And I believe that the memory allocated
> by the core DLL is accessible from the other DLL's of the process.
> (I haven't seen evidence on the opposite, but tell me if this is not true)

This is true. Or, I should say, it all boils down to 
 HeapAlloc( heap, flags, bytes)
and malloc is going to use the _crtheap.

> In the beginning of Chapter 9, Heaps, I read the following:
> 
> """
> ...About Win32 heaps (compared to Win16 heaps)...
> 
> * There is only one kind of heap (it doesn't have any particular name,
>   like "local" or "global" on Win16, because it's unique)
> 
> * Heaps are always local to a process. The contents of a process heap is
>   not accessible from the threads of another process. A large number of
>   Win16 applications use the global heap as a way of sharing data between
>   processes; this change in the Win32 heaps is often a source of problems
>   for porting Win16 applications to Win32.
> 
> * One process can create several heaps in its addressing space and can
>   manipulate them all.
> 
> * A DLL does not have its own heap. It uses the heaps as part of the
>   addressing space of the process. However, a DLL can create a heap in
>   the addressing space of a process and reserve it for its own use.
>   Since several 16-bit DLLs share data between processes by using the
>   local heap of a DLL, this change is a source of problems when porting
>   Win16 apps to Win32...
> """
> 
> This last paragraph confuses me. On one hand, it's stated that all heaps
> can be manipulated by the process, and OTOH, a DLL can reserve a heap for
> personal use within that process (implying the heap is r/w protected for
> the other DLLs ?!?). 

At any time, you can creat a new Heap
 handle HeapCreate(options, initsize, maxsize)

Nothing special about the "dll" context here. On Win9x, only 
someone who knows about the handle can manipulate the 
heap. (On NT, you can enumerate the handles in the process.)

I doubt very much that you would break anybody's code by 
removing the Windows specific behavior.

But it seems to me that unless Python always uses the 
default malloc, those of us who write C++ extensions will have 
to override operator new? I'm not sure. I've used placement 
new to allocate objects in a memory mapped file, but I've never 
tried to muck with the global memory policy of C++ program.


- Gordon


From akuchlin at mems-exchange.org  Sat Mar 25 18:58:56 2000
From: akuchlin at mems-exchange.org (Andrew Kuchling)
Date: Sat, 25 Mar 2000 12:58:56 -0500 (EST)
Subject: [Python-Dev] Unicode charnames impl.
In-Reply-To: <38DCC0B6.2A7D0EF1@tismer.com>
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
	<14555.57908.151946.182639@amarok.cnri.reston.va.us>
	<38DCC0B6.2A7D0EF1@tismer.com>
Message-ID: <14556.65120.22727.524616@newcnri.cnri.reston.va.us>

Christian Tismer writes:
>This is simply not the place to use a dictionary.
>You don't need fast lookup from names to codes,
>but something that supports incremental search.
>This would enable PythonWin to sho a pop-up list after
>you typed the first letters.

Hmm... one could argue that PythonWin or IDLE should provide their own
database for incremental searching; I was planning on following Bill
Tutt's suggestion of generating a perfect minimal hash for the names.
gperf isn't up to the job, but I found an algorithm that should be OK.
Just got to implement it now...  But, if your approach pays off it'll
be superior to a perfect hash.

>Is there any reason why you didn't use the UnicodeData.txt file,
>I mean do I cover everything if I continue to use that?

Oops; I saw the NameList file and just went for it; maybe it should
use the full UnicodeData.txt.

--amk


From moshez at math.huji.ac.il  Sat Mar 25 19:10:44 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 25 Mar 2000 20:10:44 +0200 (IST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBCECKCHAA.mhammond@skippinet.com.au>
Message-ID: <Pine.GSO.4.10.10003252008130.7664-100000@sundial>

On Sun, 26 Mar 2000, Mark Hammond wrote:

> But I also agree with Guido - we _should_ attempt to go through the 137

Where did you come up with that number? I counted much more -- not quite
sure, but certainly more.

Well, here's a tentative suggestion I worked out today. This is just to
have something to quibble about. In the interest of rushing it out of the
door, there are a few modules (explicitly mentioned) which I have said
nothing about.

net
	httplib
	ftplib
	urllib
	cgi
	gopherlib
	imaplib
	poplib
	nntplib
	smptlib
	urlparse
	telnetlib
	server
		BaseHTTPServer
		CGIHTTPServer
		SimpleHTTPServer
		SocketServer
		asynchat
		asyncore
text
	sgmllib
	htmllib
	htmlentitydefs
	xml
		whatever the xml-sig puts here
	mail
		rfc822
		mime
			MimeWriter
			mimetools
			mimify
			mailcap
			mimetypes
			base64
			quopri
		mailbox
		mhlib
	binhex
	parse
		string
		re
		regex
		reconvert
		regex_syntax
		regsub
		shlex
	ConfigParser
	linecache
	multifile
	netrc
bin
	gzip
	zlib
	aifc
	chunk
	image
		imghdr
		colorsys
		imageop
		imgfile
		rgbimg
		yuvconvert
	sound
		sndhdr
		toaiff
		audiodev
		sunau
		sunaudio
		wave
		audioop
		sunaudiodev
db
	anydbm
	whichdb
	bsddb
	dbm
	dbhash
	dumbdbm
	gdbm
math
	bisect
	fpformat
	random
	whrandom
	cmath
	math
	crypt
	fpectl
	fpetest
	array
	md5
	mpz
	rotor
	sha
time
	calendar
	time
	tzparse
	sched
	timing
interpreter
	new
	py_compile
	code
	codeop
	compileall
	keyword
	token
	tokenize
	parser
	dis
	bdb
	pdb
	profile
	pyclbr
	tabnanny
	symbol
	pstats
	traceback
	rlcompleter
security
	Bastion
	rexec
	ihooks
file
	dircache
	path -- a virtual module which would do a from <something>path import *
	dospath
	posixpath
	macpath
	nturl2path
	ntpath
	macurl2path
	filecmp
	fileinput
	StringIO
	cStringIO
	glob
	fnmatch
	posixfile
	stat
	statcache
	statvfs
	tempfile
	shutil
	pipes
	popen2
	commands
	dl
	fcntl
serialize
	pickle
	cPickle
	shelve
	xdrlib
	copy
	copy_reg
threads
	thread
	threading
	Queue
	mutex
ui
	curses
	Tkinter
	cmd
	getpass
internal
	_codecs
	_locale
	_tkinter
	pcre
	strop
	posix
users
	pwd
	grp
	nis
exceptions
os
types
UserDict
UserList
user
site
locale
sgi
	al
	cd
	cl
	fl
	fm
	gl
	misc (what used to be sgimodule.c)
	sv
unicode
	codecs
	unicodedata
	unicodedatabase
========== Modules not handled ============
formatter
getopt
pprint
pty
repr
tty
errno
operator
pure
readline
resource
select
signal
socket
struct
syslog
termios

Well, if you got this far, you certainly deserve...

congratualtions-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From DavidA at ActiveState.com  Sat Mar 25 19:28:30 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Sat, 25 Mar 2000 10:28:30 -0800
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <Pine.GSO.4.10.10003252008130.7664-100000@sundial>
Message-ID: <LMBBIEIJKMPMLBONJMFCKEGNCDAA.DavidA@ActiveState.com>

> db
> 	anydbm
> 	whichdb
> 	bsddb
> 	dbm
> 	dbhash
> 	dumbdbm
> 	gdbm

This made me think of one issue which is worth considering -- is there a
mechanism for third-party packages to hook into the standard naming
hierarchy?  It'd be weird not to have the oracle and sybase modules within
the db toplevel package, for example.

--david ascher


From moshez at math.huji.ac.il  Sat Mar 25 19:30:26 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 25 Mar 2000 20:30:26 +0200 (IST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <LMBBIEIJKMPMLBONJMFCKEGNCDAA.DavidA@ActiveState.com>
Message-ID: <Pine.GSO.4.10.10003252028290.7664-100000@sundial>

On Sat, 25 Mar 2000, David Ascher wrote:

> This made me think of one issue which is worth considering -- is there a
> mechanism for third-party packages to hook into the standard naming
> hierarchy?  It'd be weird not to have the oracle and sybase modules within
> the db toplevel package, for example.

My position is that any 3rd party module decides for itself where it wants
to live -- once we formalized the framework. Consider PyGTK/PyGnome,
PyQT/PyKDE -- they should live in the UI package too...


From DavidA at ActiveState.com  Sat Mar 25 19:50:14 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Sat, 25 Mar 2000 10:50:14 -0800
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <Pine.GSO.4.10.10003252028290.7664-100000@sundial>
Message-ID: <LMBBIEIJKMPMLBONJMFCKEGOCDAA.DavidA@ActiveState.com>

> On Sat, 25 Mar 2000, David Ascher wrote:
>
> > This made me think of one issue which is worth considering -- is there a
> > mechanism for third-party packages to hook into the standard naming
> > hierarchy?  It'd be weird not to have the oracle and sybase
> modules within
> > the db toplevel package, for example.
>
> My position is that any 3rd party module decides for itself where it wants
> to live -- once we formalized the framework. Consider PyGTK/PyGnome,
> PyQT/PyKDE -- they should live in the UI package too...

That sounds good in theory, but I can see possible problems down the line:

1) The current mapping between package names and directory structure means
that installing a third party package hierarchy in a different place on disk
than the standard library requires some work on the import mechanisms (this
may have been discussed already) and a significant amount of user education.

2) We either need a 'registration' mechanism whereby people can claim a name
in the standard hierarchy or expect conflicts.  As far as I can gather, in
the Perl world registration occurs by submission to CPAN.  Correct?

One alternative is to go the Java route, which would then mean, I think,
that some core modules are placed very high in the hierarchy (the equivalent
of the java. subtree), and some others are deprecated to lower subtree (the
equivalent of com.sun).

Anyway, I agree with Guido on this one -- naming is a contentious issue
wrought with long-term implications.  Let's not rush into a decision just
yet.

--david


From guido at python.org  Sat Mar 25 19:56:20 2000
From: guido at python.org (Guido van Rossum)
Date: Sat, 25 Mar 2000 13:56:20 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Sat, 25 Mar 2000 01:35:39 PST."
             <Pine.LNX.4.10.10003250127400.30345-100000@nebula.lyra.org> 
References: <Pine.LNX.4.10.10003250127400.30345-100000@nebula.lyra.org> 
Message-ID: <200003251856.NAA09636@eric.cnri.reston.va.us>

> I say "do it incrementally" while others say "do it all at once."
> Personally, I don't think it is possible to do all at once. As a
> corollary, if you can't do it all at once, but you *require* that it be
> done all at once, then you have effectively deferred the problem. To put
> it another way, Guido has already invented a reason to not do it: he just
> requires that it be done all at once. Result: it won't be done.

Bullshit, Greg.  (I don't normally like to use such strong words, but
since you're being confrontational here...)

I'm all for doing it incrementally -- but I want the plan for how to
do it made up front.  That doesn't require all the details to be
worked out -- but it requires a general idea about what kind of things
we will have in the namespace and what kinds of names they get.  An
organizing principle, if you like.  If we were to decide later that we
go for a Java-like deep hierarchy, the network package would have to
be moved around again -- what a waste.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From moshez at math.huji.ac.il  Sat Mar 25 20:35:37 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 25 Mar 2000 21:35:37 +0200 (IST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <LMBBIEIJKMPMLBONJMFCKEGOCDAA.DavidA@ActiveState.com>
Message-ID: <Pine.GSO.4.10.10003252127560.8000-100000@sundial>

On Sat, 25 Mar 2000, David Ascher wrote:

> > My position is that any 3rd party module decides for itself where it wants
> > to live -- once we formalized the framework. Consider PyGTK/PyGnome,
> > PyQT/PyKDE -- they should live in the UI package too...
> 
> That sounds good in theory, but I can see possible problems down the line:
> 
> 1) The current mapping between package names and directory structure means
> that installing a third party package hierarchy in a different place on disk
> than the standard library requires some work on the import mechanisms (this
> may have been discussed already) and a significant amount of user education.

Ummmm....
1.a) If the work of the import-sig produces something (which I suspect it
will), it's more complicated -- you could have JAR-like files with
hierarchies inside.

1.b) Installation is the domain of the distutils-sig. I seem to remember
Greg Ward saying something about installing packages.

> 2) We either need a 'registration' mechanism whereby people can claim a name
> in the standard hierarchy or expect conflicts.  As far as I can gather, in
> the Perl world registration occurs by submission to CPAN.  Correct?

Yes. But this is no worse then the current situation, where people pick 
a toplevel name <wink>. I agree a registration mechanism would be helpful.

> One alternative is to go the Java route, which would then mean, I think,
> that some core modules are placed very high in the hierarchy (the equivalent
> of the java. subtree), and some others are deprecated to lower subtree (the
> equivalent of com.sun).

Personally, I *hate* the Java mechanism -- see Stallman's position on why
GNU Java packages use gnu.* rather then org.gnu.* for some of the reasons.
I really, really, like the Perl mechanism, and I think we would do well
to think if something like that wouldn't suit us, with minor
modifications. (Remember that lwall copied the Pythonic module mechanism, 
so Perl and Python modules are quite similar)

> Anyway, I agree with Guido on this one -- naming is a contentious issue
> wrought with long-term implications.  Let's not rush into a decision just
> yet.

I agree. That's why I pushed out the straw-man proposal.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From bwarsaw at cnri.reston.va.us  Sat Mar 25 21:07:27 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Sat, 25 Mar 2000 15:07:27 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <14555.57858.824301.693390@anthem.cnri.reston.va.us>
	<Pine.LNX.4.10.10003241718500.26440-100000@nebula.lyra.org>
Message-ID: <14557.7295.451011.36533@anthem.cnri.reston.va.us>

I guess I was making a request for a more comprehensive list.  People
are asking to packagize the entire directory, so I'd like to know what
organization they'd propose for all the modules.

-Barry


From bwarsaw at cnri.reston.va.us  Sat Mar 25 21:20:09 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Sat, 25 Mar 2000 15:20:09 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <200003242129.QAA06510@eric.cnri.reston.va.us>
	<Pine.GSO.4.10.10003251214081.3539-100000@sundial>
Message-ID: <14557.8057.896921.693908@anthem.cnri.reston.va.us>

>>>>> "MZ" == Moshe Zadka <moshez at math.huji.ac.il> writes:

    MZ> Hmmmmm....this is a big problem. Maybe we need to have more
    MZ> people with access to the CVS?

To make changes like this, you don't just need write access to CVS,
you need physical access to the repository filesystem.  It's not
possible to provide this access to non-CNRI'ers.

-Barry


From gstein at lyra.org  Sat Mar 25 21:40:59 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 12:40:59 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14557.8057.896921.693908@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003251240010.2490-100000@nebula.lyra.org>

On Sat, 25 Mar 2000, Barry A. Warsaw wrote:
> >>>>> "MZ" == Moshe Zadka <moshez at math.huji.ac.il> writes:
> 
>     MZ> Hmmmmm....this is a big problem. Maybe we need to have more
>     MZ> people with access to the CVS?
> 
> To make changes like this, you don't just need write access to CVS,
> you need physical access to the repository filesystem.  It's not
> possible to provide this access to non-CNRI'ers.

Unless the CVS repository was moved to, say, SourceForge. 

:-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From bwarsaw at cnri.reston.va.us  Sat Mar 25 22:00:39 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Sat, 25 Mar 2000 16:00:39 -0500 (EST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
References: <LMBBIEIJKMPMLBONJMFCKEGOCDAA.DavidA@ActiveState.com>
	<Pine.GSO.4.10.10003252127560.8000-100000@sundial>
Message-ID: <14557.10487.736544.336550@anthem.cnri.reston.va.us>

>>>>> "MZ" == Moshe Zadka <moshez at math.huji.ac.il> writes:

    MZ> Personally, I *hate* the Java mechanism -- see Stallman's
    MZ> position on why GNU Java packages use gnu.* rather then
    MZ> org.gnu.* for some of the reasons.

Actually, it's Per Bothner's position:

http://www.gnu.org/software/java/why-gnu-packages.txt

and I agree with him.  I kind of wished that JimH had chosen simply
`python' as JPython's top level package heirarchy, but that's too late
to change now.

-Barry


From bwarsaw at cnri.reston.va.us  Sat Mar 25 22:03:08 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Sat, 25 Mar 2000 16:03:08 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <14557.8057.896921.693908@anthem.cnri.reston.va.us>
	<Pine.LNX.4.10.10003251240010.2490-100000@nebula.lyra.org>
Message-ID: <14557.10636.504088.517078@anthem.cnri.reston.va.us>

>>>>> "GS" == Greg Stein <gstein at lyra.org> writes:

    GS> Unless the CVS repository was moved to, say, SourceForge.

I didn't want to rehash that, but yes, you're absolutely right!

-Barry


From gstein at lyra.org  Sat Mar 25 22:13:00 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 13:13:00 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14557.10636.504088.517078@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003251309050.2490-100000@nebula.lyra.org>

On Sat, 25 Mar 2000 bwarsaw at cnri.reston.va.us wrote:
> >>>>> "GS" == Greg Stein <gstein at lyra.org> writes:
> 
>     GS> Unless the CVS repository was moved to, say, SourceForge.
> 
> I didn't want to rehash that, but yes, you're absolutely right!

Me neither, ergo the smiley :-)

Just felt inclined to mention it, and I think the conversation stopped
last time at that point; not sure it ever was "hashed" :-). But it is only
a discussion to raise if checkins-via-CNRI-guys becomes a true bottleneck.
Which it hasn't and doesn't look to be. Constrained? Yes. Bottleneck? No.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From jeremy at cnri.reston.va.us  Sat Mar 25 22:22:09 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Sat, 25 Mar 2000 16:22:09 -0500 (EST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBCECKCHAA.mhammond@skippinet.com.au>
References: <Pine.LNX.4.10.10003250005430.30345-100000@nebula.lyra.org>
	<ECEPKNMJLHAPFFJHDOJBCECKCHAA.mhammond@skippinet.com.au>
Message-ID: <14557.4689.858620.578102@walden>

>>>>> "MH" == Mark Hammond <mhammond at skippinet.com.au> writes:

  MH> [Greg writes]
  >> I'm not even going to attempt to try to define a hierarchy for
  >> all those modules. I count 137 on my local system.  Let's say
  >> that I *do* try... some are going to end up "forced" rather than
  >> obeying some obvious grouping. If you do it a chunk at a time,
  >> then you get the obvious, intuitive groupings. Try for more, and
  >> you just bung it all up.

  MH> I agree with Greg - every module will not fit into a package.

Sure.  No one is arguing with that :-).

Where I disagree with Greg, is that we shouldn't approach this
piecemeal.  A greedy algorithm can lead to a locally optimal solution
that isn't the right for the whole library.  A name or grouping might
make sense on its own, but isn't sufficiently clear when taking all
137 odd modules into account.

  MH> But I also agree with Guido - we _should_ attempt to go through
  MH> the 137 modules and put the ones that fit into logical
  MH> groupings.  Greg is probably correct with his selection for
  MH> "net", but a general evaluation is still a good thing.  A view
  MH> of the bigger picture will help to quell debates over the
  MH> structure, and only leave us with the squabbles over the exact
  MH> spelling :-)

x1.5 on this. I'm not sure which direction you ended up thinking this
was (+ or -), but which ever direction it was I like it.

Jeremy


From gstein at lyra.org  Sat Mar 25 22:40:48 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 13:40:48 -0800 (PST)
Subject: [Python-Dev] voting numbers
Message-ID: <Pine.LNX.4.10.10003251328190.2490-100000@nebula.lyra.org>

Hey... just thought I'd drop off a description of the "formal" mechanism
that the ASF uses for voting since it has been seen here and there on this
group :-)

+1  "I'm all for it. Do it!"
+0  "Seems cool and acceptable, but I can also live without it"
-0  "Not sure this is the best thing to do, but I'm not against it."
-1  "Veto. And <HERE> is my reasoning."


Strictly speaking, there is no vetoing here, other than by Guido. For
changes to Apache (as opposed to bug fixes), it depends on where the
development is. Early stages, it is reasonably open and people work
straight against CVS (except for really big design changes). Late stage,
it requires three +1 votes during discussion of a patch before it goes in.

Here on python-dev, it would seem that the votes are a good way to quickly
let Guido know people's feelings about topic X or Y.

On the patches mailing list, the voting could actually be quite a useful
measure for the people with CVS commit access. If a patch gets -1, then
its commit should wait until reason X has been resolved. Note that it can
be resolved in two ways: the person lifts their veto (after some amount of
persuasion or explanation), or the patch is updated to address the
concerns (well, unless the veto is against the concept of the patch
entirely :-). If a patch gets a few +1 votes, then it can probably go
straight in. Note that the Apache guys sometimes say things like "+1 on
concept" meaning they like the idea, but haven't reviewed the code.

Do we formalize on using these? Not really suggesting that. But if myself
(and others) drop these things into mail notes, then we may as well have a
description of just what the heck is going on :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From moshez at math.huji.ac.il  Sun Mar 26 00:27:18 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 01:27:18 +0200 (IST)
Subject: [Python-Dev] Q: repr.py vs. pprint.py
Message-ID: <Pine.GSO.4.10.10003260123420.9956-100000@sundial>

Is there any reason to keep two seperate modules with simple-formatting
functions? I think pprint is somewhat more sophisticated, but in the
worst case, we can just dump them both in the same file (the only thing
would be that pprint would export "repr", in addition to "saferepr" (among
others).

(Just bumped into this in my reorg suggestion)
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Sun Mar 26 00:32:38 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 01:32:38 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
Message-ID: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>

Here's a second version of the straw man proposal for the reorganization
of modules in packages. Note that I'm treating it as a strictly 1.7
proposal, so I don't care a "lot" about backwards compatiblity.

I'm down to 4 unhandled modules, which means that if no one objects (and
I'm sure someone will <wink>), this can be a plan of action. So get your
objections ready guys!

net
	httplib
	ftplib
	urllib
	cgi
	gopherlib
	imaplib
	poplib
	nntplib
	smptlib
	urlparse
	telnetlib
	server
		BaseHTTPServer
		CGIHTTPServer
		SimpleHTTPServer
		SocketServer
		asynchat
		asyncore
text
	sgmllib
	htmllib
	htmlentitydefs
	xml
		whatever the xml-sig puts here
	mail
		rfc822
		mime
			MimeWriter
			mimetools
			mimify
			mailcap
			mimetypes
			base64
			quopri
		mailbox
		mhlib
	binhex
	parse
		string
		re
		regex
		reconvert
		regex_syntax
		regsub
		shlex
	ConfigParser
	linecache
	multifile
	netrc
bin
	gzip
	zlib
	aifc
	chunk
	image
		imghdr
		colorsys
		imageop
		imgfile
		rgbimg
		yuvconvert
	sound
		sndhdr
		toaiff
		audiodev
		sunau
		sunaudio
		wave
		audioop
		sunaudiodev
db
	anydbm
	whichdb
	bsddb
	dbm
	dbhash
	dumbdbm
	gdbm
math
	bisect
	fpformat
	random
	whrandom
	cmath
	math
	crypt
	fpectl
	fpetest
	array
	md5
	mpz
	rotor
	sha
time
	calendar
	time
	tzparse
	sched
	timing
interpreter
	new
	py_compile
	code
	codeop
	compileall
	keyword
	token
	tokenize
	parser
	dis
	bdb
	pdb
	profile
	pyclbr
	tabnanny
	symbol
	pstats
	traceback
	rlcompleter
security
	Bastion
	rexec
	ihooks
file
	dircache
	path -- a virtual module which would do a from <something>path import *
	dospath
	posixpath
	macpath
	nturl2path
	ntpath
	macurl2path
	filecmp
	fileinput
	StringIO
	cStringIO
	glob
	fnmatch
	posixfile
	stat
	statcache
	statvfs
	tempfile
	shutil
	pipes
	popen2
	commands
	dl
	fcntl
	lowlevel
		socket
		select
	terminal
		termios
		pty
		tty
		readline
	syslog
serialize
	pickle
	cPickle
	shelve
	xdrlib
	copy
	copy_reg
threads
	thread
	threading
	Queue
	mutex
ui
	curses
	Tkinter
	cmd
	getpass
internal
	_codecs
	_locale
	_tkinter
	pcre
	strop
	posix
users
	pwd
	grp
	nis
sgi
	al
	cd
	cl
	fl
	fm
	gl
	misc (what used to be sgimodule.c)
	sv
unicode
	codecs
	unicodedata
	unicodedatabase
exceptions
os
types
UserDict
UserList
user
site
locale
pure
formatter
getopt
signal
pprint
========== Modules not handled ============
errno
resource
operator
struct

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From DavidA at ActiveState.com  Sun Mar 26 00:39:51 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Sat, 25 Mar 2000 15:39:51 -0800
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <Pine.GSO.4.10.10003252127560.8000-100000@sundial>
Message-ID: <LMBBIEIJKMPMLBONJMFCEEHFCDAA.DavidA@ActiveState.com>

> I really, really, like the Perl mechanism, and I think we would do well
> to think if something like that wouldn't suit us, with minor
> modifications.

The biggest modification which I think is needed to a Perl-like organization
is that IMO there is value in knowing what packages are 'blessed' by Guido.
In other words, some sort of Q/A mechanism would be good, if it can be kept
simple.

[Alternatively, let's not put a Q/A mechanism in place and my employer can
make money selling that information, the way they do for Perl! =)]

> (Remember that lwall copied the Pythonic module mechanism,
> so Perl and Python modules are quite similar)

That's stretching things a bit (the part after the 'so' doesn't follow from
the part before), as there is a lot more to the nature of module systems,
but the point is well taken.

--david


From moshez at math.huji.ac.il  Sun Mar 26 06:44:02 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 06:44:02 +0200 (IST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <LMBBIEIJKMPMLBONJMFCEEHFCDAA.DavidA@ActiveState.com>
Message-ID: <Pine.GSO.4.10.10003260642150.11076-100000@sundial>

On Sat, 25 Mar 2000, David Ascher wrote:

> The biggest modification which I think is needed to a Perl-like organization
> is that IMO there is value in knowing what packages are 'blessed' by Guido.
> In other words, some sort of Q/A mechanism would be good, if it can be kept
> simple.

You got a point. Anyone knows how the perl-porters decide what modules to 
put in source.tar.gz?

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping at lfw.org  Sun Mar 26 07:01:58 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 21:01:58 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003252035050.2741-100000@skuld.lfw.org>

On Sun, 26 Mar 2000, Moshe Zadka wrote:
> Here's a second version of the straw man proposal for the reorganization
> of modules in packages. Note that I'm treating it as a strictly 1.7
> proposal, so I don't care a "lot" about backwards compatiblity.

Hey, this looks pretty good.  For the most part i agree with
your layout.  Here are a few notes...

> net
[...]
> 	server
[...]

Good.

> text
[...]
> 	xml
> 		whatever the xml-sig puts here
> 	mail
> 		rfc822
> 		mime
> 			MimeWriter
> 			mimetools
> 			mimify
> 			mailcap
> 			mimetypes
> 			base64
> 			quopri
> 		mailbox
> 		mhlib
> 	binhex

I'm not convinced "mime" needs a separate branch here.
(This is the deepest part of the tree, and at three levels
small alarm bells went off in my head.)

For example, why text.binhex but text.mail.mime.base64?

> 	parse
> 		string
> 		re
> 		regex
> 		reconvert
> 		regex_syntax
> 		regsub
> 		shlex
> 	ConfigParser
> 	linecache
> 	multifile
> 	netrc

The "re" module, in particular, will get used a lot,
and it's not clear why these all belong under "parse".
I suggest dropping "parse" and moving these up.
What's "multifile" doing here instead of with the rest
of the mail/mime stuff?

> bin
[...]

I like this.  Good idea.

> 	gzip
> 	zlib
> 	aifc

Shouldn't "aifc" be under "sound"?

> 	image
[...]
> 	sound
[...]

> db
[...]

Yup.

> math
[...]
> time
[...]

Looks good.

> interpreter
[...]

How about just "interp"?

> security
[...]

> file
[...]
> 	lowlevel
> 		socket
> 		select

Why the separate "lowlevel" branch?
Why doesn't "socket" go under "net"?

> 	terminal
> 		termios
> 		pty
> 		tty
> 		readline

Why does "terminal" belong under "file"?
Maybe it could go under "ui"?  Hmm... "pty" doesn't
really belong.

> 	syslog

Hmm...

> serialize

> 	pickle
> 	cPickle
> 	shelve
> 	xdrlib
> 	copy
> 	copy_reg

"copy" doesn't really fit here under "serialize", and
"serialize" is kind of a long name.

How about a "data types" package?  We could then put
"struct", "UserDict", "UserList", "pprint", and "repr" here.

    data
        copy
        copy_reg
        pickle
        cPickle
        shelve
        xdrlib
        struct
        UserDict
        UserList
        pprint
        repr

On second thought, maybe "struct" fits better under "bin".

> threads
[...]
> ui
[...]

Uh huh.

> internal
> 	_codecs
> 	_locale
> 	_tkinter
> 	pcre
> 	strop
> 	posix

Not sure this is a good idea.  It means the Unicode
work lives under both "unicode" and "internal._codecs",
Tk is split between "ui" and "internal._tkinter",
regular expressions are split between "text.re" and
"internal.pcre".  I can see your motivation for getting
"posix" out of the way, but i suspect this is likely to
confuse people.

> users
> 	pwd
> 	grp
> 	nis

Hmm.  Yes, i suppose so.

> sgi
[...]
> unicode
[...]

Indeed.

> os
> UserDict
> UserList
> exceptions
> types
> operator
> user
> site

Yeah, these are all top-level (except maybe UserDict and
UserList, see above).

> locale

I think "locale" belongs under "math" with "fpformat" and
the others.  It's for numeric formatting.

> pure

What the heck is "pure"?

> formatter

This probably goes under "text".

> struct

See above under "data".  I can't decide whether "struct"
should be part of "data" or "bin".  Hmm... probably "bin" --
since, unlike the serializers under "data", "struct" does
not actually specify a serialization format, it only provides
fairly low-level operations.

Well, this leaves a few system-like modules that didn't
really fit elsewhere for me:

    pty
    tty
    termios
    syslog
    select
    getopt
    signal
    errno
    resource

They all seem to be Unix-related.  How about putting these
in a "unix" or "system" package?


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From moshez at math.huji.ac.il  Sun Mar 26 07:58:34 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 07:58:34 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252035050.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003260742070.11386-100000@sundial>

On Sat, 25 Mar 2000, Ka-Ping Yee wrote:

> I'm not convinced "mime" needs a separate branch here.
> (This is the deepest part of the tree, and at three levels
> small alarm bells went off in my head.)

I've had my problems with that too, but it seemed to many modules were
mime specific.

> For example, why text.binhex but text.mail.mime.base64?

Actually, I thought about this (this isn't random at all): base64 encoding
is part of the mime standard, together with quoted-printable. Binhex
isn't. I don't know if you find it reason enough, and it may be smarter
just having a text.encode.{quopri,uu,base64,binhex}

> > 	parse
> > 		string
> > 		re
> > 		regex
> > 		reconvert
> > 		regex_syntax
> > 		regsub
> > 		shlex
> > 	ConfigParser
> > 	linecache
> > 	multifile
> > 	netrc
> 
> The "re" module, in particular, will get used a lot,

and 
from <something> import re

Doesn't seem too painful.

> and it's not clear why these all belong under "parse".

These are all used for parsing data (which does not have some pre-written
parser). I had problems with the name too...

> What's "multifile" doing here instead of with the rest
> of the mail/mime stuff?

It's also useful generally.

> Shouldn't "aifc" be under "sound"?

You're right.

> > interpreter
> [...]
> 
> How about just "interp"?

I've no *strong* feelings, just a vague "don't abbrev." hunch <wink>

> Why the separate "lowlevel" branch?

Because it is -- most Python code will use one of the higher level
modules.

> Why doesn't "socket" go under "net"?

What about UNIX domain sockets? Again, no *strong* opinion, though.

> > 	terminal
> > 		termios
> > 		pty
> > 		tty
> > 		readline
> 
> Why does "terminal" belong under "file"?

Because it is (a special kind of file)

> > serialize
> 
> > 	pickle
> > 	cPickle
> > 	shelve
> > 	xdrlib
> > 	copy
> > 	copy_reg
> 
> "copy" doesn't really fit here under "serialize", and
> "serialize" is kind of a long name.

I beg to disagree -- "copy" is frequently close to serialization, both in
the model (serializing to a "data structure") and in real life (that's the
way people copy stuff in Java, and UNIX too: think tar cvf - | tar xvf -)

What's more, copy_reg is used both for copy and for pickle

I do like the idea of "data-types" package, but it needs to be ironed 
out a bit.

> > internal
> > 	_codecs
> > 	_locale
> > 	_tkinter
> > 	pcre
> > 	strop
> > 	posix
> 
> Not sure this is a good idea.  It means the Unicode
> work lives under both "unicode" and "internal._codecs",
> Tk is split between "ui" and "internal._tkinter",
> regular expressions are split between "text.re" and
> "internal.pcre".  I can see your motivation for getting
> "posix" out of the way, but i suspect this is likely to
> confuse people.

You mistook my motivation -- I just want unadvertised modules (AKA
internal use modules) to live in a carefully segregate section of the
namespace. How would this confuse people? No one imports _tkinter or pcre,
so no one would notice the change.


> > locale
> 
> I think "locale" belongs under "math" with "fpformat" and
> the others.  It's for numeric formatting.

Only? And anyway, I doubt many people will think like that.

> > pure
> 
> What the heck is "pure"?

A module that helps work with purify.

> > formatter
> 
> This probably goes under "text".

You're right.

> Well, this leaves a few system-like modules that didn't
> really fit elsewhere for me:
> 
>     pty
>     tty
>     termios
>     syslog
>     select
>     getopt
>     signal
>     errno
>     resource
> 
> They all seem to be Unix-related.  How about putting these
> in a "unix" or "system" package?

"select", "signal" aren't UNIX specific.
"getopt" is used for generic argument processing, so it isn't really UNIX
specific. And I don't like the name "system" either. But I have no
constructive proposals about thos either.

so-i'll-just-shut-up-now-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From dan at cgsoftware.com  Sun Mar 26 08:05:44 2000
From: dan at cgsoftware.com (Daniel Berlin)
Date: Sat, 25 Mar 2000 22:05:44 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260742070.11386-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003252202110.11001-100000@propylaea.anduin.com>

> "select", "signal" aren't UNIX specific.
Huh?
How not?
Can you name a non-UNIX that is providing them?
(BeOS wouldn't count, select is broken, and nobody uses signals.)
and if you can, is it providing them for something other than "UNIX/POSIX
compatibility"
> "getopt" is used for generic argument processing, so it isn't really UNIX
> specific.

It's a POSIX.2 function.
I consider that UNIX.

> And I don't like the name "system" either. But I have no
> constructive proposals about thos either.
> 
> so-i'll-just-shut-up-now-ly y'rs, Z.
> --
just-picking-nits-ly y'rs,
Dan


From moshez at math.huji.ac.il  Sun Mar 26 08:32:33 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 08:32:33 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252202110.11001-100000@propylaea.anduin.com>
Message-ID: <Pine.GSO.4.10.10003260830110.12676-100000@sundial>

On Sat, 25 Mar 2000, Daniel Berlin wrote:

> 
> > "select", "signal" aren't UNIX specific.
> Huh?
> How not?
> Can you name a non-UNIX that is providing them?

Win32. Both of them. I've even used select there.

> and if you can, is it providing them for something other than "UNIX/POSIX
> compatibility"

I don't know what it provides them for, but I've *used* *select* on
*WinNT*. I don't see why Python should make me feel bad when I'm doing
that.

> > "getopt" is used for generic argument processing, so it isn't really UNIX
> > specific.
> 
> It's a POSIX.2 function.
> I consider that UNIX.

Well, the argument style it processes is not unheard of in other OSes, and
it's nice to have command line apps that have a common ui. That's it!
"getopt" belongs in the ui package!


--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping at lfw.org  Sun Mar 26 09:23:45 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 23:23:45 -0800 (PST)
Subject: [Python-Dev] cPickle and cStringIO
Message-ID: <Pine.LNX.4.10.10003252320490.2741-100000@skuld.lfw.org>

Are there any objections to including

    try:
        from cPickle import *
    except:
        pass

in pickle and

    try:
        from cStringIO import *
    except:
        pass

in StringIO?


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From moshez at math.huji.ac.il  Sun Mar 26 09:14:10 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 09:14:10 +0200 (IST)
Subject: [Python-Dev] cPickle and cStringIO
In-Reply-To: <Pine.LNX.4.10.10003252320490.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003260913130.12676-100000@sundial>

On Sat, 25 Mar 2000, Ka-Ping Yee wrote:

> Are there any objections to including
> 
>     try:
>         from cPickle import *
>     except:
>         pass
> 
> in pickle and
> 
>     try:
>         from cStringIO import *
>     except:
>         pass
> 
> in StringIO?

Yes, until Python types are subclassable. Currently, one can inherit from
pickle.Pickler/Unpickler and StringIO.StringIO.


--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping at lfw.org  Sun Mar 26 09:37:11 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 23:37:11 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>

Okay, here's another shot at it.  Notice a few things:

    - no text.mime package
    - encoders moved to text.encode
    - Unix stuff moved to unix package (no file.lowlevel, file.terminal)
    - aifc moved to bin.sound package
    - struct moved to bin package
    - locale moved to math package
    - linecache moved to interp package
    - data-type stuff moved to data package
    - modules in internal package moved to live with their friends

Modules that are deprecated or not really intended to be imported
are listed in parentheses (to give a better idea of the "real"
size of each package).  cStringIO and cPickle are parenthesized
in hopeful anticipation of agreement on my last message...


net
        urlparse
        urllib
        ftplib
        gopherlib
        imaplib
        poplib
        nntplib
        smtplib
        telnetlib
        httplib
        cgi
        server
                BaseHTTPServer
                CGIHTTPServer
                SimpleHTTPServer
                SocketServer
                asynchat
                asyncore
text
        re              # general-purpose parsing
        sgmllib
        htmllib
        htmlentitydefs
        xml
                whatever the xml-sig puts here
        mail
                rfc822
                mailbox
                mhlib
        encode          # i'm also ok with moving text.encode.* to text.*
                binhex
                uu
                base64
                quopri
        MimeWriter
        mimify
        mimetools
        mimetypes
        multifile
        mailcap         # special-purpose file parsing
        shlex
        ConfigParser
        netrc
        formatter
        (string, strop, pcre, reconvert, regex, regex_syntax, regsub)
bin
        gzip
        zlib
        chunk
        struct
        image
                imghdr
                colorsys        # a bit unsure, but doesn't go anywhere else
                imageop
                imgfile
                rgbimg
                yuvconvert
        sound
                aifc
                sndhdr
                toaiff
                audiodev
                sunau
                sunaudio
                wave
                audioop
                sunaudiodev
db
        anydbm
        whichdb
        bsddb
        dbm
        dbhash
        dumbdbm
        gdbm
math
        math            # library functions
        cmath
        fpectl          # type-related
        fpetest
        array
        mpz
        fpformat        # formatting
        locale
        bisect          # algorithm: also unsure, but doesn't go anywhere else
        random          # randomness
        whrandom
        crypt           # cryptography
        md5
        rotor
        sha
time
        calendar
        time
        tzparse
        sched
        timing
interp
        new
        linecache       # handling .py files
        py_compile
        code            # manipulating internal objects
        codeop
        dis
        traceback
        compileall
        keyword         # interpreter constants
        token
        symbol
        tokenize        # parsing
        parser
        bdb             # development
        pdb
        profile
        pyclbr
        tabnanny
        pstats
        rlcompleter     # this might go in "ui"...
security
        Bastion
        rexec
        ihooks
file
        dircache
        path -- a virtual module which would do a from <something>path import *
        nturl2path
        macurl2path
        filecmp
        fileinput
        StringIO
        glob
        fnmatch
        stat
        statcache
        statvfs
        tempfile
        shutil
        pipes
        popen2
        commands
        dl
        (dospath, posixpath, macpath, ntpath, cStringIO)
data
        pickle
        shelve
        xdrlib
        copy
        copy_reg
        UserDict
        UserList
        pprint
        repr
        (cPickle)
threads
        thread
        threading
        Queue
        mutex
ui
        _tkinter
        curses
        Tkinter
        cmd
        getpass
        getopt
        readline
users
        pwd
        grp
        nis
sgi
        al
        cd
        cl
        fl
        fm
        gl
        misc (what used to be sgimodule.c)
        sv
unicode
        _codecs
        codecs
        unicodedata
        unicodedatabase
unix
        errno
        resource
        signal
        posix
        posixfile
        socket
        select
        syslog
        fcntl
        termios
        pty
        tty
_locale
exceptions
sys
os
types
user
site
pure
operator


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From ping at lfw.org  Sun Mar 26 09:40:27 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 23:40:27 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>
Message-ID: <Pine.LNX.4.10.10003252337160.2741-100000@skuld.lfw.org>

Hey, while we're at it... as long as we're renaming modules,
what do you all think of getting rid of that "lib" suffix?

As in:

> net
>         urlparse
>         url
>         ftp
>         gopher
>         imap
>         pop
>         nntp
>         smtp
>         telnet
>         http
>         cgi
>         server
[...]
> text
>         re              # general-purpose parsing
>         sgml
>         html
>         htmlentitydefs
[...]


"import net.ftp" seems nicer to me than "import ftplib".

We could also just stick htmlentitydefs.entitydefs in html
and deprecate htmlentitydefs.


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From ping at lfw.org  Sun Mar 26 09:53:06 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 23:53:06 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260742070.11386-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003252300230.2741-100000@skuld.lfw.org>

On Sun, 26 Mar 2000, Moshe Zadka wrote:
> > For example, why text.binhex but text.mail.mime.base64?
> 
> Actually, I thought about this (this isn't random at all): base64 encoding
> is part of the mime standard, together with quoted-printable. Binhex
> isn't. I don't know if you find it reason enough, and it may be smarter
> just having a text.encode.{quopri,uu,base64,binhex}

I think i'd like that better, yes.

> > and it's not clear why these all belong under "parse".
> 
> These are all used for parsing data (which does not have some pre-written
> parser). I had problems with the name too...

And parsing is what the "text" package is about anyway.
I say move them up.  (See the layout in my other message.
Notice most of the regular-expression stuff is deprecated
anyway, so it's not like there are really that many.)

> > Why doesn't "socket" go under "net"?
> 
> What about UNIX domain sockets? Again, no *strong* opinion, though.

Bleck, you're right.  Well, i think we just have to pick one
or the other here, and i think most people would guess "net"
first.  (You can think of it as IPC, and file IPC-related
things under then "net" category...?)

> > Why does "terminal" belong under "file"?
> 
> Because it is (a special kind of file)

Only in Unix.  It's Unix that likes to think of all things,
including terminals, as files.

> I do like the idea of "data-types" package, but it needs to be ironed 
> out a bit.

See my other message for a possible suggested hierarchy...

> > > internal
[...]
> You mistook my motivation -- I just want unadvertised modules (AKA
> internal use modules) to live in a carefully segregate section of the
> namespace. How would this confuse people? No one imports _tkinter or pcre,
> so no one would notice the change.

I think it makes more sense to classify modules by their
topic rather than their exposure.  (For example, you wouldn't
move deprecated modules to a "deprecated" package.)

Keep in mind that (well, at least to me) the main point of
any naming hierarchy is to avoid name collisions.  "internal"
doesn't really help that purpose.  You also want to be sure
(or as sure as you can) that modules will be obvious to find
in the hierarchy.  An "internal" package creates a distinction
orthogonal to the topic-matter distinction we're using for the
rest of the packages, which *potentially* introduces the
question "well... is this module internal or not?" for every
other module.  Yes, admittedly this is only "potentially",
but i hope you see the abstract point i'm trying to make...

> > > locale
> > 
> > I think "locale" belongs under "math" with "fpformat" and
> > the others.  It's for numeric formatting.
> 
> Only? And anyway, I doubt many people will think like that.

Yeah, it is pretty much only for numeric formatting.  The
more generic locale stuff seems to be in _locale.

> > They all seem to be Unix-related.  How about putting these
> > in a "unix" or "system" package?
> 
> "select", "signal" aren't UNIX specific.

Yes, but when they're available on other systems they're an
attempt to emulate Unix or Posix functionality, aren't they?

> Well, the argument style it processes is not unheard of in other OSes, and
> it's nice to have command line apps that have a common ui. That's it!
> "getopt" belongs in the ui package!

I like ui.getopt.  It's a pretty good idea.


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From moshez at math.huji.ac.il  Sun Mar 26 10:05:49 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 10:05:49 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003261004550.14456-100000@sundial>

+1. I've had minor nits, but nothing is perfect, and this is definitely
"good enough".

Now we'll just have to wait until the BDFL says something...

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Sun Mar 26 10:06:59 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 10:06:59 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252337160.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003261006280.14456-100000@sundial>

On Sat, 25 Mar 2000, Ka-Ping Yee wrote:

> Hey, while we're at it... as long as we're renaming modules,
> what do you all think of getting rid of that "lib" suffix?

+0

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Sun Mar 26 10:19:34 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 10:19:34 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252300230.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003261017470.14456-100000@sundial>

On Sat, 25 Mar 2000, Ka-Ping Yee wrote:

> > "select", "signal" aren't UNIX specific.
> 
> Yes, but when they're available on other systems they're an
> attempt to emulate Unix or Posix functionality, aren't they?

I thinki "signal" is ANSI C, but I'm not sure.

no-other-comments-ly y'rs, Z.


From gstein at lyra.org  Sun Mar 26 13:52:53 2000
From: gstein at lyra.org (Greg Stein)
Date: Sun, 26 Mar 2000 03:52:53 -0800 (PST)
Subject: [Python-Dev] Windows and PyObject_NEW
In-Reply-To: <1258123323-10623548@hypernet.com>
Message-ID: <Pine.LNX.4.10.10003260350510.7085-100000@nebula.lyra.org>

On Sat, 25 Mar 2000, Gordon McMillan wrote:
>...
> I doubt very much that you would break anybody's code by 
> removing the Windows specific behavior.
> 
> But it seems to me that unless Python always uses the 
> default malloc, those of us who write C++ extensions will have 
> to override operator new? I'm not sure. I've used placement 
> new to allocate objects in a memory mapped file, but I've never 
> tried to muck with the global memory policy of C++ program.

Actually, the big problem arises when you have debug vs. non-debug DLLs.
malloc() uses different heaps based on the debug setting. As a result, it
is a bad idea to call malloc() from a debug DLL and free() it from a
non-debug DLL.

If the allocation pattern is fixed, then things may be okay. IF.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Sun Mar 26 14:02:40 2000
From: gstein at lyra.org (Greg Stein)
Date: Sun, 26 Mar 2000 04:02:40 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003260359070.7085-100000@nebula.lyra.org>

On Sun, 26 Mar 2000, Moshe Zadka wrote:
>...
> [ tree ]

This is a great start. I have two comments:

1) keep it *very* shallow. depth just makes it conceptually difficult.

2) you're pushing too hard. modules do not *have* to go into a package.
   there are some placements that you've made which are very
   questionable... it appears they are done for movement's sake rather
   than for being "right"

I'm off to sleep, but will look into specific comments tomorrow or so.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Sun Mar 26 14:14:32 2000
From: gstein at lyra.org (Greg Stein)
Date: Sun, 26 Mar 2000 04:14:32 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003251856.NAA09636@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003260403180.7085-100000@nebula.lyra.org>

On Sat, 25 Mar 2000, Guido van Rossum wrote:
> > I say "do it incrementally" while others say "do it all at once."
> > Personally, I don't think it is possible to do all at once. As a
> > corollary, if you can't do it all at once, but you *require* that it be
> > done all at once, then you have effectively deferred the problem. To put
> > it another way, Guido has already invented a reason to not do it: he just
> > requires that it be done all at once. Result: it won't be done.
> 
> Bullshit, Greg.  (I don't normally like to use such strong words, but
> since you're being confrontational here...)

Fair enough, and point accepted. Sorry. I will say, tho, that you've taken
this slightly out of context. The next paragraph explicitly stated that I
don't believe you had this intent. I just felt that coming up with a
complete plan before doing anything would be prone to failure. You asked
to invent a new reason :-), so I said you had one already :-)

Confrontational? Yes, guilty as charged. I was a bit frustrated.

> I'm all for doing it incrementally -- but I want the plan for how to
> do it made up front.  That doesn't require all the details to be
> worked out -- but it requires a general idea about what kind of things
> we will have in the namespace and what kinds of names they get.  An
> organizing principle, if you like.  If we were to decide later that we
> go for a Java-like deep hierarchy, the network package would have to
> be moved around again -- what a waste.

All righty. So I think there is probably a single question that I have
here:

  Moshe posted a large breakdown of how things could be packaged. He and
  Ping traded a number of comments, and more will be coming as soon as
  people wake up :-)

  However, if you are only looking for a "general idea", then should
  python-dev'ers nit pick the individual modules, or just examine the
  general breakdown and hierarchy?

thx,
-g

-- 
Greg Stein, http://www.lyra.org/


From moshez at math.huji.ac.il  Sun Mar 26 14:09:02 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 14:09:02 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003260359070.7085-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003261405460.25062-100000@sundial>

On Sun, 26 Mar 2000, Greg Stein wrote:

> This is a great start. I have two comments:
> 
> 1) keep it *very* shallow. depth just makes it conceptually difficult.

I tried, and Ping shallowed it even more. 
BTW: Anyone who cares to comment, please comment on Ping's last
suggestion. I pretty much agree with the changes he made.

> 2) you're pushing too hard. modules do not *have* to go into a package.
>    there are some placements that you've made which are very
>    questionable... it appears they are done for movement's sake rather
>    than for being "right"

Well, I'm certainly sorry I gave that impression -- the reason I wans't
"right" wasn't that, it was more my desire to be "fast" -- I wanted to
have some proposal out the door, since it is harder to argue about
something concrete. The biggest prrof of concept that we all agree is that
no one seriously took objections to anything -- there were just some minor
nits to pick.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Sun Mar 26 14:11:10 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 14:11:10 +0200 (IST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <Pine.LNX.4.10.10003260403180.7085-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003261409540.25062-100000@sundial>

On Sun, 26 Mar 2000, Greg Stein wrote:

>   Moshe posted a large breakdown of how things could be packaged. He and
>   Ping traded a number of comments, and more will be coming as soon as
>   people wake up :-)

Just a general comment -- it's so much fun to live in a different zone
then all of you guys.

just-wasting-time-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From gstein at lyra.org  Sun Mar 26 14:23:57 2000
From: gstein at lyra.org (Greg Stein)
Date: Sun, 26 Mar 2000 04:23:57 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003261405460.25062-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003260420480.7085-100000@nebula.lyra.org>

On Sun, 26 Mar 2000, Moshe Zadka wrote:
> On Sun, 26 Mar 2000, Greg Stein wrote:
>...
> > 2) you're pushing too hard. modules do not *have* to go into a package.
> >    there are some placements that you've made which are very
> >    questionable... it appears they are done for movement's sake rather
> >    than for being "right"
> 
> Well, I'm certainly sorry I gave that impression -- the reason I wans't
> "right" wasn't that, it was more my desire to be "fast" -- I wanted to
> have some proposal out the door, since it is harder to argue about
> something concrete. The biggest prrof of concept that we all agree is that
> no one seriously took objections to anything -- there were just some minor
> nits to pick.

Not something to apologize for! :-)

Well, the indicator was the line in your original post about "unhandled
modules" and the conversation between you and Ping with statements along
the lines of "wasn't sure where to put this." I say just leave it then :-)

If a module does not make *obvious* sense to be in a package, then it
should not be there. For example: locale. That is not about numbers or
about text. It has general utility. If there was an i18n package, then it
would go there. Otherwise, don't force it somewhere else. Other packages
are similar, so don't single out my comment about locale.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From DavidA at ActiveState.com  Sun Mar 26 20:09:15 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Sun, 26 Mar 2000 10:09:15 -0800
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003260420480.7085-100000@nebula.lyra.org>
Message-ID: <LMBBIEIJKMPMLBONJMFCGEIACDAA.DavidA@ActiveState.com>

> If a module does not make *obvious* sense to be in a package, then it
> should not be there. For example: locale. That is not about numbers or
> about text. It has general utility. If there was an i18n package, then it
> would go there. Otherwise, don't force it somewhere else. Other packages
> are similar, so don't single out my comment about locale.

I maintain that a general principle re: what the aim of this reorg is is
needed before the partitioning of the space can make sense.

What Moshe and Ping have is a good stab at partitioning of a subspace of the
total space of Python modules and packages, i.e., the standard library.

If we limit the aim of the reorg to cover just that subspace, then that's
fine and Ping's proposal seems grossly fine to me.

If we want to have a Perl-like packaging, then we _need_ to take into
account all known Python modules of general utility, such as the database
modules, the various GUI packages, the mx* packages, Aaron's work, PIL,
etc., etc.  Ignoring those means that the dataset used to decide the
partitioning function is highly biased.  Given the larger dataset, locale
might very well fit in a not-toplevel location.

I know that any organizational scheme is going to be optimal at best at its
inception, and that as history happens, it will become suboptimal.  However,
it's important to know what the space being partitioned is supposed to look
like.

A final comment: there's a history and science to this kind of organization,
which is part of library science.  I suspect there is quite a bit of
knowledge available as to organizing principles to do it right.  It would be
nice if someone could research it a bit and summarize the basic principles
to the rest of us.

I agree with Greg that we need high-level input from Guido on this.

--david 'academic today' ascher


From ping at lfw.org  Sun Mar 26 22:34:11 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Sun, 26 Mar 2000 12:34:11 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003260420480.7085-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.10.10003261142420.2741-100000@skuld.lfw.org>

On Sun, 26 Mar 2000, Greg Stein wrote:
> 
> If a module does not make *obvious* sense to be in a package, then it
> should not be there. For example: locale. That is not about numbers or
> about text. It has general utility. If there was an i18n package, then it
> would go there. Otherwise, don't force it somewhere else. Other packages
> are similar, so don't single out my comment about locale.

I goofed.  I apologize.  Moshe and Greg are right: locale isn't
just about numbers.  I just read the comment at the top of locale.py:

    "Support for number formatting using the current locale settings"

and didn't notice the

    from _locale import *

a couple of lines down.

"import locale; dir(locale)" didn't work for me because for some
reason there's no _locale built-in on my system (Red Hat 6.1,
python-1.5.1-10).  So i looked for 'def's and they all looked
like they had to do with numeric formatting.

My mistake.  "locale", at least, belongs at the top level.

Other candidates for top-level:

    bisect              # algorithm
    struct              # more general than "bin" or "data"
    colorsys            # not really just for image file formats
    yuvconvert          # not really just for image file formats
    rlcompleter         # not really part of the interpreter
    dl                  # not really just about files

Alternatively, we could have: ui.rlcompleter, unix.dl

(It would be nice, by the way, to replace "bisect" with
an "algorithm" module containing some nice pedagogical
implementations of things like bisect, quicksort, heapsort,
Dijkstra's algorithm etc.)

The following also could be left at the top-level, since
they seem like applications (i.e. they probably won't
get imported by code, only interactively).  No strong
opinion on this.

    bdb
    pdb
    pyclbr
    tabnanny
    profile
    pstats

Also... i was avoiding calling the "unix" package "posix"
because we already have a "posix" module.  But wait... the
proposed tree already contains "math" and "time" packages.
If there is no conflict (is there a conflict?) then the
"unix" package should probably be named "posix".


-- ?!ng

"In the sciences, we are now uniquely privileged to sit side by side
with the giants on whose shoulders we stand."
    -- Gerald Holton


From moshez at math.huji.ac.il  Mon Mar 27 07:35:23 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Mon, 27 Mar 2000 07:35:23 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003261142420.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003270728070.12902-100000@sundial>

On Sun, 26 Mar 2000, Ka-Ping Yee wrote:

> The following also could be left at the top-level, since
> they seem like applications (i.e. they probably won't
> get imported by code, only interactively).  No strong
> opinion on this.
> 
>     bdb
>     pdb
>     pyclbr
>     tabnanny
>     profile
>     pstats

Let me just state my feelings about the interpreter package: since Python
programs are probably the most suited to reasoning about Python programs 
(among other things, thanks to the strong introspection capabilities of
Python), many Python modules were written to supply a convenient interface
to that introspection. These modules are *only* needed by programs dealing
with Python programs, and hence should live in a well defined part of the
namespace. I regret calling it "interpreter" though: "Python" is a better
name (something like that java.lang package)

> Also... i was avoiding calling the "unix" package "posix"
> because we already have a "posix" module.  But wait... the
> proposed tree already contains "math" and "time" packages.

Yes. That was a hard decision I made, and I'm sort of waiting for Guido to
veto it: it would negate the easy backwards compatible path of providing
a toplevel module for each module which is moved somewhere else which does
"from import *".

> If there is no conflict (is there a conflict?) then the
> "unix" package should probably be named "posix".

I hardly agree. "dl", for example, is a common function on unices, but it
is not part of the POSIX standard. I think "posix" module should have
POSIX fucntions, and the "unix" package should deal with functinality
available on real-life unices.

standards-are-fun-aren't-they-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From pf at artcom-gmbh.de  Mon Mar 27 08:52:25 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Mon, 27 Mar 2000 08:52:25 +0200 (MEST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.GSO.4.10.10003270728070.12902-100000@sundial> from Moshe Zadka at "Mar 27, 2000  7:35:23 am"
Message-ID: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>

Hi!

Moshe Zadka wrote:
> Yes. That was a hard decision I made, and I'm sort of waiting for Guido to
> veto it: it would negate the easy backwards compatible path of providing
> a toplevel module for each module which is moved somewhere else which does
> "from import *".

If the result of this renaming initiative will be that I can't use
	import sys, os, time, re, struct, cPickle, parser
	import Tkinter; Tk=Tkinter; del Tkinter
anymore in Python 1.x and instead I have to change this into (for example):
	form posix import time
	from text import re
	from bin import struct
	from Python import parser
	from ui import Tkinter; ...
	...
I would really really *HATE* this change!

[side note:
  The 'from MODULE import ...' form is evil and I have abandoned its use
  in favor of the 'import MODULE' form in 1987 or so, as our Modula-2
  programs got bigger and bigger.  With 20+ software developers working
  on a ~1,000,000 LOC of Modula-2 software system, this decision
  proofed itself well.

  The situation with Python is comparable.  Avoiding 'from ... import'
  rewards itself later, when your software has grown bigger and when it
  comes to maintaince by people not familar with the used modules.
]

May be I didn't understand what this new subdivision of the standard
library should achieve.  

The library documentation provides a existing logical subdivision into 
chapters, which group the library into several kinds of services.  
IMO this subdivision could be discussed and possibly revised.  
But at the moment I got the impression, that it was simply ignored.  
Why?  What's so bad with it?  
Why is a subdivision on the documentation level not sufficient?  
Why should modules be moved into packages?  I don't get it.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From moshez at math.huji.ac.il  Mon Mar 27 09:09:18 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Mon, 27 Mar 2000 09:09:18 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.GSO.4.10.10003270904190.15099-100000@sundial>

On Mon, 27 Mar 2000, Peter Funk wrote:

> If the result of this renaming initiative will be that I can't use
> 	import sys, os, time, re, struct, cPickle, parser
> 	import Tkinter; Tk=Tkinter; del Tkinter
> anymore in Python 1.x and instead I have to change this into (for example):
> 	form posix import time

from time import time

> 	from text import re
> 	from bin import struct
> 	from Python import parser
> 	from ui import Tkinter; ...

Yes.

> I would really really *HATE* this change!

Well, I'm sorry to hear that -- I'm waiting for this change to happen
for a long time.

> [side note:
>   The 'from MODULE import ...' form is evil and I have abandoned its use
>   in favor of the 'import MODULE' form in 1987 or so, as our Modula-2
>   programs got bigger and bigger.  With 20+ software developers working
>   on a ~1,000,000 LOC of Modula-2 software system, this decision
>   proofed itself well.

Well, yes. Though syntactically equivalent,

from package import module

Is the recommended way to use packages, unless there is a specific need.

> May be I didn't understand what this new subdivision of the standard
> library should achieve.  

Namespace cleanup. Too many toplevel names seem evil to some of us.

> Why is a subdivision on the documentation level not sufficient?  
> Why should modules be moved into packages?  I don't get it.

To allow a greater number of modules to live without worrying about
namespace collision.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping at lfw.org  Mon Mar 27 10:08:57 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Mon, 27 Mar 2000 00:08:57 -0800 (PST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org>

Hi, Peter.

Your question as to the purpose of module reorganization is
well worth asking, and perhaps we should stand back for a
while and try to really answer it well first.

I think that my answers for your question would be:

    1. To alleviate potential namespace collision.

    2. To permit talking about packages as a unit.

I hereby solicit other reasons from the rest of the group...

Reason #1 is not a serious problem yet, but i think i've
seen a few cases where it might start to be an issue.
Reason #2 has to do with things like assigning people
responsibility for taking care of a particular package,
or making commitments about which packages will be
available with which distributions or platforms.  Hence,
for example, the idea of the "unix" package.

Neither of these reasons necessitate a deep and holy
hierarchy, so we certainly want to keep it shallow and
simple if we're going to do this at all.

> If the result of this renaming initiative will be that I can't use
> 	import sys, os, time, re, struct, cPickle, parser
> 	import Tkinter; Tk=Tkinter; del Tkinter
> anymore in Python 1.x and instead I have to change this into (for example):
> 	form posix import time
> 	from text import re
> 	from bin import struct
> 	from Python import parser
> 	from ui import Tkinter; ...

Won't

    import sys, os, time.time, text.re, bin.struct, data.pickle, python.parser

also work?  ...i hope?

> The library documentation provides a existing logical subdivision into 
> chapters, which group the library into several kinds of services.  
> IMO this subdivision could be discussed and possibly revised.  
> But at the moment I got the impression, that it was simply ignored.  
> Why?  What's so bad with it?  

I did look at the documentation for some guidance in arranging
the modules, though admittedly it didn't direct me much.


-- ?!ng

"In the sciences, we are now uniquely privileged to sit side by side
with the giants on whose shoulders we stand."
    -- Gerald Holton


From pf at artcom-gmbh.de  Mon Mar 27 10:35:50 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Mon, 27 Mar 2000 10:35:50 +0200 (MEST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org> from Ka-Ping Yee at "Mar 27, 2000  0: 8:57 am"
Message-ID: <m12ZV02-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> > 	import sys, os, time, re, struct, cPickle, parser
[...]

Ka-Ping Yee:
> Won't
> 
>     import sys, os, time.time, text.re, bin.struct, data.pickle, python.parser
> 
> also work?  ...i hope?

That is even worse.  So not only the 'import' sections, which I usually 
keep at the top of my modules, have to be changed:  This way for example
're.compile(...' has to be changed into 'text.re.compile(...' all over 
the place possibly breaking the 'Maximum Line Length' styleguide rule.

Regards, Peter


From pf at artcom-gmbh.de  Mon Mar 27 12:16:48 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Mon, 27 Mar 2000 12:16:48 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
Message-ID: <m12ZWZk-000CpwC@artcom0.artcom-gmbh.de>

String objects have grown methods since 1.5.2.  So it makes sense to
provide a class 'UserString' similar to 'UserList' and 'UserDict', so
that there is a standard base class to inherit from, if someone has the
desire to extend the string methods.  What do you think?

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From fdrake at acm.org  Mon Mar 27 17:12:55 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 27 Mar 2000 10:12:55 -0500 (EST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003261405460.25062-100000@sundial>
References: <Pine.LNX.4.10.10003260359070.7085-100000@nebula.lyra.org>
	<Pine.GSO.4.10.10003261405460.25062-100000@sundial>
Message-ID: <14559.31351.783771.472320@weyr.cnri.reston.va.us>

Moshe Zadka writes:
 > Well, I'm certainly sorry I gave that impression -- the reason I wans't
 > "right" wasn't that, it was more my desire to be "fast" -- I wanted to
 > have some proposal out the door, since it is harder to argue about
 > something concrete. The biggest prrof of concept that we all agree is that
 > no one seriously took objections to anything -- there were just some minor
 > nits to pick.

  It's *really easy* to argue about something concrete.  ;)  It's just 
harder to misunderstand the specifics of the proposal.
  It's too early to say what people think; not enough people have had
time to look at the proposals yet.
  On the other hand, I think its great -- that we have a proposal to
discuss.  I'll make my comments after I've read through the last
version posted when I have time to read these.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake at acm.org  Mon Mar 27 18:20:43 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 27 Mar 2000 11:20:43 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org>
References: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
	<Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org>
Message-ID: <14559.35419.793906.868645@weyr.cnri.reston.va.us>

Peter Funk said:
 > The library documentation provides a existing logical subdivision into 
 > chapters, which group the library into several kinds of services.  
 > IMO this subdivision could be discussed and possibly revised.  
 > But at the moment I got the impression, that it was simply ignored.  
 > Why?  What's so bad with it?  

Ka-Ping Yee writes:
 > I did look at the documentation for some guidance in arranging
 > the modules, though admittedly it didn't direct me much.

  The library reference is pretty well disorganized at this point.  I
want to improve that for the 1.6 docs.
  I received a suggestion a few months back, but haven't had a chance
to dig into it, or even respond to the email.  ;(


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From jeremy at cnri.reston.va.us  Mon Mar 27 19:14:46 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Mon, 27 Mar 2000 12:14:46 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12ZV02-000CpwC@artcom0.artcom-gmbh.de>
References: <Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org>
	<m12ZV02-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <14559.38662.835289.499610@goon.cnri.reston.va.us>

>>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:

  PF> That is even worse.  So not only the 'import' sections, which I
  PF> usually keep at the top of my modules, have to be changed: This
  PF> way for example 're.compile(...' has to be changed into
  PF> 'text.re.compile(...' all over the place possibly breaking the
  PF> 'Maximum Line Length' styleguide rule.

There is nothing wrong with changing only the import statement:
    from text import re

The only problematic use of from ... import ... is
    from text.re import *
which adds an unspecified set of names to the current namespace.

Jeremy


From moshez at math.huji.ac.il  Mon Mar 27 19:59:34 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Mon, 27 Mar 2000 19:59:34 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14559.35419.793906.868645@weyr.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003271956270.14218-100000@sundial>

Peter Funk said:
> The library documentation provides a existing logical subdivision into 
> chapters, which group the library into several kinds of services.  
> IMO this subdivision could be discussed and possibly revised.  
> But at the moment I got the impression, that it was simply ignored.  
> Why?  What's so bad with it?  

Ka-Ping Yee writes:
> I did look at the documentation for some guidance in arranging
> the modules, though admittedly it didn't direct me much.

Fred L. Drake, Jr. writes:
>   The library reference is pretty well disorganized at this point.  I
> want to improve that for the 1.6 docs.

Let me just mention where my inspirations came from: shame of shames, it
came from Perl. It's hard to use Perl's organization as is, because it
doesn't (view itself) as a general purpose langauge: so things like CGI.pm
are toplevel, and regex's are part of the syntax. However, there are a lot 
of good hints there.


--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From klm at digicool.com  Mon Mar 27 20:31:01 2000
From: klm at digicool.com (Ken Manheimer)
Date: Mon, 27 Mar 2000 13:31:01 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14559.38662.835289.499610@goon.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003271316090.1711-100000@korak.digicool.com>

On Mon, 27 Mar 2000, Jeremy Hylton wrote:

> >>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:
> 
>   PF> That is even worse.  So not only the 'import' sections, which I
>   PF> usually keep at the top of my modules, have to be changed: This
>   PF> way for example 're.compile(...' has to be changed into
>   PF> 'text.re.compile(...' all over the place possibly breaking the
>   PF> 'Maximum Line Length' styleguide rule.
> 
> There is nothing wrong with changing only the import statement:
>     from text import re
> 
> The only problematic use of from ... import ... is
>     from text.re import *
> which adds an unspecified set of names to the current namespace.

Actually, i think there's another important gotcha with from .. import
which may be contributing to peter's sense of concern, but which i don't
think needs to in this case.  I also thought we had discussed providing
transparency in general, at least of the 1.x series.  ?

The other gotcha i mean applies when the thing you're importing is a
terminal, ie a non-module.  Then, changes to the assignments of the names
in the original module aren't reflected in the names you've imported -
they're decoupled from the namespace of the original module.

When the thing you're importing is, itself, a module, the same kind of
thing *can* happen, but you're more generally concerned with tracking
revisions to the contents of those modules, which is tracked ok in the
thing you "from .. import"ed.

I thought the other problem peter was objecting to, having to change the
import sections in the first place, was going to be avoided in the 1.x
series (if we do this kind of thing) by inherently extending the import
path to include all the packages, so people need not change their code?  
Seems like most of this would be fairly transparent w.r.t. the operation
of existing applications.  Have i lost track of the discussion?

Ken
klm at digicool.com


From moshez at math.huji.ac.il  Mon Mar 27 20:55:35 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Mon, 27 Mar 2000 20:55:35 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.LNX.4.21.0003271316090.1711-100000@korak.digicool.com>
Message-ID: <Pine.GSO.4.10.10003272051001.14639-100000@sundial>

On Mon, 27 Mar 2000, Ken Manheimer wrote:

> I also thought we had discussed providing
> transparency in general, at least of the 1.x series.  ?

Yes, but it would be clearly marked as deprecated in 1.7, print out
error messages in 1.8 and won't work at all in 3000. (That's my view on
the point, but I got the feeling this is where the wind is blowing).

So the transperancy mechanism is intended only to be "something backwards
compatible"...it's not supposed to be a reason why things are ugly (I
don't think they are, though). 

BTW: the transperancy mechanism I suggested was not pushing things into
the import path, but rather having toplevel modules which "from import *"
from the modules that were moved.

E.g.,
re.py would contain

# Deprecated: don't import re, it won't work in future releases
from text.re import *

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From skip at mojam.com  Mon Mar 27 21:34:39 2000
From: skip at mojam.com (Skip Montanaro)
Date: Mon, 27 Mar 2000 13:34:39 -0600 (CST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
References: <Pine.GSO.4.10.10003270728070.12902-100000@sundial>
	<m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <14559.47055.604042.381126@beluga.mojam.com>

    Peter> The library documentation provides a existing logical subdivision
    Peter> into chapters, which group the library into several kinds of
    Peter> services.

Perhaps it makes sense to revise the library reference manual's
documentation to reflect the proposed package hierarchy once it becomes
concrete.

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From skip at mojam.com  Mon Mar 27 21:52:08 2000
From: skip at mojam.com (Skip Montanaro)
Date: Mon, 27 Mar 2000 13:52:08 -0600 (CST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252035050.2741-100000@skuld.lfw.org>
References: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>
	<Pine.LNX.4.10.10003252035050.2741-100000@skuld.lfw.org>
Message-ID: <14559.48104.34263.680278@beluga.mojam.com>

Responding to an early item in this thread and trying to adapt to later
items...

Ping wrote:

    I'm not convinced "mime" needs a separate branch here.  (This is the
    deepest part of the tree, and at three levels small alarm bells went off
    in my head.)

It's not clear that mime should be beneath text/mail.  Moshe moved it up a
level, but not the way I would have done it.  I think the mime stuff still
belongs in a separate mime package.  I wouldn't just sprinkle the modules
under text.  I see two possibilities:

    text>mime
    net>mime

I prefer net>mime, because MIME and its artifacts are used heavily in
networked applications where the content being transferred isn't text.

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From fdrake at acm.org  Mon Mar 27 22:05:32 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 27 Mar 2000 15:05:32 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14559.47055.604042.381126@beluga.mojam.com>
References: <Pine.GSO.4.10.10003270728070.12902-100000@sundial>
	<m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
	<14559.47055.604042.381126@beluga.mojam.com>
Message-ID: <14559.48908.354425.313775@weyr.cnri.reston.va.us>

Skip Montanaro writes:
 > Perhaps it makes sense to revise the library reference manual's
 > documentation to reflect the proposed package hierarchy once it becomes
 > concrete.

  I'd go for this.  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido at python.org  Mon Mar 27 22:43:06 2000
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Mar 2000 15:43:06 -0500
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
Message-ID: <200003272043.PAA18445@eric.cnri.reston.va.us>

The _tkinter.c source code is littered with #ifdefs that mostly center
around distinguishing between Tcl/Tk 8.0 and older versions.  The
two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.

Would it be reasonable to assume that everybody is using at least
Tcl/Tk version 8.0?  This would simplify the code somewhat.

Or should I ask this in a larger forum?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at acm.org  Mon Mar 27 22:59:04 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 27 Mar 2000 15:59:04 -0500 (EST)
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us>
References: <200003272043.PAA18445@eric.cnri.reston.va.us>
Message-ID: <14559.52120.633384.651377@weyr.cnri.reston.va.us>

Guido van Rossum writes:
 > The _tkinter.c source code is littered with #ifdefs that mostly center
 > around distinguishing between Tcl/Tk 8.0 and older versions.  The
 > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.
 > 
 > Would it be reasonable to assume that everybody is using at least
 > Tcl/Tk version 8.0?  This would simplify the code somewhat.

  Simplify!  It's more important that the latest versions are
supported than pre-8.0 versions.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From gstein at lyra.org  Mon Mar 27 23:31:30 2000
From: gstein at lyra.org (Greg Stein)
Date: Mon, 27 Mar 2000 13:31:30 -0800 (PST)
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: <14559.52120.633384.651377@weyr.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003271330000.17374-100000@nebula.lyra.org>

On Mon, 27 Mar 2000, Fred L. Drake, Jr. wrote:
> Guido van Rossum writes:
>  > The _tkinter.c source code is littered with #ifdefs that mostly center
>  > around distinguishing between Tcl/Tk 8.0 and older versions.  The
>  > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.
>  > 
>  > Would it be reasonable to assume that everybody is using at least
>  > Tcl/Tk version 8.0?  This would simplify the code somewhat.
> 
>   Simplify!  It's more important that the latest versions are
> supported than pre-8.0 versions.

I strongly agree.

My motto is, "if the latest Python version doesn't work for you, then
don't upgrade!"  This is also Open Source -- they can easily get the
source to the old _Tkinter if they want new Python + 7.x support.

If you ask in a larger forum, then you are certain to get somebody to say,
"yes... I need that support." Then you have yourself a quandary :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From effbot at telia.com  Mon Mar 27 23:46:50 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Mon, 27 Mar 2000 23:46:50 +0200
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
References: <200003272043.PAA18445@eric.cnri.reston.va.us>
Message-ID: <009801bf9835$f85b87e0$34aab5d4@hagrid>

Guido van Rossum wrote:
> The _tkinter.c source code is littered with #ifdefs that mostly center
> around distinguishing between Tcl/Tk 8.0 and older versions.  The
> two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.
> 
> Would it be reasonable to assume that everybody is using at least
> Tcl/Tk version 8.0?  This would simplify the code somewhat.

yes.

if people are using older versions, they can always
use the version shipped with 1.5.2.

(has anyone actually tested that one with pre-8.0
versions, btw?)

> Or should I ask this in a larger forum?

maybe.  maybe not.

</F>


From jack at oratrix.nl  Mon Mar 27 23:58:56 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 27 Mar 2000 23:58:56 +0200
Subject: [Python-Dev] 1.6 job list 
In-Reply-To: Message by Moshe Zadka <moshez@math.huji.ac.il> ,
	     Sat, 25 Mar 2000 12:16:23 +0200 (IST) , <Pine.GSO.4.10.10003251214081.3539-100000@sundial> 
Message-ID: <20000327215901.ABA08F58C1@oratrix.oratrix.nl>

Recently, Moshe Zadka <moshez at math.huji.ac.il> said:
> Here's a reason: there shouldn't be changes we'll retract later -- we
> need to come up with the (more or less) right hierarchy the first time,
> or we'll do a lot of work for nothing.

I think I disagree here (hmm, it's probably better to say that I
agree, but I agree on a tangent:-). I think we can be 100% sure that
we're wrong the first time around, and we should plan for that.

One of the reasons why were' wrong is because the world is moving
on. A module that at this point in time will reside at some level in
the hierarchy may in a few years (or shorter) be one of a large family 
and be beter off elsewhere in the hierarchy. It would be silly if it
would have to stay where it was because of backward compatability.

If we plan for being wrong we can make the mistakes less painful. I
think that a simple scheme where a module can say "I'm expecting the
Python 1.6 namespace layout" would make transition to a completely
different Python 1.7 namespace layout a lot less painful, because some 
agent could do the mapping. This can either happen at runtime (through 
a namespace, or through an import hook, or probably through other
tricks as well) or optionally by a script that would do the
translations.

Of course this doesn't mean we should go off and hack in a couple of
namespaces (hence my "agreeing on a tangent"), but it does mean that I
think Gregs idea of not wanting to change everything at once has
merit.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From pf at artcom-gmbh.de  Tue Mar 28 00:11:39 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Tue, 28 Mar 2000 00:11:39 +0200 (MEST)
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 27, 2000  3:43: 6 pm"
Message-ID: <m12ZhjX-000CpzC@artcom0.artcom-gmbh.de>

Guido van Rossum:
> Or should I ask this in a larger forum?

Don't ask.  Simply tell the people on comp.lang.python that support
for the ancient Tcl/Tk versions < 8.0 will be dropped in Python 1.6.
Period. ;-)

Regards, Peter


From guido at python.org  Tue Mar 28 00:17:33 2000
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Mar 2000 17:17:33 -0500
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: Your message of "Tue, 28 Mar 2000 00:11:39 +0200."
             <m12ZhjX-000CpzC@artcom0.artcom-gmbh.de> 
References: <m12ZhjX-000CpzC@artcom0.artcom-gmbh.de> 
Message-ID: <200003272217.RAA28910@eric.cnri.reston.va.us>

> Don't ask.  Simply tell the people on comp.lang.python that support
> for the ancient Tcl/Tk versions < 8.0 will be dropped in Python 1.6.
> Period. ;-)

OK, I'm convinced.  We will pre-8.0 support.  Could someone submit a
set of patches?  It would make sense to call #error if a pre-8.0
version is detected at compile-time!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond at skippinet.com.au  Tue Mar 28 01:02:21 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue, 28 Mar 2000 09:02:21 +1000
Subject: [Python-Dev] Windows and PyObject_NEW
In-Reply-To: <200003251459.PAA09181@python.inrialpes.fr>
Message-ID: <ECEPKNMJLHAPFFJHDOJBOEEHCHAA.mhammond@skippinet.com.au>

Sorry for the delay, but Gordon's reply was accurate so should have kept you
going ;-)

> I've been reading Jeffrey Richter's "Advanced Windows" last night in order
> to try understanding better why PyObject_NEW is implemented
> differently for
> Windows.

So that is where the heaps discussion came from :-)  The problem is simply
"too many heaps are available".

> Again, I feel uncomfortable with this, especially now, when
> I'm dealing with the memory aspect of Python's object
> constructors/desctrs.

It is this exact reason it was added in the first place.

I believe this code predates the "_d" convention on Windows.  AFAIK, this
could could be removed today and everything should work (but see below why
it probably wont)

MSVC allows you to choose from a number of CRT versions.  Only in one of
these versions is the CRTL completely shared between the .EXE and all the
various .DLLs in the application.

What was happening is that this macro ended up causing the "malloc" for a
new object to occur in Python15.dll, but the Python type system meant that
tp_dealloc() (to cleanup the object) was called in the DLL implementing the
new type.  Unless Python15.dll and our extension DLL shared the same CRTL
(and hence the same malloc heap, fileno table etc) things would die.  The
DLL version of "free()" would complain, as it had never seen the pointer
before.  This change meant the malloc() and the free() were both implemented
in the same DLL/EXE

This was particularly true with Debug builds.  MSVC's debug CRTL
implementations have some very nice debugging features (guard-blocks, block
validity checks with debugger breapoints when things go wrong, leak
tracking, etc).  However, this means they use yet another heap.  Mixing
debug builds with release builds in Python is a recipe for disaster.

Theoretically, the problem has largely gone away now that a) we have
seperate "_d" versions and b) the "official" postition is to use the same
CRTL as Python15.dll.  However, is it still a minor FAQ on comp.lang.python
why PyRun_ExecFile (or whatever) fails with mysterious errors - the reason
is exactly the same - they are using a different CRTL, so the CRTL can't map
the file pointers correctly, and we get unexplained IO errors.  But now that
this macro hides the malloc problem, there may be plenty of "home grown"
extensions out there that do use a different CRTL and dont see any
problems - mainly cos they arent throwing file handles around!

Finally getting to the point of all this:

We now also have the PyMem_* functions.  This problem also doesnt exist if
extension modules use these functions instead of malloc()/free().  We only
ask them to change the PyObject allocations and deallocations, not the rest
of their code, so it is no real burden.  IMO, we should adopt these
functions for most internal object allocations and the extension
samples/docs.

Also, we should consider adding relevant PyFile_fopen(), PyFile_fclose()
type functions, that simply are a thin layer over the fopen/fclose
functions.  If extensions writers used these instead of fopen/fclose we
would gain a few fairly intangible things - lose the minor FAQ, platforms
that dont have fopen at all (eg, CE) would love you, etc.

Mark.


From mhammond at skippinet.com.au  Tue Mar 28 03:04:11 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue, 28 Mar 2000 11:04:11 +1000
Subject: [Python-Dev] Windows and PyObject_NEW
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBOEEHCHAA.mhammond@skippinet.com.au>
Message-ID: <ECEPKNMJLHAPFFJHDOJBOEEJCHAA.mhammond@skippinet.com.au>

[I wrote]

> Also, we should consider adding relevant PyFile_fopen(), PyFile_fclose()

Maybe I had something like PyFile_FromString in mind!!

That-damn-time-machine-again-ly,

Mark.


From moshez at math.huji.ac.il  Tue Mar 28 07:36:59 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 28 Mar 2000 07:36:59 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <14559.48104.34263.680278@beluga.mojam.com>
Message-ID: <Pine.GSO.4.10.10003280734001.19279-100000@sundial>

On Mon, 27 Mar 2000, Skip Montanaro wrote:

> Responding to an early item in this thread and trying to adapt to later
> items...
> 
> Ping wrote:
> 
>     I'm not convinced "mime" needs a separate branch here.  (This is the
>     deepest part of the tree, and at three levels small alarm bells went off
>     in my head.)
> 
> It's not clear that mime should be beneath text/mail.  Moshe moved it up a
> level,

Actually, Ping moved it up a level. I only decided to agree with him
retroactively...

> I think the mime stuff still
> belongs in a separate mime package.  I wouldn't just sprinkle the modules
> under text.  I see two possibilities:
> 
>     text>mime
>     net>mime
> 
> I prefer net>mime,

I don't. MIME is not a "wire protocol" like all the other things in net --
it's used inside another wire protocol, like RFC822 or HTTP. If at all,
I'd go for having a 
net/
	mail/
		mime/
Package, but Ping would yell at me again for nesting 3 levels. 
I could live with text/mime, because the mime format basically *is* text.


--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Tue Mar 28 07:47:13 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 28 Mar 2000 07:47:13 +0200 (IST)
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003280745210.19279-100000@sundial>

On Mon, 27 Mar 2000, Guido van Rossum wrote:

> The _tkinter.c source code is littered with #ifdefs that mostly center
> around distinguishing between Tcl/Tk 8.0 and older versions.  The
> two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.
> 
> Would it be reasonable to assume that everybody is using at least
> Tcl/Tk version 8.0?  This would simplify the code somewhat.

I want to ask a different question: when is Python going to officially
support Tcl/Tk v8.2/8.3? I'd really like for this to happen, as I hate
having several libraries of Tcl/Tk on my machine.

(I assume you know the joke about Jews always answering a question 
with a question <wink>)
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From jack at oratrix.nl  Tue Mar 28 10:55:56 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Tue, 28 Mar 2000 10:55:56 +0200
Subject: [Python-Dev] Great Renaming - Straw Man 0.2 
In-Reply-To: Message by Ka-Ping Yee <ping@lfw.org> ,
	     Sat, 25 Mar 2000 23:37:11 -0800 (PST) , <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org> 
Message-ID: <20000328085556.CFEAC370CF2@snelboot.oratrix.nl>

> Okay, here's another shot at it.  Notice a few things:
> ...
> bin
>	  ...
>         image
		  ...
>         sound
>		  ...

These I don't like, I think image and sound should be either at toplevel, or 
otherwise in a separate package (mm?). I know images and sounds are 
customarily stored in binary files, but so are databases and other things.

Hmm, the bin group in general seems to be a bit of a catch-all. gzip, zlib and 
chunk definitely belong together, but struct is a wholly different beast.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From jack at oratrix.nl  Tue Mar 28 11:01:51 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Tue, 28 Mar 2000 11:01:51 +0200
Subject: [Python-Dev] module reorg (was: 1.6 job list) 
In-Reply-To: Message by Moshe Zadka <moshez@math.huji.ac.il> ,
	     Sat, 25 Mar 2000 20:30:26 +0200 (IST) , <Pine.GSO.4.10.10003252028290.7664-100000@sundial> 
Message-ID: <20000328090151.86B59370CF2@snelboot.oratrix.nl>

> On Sat, 25 Mar 2000, David Ascher wrote:
> 
> > This made me think of one issue which is worth considering -- is there a
> > mechanism for third-party packages to hook into the standard naming
> > hierarchy?  It'd be weird not to have the oracle and sybase modules within
> > the db toplevel package, for example.
> 
> My position is that any 3rd party module decides for itself where it wants
> to live -- once we formalized the framework. Consider PyGTK/PyGnome,
> PyQT/PyKDE -- they should live in the UI package too...

For separate modules, yes. For packages this is different. As a point in case 
think of MacPython: it could stuff all mac-specific packages under the 
toplevel "mac", but it would probably be nicer if it could extend the existing 
namespace. It is a bit silly if mac users have to do "from mac.text.encoding 
import macbinary" but "from text.encoding import binhex", just because BinHex 
support happens to live in the core (purely for historical reasons).

But maybe this holds only for the platform distributions, then it shouldn't be 
as much of a problem as there aren't that many.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From moshez at math.huji.ac.il  Tue Mar 28 11:24:14 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 28 Mar 2000 11:24:14 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2 
In-Reply-To: <20000328085556.CFEAC370CF2@snelboot.oratrix.nl>
Message-ID: <Pine.GSO.4.10.10003281121380.23735-100000@sundial>

On Tue, 28 Mar 2000, Jack Jansen wrote:

> These I don't like, I think image and sound should be either at toplevel, or 
> otherwise in a separate package (mm?). I know images and sounds are 
> customarily stored in binary files, but so are databases and other things.

Hmmm...I think of "bin" as "interface to binary files". Agreed that I
don't have a good reason for seperating gdbm from zlib.

> Hmm, the bin group in general seems to be a bit of a catch-all. gzip, zlib and 
> chunk definitely belong together, but struct is a wholly different beast.

I think Ping and I decided to move struct to toplevel.
Ping, would you like to take your last proposal and fold into it the
consensual changes,, or should I?
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From effbot at telia.com  Tue Mar 28 11:44:14 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Tue, 28 Mar 2000 11:44:14 +0200
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
Message-ID: <02c101bf989a$2ee35860$34aab5d4@hagrid>

Guido van Rossum <guido at python.org> wrote:
> Similar to append(), I'd like to close this gap, and I've made the
> necessary changes.  This will probably break lots of code.
> 
> Similar to append(), I'd like people to fix their code rather than
> whine -- two-arg connect() has never been documented, although it's
> found in much code (even the socket module test code :-( ).
> 
> Similar to append(), I may revert the change if it is shown to cause
> too much pain during beta testing...

proposal: if anyone changes the API for a fundamental module, and
fails to update the standard library, the change is automatically "minus
one'd" for each major module that no longer works :-)

(in this case, that would be -5 or so...)

</F>


From effbot at telia.com  Tue Mar 28 11:55:19 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Tue, 28 Mar 2000 11:55:19 +0200
Subject: [Python-Dev] Great Renaming?  What is the goal?
References: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <02c901bf989b$be203d80$34aab5d4@hagrid>

Peter Funk wrote:
> Why should modules be moved into packages?  I don't get it.

fwiw, neither do I...

I'm not so sure that Python really needs a simple reorganization
of the existing set of standard library modules.  just moving the
modules around won't solve the real problems with the 1.5.2 std
library...

> IMO this subdivision could be discussed and possibly revised.  

here's one proposal:
http://www.pythonware.com/people/fredrik/librarybook-contents.htm

</F>


From gstein at lyra.org  Tue Mar 28 12:09:44 2000
From: gstein at lyra.org (Greg Stein)
Date: Tue, 28 Mar 2000 02:09:44 -0800 (PST)
Subject: [Python-Dev] 3rd parties in the hierarchy (was: module reorg)
In-Reply-To: <20000328090151.86B59370CF2@snelboot.oratrix.nl>
Message-ID: <Pine.LNX.4.10.10003280207350.17374-100000@nebula.lyra.org>

On Tue, 28 Mar 2000, Jack Jansen wrote:
> > On Sat, 25 Mar 2000, David Ascher wrote:
> > > This made me think of one issue which is worth considering -- is there a
> > > mechanism for third-party packages to hook into the standard naming
> > > hierarchy?  It'd be weird not to have the oracle and sybase modules within
> > > the db toplevel package, for example.
> > 
> > My position is that any 3rd party module decides for itself where it wants
> > to live -- once we formalized the framework. Consider PyGTK/PyGnome,
> > PyQT/PyKDE -- they should live in the UI package too...
> 
> For separate modules, yes. For packages this is different. As a point in case 
> think of MacPython: it could stuff all mac-specific packages under the 
> toplevel "mac", but it would probably be nicer if it could extend the existing 
> namespace. It is a bit silly if mac users have to do "from mac.text.encoding 
> import macbinary" but "from text.encoding import binhex", just because BinHex 
> support happens to live in the core (purely for historical reasons).
> 
> But maybe this holds only for the platform distributions, then it shouldn't be 
> as much of a problem as there aren't that many.

Assuming that you use an archive like those found in my "small" distro or
Gordon's distro, then this is no problem. The archive simply recognizes
and maps "text.encoding.macbinary" to its own module.

Another way to say it: stop thinking in terms of the filesystem as the
sole mechanism for determining placement in the package hierarchy.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From guido at python.org  Tue Mar 28 15:38:12 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 08:38:12 -0500
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: Your message of "Tue, 28 Mar 2000 07:47:13 +0200."
             <Pine.GSO.4.10.10003280745210.19279-100000@sundial> 
References: <Pine.GSO.4.10.10003280745210.19279-100000@sundial> 
Message-ID: <200003281338.IAA29532@eric.cnri.reston.va.us>

> I want to ask a different question: when is Python going to officially
> support Tcl/Tk v8.2/8.3? I'd really like for this to happen, as I hate
> having several libraries of Tcl/Tk on my machine.

This is already in the CVS tree, except for the Windows installer.
Python 1.6 will not install a separate complete Tcl installation;
instead, it will install the needed Tcl/Tk files (Tcl/Tk 8.3 or newer)
in the Python tree, so it won't affect existing Tcl/Tk installations.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Tue Mar 28 15:57:02 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 08:57:02 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: Your message of "Tue, 28 Mar 2000 11:44:14 +0200."
             <02c101bf989a$2ee35860$34aab5d4@hagrid> 
References: <200003242103.QAA03288@eric.cnri.reston.va.us>  
            <02c101bf989a$2ee35860$34aab5d4@hagrid> 
Message-ID: <200003281357.IAA29621@eric.cnri.reston.va.us>

> proposal: if anyone changes the API for a fundamental module, and
> fails to update the standard library, the change is automatically "minus
> one'd" for each major module that no longer works :-)
> 
> (in this case, that would be -5 or so...)

Oops.  Sigh.  While we're pretending that this change goes in, could
you point me to those five modules?  Also, we need to add test cases
to the standard test suite that would have found these!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward at cnri.reston.va.us  Tue Mar 28 17:04:47 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Tue, 28 Mar 2000 10:04:47 -0500
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>; from ping@lfw.org on Sat, Mar 25, 2000 at 11:37:11PM -0800
References: <Pine.GSO.4.10.10003260129180.9956-100000@sundial> <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>
Message-ID: <20000328100446.A2586@cnri.reston.va.us>

On 25 March 2000, Ka-Ping Yee said:
> Okay, here's another shot at it.  Notice a few things:

Damn, I started writing a response to Moshe's original proposal -- and
*then* saw this massive thread.  Oh well.  Turns out I still have a few
useful things to say:

First, any organization scheme for the standard library (or anything
else, for that matter) should have a few simple guidelines.  Here are
two:

  * "deep hierarchies considered harmful": ie. avoid sub-packages if at
    all possible

  * "everything should have a purpose": every top-level package should
    be describable with a single, clear sentence of plain language.
    Eg.:
       net - Internet protocols, data formats, and client/server infrastructure
       unix - Unix-specific system calls, protocols, and conventions

And two somewhat open issues:

  * "as long as we're renaming...": maybe this would be a good time to
    standardize naming conventions, eg. "cgi" -> "cgilib" *or*
    "{http,ftp,url,...}lib" -> "{http,ftp,url}...", "MimeWriter" ->
    "mimewriter", etc.

  * "shared namespaces vs system namespaces": the Perl model of "nothing
    belongs to The System; anyone can add a module in Text:: or Net:: or
    whatever" works there because Perl doesn't have __init__ files or
    anything to distinguish module namespaces; they just are.  Python's
    import mechanism would have to change to support this, and the fact
    that __init__ files may contain arbitrary code makes this feel
    like a very tricky change to make.

Now specific comments...

> net
>         urlparse
>         urllib
>         ftplib
>         gopherlib
>         imaplib
>         poplib
>         nntplib
>         smtplib
>         telnetlib
>         httplib
>         cgi

Rename?  Either cgi -> cgilib or foolib -> foo?

>         server
>                 BaseHTTPServer
>                 CGIHTTPServer
>                 SimpleHTTPServer
>                 SocketServer
>                 asynchat
>                 asyncore

This is one good place for a sub-package.  It's a also a good place to
rename: the convention for Python module names seems to be
all-lowercase; and "Server" is redundant when you're in the net.server
package.  How about:

    net.server.base_http
    net.server.cgi_http
    net.server.simple_http
    net.server.socket

Underscores negotiable.  They don't seem to be popular in module names,
although sometimes they would be real life-savers.

> text

I think "text" should mean "plain old unstructured, un-marked-up ASCII
text", where "unstructured, un-marked-up" really means "not structured
or marked up in a well-known standard way".

Or maybe not.  I'm just trying to come up with an excuse for moving xml
to top-level, which I think is where it belongs.  Maybe the excuse
should just be, "XML is really important and visible, and anyways Paul
Prescod will raise a stink if it isn't put at top-level in Python
package-space".

>         re              # general-purpose parsing
 
Top-level: this is a fundamental module that should be treated on a par
with 'string'.  (Well, except for building RE methods into
strings... hmmMMmm...maybe... [no, I'm kidding!])

>         sgmllib
>         htmllib
>         htmlentitydefs

Not sure what to do about these.  Someone referred somewhere to a "web"
top-level package, which seems to have disappeared.  If it reappars, it
would be a good place for the HTML modules (not to mention a big chunk
of "net") -- this would mainly be for "important and visible" (ie. PR)
reasons, rather than sound technical reasons.

>         xml
>                 whatever the xml-sig puts here

Should be top-level.

>         mail
>                 rfc822
>                 mailbox
>                 mhlib

"mail" should either be top-level or under "net".  (Yes, I *know* it's
not a wire-level protocol: that's what net.smtplib is for.  But last
time I checked, email is pretty useless without a network.  And
vice-versa.)

Or maybe these all belong in a top-level "data" package: I'm starting to
warm to that.

> bin
>         gzip
>         zlib
>         chunk
>         struct
>         image
>                 imghdr
>                 colorsys        # a bit unsure, but doesn't go anywhere else
>                 imageop
>                 imgfile
>                 rgbimg
>                 yuvconvert
>         sound
>                 aifc
>                 sndhdr
>                 toaiff
>                 audiodev
>                 sunau
>                 sunaudio
>                 wave
>                 audioop
>                 sunaudiodev

I agree with Jack: image and sound (audio?) should be top-level.  I
don't think I like the idea of an intervening "mm" or "multimedia" or
"media" or what-have-you package, though.

The other stuff in "bin" is kind of a grab-bag: "chunk" and "struct"
might belong in the mythical "data" package.

> db
>         anydbm
>         whichdb
>         bsddb
>         dbm
>         dbhash
>         dumbdbm
>         gdbm

Yup.

> math
>         math            # library functions
>         cmath
>         fpectl          # type-related
>         fpetest
>         array
>         mpz
>         fpformat        # formatting
>         locale
>         bisect          # algorithm: also unsure, but doesn't go anywhere else
>         random          # randomness
>         whrandom
>         crypt           # cryptography
>         md5
>         rotor
>         sha

Hmmm.  "locale" has already been dealt with; obviously it should be
top-evel.  I think "array" should be top-level or under the mythical
"data".

Six crypto-related modules seems like enough to justify a top-level
"crypt" package, though.

> time
>         calendar
>         time
>         tzparse
>         sched
>         timing

Yup.

> interp
>         new
>         linecache       # handling .py files
[...]
>         tabnanny
>         pstats
>         rlcompleter     # this might go in "ui"...

I like "python" for this one.  (But I'm not sure if tabnanny and
rlcompleter belong there.)

> security
>         Bastion
>         rexec
>         ihooks

What does ihooks have to do with security?

> file
>         dircache
>         path -- a virtual module which would do a from <something>path import *
>         nturl2path
>         macurl2path
>         filecmp
>         fileinput
>         StringIO

Lowercase for consistency?

>         glob
>         fnmatch
>         stat
>         statcache
>         statvfs
>         tempfile
>         shutil
>         pipes
>         popen2
>         commands
>         dl

No problem until these last two -- 'commands' is a Unix-specific thing
that has very little to do with the filesystem per se, and 'dl' is (as I
understand it) deep ju-ju with sharp edges that should probably be
hidden away in the 'python' ('sys'?) package.

Oh yeah, "dl" should be elsewhere -- "python" maybe?  Top-level?
Perhaps we need a "deepmagic" package for "dl" and "new"?  ;-)

> data
>         pickle
>         shelve
>         xdrlib
>         copy
>         copy_reg
>         UserDict
>         UserList
>         pprint
>         repr
>         (cPickle)

Oh hey, it's *not* a mythical package!  Guess I didn't read far enough
ahead.  I like it, but would add more stuff to it (obviously): 'struct',
'chunk', 'array' for starters.

Should cPickle be renamed to fastpickle?

> threads
>         thread
>         threading
>         Queue

Lowercase?

> ui
>         _tkinter
>         curses
>         Tkinter
>         cmd
>         getpass
>         getopt
>         readline

> users
>         pwd
>         grp
>         nis

These belong in "unix".  Possibly "nis" belongs in "net" -- do any
non-Unix OSes use NIS?

> sgi
>         al
>         cd
>         cl
>         fl
>         fm
>         gl
>         misc (what used to be sgimodule.c)
>         sv

Should this be "sgi" or "irix"?  Ditto for "sun" vs "solaris" if there
are a significant number of Sun/Solaris modules.  Note that the
respective trademark holders might get very antsy about who gets to put
names in those namespaces -- that's exactly what happened with Sun,
Solaris 8, and Perl.  I believe the compromise they arrived at was that
the "Solaris::" namespace remains open, but Sun gets the "Sun::"
namespace.

There should probably be a win32 package, for core registry access stuff
if nothing else.  There might someday be a "linux" package; it's highly
unlikely there would be a "pc" or "alpha" package though.  All of those
argue over "irix" and "solaris" instead of "sgi" and "sun".

        Greg


From gvwilson at nevex.com  Tue Mar 28 17:45:10 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Tue, 28 Mar 2000 10:45:10 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.GSO.4.10.10003251036170.3539-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003281038400.29911-100000@akbar.nevex.com>

> > Greg Wilson
> > If None becomes a keyword, I would like to ask whether it could be
> > used to signal that a method is a class method, as opposed to an
> > instance method:

> I'd like to know what you mean by "class" method. (I do know C++ and
> Java, so I have some idea...). Specifically, my question is: how does
> a class method access class variables? They can't be totally
> unqualified (because that's very unpythonic). If they are qualified by
> the class's name, I see it as a very mild improvement on the current
> situation. You could suggest, for example, to qualify class variables
> by "class" (so you'd do things like:
>
> 	class.x = 1
>
> ), but I'm not sure I like it. On the whole, I think it is a much
> bigger issue on how be denote class methods.

I don't like overloading the word 'class' this way, as it makes it
difficult to distinguish a parent's 'foo' member and a child's 'foo'
member:

class Parent:
    foo = 3
    ...other stuff...

class Child(Parent):
    foo = 9
    def test():
        print class.foo   # obviously 9, but how to get 3?

I think that using the class's name instead of 'self' will be easy to
explain, will look like it belongs in the language, will be unlikely to
lead to errors, and will handle multiple inheritance with ease:

class Child(Parent):
    foo = 9
    def test():
        print Child.foo   # 9
        print Parent.foo  # 3

> Also, one slight problem with your method of denoting class methods:
> currently, it is possible to add instance method at run time to a
> class by something like
> 
> class C:
> 	pass
> 
> def foo(self):
> 	pass
> 
> C.foo = foo
> 
> In your suggestion, how do you view the possiblity of adding class
> methods to a class? (Note that "foo", above, is also perfectly usable
> as a plain function).

Hm, I hadn't thought of this... :-(

> > I'd also like to ask (separately) that assignment to None be defined as a
> > no-op, so that programmers can write:
> > 
> >     year, month, None, None, None, None, weekday, None, None = gmtime(time())
> > 
> > instead of having to create throw-away variables to fill in slots in
> > tuples that they don't care about.
> 
> Currently, I use "_" for that purpose, after I heard the idea from
> Fredrik Lundh.

I do the same thing when I need to; I just thought that making assignment
to "None" special would formalize this in a readable way.


From jeremy at cnri.reston.va.us  Tue Mar 28 19:31:48 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Tue, 28 Mar 2000 12:31:48 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.LNX.4.21.0003271316090.1711-100000@korak.digicool.com>
References: <14559.38662.835289.499610@goon.cnri.reston.va.us>
	<Pine.LNX.4.21.0003271316090.1711-100000@korak.digicool.com>
Message-ID: <14560.60548.74378.613188@goon.cnri.reston.va.us>

>>>>> "KLM" == Ken Manheimer <klm at digicool.com> writes:

  >> The only problematic use of from ... import ... is 
  >>     from text.re import * 
  >> which adds an unspecified set of names to the current
  >> namespace.

  KLM> The other gotcha i mean applies when the thing you're importing
  KLM> is a terminal, ie a non-module.  Then, changes to the
  KLM> assignments of the names in the original module aren't
  KLM> reflected in the names you've imported - they're decoupled from
  KLM> the namespace of the original module.

This isn't an import issue.  Some people simply don't understand
that assignment (and import as form of assignment) is name binding.
Import binds an imported object to a name in the current namespace.
It does not affect bindings in other namespaces, nor should it.

  KLM> I thought the other problem peter was objecting to, having to
  KLM> change the import sections in the first place, was going to be
  KLM> avoided in the 1.x series (if we do this kind of thing) by
  KLM> inherently extending the import path to include all the
  KLM> packages, so people need not change their code?  Seems like
  KLM> most of this would be fairly transparent w.r.t. the operation
  KLM> of existing applications.

I'm not sure if there is consensus on backwards compatibility.  I'm
not in favor of creating a huge sys.path that includes every package's
contents.  It would be a big performance hit.

Jeremy


From moshez at math.huji.ac.il  Tue Mar 28 19:36:47 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 28 Mar 2000 19:36:47 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <20000328100446.A2586@cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003281914251.14542-100000@sundial>

On Tue, 28 Mar 2000, Greg Ward wrote:

>   * "deep hierarchies considered harmful": ie. avoid sub-packages if at
>     all possible
> 
>   * "everything should have a purpose": every top-level package should
>     be describable with a single, clear sentence of plain language.

Good guidelines, but they aren't enough. And anyway, rules were meant to
be broken <0.9 wink>

>   * "as long as we're renaming...": maybe this would be a good time to
>     standardize naming conventions, eg. "cgi" -> "cgilib" *or*
>     "{http,ftp,url,...}lib" -> "{http,ftp,url}...", "MimeWriter" ->
>     "mimewriter", etc.

+1

>   * "shared namespaces vs system namespaces": the Perl model of "nothing
>     belongs to The System; anyone can add a module in Text:: or Net:: or
>     whatever" works there because Perl doesn't have __init__ files or
>     anything to distinguish module namespaces; they just are.  Python's
>     import mechanism would have to change to support this, and the fact
>     that __init__ files may contain arbitrary code makes this feel
>     like a very tricky change to make.

Indeed. But I still feel that "few things should belong to the system"
is quite a useful rule...
(That's what I referred to when I said Perl's module system is more suited
to CPAN (now there's a surprise))

> Rename?  Either cgi -> cgilib or foolib -> foo?

Yes. But I wanted the first proposal to be just about placing stuff,
because that airs out more disagreements.

> This is one good place for a sub-package.  It's a also a good place to
> rename: the convention for Python module names seems to be
> all-lowercase; and "Server" is redundant when you're in the net.server
> package.  How about:
> 
>     net.server.base_http
>     net.server.cgi_http
>     net.server.simple_http
>     net.server.socket

Hmmmmm......+0

> Underscores negotiable.  They don't seem to be popular in module names,
> although sometimes they would be real life-savers.

Personally, I prefer underscores to CamelCase.

> Or maybe not.  I'm just trying to come up with an excuse for moving xml
> to top-level, which I think is where it belongs.  Maybe the excuse
> should just be, "XML is really important and visible, and anyways Paul
> Prescod will raise a stink if it isn't put at top-level in Python
> package-space".

I still think "xml" should be a brother to "html" and "sgml".
Current political trans not withstanding.

> Not sure what to do about these.  Someone referred somewhere to a "web"
> top-level package, which seems to have disappeared.  If it reappars, it
> would be a good place for the HTML modules (not to mention a big chunk
> of "net") -- this would mainly be for "important and visible" (ie. PR)
> reasons, rather than sound technical reasons.

I think the "web" package should be reinstated. But you won't like it:
I'd put xml in web.

> "mail" should either be top-level or under "net".  (Yes, I *know* it's
> not a wire-level protocol: that's what net.smtplib is for.  But last
> time I checked, email is pretty useless without a network.  And
> vice-versa.)

Ummmm.....I'd disagree, but I lack the strength and the moral conviction.
Put it under net and we'll call it a deal <wink>

> Or maybe these all belong in a top-level "data" package: I'm starting to
> warm to that.

Ummmm...I don't like the "data" package personally. It seems to disobey
your second guideline.

> I agree with Jack: image and sound (audio?) should be top-level.  I
> don't think I like the idea of an intervening "mm" or "multimedia" or
> "media" or what-have-you package, though.

Definitely multimedia. Okay, I'm bought.

> Six crypto-related modules seems like enough to justify a top-level
> "crypt" package, though.

It seemed obvious to me that "crypt" should be under "math". But maybe
that's just the mathematician in me speaking.

> I like "python" for this one.  (But I'm not sure if tabnanny and
> rlcompleter belong there.)

I agree, and I'm not sure about rlcompleter, but am sure about tabnanny.

> What does ihooks have to do with security?

Well, it was more or less written to support rexec. A weak argument,
admittedly

> No problem until these last two -- 'commands' is a Unix-specific thing
> that has very little to do with the filesystem per se

Hmmmmm...it is on the same level with popen. Why not move popen too?

>, and 'dl' is (as I
> understand it) deep ju-ju with sharp edges that should probably be
> hidden away 

Ummmmmm.....not in the "python" package: it doesn't have anything to
do with the interpreter.

> Should this be "sgi" or "irix"?  Ditto for "sun" vs "solaris" if there
> are a significant number of Sun/Solaris modules.  Note that the
> respective trademark holders might get very antsy about who gets to put
> names in those namespaces -- that's exactly what happened with Sun,
> Solaris 8, and Perl.  I believe the compromise they arrived at was that
> the "Solaris::" namespace remains open, but Sun gets the "Sun::"
> namespace.

Ummmmm.....I don't see how they have any legal standing. I for one refuse
to care about what Sun Microsystem thinks about names for Python packages.

> There should probably be a win32 package, for core registry access stuff
> if nothing else.

And for all the other extensions in win32all
Yep! 
(Just goes to show what happens when you decide to package based on a UNIX
system)

> All of those
> argue over "irix" and "solaris" instead of "sgi" and "sun".

Fine with me -- just wanted to move them out of my face <wink>
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From andy at reportlab.com  Tue Mar 28 20:13:02 2000
From: andy at reportlab.com (Andy Robinson)
Date: Tue, 28 Mar 2000 18:13:02 GMT
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <20000327170031.693531CDF6@dinsdale.python.org>
References: <20000327170031.693531CDF6@dinsdale.python.org>
Message-ID: <38e0f4cf.24247656@post.demon.co.uk>

On Mon, 27 Mar 2000 12:00:31 -0500 (EST), Peter Funk wrote:

> Do we need a UserString class?

This will probably be useful on top of the i18n stuff in due course,
so I'd like it.

Something Mike Da Silva and I have discussed a lot is implementing a
higher-level 'typed string' library on top of the Unicode stuff.  
A 'typed string' is like a string, but knows what encoding it is in -
possibly Unicode, possibly a native encoding and embodies some basic
type safety and convenience notions, like not being able to add a
Shift-JIS and an EUC string together.  Iteration would always be per
character, not per byte; and a certain amount of magic would say that
if the string was (say) Japanese, it would acquire a few extra methods
for doing some Japan-specific things like expanding half-width
katakana.

Of course, we can do this anyway, but I think defining the API clearly
in UserString is a great idea.

- Andy Robinson


From guido at python.org  Tue Mar 28 21:22:43 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 14:22:43 -0500
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: Your message of "Tue, 28 Mar 2000 18:13:02 GMT."
             <38e0f4cf.24247656@post.demon.co.uk> 
References: <20000327170031.693531CDF6@dinsdale.python.org>  
            <38e0f4cf.24247656@post.demon.co.uk> 
Message-ID: <200003281922.OAA03113@eric.cnri.reston.va.us>

> > Do we need a UserString class?
> 
> This will probably be useful on top of the i18n stuff in due course,
> so I'd like it.
> 
> Something Mike Da Silva and I have discussed a lot is implementing a
> higher-level 'typed string' library on top of the Unicode stuff.  
> A 'typed string' is like a string, but knows what encoding it is in -
> possibly Unicode, possibly a native encoding and embodies some basic
> type safety and convenience notions, like not being able to add a
> Shift-JIS and an EUC string together.  Iteration would always be per
> character, not per byte; and a certain amount of magic would say that
> if the string was (say) Japanese, it would acquire a few extra methods
> for doing some Japan-specific things like expanding half-width
> katakana.
> 
> Of course, we can do this anyway, but I think defining the API clearly
> in UserString is a great idea.

Agreed.  Please somebody send a patch!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Tue Mar 28 21:25:39 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 14:25:39 -0500
Subject: [Python-Dev] First alpha release of Python 1.6
Message-ID: <200003281925.OAA03287@eric.cnri.reston.va.us>

I'm hoping to release a first, rough alpha of Python 1.6 by April 1st
(no joke!).

Not everything needs to be finished by then, but I hope to have the
current versions of distutil, expat, and sre in there.

Anything else that needs to go into 1.6 and isn't ready yet?  (Small
stuff doesn't matter, everything currently in the patches queue can
probably go in if it isn't rejected by then.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From DavidA at ActiveState.com  Tue Mar 28 21:40:24 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Tue, 28 Mar 2000 11:40:24 -0800
Subject: [Python-Dev] First alpha release of Python 1.6
In-Reply-To: <200003281925.OAA03287@eric.cnri.reston.va.us>
Message-ID: <NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com>

> Anything else that needs to go into 1.6 and isn't ready yet? 

No one seems to have found time to figure out the mmap module support.

--david


From guido at python.org  Tue Mar 28 21:33:29 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 14:33:29 -0500
Subject: [Python-Dev] First alpha release of Python 1.6
In-Reply-To: Your message of "Tue, 28 Mar 2000 11:40:24 PST."
             <NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com> 
References: <NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com> 
Message-ID: <200003281933.OAA04896@eric.cnri.reston.va.us>

> > Anything else that needs to go into 1.6 and isn't ready yet? 
> 
> No one seems to have found time to figure out the mmap module support.

I wasn't even aware that that was a priority.  If someone submits it,
it will go in -- alpha 1 is not a total feature freeze, just a
"testing the waters".

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tismer at tismer.com  Tue Mar 28 21:49:17 2000
From: tismer at tismer.com (Christian Tismer)
Date: Tue, 28 Mar 2000 21:49:17 +0200
Subject: [Python-Dev] First alpha release of Python 1.6
References: <200003281925.OAA03287@eric.cnri.reston.va.us>
Message-ID: <38E10CBD.C6B71D50@tismer.com>


Guido van Rossum wrote:
...
> Anything else that needs to go into 1.6 and isn't ready yet?

Stackless Python of course, but it *is* ready yet.

Just kidding. I will provide a compressed unicode database
in a few days. That will be a non-Python-specific module,
and (Marc or I) will provide a Python specific wrapper.
This will probably not get ready until April 1.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From akuchlin at mems-exchange.org  Tue Mar 28 21:51:29 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Tue, 28 Mar 2000 14:51:29 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com>
References: <200003281925.OAA03287@eric.cnri.reston.va.us>
	<NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com>
Message-ID: <14561.3393.761177.776684@amarok.cnri.reston.va.us>

David Ascher writes:
>> Anything else that needs to go into 1.6 and isn't ready yet? 
>No one seems to have found time to figure out the mmap module support.

The issue there is cross-platform compatibility; the Windows and Unix
versions take completely different constructor arguments, so how
should we paper over the differences?

Unix arguments: (file descriptor, size, flags, protection)
Win32 arguments:(filename, tagname, size)

We could just say, "OK, the args are completely different between
Win32 and Unix, despite it being the same function name".  Maybe
that's best, because there seems no way to reconcile those two
different sets of arguments.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
I'm here for the FBI, not the _Weekly World News_.
  -- Scully in X-FILES #1


From DavidA at ActiveState.com  Tue Mar 28 22:06:09 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Tue, 28 Mar 2000 12:06:09 -0800
Subject: [Python-Dev] mmapfile module
In-Reply-To: <14561.3393.761177.776684@amarok.cnri.reston.va.us>
Message-ID: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>

> The issue there is cross-platform compatibility; the Windows and Unix
> versions take completely different constructor arguments, so how
> should we paper over the differences?
>
> Unix arguments: (file descriptor, size, flags, protection)
> Win32 arguments:(filename, tagname, size)
>
> We could just say, "OK, the args are completely different between
> Win32 and Unix, despite it being the same function name".  Maybe
> that's best, because there seems no way to reconcile those two
> different sets of arguments.

I guess my approach would be to provide two platform-specific modules, and
to figure out a high-level Python module which could provide a reasonable
platform-independent interface on top of it.  One problem with that approach
is that I think that there is also great value in having a portable mmap
interface in the C layer, where i see lots of possible uses in extension
modules (much like the threads API).

--david


From guido at python.org  Tue Mar 28 22:00:57 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 15:00:57 -0500
Subject: [Python-Dev] mmapfile module
In-Reply-To: Your message of "Tue, 28 Mar 2000 12:06:09 PST."
             <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> 
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> 
Message-ID: <200003282000.PAA11988@eric.cnri.reston.va.us>

> > The issue there is cross-platform compatibility; the Windows and Unix
> > versions take completely different constructor arguments, so how
> > should we paper over the differences?
> >
> > Unix arguments: (file descriptor, size, flags, protection)
> > Win32 arguments:(filename, tagname, size)
> >
> > We could just say, "OK, the args are completely different between
> > Win32 and Unix, despite it being the same function name".  Maybe
> > that's best, because there seems no way to reconcile those two
> > different sets of arguments.
> 
> I guess my approach would be to provide two platform-specific modules, and
> to figure out a high-level Python module which could provide a reasonable
> platform-independent interface on top of it.  One problem with that approach
> is that I think that there is also great value in having a portable mmap
> interface in the C layer, where i see lots of possible uses in extension
> modules (much like the threads API).

I don't know enough about this, but it seems that there might be two
steps: *creating* a mmap object is necessarily platform-specific; but
*using* a mmap object could be platform-neutral.

What is the API for mmap objects?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From klm at digicool.com  Tue Mar 28 22:07:25 2000
From: klm at digicool.com (Ken Manheimer)
Date: Tue, 28 Mar 2000 15:07:25 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14560.60548.74378.613188@goon.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003281504430.10812-100000@korak.digicool.com>

On Tue, 28 Mar 2000, Jeremy Hylton wrote:

> >>>>> "KLM" == Ken Manheimer <klm at digicool.com> writes:
> 
>   >> The only problematic use of from ... import ... is 
>   >>     from text.re import * 
>   >> which adds an unspecified set of names to the current
>   >> namespace.
> 
>   KLM> The other gotcha i mean applies when the thing you're importing
>   KLM> is a terminal, ie a non-module.  Then, changes to the
>   KLM> assignments of the names in the original module aren't
>   KLM> reflected in the names you've imported - they're decoupled from
>   KLM> the namespace of the original module.
> 
> This isn't an import issue.  Some people simply don't understand
> that assignment (and import as form of assignment) is name binding.
> Import binds an imported object to a name in the current namespace.
> It does not affect bindings in other namespaces, nor should it.

I know that - i was addressing the asserted evilness of

from ... import ...

and how it applied - and didn't - w.r.t. packages.

>   KLM> I thought the other problem peter was objecting to, having to
>   KLM> change the import sections in the first place, was going to be
>   KLM> avoided in the 1.x series (if we do this kind of thing) by
>   KLM> inherently extending the import path to include all the
>   KLM> packages, so people need not change their code?  Seems like
>   KLM> most of this would be fairly transparent w.r.t. the operation
>   KLM> of existing applications.
> 
> I'm not sure if there is consensus on backwards compatibility.  I'm
> not in favor of creating a huge sys.path that includes every package's
> contents.  It would be a big performance hit.

Yes, someone reminded me that the other (better, i think) option is stub
modules in the current places that do the "from ... import *" for the
right values of "...".  py3k finishes the migration by eliminating the
stubs.

Ken
klm at digicool.com


From gward at cnri.reston.va.us  Tue Mar 28 22:29:55 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Tue, 28 Mar 2000 15:29:55 -0500
Subject: [Python-Dev] First alpha release of Python 1.6
In-Reply-To: <200003281925.OAA03287@eric.cnri.reston.va.us>; from guido@python.org on Tue, Mar 28, 2000 at 02:25:39PM -0500
References: <200003281925.OAA03287@eric.cnri.reston.va.us>
Message-ID: <20000328152955.A3136@cnri.reston.va.us>

On 28 March 2000, Guido van Rossum said:
> I'm hoping to release a first, rough alpha of Python 1.6 by April 1st
> (no joke!).
> 
> Not everything needs to be finished by then, but I hope to have the
> current versions of distutil, expat, and sre in there.

We just need to do a bit of CVS trickery to put Distutils under the
Python tree.  I'd *like* for Distutils to have its own CVS existence at
least until 1.6 is released, but it's not essential.

Two of the big Distutils to-do items that I enumerated at IPC8 have been
knocked off: the "dist" command has been completely redone (and renamed
"sdist", for "source distribution"), as has the "install" command.

The really major to-do items left for Distutils are:

  * implement the "bdist" command with enough marbles to generate RPMs
    and some sort of Windows installer (Wise?); Solaris packages,
    Debian packages, and something for the Mac would be nice too.

  * documentation (started, but only just)

And there are some almost-as-important items:

  * Mac OS support; this has been started, at least for the
    unfashionable and clunky sounding MPW compiler; CodeWarrior
    support (via AppleEvents, I think) would be nice

  * test suite -- at least the fundamental Distutils marbles should get
    a good exercise; it would also be nice to put together a bunch
    of toy module distributions and make sure that "build" and "install"
    on them do the right things... all automatically, of course!

  * reduce number of tracebacks: right now, certain errors in the setup
    script or on the command line can result in a traceback, when
    they should just result in SystemExit with "error in setup script:
    ..." or "error on command line: ..."

  * fold in Finn Bock's JPython compat. patch

  * fold in Michael Muller's "pkginfo" patch

  * finish and fold in my Python 1.5.1 compat. patch (only necessary
    as long as Distutils has a life of its own, outside Python)

Well, I'd better get cracking ... Guido, we can do the CVS thing any
time; I guess I'll mosey on downstairs.

        Greg
-- 
Greg Ward - software developer                    gward at cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From effbot at telia.com  Tue Mar 28 21:46:17 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Tue, 28 Mar 2000 21:46:17 +0200
Subject: [Python-Dev] mmapfile module
References: <200003281925.OAA03287@eric.cnri.reston.va.us><NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com> <14561.3393.761177.776684@amarok.cnri.reston.va.us>
Message-ID: <003501bf98ee$50097a20$34aab5d4@hagrid>

Andrew M. Kuchling wrote:
> The issue there is cross-platform compatibility; the Windows and Unix
> versions take completely different constructor arguments, so how
> should we paper over the differences?
> 
> Unix arguments: (file descriptor, size, flags, protection)
> Win32 arguments:(filename, tagname, size)
> 
> We could just say, "OK, the args are completely different between
> Win32 and Unix, despite it being the same function name".  Maybe
> that's best, because there seems no way to reconcile those two
> different sets of arguments.

I don't get this.  Why expose low-level implementation details
to the user (flags, protection, tagname)?

(And how come the Windows implementation doesn't support
read-only vs. read/write flags?)

Unless the current implementation uses something radically
different from mmap/MapViewOfFile, wouldn't an interface like:

    (filename, mode="rb", size=entire file, offset=0)

be sufficient?  (where mode can be "wb" or "wb+" or "rb+",
optionally without the "b")

</F>


From donb at init.com  Tue Mar 28 22:46:06 2000
From: donb at init.com (Donald Beaudry)
Date: Tue, 28 Mar 2000 15:46:06 -0500
Subject: [Python-Dev] None as a keyword / class methods 
References: <Pine.LNX.4.10.10003281038400.29911-100000@akbar.nevex.com>
Message-ID: <200003282046.PAA18822@zippy.init.com>

...sorry to jump in on the middle of this one, but.

A while back I put a lot of thought into how to support class methods
and class attributes.  I feel that I solved the problem in a fairly
complete way though the solution does have some warts.  Here's an
example:

>>> class foo(base):
...     value = 10 # this is an instance attribute called 'value'
...                # as usual, it is shared between all instances
...                # until explicitly set on a particular instance
... 
...     def set_value(self, x):
...         print "instance method"
...         self.value = x
... 
...     #
...     # here come the weird part
...     #
...     class __class__:
...         value = 5  # this is a class attribute called value
... 
...         def set_value(cl, x):
...             print "class method"
...             cl.value = x
... 
...         def set_instance_default_value(cl, x):
...             cl._.value = x
...
>>> f = foo()
>>> f.value
10
>>> foo.value = 20
>>> f.value
10
>>> f.__class__.value
20
>>> foo._.value
10
>>> foo._.value = 1
>>> f.value
1
>>> foo.set_value(100)
class method
>>> foo.value
100
>>> f.value
1
>>> f.set_value(40)
instance method
>>> f.value
40
>>> foo._.value
1
>>> ff=foo()
>>> foo.set_instance_default_value(15)
>>> ff.value
15
>>> foo._.set_value(ff, 5)
instance method
>>> ff.value
5
>>>


Is anyone still with me?

The crux of the problem is that in the current python class/instance
implementation, classes dont have attributes of their own.  All of
those things that look like class attributes are really there as
defaults for the instances.  To support true class attributes a new
name space must be invented.  Since I wanted class objects to look
like any other object, I chose to move the "instance defaults" name
space under the underscore attribute.  This allows the class's
unqualified namespace to refer to its own attributes.  Clear as mud,
right?

In case you are wondering, yes, the code above is a working example.
I released it a while back as the 'objectmodule' and just updated it
to work with Python-1.5.2.  The update has yet to be released.

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb at init.com                                      Lexington, MA 02421
                      ...Will hack for sushi...


From akuchlin at mems-exchange.org  Tue Mar 28 22:50:18 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Tue, 28 Mar 2000 15:50:18 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <003501bf98ee$50097a20$34aab5d4@hagrid>
References: <200003281925.OAA03287@eric.cnri.reston.va.us>
	<NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com>
	<14561.3393.761177.776684@amarok.cnri.reston.va.us>
	<003501bf98ee$50097a20$34aab5d4@hagrid>
Message-ID: <14561.6922.415063.279939@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>(And how come the Windows implementation doesn't support
>read-only vs. read/write flags?)

Good point; that should be fixed.

>    (filename, mode="rb", size=entire file, offset=0)
>be sufficient?  (where mode can be "wb" or "wb+" or "rb+",
>optionally without the "b")

Hmm... maybe we can dispose of the PROT_* argument that way on Unix.
But how would you specify MAP_SHARED vs. MAP_PRIVATE, or
MAP_ANONYMOUS?  (MAP_FIXED seems useless to a Python programmer.)
Another character in the mode argument, or a flags argument?

Worse, as you pointed out in the same thread, MAP_ANONYMOUS on OSF/1
doesn't want to take a file descriptor at all.

Also, the tag name on Windows seems important, from Gordon McMillan's
explanation of it:
http://www.python.org/pipermail/python-dev/1999-November/002808.html

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
You mustn't kill me. You don't love me. You d-don't even know me.
  -- The Furies kill Abel, in SANDMAN #66: "The Kindly Ones:10"


From guido at python.org  Tue Mar 28 23:02:04 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 16:02:04 -0500
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: Your message of "Tue, 28 Mar 2000 15:46:06 EST."
             <200003282046.PAA18822@zippy.init.com> 
References: <Pine.LNX.4.10.10003281038400.29911-100000@akbar.nevex.com>  
            <200003282046.PAA18822@zippy.init.com> 
Message-ID: <200003282102.QAA13041@eric.cnri.reston.va.us>

> A while back I put a lot of thought into how to support class methods
> and class attributes.  I feel that I solved the problem in a fairly
> complete way though the solution does have some warts.  Here's an
> example:
[...]
> Is anyone still with me?
> 
> The crux of the problem is that in the current python class/instance
> implementation, classes dont have attributes of their own.  All of
> those things that look like class attributes are really there as
> defaults for the instances.  To support true class attributes a new
> name space must be invented.  Since I wanted class objects to look
> like any other object, I chose to move the "instance defaults" name
> space under the underscore attribute.  This allows the class's
> unqualified namespace to refer to its own attributes.  Clear as mud,
> right?
> 
> In case you are wondering, yes, the code above is a working example.
> I released it a while back as the 'objectmodule' and just updated it
> to work with Python-1.5.2.  The update has yet to be released.

This looks like it would break a lot of code.  How do you refer to a
superclass method?  It seems that ClassName.methodName would refer to
the class method, not to the unbound instance method.  Also, moving
the default instance attributes to a different namespace seems to be a
semantic change that could change lots of things.

I am still in favor of saying "Python has no class methods -- use
module-global functions for that".  Between the module, the class and
the instance, there are enough namespaces -- we don't need another
one.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pf at artcom-gmbh.de  Tue Mar 28 23:01:29 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Tue, 28 Mar 2000 23:01:29 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <200003281922.OAA03113@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000  2:22:43 pm"
Message-ID: <m12a37B-000CpwC@artcom0.artcom-gmbh.de>

I wrote:
> > > Do we need a UserString class?
> > 
Andy Robinson:
> > This will probably be useful on top of the i18n stuff in due course,
> > so I'd like it.
> > 
> > Something Mike Da Silva and I have discussed a lot is implementing a
> > higher-level 'typed string' library on top of the Unicode stuff.  
> > A 'typed string' is like a string, but knows what encoding it is in -
> > possibly Unicode, possibly a native encoding and embodies some basic
> > type safety and convenience notions, like not being able to add a
> > Shift-JIS and an EUC string together.  Iteration would always be per
> > character, not per byte; and a certain amount of magic would say that
> > if the string was (say) Japanese, it would acquire a few extra methods
> > for doing some Japan-specific things like expanding half-width
> > katakana.
> > 
> > Of course, we can do this anyway, but I think defining the API clearly
> > in UserString is a great idea.
> 
Guido van Rossum:
> Agreed.  Please somebody send a patch!

I feel unable to do, what Andy proposed.  What I had in mind was a
simple wrapper class around the builtin string type similar to 
UserDict and UserList which can be used to derive other classes from.

I use UserList and UserDict quite often and find them very useful.
They are simple and powerful and easy to extend.

May be the things Andy Robinson proposed above belong into a sub class
which inherits from a simple UserString class?  Do we need
an additional UserUnicode class for unicode string objects?

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From guido at python.org  Tue Mar 28 23:56:49 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 16:56:49 -0500
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: Your message of "Tue, 28 Mar 2000 23:01:29 +0200."
             <m12a37B-000CpwC@artcom0.artcom-gmbh.de> 
References: <m12a37B-000CpwC@artcom0.artcom-gmbh.de> 
Message-ID: <200003282156.QAA13361@eric.cnri.reston.va.us>

[Peter Funk]
> > > > Do we need a UserString class?
> > > 
> Andy Robinson:
> > > This will probably be useful on top of the i18n stuff in due course,
> > > so I'd like it.
> > > 
> > > Something Mike Da Silva and I have discussed a lot is implementing a
> > > higher-level 'typed string' library on top of the Unicode stuff.  
> > > A 'typed string' is like a string, but knows what encoding it is in -
> > > possibly Unicode, possibly a native encoding and embodies some basic
> > > type safety and convenience notions, like not being able to add a
> > > Shift-JIS and an EUC string together.  Iteration would always be per
> > > character, not per byte; and a certain amount of magic would say that
> > > if the string was (say) Japanese, it would acquire a few extra methods
> > > for doing some Japan-specific things like expanding half-width
> > > katakana.
> > > 
> > > Of course, we can do this anyway, but I think defining the API clearly
> > > in UserString is a great idea.
> > 
> Guido van Rossum:
> > Agreed.  Please somebody send a patch!

[PF]
> I feel unable to do, what Andy proposed.  What I had in mind was a
> simple wrapper class around the builtin string type similar to 
> UserDict and UserList which can be used to derive other classes from.

Yes.  I think Andy wanted his class to be a subclass of UserString.

> I use UserList and UserDict quite often and find them very useful.
> They are simple and powerful and easy to extend.

Agreed.

> May be the things Andy Robinson proposed above belong into a sub class
> which inherits from a simple UserString class?  Do we need
> an additional UserUnicode class for unicode string objects?

It would be great if there was a single UserString class which would
work with either Unicode or 8-bit strings.  I think that shouldn't be
too hard, since it's just a wrapper.

So why don't you give the UserString.py a try and leave Andy's wish alone?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pf at artcom-gmbh.de  Tue Mar 28 23:47:59 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Tue, 28 Mar 2000 23:47:59 +0200 (MEST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <02c901bf989b$be203d80$34aab5d4@hagrid> from Fredrik Lundh at "Mar 28, 2000 11:55:19 am"
Message-ID: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> Peter Funk wrote:
> > Why should modules be moved into packages?  I don't get it.
> 
Fredrik Lundh:
> fwiw, neither do I...

Pheeewww... And I thought I'am the only one! ;-)

> I'm not so sure that Python really needs a simple reorganization
> of the existing set of standard library modules.  just moving the
> modules around won't solve the real problems with the 1.5.2 std
> library...

Right.  I propose to leave the namespace flat.

I like to argue with Brad J. Cox ---the author of the book "Object
Oriented Programming - An Evolutionary Approach" Addison Wesley,
1987--- who proposes the idea of what he calls a "Software-IC":
He looks closely to design process of electronic engineers which 
ussually deal with large data books with prefabricated components.  
There are often hundreds of them in such a databook and most of
them have terse and not very mnemonic names.
But the engineers using them all day *know* after a short while that a 
7400 chip is a TTL-chip containing 4 NAND gates.  

Nearly the same holds true for software engineers using Software-IC
like 're' or 'struct' as their daily building blocks.

A software engineer who is already familar with his/her building
blocks has absolutely no advantage from a deeply nested namespace.

Now for something completely different:  
Fredrik Lundh about the library documentation:
> here's one proposal:
> http://www.pythonware.com/people/fredrik/librarybook-contents.htm

Whether 'md5', 'getpass' and 'traceback' fit into a category 
'Commonly Used Modules' is ....ummmm.... at least a bit questionable.

But we should really focus the discussion on the structure of the 
documentation.  Since many standard library modules belong into
several logical catagories at once, a true tree structured organization
is simply not sufficient to describe everything.  So it is important
to set up pointers between related functionality.  For example 
'string.replace' is somewhat related to 're.sub' or 'getpass' is
related to 'crypt', however 'crypt' is related to 'md5' and so on.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From pf at artcom-gmbh.de  Wed Mar 29 00:13:02 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 00:13:02 +0200 (MEST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules _tkinter.c,1.91,1.92
In-Reply-To: <200003282007.PAA12045@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000  3: 7: 9 pm"
Message-ID: <m12a4EQ-000CpzC@artcom0.artcom-gmbh.de>

Hi!

Guido van Rossum:
> Modified Files:
> 	_tkinter.c 
[...]

> *** 491,501 ****
>   
>   	v->interp = Tcl_CreateInterp();
> - 
> - #if TKMAJORMINOR == 8001
> - 	TclpInitLibraryPath(baseName);
> - #endif /* TKMAJORMINOR */
>   
> ! #if defined(macintosh) && TKMAJORMINOR >= 8000
> ! 	/* This seems to be needed since Tk 8.0 */
>   	ClearMenuBar();
>   	TkMacInitMenus(v->interp);
> --- 475,481 ----
>   
>   	v->interp = Tcl_CreateInterp();
>   
> ! #if defined(macintosh)
> ! 	/* This seems to be needed */
>   	ClearMenuBar();
>   	TkMacInitMenus(v->interp);
> ***************

Are you sure that the call to 'TclpInitLibraryPath(baseName);' 
is not required in Tcl/Tk 8.1, 8.2, 8.3 ?  
I would propose the following:

+#if TKMAJORMINOR >= 8001
+ TclpInitLibraryPath(baseName);
+# endif /* TKMAJORMINOR */

Here I quote from the Tcl8.3 source distribution:
/*
 *---------------------------------------------------------------------------
 *
 * TclpInitLibraryPath --
 *
 *      Initialize the library path at startup.  We have a minor
 *      metacircular problem that we don't know the encoding of the
 *      operating system but we may need to talk to operating system
 *      to find the library directories so that we know how to talk to
 *      the operating system.
 *
 *      We do not know the encoding of the operating system.
 *      We do know that the encoding is some multibyte encoding.
 *      In that multibyte encoding, the characters 0..127 are equivalent
 *          to ascii.
 *
 *      So although we don't know the encoding, it's safe:
 *          to look for the last slash character in a path in the encoding.
 *          to append an ascii string to a path.
 *          to pass those strings back to the operating system.
 *
 *      But any strings that we remembered before we knew the encoding of
 *      the operating system must be translated to UTF-8 once we know the
 *      encoding so that the rest of Tcl can use those strings.
 *
 *      This call sets the library path to strings in the unknown native
 *      encoding.  TclpSetInitialEncodings() will translate the library
 *      path from the native encoding to UTF-8 as soon as it determines
 *      what the native encoding actually is.
 *
 *      Called at process initialization time.
 *
 * Results:
 *      None.
 */

Sorry, but I don't know enough about this in connection with the 
unicode patches and if we should pay attention to this.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From akuchlin at mems-exchange.org  Wed Mar 29 00:21:07 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Tue, 28 Mar 2000 17:21:07 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de>
References: <02c901bf989b$be203d80$34aab5d4@hagrid>
	<m12a3qB-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <14561.12371.857178.550236@amarok.cnri.reston.va.us>

Peter Funk quoted:
>Fredrik Lundh:
>> I'm not so sure that Python really needs a simple reorganization
>> of the existing set of standard library modules.  just moving the
>> modules around won't solve the real problems with the 1.5.2 std
>> library...
>Right.  I propose to leave the namespace flat.

I third that comment.  Arguments against reorganizing for 1.6:

  1) I doubt that we have time to do a good job of it for 1.6.  
  (1.7, maybe.)

  2) Right now there's no way for third-party extensions to add
  themselves to a package in the standard library.  Once Python finds
  foo/__init__.py, it won't look for site-packages/foo/__init__.py, so
  if you grab, say, "crypto" as a package name in the standard library,
  it's forever lost to third-party extensions.

  3) Rearranging the modules is a good chance to break backward
  compatibility in other ways.  If you want to rewrite, say, httplib
  in a non-compatible way to support HTTP/1.1, then the move from
  httplib.py to net.http.py is a great chance to do that, and leave
  httplib.py as-is for old programs.  If you just copy httplib.py,
  rewriting net.http.py is now harder, since you have to either 
  maintain compatibility or break things *again* in the next version
  of Python.

  4) We wanted to get 1.6 out fairly quickly, and therefore limited 
  the number of features that would get in.  (Vide the "Python 1.6
  timing" thread last ... November, was it?)  Packagizing is feature
  creep that'll slow things down
     
Maybe we should start a separate list to discuss a package hierarchy
for 1.7.  But for 1.6, forget it.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Posting "Please send e-mail, since I don't read this group": Poster is
rendered illiterate by a simple trepanation.
  -- Kibo, in the Happynet Manifesto


From guido at python.org  Wed Mar 29 00:24:46 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 17:24:46 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules _tkinter.c,1.91,1.92
In-Reply-To: Your message of "Wed, 29 Mar 2000 00:13:02 +0200."
             <m12a4EQ-000CpzC@artcom0.artcom-gmbh.de> 
References: <m12a4EQ-000CpzC@artcom0.artcom-gmbh.de> 
Message-ID: <200003282224.RAA13573@eric.cnri.reston.va.us>

> Are you sure that the call to 'TclpInitLibraryPath(baseName);' 
> is not required in Tcl/Tk 8.1, 8.2, 8.3 ?  
> I would propose the following:
> 
> +#if TKMAJORMINOR >= 8001
> + TclpInitLibraryPath(baseName);
> +# endif /* TKMAJORMINOR */

It is an internal routine which shouldn't be called at all by the
user.  I believe it is called internally at the right time.  Note that
we now call Tcl_FindExecutable(), which *is* intended to be called by
the user (and exists in all 8.x versions) -- maybe this causes
TclpInitLibraryPath() to be called.

I tested it on Solaris, with Tcl/Tk versions 8.0.4, 8.1.1, 8.2.3 and
8.3.0, and it doesn't seem to make any difference, as long as that
version of Tcl/Tk has actually been installed.  (When it's not
installed, TclpInitLibraryPath() doesn't help either.)

I still have to check this on Windows -- maybe it'll have to go back in.

[...]
> Sorry, but I don't know enough about this in connection with the 
> unicode patches and if we should pay attention to this.

It seems to be allright...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Wed Mar 29 00:25:27 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 17:25:27 -0500
Subject: [Python-Dev] Great Renaming? What is the goal?
In-Reply-To: Your message of "Tue, 28 Mar 2000 17:21:07 EST."
             <14561.12371.857178.550236@amarok.cnri.reston.va.us> 
References: <02c901bf989b$be203d80$34aab5d4@hagrid> <m12a3qB-000CpwC@artcom0.artcom-gmbh.de>  
            <14561.12371.857178.550236@amarok.cnri.reston.va.us> 
Message-ID: <200003282225.RAA13586@eric.cnri.reston.va.us>

> Maybe we should start a separate list to discuss a package hierarchy
> for 1.7.  But for 1.6, forget it.

Yes!  Please!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From donb at init.com  Wed Mar 29 00:56:03 2000
From: donb at init.com (Donald Beaudry)
Date: Tue, 28 Mar 2000 17:56:03 -0500
Subject: [Python-Dev] None as a keyword / class methods 
References: <Pine.LNX.4.10.10003281038400.29911-100000@akbar.nevex.com> <200003282046.PAA18822@zippy.init.com> <200003282102.QAA13041@eric.cnri.reston.va.us>
Message-ID: <200003282256.RAA21080@zippy.init.com>

Guido van Rossum <guido at python.org> wrote,
> This looks like it would break a lot of code.

Only if it were to replace the current implementation.  Perhaps I
inadvertly made that suggestion.  It was not my intention.  Another
way to look at my post is to say that it was intended to point out why
we cant have class methods in the current implementation... it's a
name space issue.

> How do you refer to a superclass method?  It seems that
> ClassName.methodName would refer to the class method, not to the
> unbound instance method.

Right.  To get at the unbound instance methods you must go through the
'unbound accessor' which is accessed via the underscore.

If you wanted to chain to a superclass method it would look like this:

    class child(parent):
        def do_it(self, x):
            z = parent._.do_it(self, x)
            return z

> Also, moving the default instance attributes to a different
> namespace seems to be a semantic change that could change lots of
> things.

I agree... and that's why I wouldnt suggest doing it to the current
class/instance implementation.  However, for those who insist on
having class attributes and methods I think it would be cool to settle
on a standard "syntax".

> I am still in favor of saying "Python has no class methods -- use
> module-global functions for that".

Or use a class/instance implementation provided via an extension
module rather than the built-in one.  The class named 'base' shown in
my example is a class designed for that purpose.

> Between the module, the class and the instance, there are enough
> namespaces -- we don't need another one.

The topic comes up often enough to make me think some might disagree.

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb at init.com                                      Lexington, MA 02421
                  ...So much code, so little time...


From moshez at math.huji.ac.il  Wed Mar 29 01:24:29 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 01:24:29 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14561.12371.857178.550236@amarok.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003290119110.18366-100000@sundial>

On Tue, 28 Mar 2000, Andrew M. Kuchling wrote:

> Peter Funk quoted:
> >Fredrik Lundh:
> >> I'm not so sure that Python really needs a simple reorganization
> >> of the existing set of standard library modules.  just moving the
> >> modules around won't solve the real problems with the 1.5.2 std
> >> library...
> >Right.  I propose to leave the namespace flat.
> 
> I third that comment.  Arguments against reorganizing for 1.6:

Let me just note that my original great renaming proposal was titled
"1.7". I'm certain I don't want it to affect the 1.6 release -- my god,
it's almost alpha time and we don't even know how to reorganize.
Strictly 1.7.

>   4) We wanted to get 1.6 out fairly quickly, and therefore limited 
>   the number of features that would get in.  (Vide the "Python 1.6
>   timing" thread last ... November, was it?)  Packagizing is feature
>   creep that'll slow things down

Oh yes. I'm waiting for that 1.6....I wouldn't want to stall it for the
world.

But this is a good chance as any to discuss reasons, before strategies.
Here's why I believe we should re-organize Python modules:
 -- modules fall quite naturally into subpackages. Reducing the number
    of toplevel modules will lessen the clutter
 -- it would be easier to synchronize documentation and code (think
    "automatically generated documentation")
 -- it would enable us to move toward a CPAN-like module repository,
    together with the dist-sig efforts.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From gmcm at hypernet.com  Wed Mar 29 01:44:27 2000
From: gmcm at hypernet.com (Gordon McMillan)
Date: Tue, 28 Mar 2000 18:44:27 -0500
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14561.12371.857178.550236@amarok.cnri.reston.va.us>
References: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <1257835425-27941123@hypernet.com>

Andrew M. Kuchling wrote:
[snip]
>   2) Right now there's no way for third-party extensions to add
>   themselves to a package in the standard library.  Once Python finds
>   foo/__init__.py, it won't look for site-packages/foo/__init__.py, so
>   if you grab, say, "crypto" as a package name in the standard library,
>   it's forever lost to third-party extensions.

That way lies madness. While I'm happy to carp at Java for 
requiring "com", "net" or whatever as a top level name, their 
intent is correct: the names grabbed by the Python standard 
packages belong to no one but the Python standard 
packages. If you *don't* do that, upgrades are an absolute 
nightmare. 

Marc-Andre grabbed "mx". If (as I rather suspect <wink>) he 
wants to remake the entire standard lib in his image, he's 
welcome to - *under* mx.

What would happen if he (and everyone else) installed 
themselves *into* my core packages, then I decided I didn't 
want his stuff? More than likely I'd have to scrub the damn 
installation and start all over again.

- Gordon


From DavidA at ActiveState.com  Wed Mar 29 02:01:57 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Tue, 28 Mar 2000 16:01:57 -0800
Subject: [Python-Dev] yeah! for Jeremy and Greg
Message-ID: <NDBBJPNCJLKKIOBLDOMJGEDNCDAA.DavidA@ActiveState.com>

I'm thrilled to see the extended call syntax patches go in!  One less wart
in the language!

Jeremy ZitBlaster Hylton and Greg Noxzema Ewing!

--david


From pf at artcom-gmbh.de  Wed Mar 29 01:53:50 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 01:53:50 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <200003282156.QAA13361@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000  4:56:49 pm"
Message-ID: <m12a5ny-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> [Peter Funk]
> > > > > Do we need a UserString class?
[...]
Guido van Rossum:
> So why don't you give the UserString.py a try and leave Andy's wish alone?

Okay.  Here we go.  Could someone please have a close eye on this?
I've haccked it up in hurry.
---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ----
#!/usr/bin/env python
"""A user-defined wrapper around string objects

Note: string objects have grown methods in Python 1.6 
This module requires Python 1.6 or later.
"""
import sys

# XXX Totally untested and hacked up until 2:00 am with too less sleep ;-)

class UserString:
    def __init__(self, string=""):
        self.data = string
    def __repr__(self): return repr(self.data)
    def __cmp__(self, string):
        if isinstance(string, UserString):
            return cmp(self.data, string.data)
        else:
            return cmp(self.data, string)
    def __len__(self): return len(self.data)
    # methods defined in alphabetical order
    def capitalize(self): return self.__class__(self.data.capitalize())
    def center(self, width): return self.__class__(self.data.center(width))
    def count(self, sub, start=0, end=sys.maxint):
        return self.data.count(sub, start, end)
    def encode(self, encoding=None, errors=None): # XXX improve this?
        if encoding:
	    if errors:
		return self.__class__(self.data.encode(encoding, errors))
	    else:
		return self.__class__(self.data.encode(encoding))
	else: 
	    return self.__class__(self.data.encode())
    def endswith(self):
        raise NotImplementedError
    def	find(self, sub, start=0, end=sys.maxint): 
        return self.data.find(sub, start, end)
    def index(self): 
        return self.data.index(sub, start, end)
    def isdecimal(self): return self.data.isdecimal()
    def isdigit(self): return self.data.isdigit()
    def islower(self): return self.data.islower()
    def isnumeric(self): return self.data.isnumeric()
    def isspace(self): return self.data.isspace()
    def istitle(self): return self.data.istitle()
    def isupper(self): return self.data.isupper()
    def join(self, seq): return self.data.join(seq)
    def ljust(self, width): return self.__class__(self.data.ljust(width))
    def lower(self): return self.__class__(self.data.lower())
    def lstrip(self): return self.__class__(self.data.lstrip())
    def replace(self, old, new, maxsplit=-1): 
	return self.__class__(self.data.replace(old, new, maxsplit))
    def rfind(self, sub, start=0, end=sys.maxint): 
        return self.data.rfind(sub, start, end)
    def rindex(self, sub, start=0, end=sys.maxint): 
        return self.data.rindex(sub, start, end)
    def rjust(self, width): return self.__class__(self.data.rjust(width))
    def rstrip(self): return self.__class__(self.data.rstrip())
    def split(self, sep=None, maxsplit=-1): 
        return self.data.split(sep, maxsplit)
    def splitlines(self, maxsplit=-1): return self.data.splitlines(maxsplit)
    def startswith(self, prefix, start=0, end=sys.maxint): 
        return self.data.startswith(prefix, start, end)
    def strip(self): return self.__class__(self.data.strip())
    def swapcase(self): return self.__class__(self.data.swapcase())
    def title(self): return self.__class__(self.data.title())
    def translate(self, table, deletechars=""): 
        return self.__class__(self.data.translate(table, deletechars))
    def upper(self): return self.__class__(self.data.upper())

    def __add__(self, other):
        if isinstance(other, UserString):
            return self.__class__(self.data + other.data)
        elif isinstance(other, type(self.data)):
            return self.__class__(self.data + other)
        else:
            return self.__class__(self.data + str(other))
    def __radd__(self, other):
        if isinstance(other, type(self.data)):
            return self.__class__(other + self.data)
        else:
            return self.__class__(str(other) + self.data)
    def __mul__(self, n):
        return self.__class__(self.data*n)
    __rmul__ = __mul__

def _test():
    s = UserString("abc")
    u = UserString(u"efg")
    # XXX add some real tests here?
    return [0]

if __name__ == "__main__":
    import sys
    sys.exit(_test()[0])


From effbot at telia.com  Wed Mar 29 01:12:55 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Wed, 29 Mar 2000 01:12:55 +0200
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <NDBBJPNCJLKKIOBLDOMJGEDNCDAA.DavidA@ActiveState.com>
Message-ID: <012301bf990b$2a494c80$34aab5d4@hagrid>

> I'm thrilled to see the extended call syntax patches go in!  One less wart
> in the language!

but did he compile before checking in?

..\Python\compile.c(1225) : error C2065: 'CALL_FUNCTION_STAR' :
undeclared identifier

(compile.c and opcode.h both mention this identifier, but
nobody defines it...  should it be CALL_FUNCTION_VAR,
perhaps?)

</F>


From guido at python.org  Wed Mar 29 02:07:34 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 19:07:34 -0500
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: Your message of "Wed, 29 Mar 2000 01:53:50 +0200."
             <m12a5ny-000CpwC@artcom0.artcom-gmbh.de> 
References: <m12a5ny-000CpwC@artcom0.artcom-gmbh.de> 
Message-ID: <200003290007.TAA16081@eric.cnri.reston.va.us>

> > [Peter Funk]
> > > > > > Do we need a UserString class?
> [...]
> Guido van Rossum:
> > So why don't you give the UserString.py a try and leave Andy's wish alone?
[Peter]
> Okay.  Here we go.  Could someone please have a close eye on this?
> I've haccked it up in hurry.

Good job!

Go get some sleep, and tomorrow morning when you're fresh, compare it
to UserList.  From visual inpsection, you seem to be missing
__getitem__ and __getslice__, and maybe more (of course not __set*__).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From ping at lfw.org  Wed Mar 29 02:13:24 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Tue, 28 Mar 2000 18:13:24 -0600 (CST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <012301bf990b$2a494c80$34aab5d4@hagrid>
Message-ID: <Pine.LNX.4.10.10003281809490.4220-100000@server1.lfw.org>

On Wed, 29 Mar 2000, Fredrik Lundh wrote:
> > I'm thrilled to see the extended call syntax patches go in!  One less wart
> > in the language!
> 
> but did he compile before checking in?

You beat me to it.  I read David's message and got so excited
i just had to try it right away.  So i updated my CVS tree,
did "make", and got the same error:

    make[1]: Entering directory `/home/ping/dev/python/dist/src/Python'
    gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H   -c compile.c -o compile.o
    compile.c: In function `com_call_function':
    compile.c:1225: `CALL_FUNCTION_STAR' undeclared (first use in this function)
    compile.c:1225: (Each undeclared identifier is reported only once
    compile.c:1225: for each function it appears in.)
    make[1]: *** [compile.o] Error 1

> (compile.c and opcode.h both mention this identifier, but
> nobody defines it...  should it be CALL_FUNCTION_VAR,
> perhaps?)

But CALL_FUNCTION_STAR is mentioned in the comments...

    #define CALL_FUNCTION   131     /* #args + (#kwargs<<8) */
    #define MAKE_FUNCTION   132     /* #defaults */
    #define BUILD_SLICE     133     /* Number of items */

    /* The next 3 opcodes must be contiguous and satisfy
       (CALL_FUNCTION_STAR - CALL_FUNCTION) & 3 == 1  */
    #define CALL_FUNCTION_VAR          140  /* #args + (#kwargs<<8) */
    #define CALL_FUNCTION_KW           141  /* #args + (#kwargs<<8) */
    #define CALL_FUNCTION_VAR_KW       142  /* #args + (#kwargs<<8) */

The condition (CALL_FUNCTION_STAR - CALL_FUNCTION) & 3 == 1
doesn't make much sense, though...


-- ?!ng


From jeremy at cnri.reston.va.us  Wed Mar 29 02:18:54 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Tue, 28 Mar 2000 19:18:54 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <012301bf990b$2a494c80$34aab5d4@hagrid>
References: <NDBBJPNCJLKKIOBLDOMJGEDNCDAA.DavidA@ActiveState.com>
	<012301bf990b$2a494c80$34aab5d4@hagrid>
Message-ID: <14561.19438.157799.810802@goon.cnri.reston.va.us>

>>>>> "FL" == Fredrik Lundh <effbot at telia.com> writes:

  >> I'm thrilled to see the extended call syntax patches go in!  One
  >> less wart in the language!

  FL> but did he compile before checking in?

Indeed, but not often enough :-).

  FL> ..\Python\compile.c(1225) : error C2065: 'CALL_FUNCTION_STAR' :
  FL> undeclared identifier

  FL> (compile.c and opcode.h both mention this identifier, but nobody
  FL> defines it...  should it be CALL_FUNCTION_VAR, perhaps?)

This was a last minute change of names.  I had previously compiled
under the old names.  The Makefile doesn't describe the dependency
between opcode.h and compile.c.  And the compile.o file I had worked,
because the only change was to the name of a macro.

It's too bad the Makefile doesn't have all the dependencies.  It seems
that it's necessary to do a make clean before checking in a change
that affects many files.

Jeremy


From klm at digicool.com  Wed Mar 29 02:30:05 2000
From: klm at digicool.com (Ken Manheimer)
Date: Tue, 28 Mar 2000 19:30:05 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJGEDNCDAA.DavidA@ActiveState.com>
Message-ID: <Pine.LNX.4.21.0003281922460.10812-100000@korak.digicool.com>

On Tue, 28 Mar 2000, David Ascher wrote:

> I'm thrilled to see the extended call syntax patches go in!  One less wart
> in the language!

Me too!  Even the lisps i used to know (albeit ancient, according to eric)
couldn't get it as tidy as this.

(Silly me, now i'm imagining we're going to see operator assignments just
around the bend.  "Give them a tasty morsel, they ask for your dinner..."-)

Ken
klm at digicool.com


From ping at lfw.org  Wed Mar 29 02:35:54 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Tue, 28 Mar 2000 18:35:54 -0600 (CST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <14561.19438.157799.810802@goon.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>

On Tue, 28 Mar 2000, Jeremy Hylton wrote:
> 
> It's too bad the Makefile doesn't have all the dependencies.  It seems
> that it's necessary to do a make clean before checking in a change
> that affects many files.

I updated again and rebuilt.

    >>> def sum(*args):
    ...     s = 0
    ...     for x in args: s = s + x
    ...     return s
    ... 
    >>> sum(2,3,4)
    9
    >>> sum(*[2,3,4])
    9
    >>> x = (2,3,4)
    >>> sum(*x)
    9
    >>> def func(a, b, c):
    ...     print a, b, c
    ... 
    >>> func(**{'a':2, 'b':1, 'c':6})
    2 1 6
    >>> func(**{'c':8, 'a':1, 'b':9})
    1 9 8
    >>> 

*cool*.

So does this completely obviate the need for "apply", then?

    apply(x, y, z)  <==>  x(*y, **z)


-- ?!ng


From guido at python.org  Wed Mar 29 02:35:17 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 19:35:17 -0500
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: Your message of "Tue, 28 Mar 2000 18:35:54 CST."
             <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org> 
References: <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org> 
Message-ID: <200003290035.TAA16278@eric.cnri.reston.va.us>

> *cool*.
> 
> So does this completely obviate the need for "apply", then?
> 
>     apply(x, y, z)  <==>  x(*y, **z)

I think so (except for backwards compatibility).  The 1.6 docs for
apply should point this out!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From DavidA at ActiveState.com  Wed Mar 29 02:42:20 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Tue, 28 Mar 2000 16:42:20 -0800
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
Message-ID: <NDBBJPNCJLKKIOBLDOMJIEEACDAA.DavidA@ActiveState.com>

> I updated again and rebuilt.
> 
>     >>> def sum(*args):
>     ...     s = 0
>     ...     for x in args: s = s + x
>     ...     return s
>     ... 
>     >>> sum(2,3,4)
>     9
>     >>> sum(*[2,3,4])
>     9
>     >>> x = (2,3,4)
>     >>> sum(*x)
>     9
>     >>> def func(a, b, c):
>     ...     print a, b, c
>     ... 
>     >>> func(**{'a':2, 'b':1, 'c':6})
>     2 1 6
>     >>> func(**{'c':8, 'a':1, 'b':9})
>     1 9 8
>     >>> 
> 
> *cool*.


But most importantly, IMO:

class SubClass(Class):
	def __init__(self, a, *args, **kw):
		self.a = a
		Class.__init__(self, *args, **kw)

Much neater.


From bwarsaw at cnri.reston.va.us  Wed Mar 29 02:46:11 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 28 Mar 2000 19:46:11 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <14561.19438.157799.810802@goon.cnri.reston.va.us>
	<Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
Message-ID: <14561.21075.637108.322536@anthem.cnri.reston.va.us>

Uh oh.  Fresh CVS update and make clean, make:

-------------------- snip snip --------------------
Python 1.5.2+ (#20, Mar 28 2000, 19:37:38)  [GCC 2.8.1] on sunos5
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> def sum(*args):
...  s = 0
...  for x in args: s = s + x
...  return s
... 
>>> class Nums:
...  def __getitem__(self, i):
...   if i >= 10 or i < 0: raise IndexError
...   return i
... 
>>> n = Nums()
>>> for i in n: print i
... 
0
1
2
3
4
5
6
7
8
9
>>> sum(*n)
Traceback (innermost last):
  File "<stdin>", line 1, in ?
SystemError: bad argument to internal function
-------------------- snip snip --------------------

-Barry


From bwarsaw at cnri.reston.va.us  Wed Mar 29 03:02:16 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 28 Mar 2000 20:02:16 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <14561.19438.157799.810802@goon.cnri.reston.va.us>
	<Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
	<14561.21075.637108.322536@anthem.cnri.reston.va.us>
Message-ID: <14561.22040.383370.283163@anthem.cnri.reston.va.us>

Changing the definition of class Nums to

class Nums:
    def __getitem__(self, i):
        if 0 <= i < 10: return i
        raise IndexError
    def __len__(self):
        return 10

I.e. adding the __len__() method avoids the SystemError.

Either the *arg call should not depend on the sequence being
lenght-able, or it should error check that the length calculation
doesn't return -1 or raise an exception.

Looking at PySequence_Length() though, it seems that m->sq_length(s)
can return -1 without setting a type_error.  So the fix is either to
include a check for return -1 in PySequence_Length() when calling
sq_length, or instance_length() should set a TypeError when it has no
__len__() method and returns -1.

I gotta run so I can't follow this through -- I'm sure I'll see the
right solution from someone in tomorrow mornings email :)

-Barry


From ping at lfw.org  Wed Mar 29 03:17:27 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Tue, 28 Mar 2000 19:17:27 -0600 (CST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <14561.22040.383370.283163@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003281916100.4584-100000@server1.lfw.org>

On Tue, 28 Mar 2000, Barry A. Warsaw wrote:
> 
> Changing the definition of class Nums to
> 
> class Nums:
>     def __getitem__(self, i):
>         if 0 <= i < 10: return i
>         raise IndexError
>     def __len__(self):
>         return 10
> 
> I.e. adding the __len__() method avoids the SystemError.

It should be noted that "apply" has the same problem, with a
different counterintuitive error message:

    >>> n = Nums()
    >>> apply(sum, n)
    Traceback (innermost last):
      File "<stdin>", line 1, in ?
    AttributeError: __len__


-- ?!ng


From jeremy at cnri.reston.va.us  Wed Mar 29 04:59:26 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Tue, 28 Mar 2000 21:59:26 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJIEEACDAA.DavidA@ActiveState.com>
References: <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
	<NDBBJPNCJLKKIOBLDOMJIEEACDAA.DavidA@ActiveState.com>
Message-ID: <14561.29070.940238.542509@bitdiddle.cnri.reston.va.us>

>>>>> "DA" == David Ascher <DavidA at ActiveState.com> writes:

  DA> But most importantly, IMO:

  DA> class SubClass(Class):
  DA> 	def __init__(self, a, *args, **kw):
  DA> 		self.a = a
  DA> 		Class.__init__(self, *args, **kw)

  DA> Much neater.

This version of method overloading was what I liked most about Greg's
patch.  Note that I also prefer:

class SubClass(Class):
    super_init = Class.__init__

    def __init__(self, a, *args, **kw):
        self.a = a
	self.super_init(*args, **kw)

I've been happy to have all the overridden methods explicitly labelled
at the top of a class lately.  It is much easier to change the class
hierarchy later.

Jeremy


From gward at cnri.reston.va.us  Wed Mar 29 05:15:00 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Tue, 28 Mar 2000 22:15:00 -0500
Subject: [Python-Dev] __debug__ and py_compile
Message-ID: <20000328221500.A3290@cnri.reston.va.us>

Hi all --

a particularly active member of the Distutils-SIG brought the
global '__debug__' flag to my attention, since I (and thus my code)
didn't know if calling 'py_compile.compile()' would result in a ".pyc"
or a ".pyo" file.  It appears that, using __debug__, you can determine
what you're going to get.  Cool!

However, it doesn't look like you can *choose* what you're going to
get.  Is this correct?  Ie. does the presence/absence of -O when the
interpreter starts up *completely* decide how code is compiled?

Also, can I rely on __debug__ being there in the future?  How about in
the past?  I still occasionally ponder making Distutils compatible with
Python 1.5.1.

Thanks --

       Greg


From guido at python.org  Wed Mar 29 06:08:12 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 23:08:12 -0500
Subject: [Python-Dev] __debug__ and py_compile
In-Reply-To: Your message of "Tue, 28 Mar 2000 22:15:00 EST."
             <20000328221500.A3290@cnri.reston.va.us> 
References: <20000328221500.A3290@cnri.reston.va.us> 
Message-ID: <200003290408.XAA17991@eric.cnri.reston.va.us>

> a particularly active member of the Distutils-SIG brought the
> global '__debug__' flag to my attention, since I (and thus my code)
> didn't know if calling 'py_compile.compile()' would result in a ".pyc"
> or a ".pyo" file.  It appears that, using __debug__, you can determine
> what you're going to get.  Cool!
> 
> However, it doesn't look like you can *choose* what you're going to
> get.  Is this correct?  Ie. does the presence/absence of -O when the
> interpreter starts up *completely* decide how code is compiled?

Correct.  You (currently) can't change the opt setting of the
compiler.  (It was part of the compiler restructuring to give more
freedom here; this has been pushed back to 1.7.)

> Also, can I rely on __debug__ being there in the future?  How about in
> the past?  I still occasionally ponder making Distutils compatible with
> Python 1.5.1.

__debug__ is as old as the assert statement, going back to at least
1.5.0.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From moshez at math.huji.ac.il  Wed Mar 29 07:35:51 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 07:35:51 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <1257835425-27941123@hypernet.com>
Message-ID: <Pine.GSO.4.10.10003290729530.20524-100000@sundial>

On Tue, 28 Mar 2000, Gordon McMillan wrote:

> What would happen if he (and everyone else) installed 
> themselves *into* my core packages, then I decided I didn't 
> want his stuff? More than likely I'd have to scrub the damn 
> installation and start all over again.

I think Greg Stein answered that objection, by reminding us that the
filesystem isn't the only way to set up a package hierarchy. In
particular, even with Python's current module system, there is no need to
scrub installations: Python core modules go (under UNIX) in
/usr/local/lib/python1.5, and 3rd party modules go in
/usr/local/lib/python1.5/site-packages. Need to remove stuff? Remove
whatever is in /usr/local/lib/python1.5/site-packages. Need to upgrade?
Just backup /usr/local/lib/python1.5/site-packages, remove
/usr/local/lib/python1.5/, install, and move 3rd party modules back from
backup. This becomes even easier if the standard installation is in a
JAR-like file, and 3rd party modules are also in a JAR-like file, but
specified to be in their natural place.

Wow! That was a long rant!

Anyway, I already expressed my preference of the Perl way, over the Java
way. For one thing, I don't want to have to register a domain just so I
could distribute Python code <wink>

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From bwarsaw at cnri.reston.va.us  Wed Mar 29 07:42:34 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Wed, 29 Mar 2000 00:42:34 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <14561.19438.157799.810802@goon.cnri.reston.va.us>
	<Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
	<14561.21075.637108.322536@anthem.cnri.reston.va.us>
Message-ID: <14561.38858.41246.28460@anthem.cnri.reston.va.us>

>>>>> "BAW" == Barry A Warsaw <bwarsaw at cnri.reston.va.us> writes:

    BAW> Uh oh.  Fresh CVS update and make clean, make:

    >>> sum(*n)
    | Traceback (innermost last):
    |   File "<stdin>", line 1, in ?
    | SystemError: bad argument to internal function

Here's a proposed patch that will cause a TypeError to be raised
instead.

-Barry

-------------------- snip snip --------------------
Index: abstract.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Objects/abstract.c,v
retrieving revision 2.33
diff -c -r2.33 abstract.c
*** abstract.c	2000/03/10 22:55:18	2.33
--- abstract.c	2000/03/29 05:36:21
***************
*** 860,866 ****
  	PyObject *s;
  {
  	PySequenceMethods *m;
! 
  	if (s == NULL) {
  		null_error();
  		return -1;
--- 860,867 ----
  	PyObject *s;
  {
  	PySequenceMethods *m;
! 	int size = -1;
! 	
  	if (s == NULL) {
  		null_error();
  		return -1;
***************
*** 868,877 ****
  
  	m = s->ob_type->tp_as_sequence;
  	if (m && m->sq_length)
! 		return m->sq_length(s);
  
! 	type_error("len() of unsized object");
! 	return -1;
  }
  
  PyObject *
--- 869,879 ----
  
  	m = s->ob_type->tp_as_sequence;
  	if (m && m->sq_length)
! 		size = m->sq_length(s);
  
! 	if (size < 0)
! 		type_error("len() of unsized object");
! 	return size;
  }
  
  PyObject *
Index: ceval.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Python/ceval.c,v
retrieving revision 2.169
diff -c -r2.169 ceval.c
*** ceval.c	2000/03/28 23:49:16	2.169
--- ceval.c	2000/03/29 05:39:00
***************
*** 1636,1641 ****
--- 1636,1649 ----
  				break;
  			    }
  			    nstar = PySequence_Length(stararg);
+ 			    if (nstar < 0) {
+ 				    if (!PyErr_Occurred)
+ 					    PyErr_SetString(
+ 						    PyExc_TypeError,
+ 						    "len() of unsized object");
+ 				    x = NULL;
+ 				    break;
+ 			    }
  			}
  			if (nk > 0) {
  			    if (kwdict == NULL) {


From bwarsaw at cnri.reston.va.us  Wed Mar 29 07:46:19 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Wed, 29 Mar 2000 00:46:19 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <14561.22040.383370.283163@anthem.cnri.reston.va.us>
	<Pine.LNX.4.10.10003281916100.4584-100000@server1.lfw.org>
Message-ID: <14561.39083.748093.694726@anthem.cnri.reston.va.us>

>>>>> "KY" == Ka-Ping Yee <ping at lfw.org> writes:

    | It should be noted that "apply" has the same problem, with a
    | different counterintuitive error message:

    >> n = Nums() apply(sum, n)
    |     Traceback (innermost last):
    |       File "<stdin>", line 1, in ?
    |     AttributeError: __len__

The patch I just posted fixes this too.  The error message ain't
great, but at least it's consistent with the direct call.

-Barry

-------------------- snip snip --------------------
Traceback (innermost last):
  File "/tmp/doit.py", line 15, in ?
    print apply(sum, n)
TypeError: len() of unsized object


From pf at artcom-gmbh.de  Wed Mar 29 08:30:22 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 08:30:22 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <Pine.GSO.4.10.10003290737580.20524-100000@sundial> from Moshe Zadka at "Mar 29, 2000  7:44:42 am"
Message-ID: <m12aBzi-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> On Wed, 29 Mar 2000, Peter Funk wrote:
> 
> > class UserString:
> >     def __init__(self, string=""):
> >         self.data = string
>           ^^^^^^^
Moshe Zadka wrote:
> Why do you feel there is a need to default? Strings are immutable

I had something like this in my mind:

class MutableString(UserString):
    """Python strings are immutable objects.  But of course this can
    be changed in a derived class implementing the missing methods.

        >>> s = MutableString()
	>>> s[0:5] = "HUH?"
    """
    def __setitem__(self, char):
        ....
    def __setslice__(self, i, j, substring):
        ....
> What about __int__, __long__, __float__, __str__, __hash__?
> And what about __getitem__ and __contains__?
> And __complex__?

I was obviously too tired and too eager to get this out!  
Thanks for reviewing and responding so quickly.  I will add them.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From moshez at math.huji.ac.il  Wed Mar 29 08:51:30 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 08:51:30 +0200 (IST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString
 class?
In-Reply-To: <m12aBzi-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.GSO.4.10.10003290850310.20736-100000@sundial>

On Wed, 29 Mar 2000, Peter Funk wrote:

> Moshe Zadka wrote:
> > Why do you feel there is a need to default? Strings are immutable
> 
> I had something like this in my mind:
> 
> class MutableString(UserString):
>     """Python strings are immutable objects.  But of course this can
>     be changed in a derived class implementing the missing methods.

Then add the default in the constructor for MutableString....

eagerly-waiting-for-UserString.py-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Wed Mar 29 09:03:53 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 09:03:53 +0200 (IST)
Subject: [Python-Dev] 1.5.2->1.6 Changes
Message-ID: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>

I'm starting to compile a list of changes from 1.5.2 to 1.6. Here's what I
came up with so far
-- string objects now have methods (though they are still immutable)
-- unicode support: Unicode strings are marked with u"string", and there
   is support for arbitrary encoders/decoders
-- "in" operator can now be overriden in user-defined classes to mean anything:
   it calls the magic method __contains__
-- SRE is the new regular expression engine. re.py became an interface to
   the same engine. The new engine fully supports unicode regular expressions.
-- Some methods which would take multiple arguments and treat them as a tuple
   were fixed: list.{append, insert, remove, count}, socket.connect
-- Some modules were made obsolete
-- filecmp.py (supersedes the old cmp.py and dircmp.py modules),
-- tabnanny.py (make sure the source file doesn't assume a specific tab-width)
-- win32reg (win32 registry editor)
-- unicode module, and codecs package
-- New calling syntax: f(*args, **kw) equivalent to apply(f, args, kw)
-- _tkinter now uses the object, rather then string, interface to Tcl.

Please e-mail me personally if you think of any other changes, and I'll 
try to integrate them into a complete "changes" document.

Thanks in advance
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From esr at thyrsus.com  Wed Mar 29 09:21:29 2000
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 29 Mar 2000 02:21:29 -0500
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>; from Moshe Zadka on Wed, Mar 29, 2000 at 09:03:53AM +0200
References: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>
Message-ID: <20000329022129.A15539@thyrsus.com>

Moshe Zadka <moshez at math.huji.ac.il>:
> -- _tkinter now uses the object, rather then string, interface to Tcl.

Hm, does this mean that the annoying requirement to do explicit gets and
sets to move data between the Python world and the Tcl/Tk world is gone?
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

"A system of licensing and registration is the perfect device to deny
gun ownership to the bourgeoisie."
	-- Vladimir Ilyich Lenin


From moshez at math.huji.ac.il  Wed Mar 29 09:22:54 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 09:22:54 +0200 (IST)
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: <20000329022129.A15539@thyrsus.com>
Message-ID: <Pine.GSO.4.10.10003290921450.21447-100000@sundial>

On Wed, 29 Mar 2000, Eric S. Raymond wrote:

> Moshe Zadka <moshez at math.huji.ac.il>:
> > -- _tkinter now uses the object, rather then string, interface to Tcl.
> 
> Hm, does this mean that the annoying requirement to do explicit gets and
> sets to move data between the Python world and the Tcl/Tk world is gone?

I doubt it. It's just that Python and Tcl have such a different outlook
about variables, that I don't think it can be slided over.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From pf at artcom-gmbh.de  Wed Mar 29 11:16:17 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 11:16:17 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <Pine.GSO.4.10.10003290850310.20736-100000@sundial> from Moshe Zadka at "Mar 29, 2000  8:51:30 am"
Message-ID: <m12aEaH-000CpwC@artcom0.artcom-gmbh.de>

Hi!

Moshe Zadka:
> eagerly-waiting-for-UserString.py-ly y'rs, Z.

Well, I've added the missing methods.  Unfortunately I ran out of time now and
a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still 
missing.  

Regards, Peter
---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ----
#!/usr/bin/env python
"""A user-defined wrapper around string objects

Note: string objects have grown methods in Python 1.6 
This module requires Python 1.6 or later.
"""
from types import StringType, UnicodeType
import sys

class UserString:
    def __init__(self, string):
        self.data = string
    def __str__(self): return str(self.data)
    def __repr__(self): return repr(self.data)
    def __int__(self): return int(self.data)
    def __long__(self): return long(self.data)
    def __float__(self): return float(self.data)
    def __hash__(self): return hash(self.data)

    def __cmp__(self, string):
        if isinstance(string, UserString):
            return cmp(self.data, string.data)
        else:
            return cmp(self.data, string)
    def __contains__(self, char):
        return char in self.data

    def __len__(self): return len(self.data)
    def __getitem__(self, index): return self.__class__(self.data[index])
    def __getslice__(self, start, end):
        start = max(start, 0); end = max(end, 0)
        return self.__class__(self.data[start:end])

    def __add__(self, other):
        if isinstance(other, UserString):
            return self.__class__(self.data + other.data)
        elif isinstance(other, StringType) or isinstance(other, UnicodeType):
            return self.__class__(self.data + other)
        else:
            return self.__class__(self.data + str(other))
    def __radd__(self, other):
        if isinstance(other, StringType) or isinstance(other, UnicodeType):
            return self.__class__(other + self.data)
        else:
            return self.__class__(str(other) + self.data)
    def __mul__(self, n):
        return self.__class__(self.data*n)
    __rmul__ = __mul__

    # the following methods are defined in alphabetical order:
    def capitalize(self): return self.__class__(self.data.capitalize())
    def center(self, width): return self.__class__(self.data.center(width))
    def count(self, sub, start=0, end=sys.maxint):
        return self.data.count(sub, start, end)
    def encode(self, encoding=None, errors=None): # XXX improve this?
        if encoding:
            if errors:
                return self.__class__(self.data.encode(encoding, errors))
            else:
                return self.__class__(self.data.encode(encoding))
        else: 
            return self.__class__(self.data.encode())
    def endswith(self, suffix, start=0, end=sys.maxint):
        return self.data.endswith(suffix, start, end)
    def find(self, sub, start=0, end=sys.maxint): 
        return self.data.find(sub, start, end)
    def index(self, sub, start=0, end=sys.maxint): 
        return self.data.index(sub, start, end)
    def isdecimal(self): return self.data.isdecimal()
    def isdigit(self): return self.data.isdigit()
    def islower(self): return self.data.islower()
    def isnumeric(self): return self.data.isnumeric()
    def isspace(self): return self.data.isspace()
    def istitle(self): return self.data.istitle()
    def isupper(self): return self.data.isupper()
    def join(self, seq): return self.data.join(seq)
    def ljust(self, width): return self.__class__(self.data.ljust(width))
    def lower(self): return self.__class__(self.data.lower())
    def lstrip(self): return self.__class__(self.data.lstrip())
    def replace(self, old, new, maxsplit=-1): 
        return self.__class__(self.data.replace(old, new, maxsplit))
    def rfind(self, sub, start=0, end=sys.maxint): 
        return self.data.rfind(sub, start, end)
    def rindex(self, sub, start=0, end=sys.maxint): 
        return self.data.rindex(sub, start, end)
    def rjust(self, width): return self.__class__(self.data.rjust(width))
    def rstrip(self): return self.__class__(self.data.rstrip())
    def split(self, sep=None, maxsplit=-1): 
        return self.data.split(sep, maxsplit)
    def splitlines(self, maxsplit=-1): return self.data.splitlines(maxsplit)
    def startswith(self, prefix, start=0, end=sys.maxint): 
        return self.data.startswith(prefix, start, end)
    def strip(self): return self.__class__(self.data.strip())
    def swapcase(self): return self.__class__(self.data.swapcase())
    def title(self): return self.__class__(self.data.title())
    def translate(self, table, deletechars=""): 
        return self.__class__(self.data.translate(table, deletechars))
    def upper(self): return self.__class__(self.data.upper())

class MutableString(UserString):
    """mutable string objects

    Python strings are immutable objects.  This has the advantage, that
    strings may be used as dictionary keys.  If this property isn't needed
    and you insist on changing string values in place instead, you may cheat
    and use MutableString.

    But the purpose of this class is an educational one: to prevent
    people from inventing their own mutable string class derived
    from UserString and than forget thereby to remove (override) the
    __hash__ method inherited from ^UserString.  This would lead to
    errors that would be very hard to track down.

    A faster and better solution is to rewrite the program using lists."""
    def __init__(self, string=""):
        self.data = string
    def __hash__(self): 
        raise TypeError, "unhashable type (it is mutable)"
    def __setitem__(self, index, sub):
	if index < 0 or index >= len(self.data): raise IndexError
        self.data = self.data[:index] + sub + self.data[index+1:]
    def __delitem__(self, index):
	if index < 0 or index >= len(self.data): raise IndexError
        self.data = self.data[:index] + self.data[index+1:]
    def __setslice__(self, start, end, sub):
        start = max(start, 0); end = max(end, 0)
        if isinstance(sub, UserString):
            self.data = self.data[:start]+sub.data+self.data[end:]
        elif isinstance(sub, StringType) or isinstance(sub, UnicodeType):
            self.data = self.data[:start]+sub+self.data[end:]
        else:
            self.data =  self.data[:start]+str(sub)+self.data[end:]
    def __delslice__(self, start, end):
        start = max(start, 0); end = max(end, 0)
        self.data = self.data[:start] + self.data[end:]
    def immutable(self):
        return UserString(self.data)
    
def _test():
    s = UserString("abc")
    u = UserString(u"efg")
    # XXX add some real tests here?
    return 0

if __name__ == "__main__":
    sys.exit(_test())


From mal at lemburg.com  Wed Mar 29 11:34:21 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 29 Mar 2000 11:34:21 +0200
Subject: [Python-Dev] Great Renaming?  What is the goal?
References: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de> <1257835425-27941123@hypernet.com>
Message-ID: <38E1CE1D.7899B1BC@lemburg.com>

Gordon McMillan wrote:
> 
> Andrew M. Kuchling wrote:
> [snip]
> >   2) Right now there's no way for third-party extensions to add
> >   themselves to a package in the standard library.  Once Python finds
> >   foo/__init__.py, it won't look for site-packages/foo/__init__.py, so
> >   if you grab, say, "crypto" as a package name in the standard library,
> >   it's forever lost to third-party extensions.
> 
> That way lies madness. While I'm happy to carp at Java for
> requiring "com", "net" or whatever as a top level name, their
> intent is correct: the names grabbed by the Python standard
> packages belong to no one but the Python standard
> packages. If you *don't* do that, upgrades are an absolute
> nightmare.
> 
> Marc-Andre grabbed "mx". If (as I rather suspect <wink>) he
> wants to remake the entire standard lib in his image, he's
> welcome to - *under* mx.

Right, that's the way I see it too. BTW, where can I register
the "mx" top-level package name ? Should these be registered
in the NIST registry ? Will the names registered there be
honored ?
 
> What would happen if he (and everyone else) installed
> themselves *into* my core packages, then I decided I didn't
> want his stuff? More than likely I'd have to scrub the damn
> installation and start all over again.

That's a no-no, IMHO. Unless explicitly allowed, packages
should *not* install themselves as subpackages to other
existing top-level packages. If they do, its their problem
if the hierarchy changes...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From moshez at math.huji.ac.il  Wed Mar 29 11:59:47 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 11:59:47 +0200 (IST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString
 class?
In-Reply-To: <m12aEaH-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.GSO.4.10.10003291152340.28879-100000@sundial>

On Wed, 29 Mar 2000, Peter Funk wrote:

> Hi!
> 
> Moshe Zadka:
> > eagerly-waiting-for-UserString.py-ly y'rs, Z.
> 
> Well, I've added the missing methods.  Unfortunately I ran out of time now and
> a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still 
> missing.  

Great work, Peter! I really like UserString. However, I have two issues
with MutableString:

1. I tshouldn't share implementation with UserString, otherwise your
algorithm are not behaving with correct big-O properties. It should
probably use a char-array (from the array module) as the internal
representation.

2. It shouldn't share interface iwth UserString, since it doesn't have a
proper implementation with __hash__.


All in all, I probably disagree with making MutableString a subclass of
UserString. If I have time later today, I'm hoping to be able to make my
own MutableString


From pf at artcom-gmbh.de  Wed Mar 29 12:35:32 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 12:35:32 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <Pine.GSO.4.10.10003291152340.28879-100000@sundial> from Moshe Zadka at "Mar 29, 2000 11:59:47 am"
Message-ID: <m12aFoy-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> > Moshe Zadka:
> > > eagerly-waiting-for-UserString.py-ly y'rs, Z.
> > 
> On Wed, 29 Mar 2000, Peter Funk wrote:
> > Well, I've added the missing methods.  Unfortunately I ran out of time now and
> > a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still 
> > missing.  
> 
Moshe Zadka schrieb:
> Great work, Peter! I really like UserString. However, I have two issues
> with MutableString:
> 
> 1. I tshouldn't share implementation with UserString, otherwise your
> algorithm are not behaving with correct big-O properties. It should
> probably use a char-array (from the array module) as the internal
> representation.

Hmm.... I don't understand what you mean with 'big-O properties'.  
The internal representation of any object should be considered ...
umm ... internal.

> 2. It shouldn't share interface iwth UserString, since it doesn't have a
> proper implementation with __hash__.

What's wrong with my implementation of __hash__ raising a TypeError with
the attribution 'unhashable object'.  This is the same behaviour, if 
you try to add some other mutable object as key to dictionary:

>>> l = []
>>> d = { l : 'foo' }
Traceback (innermost last):
  File "<stdin>", line 1, in ?
TypeError: unhashable type

> All in all, I probably disagree with making MutableString a subclass of
> UserString. If I have time later today, I'm hoping to be able to make my
> own MutableString

As I tried to point out in the docstring of 'MutableString', I don't want 
people actually start using the 'MutableString' class.  My Intentation 
was to prevent people from trying to invent their own and than probably 
wrong MutableString class derived from UserString.  Only Newbies will really
ever need mutable strings in Python (see FAQ).

May be my 'MutableString' idea belongs somewhere into 
the to be written src/Doc/libuserstring.tex.  But since Newbies tend
to ignore docs ... Sigh.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From gmcm at hypernet.com  Wed Mar 29 13:07:20 2000
From: gmcm at hypernet.com (Gordon McMillan)
Date: Wed, 29 Mar 2000 06:07:20 -0500
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.GSO.4.10.10003290729530.20524-100000@sundial>
References: <1257835425-27941123@hypernet.com>
Message-ID: <1257794452-30405909@hypernet.com>

Moshe Zadka wrote:

> On Tue, 28 Mar 2000, Gordon McMillan wrote:
> 
> > What would happen if he (and everyone else) installed 
> > themselves *into* my core packages, then I decided I didn't 
> > want his stuff? More than likely I'd have to scrub the damn 
> > installation and start all over again.
> 
> I think Greg Stein answered that objection, by reminding us that the
> filesystem isn't the only way to set up a package hierarchy.

You mean when Greg said:
>Assuming that you use an archive like those found in my "small" distro or
> Gordon's distro, then this is no problem. The archive simply recognizes
> and maps "text.encoding.macbinary" to its own module.

I don't know what this has to do with it. When we get around 
to the 'macbinary' part, we have already established that 
'text.encoding' is the parent which should supply 'macbinary'.

>  In
> particular, even with Python's current module system, there is no need to
> scrub installations: Python core modules go (under UNIX) in
> /usr/local/lib/python1.5, and 3rd party modules go in
> /usr/local/lib/python1.5/site-packages. 

And if there's a /usr/local/lib/python1.5/text/encoding, there's 
no way that /usr/local/lib/python1.5/site-
packages/text/encoding will get searched.

I believe you could hack up an importer that did allow this, and 
I think you'd be 100% certifiable if you did. Just look at the 
surprise factor.

Hacking stuff into another package is just as evil as math.pi = 
42.

> Anyway, I already expressed my preference of the Perl way, over the Java
> way. For one thing, I don't want to have to register a domain just so I
> could distribute Python code <wink>

I haven't the foggiest what the "Perl way" is; I wouldn't be 
surprised if it relied on un-Pythonic sociological factors. I 
already said the Java mechanics are silly; uniqueness is what 
matters. When Python packages start selling in the four and 
five figure range <snort>, then a registry mechanism will likely 
be necessary.

- Gordon


From moshez at math.huji.ac.il  Wed Mar 29 13:21:09 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 13:21:09 +0200 (IST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString
 class?
In-Reply-To: <m12aFoy-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.GSO.4.10.10003291316170.2448-100000@sundial>

On Wed, 29 Mar 2000, Peter Funk wrote:

> > 1. I tshouldn't share implementation with UserString, otherwise your
> > algorithm are not behaving with correct big-O properties. It should
> > probably use a char-array (from the array module) as the internal
> > representation.
> 
> Hmm.... I don't understand what you mean with 'big-O properties'.  
> The internal representation of any object should be considered ...
> umm ... internal.

Yes, but
s[0] = 'a'

Should take O(1) time, not O(len(s))

> > 2. It shouldn't share interface iwth UserString, since it doesn't have a
> > proper implementation with __hash__.
> 
> What's wrong with my implementation of __hash__ raising a TypeError with
> the attribution 'unhashable object'. 

A subtype shouldn't change contracts of its supertypes. hash() was
implicitly contracted as "raising no exceptions".


--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Wed Mar 29 13:30:59 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 13:30:59 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <1257794452-30405909@hypernet.com>
Message-ID: <Pine.GSO.4.10.10003291325270.2448-100000@sundial>

On Wed, 29 Mar 2000, Gordon McMillan wrote:

> And if there's a /usr/local/lib/python1.5/text/encoding, there's 
> no way that /usr/local/lib/python1.5/site-
> packages/text/encoding will get searched.

Oh my god! I just realized you're right. Well, back to the drawing board.

> I haven't the foggiest what the "Perl way" is; I wouldn't be 
> surprised if it relied on un-Pythonic sociological factors. 

No, it relies on non-Pythonic (but not unpythonic -- simply different)
technical choices.

> I 
> already said the Java mechanics are silly; uniqueness is what 
> matters. 

As in all things namespacish ;-)

Though I suspect a registry will be needed much sooner.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From guido at python.org  Wed Mar 29 14:26:56 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Mar 2000 07:26:56 -0500
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: Your message of "Wed, 29 Mar 2000 02:21:29 EST."
             <20000329022129.A15539@thyrsus.com> 
References: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>  
            <20000329022129.A15539@thyrsus.com> 
Message-ID: <200003291226.HAA18216@eric.cnri.reston.va.us>

> Moshe Zadka <moshez at math.huji.ac.il>:
> > -- _tkinter now uses the object, rather then string, interface to Tcl.

Eric Raymond:
> Hm, does this mean that the annoying requirement to do explicit gets and
> sets to move data between the Python world and the Tcl/Tk world is gone?

Not sure what you are referring to -- this should be completely
transparant to Python/Tkinter users.  If you are thinking of the way
Tcl variables are created and manipulated in Python, no, this doesn't
change, alas (Tcl variables aren't objects -- they are manipulated
through get and set commands. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Wed Mar 29 14:32:16 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Mar 2000 07:32:16 -0500
Subject: [Python-Dev] Great Renaming? What is the goal?
In-Reply-To: Your message of "Wed, 29 Mar 2000 11:34:21 +0200."
             <38E1CE1D.7899B1BC@lemburg.com> 
References: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de> <1257835425-27941123@hypernet.com>  
            <38E1CE1D.7899B1BC@lemburg.com> 
Message-ID: <200003291232.HAA18234@eric.cnri.reston.va.us>

> > Marc-Andre grabbed "mx". If (as I rather suspect <wink>) he
> > wants to remake the entire standard lib in his image, he's
> > welcome to - *under* mx.
> 
> Right, that's the way I see it too. BTW, where can I register
> the "mx" top-level package name ? Should these be registered
> in the NIST registry ? Will the names registered there be
> honored ?

I think the NIST registry is a failed experiment -- too cumbersome to
maintain or consult.  We can do this the same way as common law
handles trade marks: if you have used it as your brand name long
enough, even if you didn't register, someone else cannot grab it away
from you.

> > What would happen if he (and everyone else) installed
> > themselves *into* my core packages, then I decided I didn't
> > want his stuff? More than likely I'd have to scrub the damn
> > installation and start all over again.
> 
> That's a no-no, IMHO. Unless explicitly allowed, packages
> should *not* install themselves as subpackages to other
> existing top-level packages. If they do, its their problem
> if the hierarchy changes...

Agreed.  Although some people seem to *want* this.  Probably because
it's okay to do that in Java and (apparently?) in Perl.  And C++,
probably.  It all probably stems back to Lisp.  I admit that I didn't
see this subtlety when I designed Python's package architecture.  It's
too late to change (e.g. because of __init__.py).  Is it a problem
though?  Let's be open-minded about this and think about whether we
want to allow this or not, and why...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Wed Mar 29 14:35:33 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Mar 2000 07:35:33 -0500
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: Your message of "Wed, 29 Mar 2000 13:21:09 +0200."
             <Pine.GSO.4.10.10003291316170.2448-100000@sundial> 
References: <Pine.GSO.4.10.10003291316170.2448-100000@sundial> 
Message-ID: <200003291235.HAA18249@eric.cnri.reston.va.us>

> > What's wrong with my implementation of __hash__ raising a TypeError with
> > the attribution 'unhashable object'. 
> 
> A subtype shouldn't change contracts of its supertypes. hash() was
> implicitly contracted as "raising no exceptions".

Let's not confuse subtypes and subclasses.  One of the things implicit
in the discussion on types-sig is that not every subclass is a
subtype!  Yes, this violates something we all learned from C++ -- but
it's a great insight.  No time to explain it more, but for me, Peter's
subclassing UserString for MutableString to borrow implementation is
fine.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pf at artcom-gmbh.de  Wed Mar 29 15:49:24 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 15:49:24 +0200 (MEST)
Subject: [Python-Dev] NIST Registry (was Great Renaming? What is the goal?)
In-Reply-To: <200003291232.HAA18234@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 29, 2000  7:32:16 am"
Message-ID: <m12aIqa-000CpwC@artcom0.artcom-gmbh.de>

Hi!

Guido van Rossum:
> I think the NIST registry is a failed experiment -- too cumbersome to
> maintain or consult.  

The WEB frontend of the NIST registry is not that bad --- if you are
even aware of the fact, that such a beast exists!

I use Python since 1994 and discovered the NIST registry incidental
a few weeks ago, when I was really looking for something about the
Win32 registry and used the search engine on www.python.org.
My first thought was: What a neat clever idea!

I think this is an example how the Python community suffers from 
poor advertising of good ideas.

> We can do this the same way as common law
> handles trade marks: if you have used it as your brand name long
> enough, even if you didn't register, someone else cannot grab it away
> from you.

Okay.  But a more formal registry wouldn't hurt.  Something like the
global module index from the current docs supplemented with all 
contribution modules which can be currently found a www.vex.net would
be a useful resource.

Regards, Peter


From moshez at math.huji.ac.il  Wed Mar 29 16:15:36 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 16:15:36 +0200 (IST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString
 class?
In-Reply-To: <200003291235.HAA18249@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003291614360.2448-100000@sundial>

On Wed, 29 Mar 2000, Guido van Rossum wrote:

> Let's not confuse subtypes and subclasses.  One of the things implicit
> in the discussion on types-sig is that not every subclass is a
> subtype!  Yes, this violates something we all learned from C++ -- but
> it's a great insight.  No time to explain it more, but for me, Peter's
> subclassing UserString for MutableString to borrow implementation is
> fine.

Oh, I agree with this. An earlier argument which got snipped in the
discussion is why it's a bad idea to borrow implementation (a totally
different argument)

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From fdrake at acm.org  Wed Mar 29 18:02:13 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 29 Mar 2000 11:02:13 -0500 (EST)
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>
References: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>
Message-ID: <14562.10501.726637.335088@seahag.cnri.reston.va.us>

Moshe Zadka writes:
 > -- filecmp.py (supersedes the old cmp.py and dircmp.py modules),
 > -- tabnanny.py (make sure the source file doesn't assume a specific tab-width)

  Weren't these in 1.5.2?  I think filecmp is documented in the
released docs... ah, no, I'm safe.  ;)

 > Please e-mail me personally if you think of any other changes, and I'll 
 > try to integrate them into a complete "changes" document.

  The documentation is updated.  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From skip at mojam.com  Wed Mar 29 18:57:51 2000
From: skip at mojam.com (Skip Montanaro)
Date: Wed, 29 Mar 2000 10:57:51 -0600
Subject: [Python-Dev] CVS woes...
Message-ID: <200003291657.KAA22177@beluga.mojam.com>

Does anyone else besides me have trouble getting their Python tree to sync
with the CVS repository?  I've tried all manner of flags to "cvs update",
most recently "cvs update -d -A ." with no success.  There are still some
files I know Fred Drake has patched that show up as different and it refuses 
to pick up Lib/robotparser.py.

I'm going to blast my current tree and start anew after saving one or two
necessary files.  Any thoughts you might have would be much appreciated.

(Private emails please, unless for some reason you think this should be a
python-dev topic.  I only post here because I suspect most of the readers
use CVS to keep in frequent sync and may have some insight.)

Thx,

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From moshez at math.huji.ac.il  Wed Mar 29 19:06:59 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 19:06:59 +0200 (IST)
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: <14562.10501.726637.335088@seahag.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003291905430.11398-100000@sundial>

On Wed, 29 Mar 2000, Fred L. Drake, Jr. wrote:

> 
> Moshe Zadka writes:
>  > -- filecmp.py (supersedes the old cmp.py and dircmp.py modules),
>  > -- tabnanny.py (make sure the source file doesn't assume a specific tab-width)
> 
>   Weren't these in 1.5.2?  I think filecmp is documented in the
> released docs... ah, no, I'm safe.  ;)

Tabnanny wasn't a module, and filecmp wasn't at all.

>   The documentation is updated.  ;)

Yes, but it was released as a late part of 1.5.2.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From effbot at telia.com  Wed Mar 29 18:38:00 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Wed, 29 Mar 2000 18:38:00 +0200
Subject: [Python-Dev] CVS woes...
References: <200003291657.KAA22177@beluga.mojam.com>
Message-ID: <01b701bf999d$267b6740$34aab5d4@hagrid>

Skip wrote:
> Does anyone else besides me have trouble getting their Python tree to sync
> with the CVS repository?  I've tried all manner of flags to "cvs update",
> most recently "cvs update -d -A ." with no success.  There are still some
> files I know Fred Drake has patched that show up as different and it refuses 
> to pick up Lib/robotparser.py.

note that robotparser doesn't show up on cvs.python.org
either.  maybe cnri's cvs admins should look into this...

</F>


From fdrake at acm.org  Wed Mar 29 20:20:14 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 29 Mar 2000 13:20:14 -0500 (EST)
Subject: [Python-Dev] CVS woes...
In-Reply-To: <200003291657.KAA22177@beluga.mojam.com>
References: <200003291657.KAA22177@beluga.mojam.com>
Message-ID: <14562.18782.465814.696099@seahag.cnri.reston.va.us>

Skip Montanaro writes:
 > most recently "cvs update -d -A ." with no success.  There are still some
 > files I know Fred Drake has patched that show up as different and it refuses

  You should be aware that many of the more recent documentation
patches have been in the 1.5.2p2 branch (release-1.5.2p1-patches, I
think), rather than the development head.  I'm hoping to begin the
merge in the next week.
  I also have a few patches that I haven't had time to look at yet,
and I'm not inclined to make any changes until I've merged the 1.5.2p2
docs with the 1.6 tree, mostly to keep the merge from being any more
painful than I already expect it to be.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From bwarsaw at cnri.reston.va.us  Wed Mar 29 20:22:57 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Wed, 29 Mar 2000 13:22:57 -0500 (EST)
Subject: [Python-Dev] CVS woes...
References: <200003291657.KAA22177@beluga.mojam.com>
	<01b701bf999d$267b6740$34aab5d4@hagrid>
Message-ID: <14562.18945.407398.812930@anthem.cnri.reston.va.us>

>>>>> "FL" == Fredrik Lundh <effbot at telia.com> writes:

    FL> note that robotparser doesn't show up on cvs.python.org
    FL> either.  maybe cnri's cvs admins should look into this...

I've just resync'd python/dist and am doing a fresh checkout now.
Looks like Lib/robotparser.py is there now.

-Barry


From guido at python.org  Wed Mar 29 20:23:38 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Mar 2000 13:23:38 -0500
Subject: [Python-Dev] CVS woes...
In-Reply-To: Your message of "Wed, 29 Mar 2000 10:57:51 CST."
             <200003291657.KAA22177@beluga.mojam.com> 
References: <200003291657.KAA22177@beluga.mojam.com> 
Message-ID: <200003291823.NAA20134@eric.cnri.reston.va.us>

> Does anyone else besides me have trouble getting their Python tree to sync
> with the CVS repository?  I've tried all manner of flags to "cvs update",
> most recently "cvs update -d -A ." with no success.  There are still some
> files I know Fred Drake has patched that show up as different and it refuses 
> to pick up Lib/robotparser.py.

My bad.  When I move or copy a file around in the CVS repository
directly instead of using cvs commit, I have to manually call a script
that updates the mirror.  I've done that now, and robotparser.py
should now be in the mirror.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward at cnri.reston.va.us  Wed Mar 29 21:06:14 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Wed, 29 Mar 2000 14:06:14 -0500
Subject: [Python-Dev] Distutils now in Python CVS tree
Message-ID: <20000329140613.A5850@cnri.reston.va.us>

Hi all --

Distutils is now available through the Python CVS tree *in addition to
its own CVS tree*.  That is, if you keep on top of developments in the
Python CVS tree, then you will be tracking the latest Distutils code in
Lib/distutils.  Or, you can keep following the Distutils through its own
CVS tree.  (This is all done through one itty-bitty little symlink in
the CNRI CVS repository, and It Just Works.  Cool.)

Note that only the 'distutils' subdirectory of the distutils
distribution is tracked by Python: that is, changes to the
documentation, test suites, and example setup scripts are *not*
reflected in the Python CVS tree.

If you follow neither Python nor Distutils CVS updates, this doesn't
affect you.

If you've been following Distutils CVS updates, you can continue to do so
as you've always done (and as is documented on the Distutils "Anonymous
CVS" web page).

If you've been following Python CVS updates, then you are now following
most Distutils CVS updates too -- as long as you do "cvs update -d", of
course.  If you're interested in following updates in the Distutils
documentation, tests, examples, etc. then you should follow the
Distutils CVS tree directly.

If you've been following *both* Python and Distutils CVS updates, and
hacking on the Distutils, then you should pick one or the other as your
working directory.  If you submit patches, it doesn't really matter if
they're relative to the top of the Python tree, the top of the Distutils
tree, or what -- I'll probably figure it out.  However, it's probably
best to continue sending Distutils patches to distutils-sig at python.org,
*or* direct to me (gward at python.net) for trivial patches.  Unless Guido
says otherwise, I don't see a compelling reason to send Distutils
patches to patches at python.org.

In related news, the distutils-checkins list is probably going to go
away, and all Distutils checkin messages will go python-checkins
instead.  Let me know if you avidly follow distutils-checkins, but do
*not* want to follow python-checkins -- if lots of people respond
(doubtful, as distutils-checkins only had 3 subscribers last I
checked!), we'll reconsider.

        Greg


From fdrake at acm.org  Wed Mar 29 21:28:19 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 29 Mar 2000 14:28:19 -0500 (EST)
Subject: [Python-Dev] Re: [Distutils] Distutils now in Python CVS tree
In-Reply-To: <20000329140525.A5842@cnri.reston.va.us>
References: <20000329140525.A5842@cnri.reston.va.us>
Message-ID: <14562.22867.998809.897214@seahag.cnri.reston.va.us>

Greg Ward writes:
 > Distutils is now available through the Python CVS tree *in addition to
 > its own CVS tree*.  That is, if you keep on top of developments in the
 > Python CVS tree, then you will be tracking the latest Distutils code in
 > Lib/distutils.  Or, you can keep following the Distutils through its own
 > CVS tree.  (This is all done through one itty-bitty little symlink in
 > the CNRI CVS repository, and It Just Works.  Cool.)

Greg,
  You may want to point out the legalese requirements for patches to
the Python tree.  ;(  That means the patches should probably go to
patches at python.org or you should ensure an archive of all the legal
statements is maintained at CNRI.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From ping at lfw.org  Wed Mar 29 23:44:31 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Wed, 29 Mar 2000 15:44:31 -0600 (CST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <02c901bf989b$be203d80$34aab5d4@hagrid>
Message-ID: <Pine.LNX.4.10.10003291539340.7351-100000@server1.lfw.org>

On Tue, 28 Mar 2000, Fredrik Lundh wrote:
> 
> > IMO this subdivision could be discussed and possibly revised.  
> 
> here's one proposal:
> http://www.pythonware.com/people/fredrik/librarybook-contents.htm

Wow.  I don't think i hardly ever use any of the modules in your
"Commonly Used Modules" category.  Except traceback, from time to
time, but that's really the only one!

Hmm.  I'd arrange things a little differently, though i do like
the category for Data Representation (it should probably go next
to Data Storage though).  I would prefer a separate group for
interpreter-and-development-related things.  The "File Formats"
group seems weak... to me, its contents would better belong in a
"parsing" or "text processing" classification.

urlparse definitely goes with urllib.

These comments are kind of random, i know... maybe i'll try
putting together another grouping if i have any time.


-- ?!ng


From adustman at comstar.net  Thu Mar 30 02:57:06 2000
From: adustman at comstar.net (Andy Dustman)
Date: Wed, 29 Mar 2000 19:57:06 -0500 (EST)
Subject: [Python-Dev] socketmodule with SSL enabled
In-Reply-To: <200003290150.UAA17819@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003291952110.20418-100000@kenny.comstar.net>

I had to make the following one-line change to socketmodule.c so that it
would link properly with openssl-0.9.4. In studying the openssl include
files, I found:

#define SSLeay_add_ssl_algorithms()   SSL_library_init()

SSL_library_init() seems to be the "correct" call nowadays. I don't know
why this isn't being picked up. I also don't know how well the module
works, other than it imports, but I sure would like to try it with
Zope/ZServer/Medusa...

-- 
andy dustman       |     programmer/analyst     |      comstar.net, inc.
telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d
"Therefore, sweet knights, if you may doubt your strength or courage, 
come no further, for death awaits you all, with nasty, big, pointy teeth!"

Index: socketmodule.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Modules/socketmodule.c,v
retrieving revision 1.98
diff -c -r1.98 socketmodule.c
*** socketmodule.c      2000/03/24 20:56:56     1.98
--- socketmodule.c      2000/03/30 00:49:09
***************
*** 2384,2390 ****
                return;
  #ifdef USE_SSL
        SSL_load_error_strings();
!       SSLeay_add_ssl_algorithms();
        SSLErrorObject = PyErr_NewException("socket.sslerror", NULL, NULL);
        if (SSLErrorObject == NULL)
                return;
--- 2384,2390 ----
                return;
  #ifdef USE_SSL
        SSL_load_error_strings();
!       SSL_library_init();
        SSLErrorObject = PyErr_NewException("socket.sslerror", NULL, NULL);
        if (SSLErrorObject == NULL)
                return;


From gstein at lyra.org  Thu Mar 30 04:54:27 2000
From: gstein at lyra.org (Greg Stein)
Date: Wed, 29 Mar 2000 18:54:27 -0800 (PST)
Subject: [Python-Dev] installation points (was: Great Renaming?  What is the goal?)
In-Reply-To: <1257794452-30405909@hypernet.com>
Message-ID: <Pine.LNX.4.10.10003291832350.8823-100000@nebula.lyra.org>

On Wed, 29 Mar 2000, Gordon McMillan wrote:
> Moshe Zadka wrote:
> > On Tue, 28 Mar 2000, Gordon McMillan wrote:
> > > What would happen if he (and everyone else) installed 
> > > themselves *into* my core packages, then I decided I didn't 
> > > want his stuff? More than likely I'd have to scrub the damn 
> > > installation and start all over again.
> > 
> > I think Greg Stein answered that objection, by reminding us that the
> > filesystem isn't the only way to set up a package hierarchy.
> 
> You mean when Greg said:
> >Assuming that you use an archive like those found in my "small" distro or
> > Gordon's distro, then this is no problem. The archive simply recognizes
> > and maps "text.encoding.macbinary" to its own module.
> 
> I don't know what this has to do with it. When we get around 
> to the 'macbinary' part, we have already established that 
> 'text.encoding' is the parent which should supply 'macbinary'.

good point...

> >  In
> > particular, even with Python's current module system, there is no need to
> > scrub installations: Python core modules go (under UNIX) in
> > /usr/local/lib/python1.5, and 3rd party modules go in
> > /usr/local/lib/python1.5/site-packages. 
> 
> And if there's a /usr/local/lib/python1.5/text/encoding, there's 
> no way that /usr/local/lib/python1.5/site-
> packages/text/encoding will get searched.
> 
> I believe you could hack up an importer that did allow this, and 
> I think you'd be 100% certifiable if you did. Just look at the 
> surprise factor.
> 
> Hacking stuff into another package is just as evil as math.pi = 
> 42.

Not if the package was designed for it. For a "package" like "net", it
would be perfectly acceptable to allow third-parties to define that as
their installation point.

And yes, assume there is an importer that looks into the installed
archives for modules. In the example, the harder part is determining where
the "text.encoding" package is loaded from. And yah: it may be difficult
to arrange the the text.encoding's importer to allow for archive
searching.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From thomas.heller at ion-tof.com  Thu Mar 30 21:30:25 2000
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Thu, 30 Mar 2000 21:30:25 +0200
Subject: [Python-Dev] Metaclasses, customizing attribute access for classes
Message-ID: <021c01bf9a7e$662327c0$4500a8c0@thomasnotebook>

Dear Python-developers,

Recently I played with metaclasses from within python,
also with Jim Fulton's ExtensionClass.
I even tried to write my own metaclass in a C-extension, using the
famous Don Beaudry hook.
It seems that ExtensionClass does not completely what I want.
Metaclasses implemented in python are somewhat slow,
also writing them is a lot of work.
Writing a metaclass in C is even more work...

Well, what do I want?

Often, I use the following pattern:
class X:
    def __init__ (self):
        self.delegate = anObjectImplementedInC(...)

    def __getattr__ (self, key):
        return self.delegate.dosomething(key)

    def __setattr__ (self, key, value):
        self.delegate.doanotherthing(key, value)

    def __delattr__ (self, key):
        self.delegate.doevenmore(key)

This is too slow (for me).
So what I would like do to is:

class X:
    def __init__ (self):
        self.__dict__ = aMappingObject(...)

and now aMappingObject will automatically receive
all the setattr, getattr, and delattr calls.

The *only* thing which is required for this is to remove
the restriction that the __dict__ attribute must be a dictionary.
This is only a small change to classobject.c (which unfortunately I
have only implemented for 1.5.2, not for the CVS version).
The performance impact for this change is unnoticable in pystone.

What do you think?
Should I prepare a patch?
Any chance that this can be included in a future python version?

Thomas Heller


From petrilli at amber.org  Thu Mar 30 21:52:02 2000
From: petrilli at amber.org (Christopher Petrilli)
Date: Thu, 30 Mar 2000 14:52:02 -0500
Subject: [Python-Dev] Unicode compile
Message-ID: <20000330145202.B9078@trump.amber.org>

I don't know how much memory other people have in their machiens, but
in this machine (128Mb), I get the following trying to compile a CVS
checkout of Python:

gcc  -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c
./unicodedatabase.c:53482: virtual memory exhausted

I hope that this is a temporary thing, or we ship the database some
other manner, but I would argue that you should be able to compile
Python on a machine with 32Mb of RAM at MOST.... for an idea of how
much VM this machine has, i have 256Mb of SWAP on top of it.

Chris
-- 
| Christopher Petrilli
| petrilli at amber.org


From guido at python.org  Thu Mar 30 22:12:22 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 15:12:22 -0500
Subject: [Python-Dev] Unicode compile
In-Reply-To: Your message of "Thu, 30 Mar 2000 14:52:02 EST."
             <20000330145202.B9078@trump.amber.org> 
References: <20000330145202.B9078@trump.amber.org> 
Message-ID: <200003302012.PAA22062@eric.cnri.reston.va.us>

> I don't know how much memory other people have in their machiens, but
> in this machine (128Mb), I get the following trying to compile a CVS
> checkout of Python:
> 
> gcc  -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c
> ./unicodedatabase.c:53482: virtual memory exhausted
> 
> I hope that this is a temporary thing, or we ship the database some
> other manner, but I would argue that you should be able to compile
> Python on a machine with 32Mb of RAM at MOST.... for an idea of how
> much VM this machine has, i have 256Mb of SWAP on top of it.

I'm not sure how to fix this, short of reading the main database from
a file.  Marc-Andre?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tismer at tismer.com  Thu Mar 30 22:14:55 2000
From: tismer at tismer.com (Christian Tismer)
Date: Thu, 30 Mar 2000 22:14:55 +0200
Subject: [Python-Dev] Unicode compile
References: <20000330145202.B9078@trump.amber.org>
Message-ID: <38E3B5BF.2D00F930@tismer.com>


Christopher Petrilli wrote:
> 
> I don't know how much memory other people have in their machiens, but
> in this machine (128Mb), I get the following trying to compile a CVS
> checkout of Python:
> 
> gcc  -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c
> ./unicodedatabase.c:53482: virtual memory exhausted
> 
> I hope that this is a temporary thing, or we ship the database some
> other manner, but I would argue that you should be able to compile
> Python on a machine with 32Mb of RAM at MOST.... for an idea of how
> much VM this machine has, i have 256Mb of SWAP on top of it.

I had similar effects, what made me work on a compressed database
(see older messages). Due to time limits, I will not get ready
before 1.6.a1 is out. And then quite a lot of other changes
will be necessary by Marc, since the API changes quite much.
But it will definately be a less than 20 KB module, proven.

ciao - chris(2)

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From akuchlin at mems-exchange.org  Thu Mar 30 22:14:27 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 15:14:27 -0500 (EST)
Subject: [Python-Dev] Unicode compile
In-Reply-To: <200003302012.PAA22062@eric.cnri.reston.va.us>
References: <20000330145202.B9078@trump.amber.org>
	<200003302012.PAA22062@eric.cnri.reston.va.us>
Message-ID: <14563.46499.555853.413690@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>I'm not sure how to fix this, short of reading the main database from
>a file.  Marc-Andre?

Turning off optimization may help.  (Or it may not -- it might be
creating the data structures for a large static table that's the
problem.)

--amk


From akuchlin at mems-exchange.org  Thu Mar 30 22:22:02 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 15:22:02 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <200003282000.PAA11988@eric.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
	<200003282000.PAA11988@eric.cnri.reston.va.us>
Message-ID: <14563.46954.70800.706245@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>I don't know enough about this, but it seems that there might be two
>steps: *creating* a mmap object is necessarily platform-specific; but
>*using* a mmap object could be platform-neutral.
>
>What is the API for mmap objects?

You create them; Unix wants a file descriptor, and Windows wants a
filename.  Then they behave like buffer objects, like mutable strings.

I like Fredrik's suggestion of an 'open(filename, mode, ...)' type of
interface.  If someone can suggest a way to handle the extra flags
such as MAP_SHARED and the Windows tag argument, I'll happily
implement it.  Maybe just keyword arguments that differ across
platforms?  open(filename, mode, [tag = 'foo',] [flags =
mmapfile.MAP_SHARED]).  We could preserve the ability to mmap() only a
file descriptor on Unix through a separate openfd() function.  I'm
also strongly tempted to rename the module from mmapfile to just
'mmap'.

I'd suggest waiting until the interface is finalized before adding the
module to the CVS tree -- which means after 1.6a1 -- but I can add the
module as it stands if you like.  Guido, let me know if you want me to
do that.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
A Puck is harder by far to hurt than some little lord of malice from the lands
of ice and snow. We Pucks are old and hard and wild...
  -- Robin Goodfellow, in SANDMAN #66: "The Kindly Ones:10"


From guido at python.org  Thu Mar 30 22:23:42 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 15:23:42 -0500
Subject: [Python-Dev] socketmodule with SSL enabled
In-Reply-To: Your message of "Wed, 29 Mar 2000 19:57:06 EST."
             <Pine.LNX.4.10.10003291952110.20418-100000@kenny.comstar.net> 
References: <Pine.LNX.4.10.10003291952110.20418-100000@kenny.comstar.net> 
Message-ID: <200003302023.PAA22350@eric.cnri.reston.va.us>

> I had to make the following one-line change to socketmodule.c so that it
> would link properly with openssl-0.9.4. In studying the openssl include
> files, I found:
> 
> #define SSLeay_add_ssl_algorithms()   SSL_library_init()
> 
> SSL_library_init() seems to be the "correct" call nowadays. I don't know
> why this isn't being picked up. I also don't know how well the module
> works, other than it imports, but I sure would like to try it with
> Zope/ZServer/Medusa...

Strange -- the version of OpenSSL I have also calls itself 0.9.4
("OpenSSL 0.9.4 09 Aug 1999" to be precise) and doesn't have
SSL_library_init().

I wonder what gives...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Thu Mar 30 22:25:58 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 15:25:58 -0500
Subject: [Python-Dev] mmapfile module
In-Reply-To: Your message of "Thu, 30 Mar 2000 15:22:02 EST."
             <14563.46954.70800.706245@amarok.cnri.reston.va.us> 
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us>  
            <14563.46954.70800.706245@amarok.cnri.reston.va.us> 
Message-ID: <200003302025.PAA22367@eric.cnri.reston.va.us>

> Guido van Rossum writes:
> >I don't know enough about this, but it seems that there might be two
> >steps: *creating* a mmap object is necessarily platform-specific; but
> >*using* a mmap object could be platform-neutral.
> >
> >What is the API for mmap objects?

[AMK]
> You create them; Unix wants a file descriptor, and Windows wants a
> filename.  Then they behave like buffer objects, like mutable strings.
> 
> I like Fredrik's suggestion of an 'open(filename, mode, ...)' type of
> interface.  If someone can suggest a way to handle the extra flags
> such as MAP_SHARED and the Windows tag argument, I'll happily
> implement it.  Maybe just keyword arguments that differ across
> platforms?  open(filename, mode, [tag = 'foo',] [flags =
> mmapfile.MAP_SHARED]).  We could preserve the ability to mmap() only a
> file descriptor on Unix through a separate openfd() function.

Yes, keyword args seem to be the way to go.  To avoid an extra
function you could add a fileno=... kwarg, in which case the filename
is ignored or required to be "".

> I'm
> also strongly tempted to rename the module from mmapfile to just
> 'mmap'.

Sure.

> I'd suggest waiting until the interface is finalized before adding the
> module to the CVS tree -- which means after 1.6a1 -- but I can add the
> module as it stands if you like.  Guido, let me know if you want me to
> do that.

Might as well check it in -- the alpha is going to be rough and I
expect another alpha to come out shortly to correct the biggest
problems.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Thu Mar 30 22:22:08 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 30 Mar 2000 22:22:08 +0200
Subject: [Python-Dev] Unicode compile
References: <20000330145202.B9078@trump.amber.org> <200003302012.PAA22062@eric.cnri.reston.va.us>
Message-ID: <38E3B770.6CD61C37@lemburg.com>

Guido van Rossum wrote:
> 
> > I don't know how much memory other people have in their machiens, but
> > in this machine (128Mb), I get the following trying to compile a CVS
> > checkout of Python:
> >
> > gcc  -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c
> > ./unicodedatabase.c:53482: virtual memory exhausted
> >
> > I hope that this is a temporary thing, or we ship the database some
> > other manner, but I would argue that you should be able to compile
> > Python on a machine with 32Mb of RAM at MOST.... for an idea of how
> > much VM this machine has, i have 256Mb of SWAP on top of it.
> 
> I'm not sure how to fix this, short of reading the main database from
> a file.  Marc-Andre?

Hmm, the file compiles fine on my 64MB Linux machine with about 100MB 
of swap. What gcc version do you use ?

Anyway, once Christian is ready with his compact
replacement I think we no longer have to worry about that
chunk of static data :-)

Reading in the data from a file is not a very good solution,
because it would override the OS optimizations for static
data in object files (like e.g. swapping in only those pages
which are really needed, etc.).

An alternative solution would be breaking the large
table into several smaller ones and accessing it via
a redirection function.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From adustman at comstar.net  Thu Mar 30 23:12:51 2000
From: adustman at comstar.net (Andy Dustman)
Date: Thu, 30 Mar 2000 16:12:51 -0500 (EST)
Subject: [Python-Dev] socketmodule with SSL enabled
In-Reply-To: <200003302023.PAA22350@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003301611430.32616-100000@kenny.comstar.net>

On Thu, 30 Mar 2000, Guido van Rossum wrote:

> > I had to make the following one-line change to socketmodule.c so that it
> > would link properly with openssl-0.9.4. In studying the openssl include
> > files, I found:
> > 
> > #define SSLeay_add_ssl_algorithms()   SSL_library_init()
> > 
> > SSL_library_init() seems to be the "correct" call nowadays. I don't know
> > why this isn't being picked up. I also don't know how well the module
> > works, other than it imports, but I sure would like to try it with
> > Zope/ZServer/Medusa...
> 
> Strange -- the version of OpenSSL I have also calls itself 0.9.4
> ("OpenSSL 0.9.4 09 Aug 1999" to be precise) and doesn't have
> SSL_library_init().
> 
> I wonder what gives...

I don't know. Right after I made the patch, I found that 0.9.5 is
available, and I was able to successfully compile against that version
(with the patch). 

-- 
andy dustman       |     programmer/analyst     |      comstar.net, inc.
telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d
"Therefore, sweet knights, if you may doubt your strength or courage, 
come no further, for death awaits you all, with nasty, big, pointy teeth!"


From akuchlin at mems-exchange.org  Thu Mar 30 23:19:45 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 16:19:45 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <200003302025.PAA22367@eric.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
	<200003282000.PAA11988@eric.cnri.reston.va.us>
	<14563.46954.70800.706245@amarok.cnri.reston.va.us>
	<200003302025.PAA22367@eric.cnri.reston.va.us>
Message-ID: <14563.50417.909045.81868@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>Might as well check it in -- the alpha is going to be rough and I
>expect another alpha to come out shortly to correct the biggest
>problems.

Done -- just doing my bit to ensure the first alpha is rough! :)

My next task is to add the Expat module.  My understanding is that
it's OK to add Expat itself, too; where should I put all that code?
Modules/expat/* ?

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
I'll bring the Kindly Ones down on his blasted head.
  -- Desire, in SANDMAN #31: "Three Septembers and a January"


From fdrake at acm.org  Thu Mar 30 23:29:58 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 30 Mar 2000 16:29:58 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <14563.50417.909045.81868@amarok.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
	<200003282000.PAA11988@eric.cnri.reston.va.us>
	<14563.46954.70800.706245@amarok.cnri.reston.va.us>
	<200003302025.PAA22367@eric.cnri.reston.va.us>
	<14563.50417.909045.81868@amarok.cnri.reston.va.us>
Message-ID: <14563.51030.24773.587972@seahag.cnri.reston.va.us>

Andrew M. Kuchling writes:
 > Done -- just doing my bit to ensure the first alpha is rough! :)
 > 
 > My next task is to add the Expat module.  My understanding is that
 > it's OK to add Expat itself, too; where should I put all that code?
 > Modules/expat/* ?

  Do you have documentation for this?


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From akuchlin at mems-exchange.org  Thu Mar 30 23:30:35 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 16:30:35 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <14563.51030.24773.587972@seahag.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
	<200003282000.PAA11988@eric.cnri.reston.va.us>
	<14563.46954.70800.706245@amarok.cnri.reston.va.us>
	<200003302025.PAA22367@eric.cnri.reston.va.us>
	<14563.50417.909045.81868@amarok.cnri.reston.va.us>
	<14563.51030.24773.587972@seahag.cnri.reston.va.us>
Message-ID: <14563.51067.560938.367690@amarok.cnri.reston.va.us>

Fred L. Drake, Jr. writes:
>  Do you have documentation for this?

Somewhere at home, I think, but not here at work.  I'll try to get it
checked in before 1.6alpha1, but don't hold me to that.

--amk


From guido at python.org  Thu Mar 30 23:31:58 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 16:31:58 -0500
Subject: [Python-Dev] mmapfile module
In-Reply-To: Your message of "Thu, 30 Mar 2000 16:19:45 EST."
             <14563.50417.909045.81868@amarok.cnri.reston.va.us> 
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us>  
            <14563.50417.909045.81868@amarok.cnri.reston.va.us> 
Message-ID: <200003302131.QAA22897@eric.cnri.reston.va.us>

> Done -- just doing my bit to ensure the first alpha is rough! :)

When the going gets rough, the rough get going :-)

> My next task is to add the Expat module.  My understanding is that
> it's OK to add Expat itself, too; where should I put all that code?
> Modules/expat/* ?

Whoa...  Not sure.  This will give issues with Patrice, at least (even
if it is pure Open Source -- given the size).  I'd prefer to add
instructions to Setup.in about where to get it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at acm.org  Thu Mar 30 23:34:55 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 30 Mar 2000 16:34:55 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <14563.51067.560938.367690@amarok.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
	<200003282000.PAA11988@eric.cnri.reston.va.us>
	<14563.46954.70800.706245@amarok.cnri.reston.va.us>
	<200003302025.PAA22367@eric.cnri.reston.va.us>
	<14563.50417.909045.81868@amarok.cnri.reston.va.us>
	<14563.51030.24773.587972@seahag.cnri.reston.va.us>
	<14563.51067.560938.367690@amarok.cnri.reston.va.us>
Message-ID: <14563.51327.190466.477566@seahag.cnri.reston.va.us>

Andrew M. Kuchling writes:
 > Somewhere at home, I think, but not here at work.  I'll try to get it
 > checked in before 1.6alpha1, but don't hold me to that.

  The date isn't important; I'm not planning to match alpha/beta
releases with Doc releases.  I just want to be sure it gets in soon so
that the debugging process can kick in for that as well.  ;)
  Thanks!


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido at python.org  Thu Mar 30 23:34:02 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 16:34:02 -0500
Subject: [Python-Dev] mmapfile module
In-Reply-To: Your message of "Thu, 30 Mar 2000 16:31:58 EST."
             <200003302131.QAA22897@eric.cnri.reston.va.us> 
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us>  
            <200003302131.QAA22897@eric.cnri.reston.va.us> 
Message-ID: <200003302134.QAA22939@eric.cnri.reston.va.us>

> Whoa...  Not sure.  This will give issues with Patrice, at least (even
> if it is pure Open Source -- given the size).

For those outside CNRI -- Patrice is CNRI's tough IP lawyer.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From akuchlin at mems-exchange.org  Thu Mar 30 23:48:13 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 16:48:13 -0500 (EST)
Subject: [Python-Dev] Expat module
In-Reply-To: <200003302131.QAA22897@eric.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
	<200003282000.PAA11988@eric.cnri.reston.va.us>
	<14563.46954.70800.706245@amarok.cnri.reston.va.us>
	<200003302025.PAA22367@eric.cnri.reston.va.us>
	<14563.50417.909045.81868@amarok.cnri.reston.va.us>
	<200003302131.QAA22897@eric.cnri.reston.va.us>
Message-ID: <14563.52125.401817.986919@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>> My next task is to add the Expat module.  My understanding is that
>> it's OK to add Expat itself, too; where should I put all that code?
>> Modules/expat/* ?
>
>Whoa...  Not sure.  This will give issues with Patrice, at least (even
>if it is pure Open Source -- given the size).  I'd prefer to add
>instructions to Setup.in about where to get it.

Fair enough; I'll just add the module itself, then, and we can always
change it later.  

Should we consider replacing the makesetup/Setup.in mechanism with a
setup.py script that uses the Distutils?  You'd have to compile a
minipython with just enough critical modules -- strop and posixmodule
are probably the most important ones -- in order to run setup.py.
It's something I'd like to look at for 1.6, because then you could be
much smarter in automatically enabling modules.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
This is the way of Haskell or Design by Contract of Eiffel. This one is like
wearing a XV century armor, you walk very safely but in a very tiring way.
  -- Manuel Gutierrez Algaba, 26 Jan 2000


From guido at python.org  Fri Mar 31 00:41:45 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 17:41:45 -0500
Subject: [Python-Dev] Expat module
In-Reply-To: Your message of "Thu, 30 Mar 2000 16:48:13 EST."
             <14563.52125.401817.986919@amarok.cnri.reston.va.us> 
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us>  
            <14563.52125.401817.986919@amarok.cnri.reston.va.us> 
Message-ID: <200003302241.RAA23050@eric.cnri.reston.va.us>

> Fair enough; I'll just add the module itself, then, and we can always
> change it later.  

OK.

> Should we consider replacing the makesetup/Setup.in mechanism with a
> setup.py script that uses the Distutils?  You'd have to compile a
> minipython with just enough critical modules -- strop and posixmodule
> are probably the most important ones -- in order to run setup.py.
> It's something I'd like to look at for 1.6, because then you could be
> much smarter in automatically enabling modules.

If you can come up with something that works well enough, that would
be great.  (Although I'm not sure where the distutils come in.)

We still need to use configure/autoconf though.

Hardcoding a small complement of modules is no problem.  (Why do you
think you need strop though?  Remember we have string methods!)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond at skippinet.com.au  Fri Mar 31 01:03:39 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri, 31 Mar 2000 09:03:39 +1000
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/PC python_nt.rc,1.8,1.9
In-Reply-To: <200003302259.RAA23266@eric.cnri.reston.va.us>
Message-ID: <ECEPKNMJLHAPFFJHDOJBAEJICHAA.mhammond@skippinet.com.au>

This is the version number as displayed by Windows Explorer in the
"properties" dialog.

Mark.

> Modified Files:
> 	python_nt.rc
> Log Message:
> Seems there was a version string here that still looked
> like 1.5.2.
>
>
> Index: python_nt.rc
> ==========================================================
> =========
> RCS file: /projects/cvsroot/python/dist/src/PC/python_nt.rc,v
> retrieving revision 1.8
> retrieving revision 1.9
> diff -C2 -r1.8 -r1.9
> *** python_nt.rc	2000/03/29 01:50:50	1.8
> --- python_nt.rc	2000/03/30 22:59:09	1.9
> ***************
> *** 29,34 ****
>
>   VS_VERSION_INFO VERSIONINFO
> !  FILEVERSION 1,5,2,3
> !  PRODUCTVERSION 1,5,2,3
>    FILEFLAGSMASK 0x3fL
>   #ifdef _DEBUG
> --- 29,34 ----
>
>   VS_VERSION_INFO VERSIONINFO
> !  FILEVERSION 1,6,0,0
> !  PRODUCTVERSION 1,6,0,0
>    FILEFLAGSMASK 0x3fL
>   #ifdef _DEBUG
>
>
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at python.org
> http://www.python.org/mailman/listinfo/python-checkins
>


From effbot at telia.com  Fri Mar 31 00:40:51 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 00:40:51 +0200
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
Message-ID: <00b701bf9a99$022339c0$34aab5d4@hagrid>

at this time, SRE uses types instead of classes for compiled
patterns and matches.  these classes provide a documented
interface, and a bunch of internal attributes, for example:

RegexObjects:

    code -- a PCRE code object
    pattern -- the source pattern
    groupindex -- maps group names to group indices

MatchObjects:

    regs -- same as match.span()?
    groupindex -- as above
    re -- the pattern object used for this match
    string -- the target string used for this match

the problem is that some other modules use these attributes
directly.  for example, xmllib.py uses the pattern attribute, and
other code I've seen uses regs to speed things up.

in SRE, I would like to get rid of all these (except possibly for
the match.string attribute).

opinions?

</F>


From guido at python.org  Fri Mar 31 01:31:43 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 18:31:43 -0500
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
In-Reply-To: Your message of "Fri, 31 Mar 2000 00:40:51 +0200."
             <00b701bf9a99$022339c0$34aab5d4@hagrid> 
References: <00b701bf9a99$022339c0$34aab5d4@hagrid> 
Message-ID: <200003302331.SAA24895@eric.cnri.reston.va.us>

> at this time, SRE uses types instead of classes for compiled
> patterns and matches.  these classes provide a documented
> interface, and a bunch of internal attributes, for example:
> 
> RegexObjects:
> 
>     code -- a PCRE code object
>     pattern -- the source pattern
>     groupindex -- maps group names to group indices
> 
> MatchObjects:
> 
>     regs -- same as match.span()?
>     groupindex -- as above
>     re -- the pattern object used for this match
>     string -- the target string used for this match
> 
> the problem is that some other modules use these attributes
> directly.  for example, xmllib.py uses the pattern attribute, and
> other code I've seen uses regs to speed things up.
> 
> in SRE, I would like to get rid of all these (except possibly for
> the match.string attribute).
> 
> opinions?

Sounds reasonable.  All std lib modules that violate this will need to
be fixed once sre.py replaces re.py.

(Checkin of sre is next.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From akuchlin at mems-exchange.org  Fri Mar 31 01:40:16 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 18:40:16 -0500 (EST)
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
In-Reply-To: <00b701bf9a99$022339c0$34aab5d4@hagrid>
References: <00b701bf9a99$022339c0$34aab5d4@hagrid>
Message-ID: <14563.58848.109072.339060@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>RegexObjects:
>    code -- a PCRE code object
>    pattern -- the source pattern
>    groupindex -- maps group names to group indices

pattern and groupindex are documented in the Library Reference, and
they're part of the public interface.  .code is not, so you can drop
it.

>MatchObjects:
>    regs -- same as match.span()?
>    groupindex -- as above
>    re -- the pattern object used for this match
>    string -- the target string used for this match

.re and .string are documented. I don't see a reference to
MatchObject.groupindex anywhere, and .regs isn't documented, so those
two can be ignored; xmllib or whatever external modules use them are
being very naughty, so go ahead and break them.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Imagine a thousand thousand fireflies of every shape and color; Oh, that was
Baghdad at night in those days.
  -- From SANDMAN #50: "Ramadan"


From effbot at telia.com  Fri Mar 31 01:05:15 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 01:05:15 +0200
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
References: <00b701bf9a99$022339c0$34aab5d4@hagrid> <14563.58848.109072.339060@amarok.cnri.reston.va.us>
Message-ID: <00e901bf9a9c$6c036240$34aab5d4@hagrid>

Andrew wrote:
> >RegexObjects:
> >    code -- a PCRE code object
> >    pattern -- the source pattern
> >    groupindex -- maps group names to group indices
> 
> pattern and groupindex are documented in the Library Reference, and
> they're part of the public interface.

hmm.  I could have sworn...   guess I didn't look carefully
enough (or someone's used his time machine again :-).

oh well, more bloat...

btw, "pattern" doesn't make much sense in SRE -- who says
the pattern object was created by re.compile?  guess I'll just
set it to None in other cases (e.g. sregex, sreverb, sgema...)

</F>


From bwarsaw at cnri.reston.va.us  Fri Mar 31 02:35:16 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 30 Mar 2000 19:35:16 -0500 (EST)
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
References: <00b701bf9a99$022339c0$34aab5d4@hagrid>
	<14563.58848.109072.339060@amarok.cnri.reston.va.us>
	<00e901bf9a9c$6c036240$34aab5d4@hagrid>
Message-ID: <14563.62148.860971.360871@anthem.cnri.reston.va.us>

>>>>> "FL" == Fredrik Lundh <effbot at telia.com> writes:

    FL> hmm.  I could have sworn...   guess I didn't look carefully
    FL> enough (or someone's used his time machine again :-).

Yep, sorry.  If it's documented as in the public interface, it should
be kept.  Anything else can go (he says without yet grep'ing through
his various code bases).

-Barry


From bwarsaw at cnri.reston.va.us  Fri Mar 31 06:34:15 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 30 Mar 2000 23:34:15 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us>
Message-ID: <14564.10951.90258.729547@anthem.cnri.reston.va.us>

>>>>> "Guido" == Guido van Rossum <guido at cnri.reston.va.us> writes:

    Guido> Modified Files: mmapmodule.c Log Message: Hacked for Win32
    Guido> by Mark Hammond.  Reformatted for 8-space tabs and fitted
    Guido> into 80-char lines by GvR.

Can we change the 8-space-tab rule for all new C code that goes in?  I
know that we can't practically change existing code right now, but for
new C code, I propose we use no tab characters, and we use a 4-space
block indentation.

-Barry


From DavidA at ActiveState.com  Fri Mar 31 07:07:02 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Thu, 30 Mar 2000 21:07:02 -0800
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
In-Reply-To: <14564.10951.90258.729547@anthem.cnri.reston.va.us>
Message-ID: <NDBBJPNCJLKKIOBLDOMJGEIOCDAA.DavidA@ActiveState.com>

> Can we change the 8-space-tab rule for all new C code that goes in?  I
> know that we can't practically change existing code right now, but for
> new C code, I propose we use no tab characters, and we use a 4-space
> block indentation.

Heretic!  

+1, FWIW =)


From bwarsaw at cnri.reston.va.us  Fri Mar 31 07:16:48 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Fri, 31 Mar 2000 00:16:48 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
References: <14564.10951.90258.729547@anthem.cnri.reston.va.us>
	<NDBBJPNCJLKKIOBLDOMJGEIOCDAA.DavidA@ActiveState.com>
Message-ID: <14564.13504.310866.835201@anthem.cnri.reston.va.us>

>>>>> "DA" == David Ascher <DavidA at ActiveState.com> writes:

    DA> Heretic!

    DA> +1, FWIW =)

I hereby offer to so untabify and reformat any C code in the standard
distribution that Guido will approve of.

-Barry


From mhammond at skippinet.com.au  Fri Mar 31 07:16:26 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri, 31 Mar 2000 15:16:26 +1000
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJGEIOCDAA.DavidA@ActiveState.com>
Message-ID: <ECEPKNMJLHAPFFJHDOJBMEKCCHAA.mhammond@skippinet.com.au>

+1 for me too.  It also brings all source files under the same
guidelines (rather than seperate ones for .py and .c)

Mark.


From bwarsaw at cnri.reston.va.us  Fri Mar 31 07:40:16 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Fri, 31 Mar 2000 00:40:16 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
References: <NDBBJPNCJLKKIOBLDOMJGEIOCDAA.DavidA@ActiveState.com>
	<ECEPKNMJLHAPFFJHDOJBMEKCCHAA.mhammond@skippinet.com.au>
Message-ID: <14564.14912.629414.970309@anthem.cnri.reston.va.us>

>>>>> "MH" == Mark Hammond <mhammond at skippinet.com.au> writes:

    MH> +1 for me too.  It also brings all source files under the same
    MH> guidelines (rather than seperate ones for .py and .c)

BTW, I further propose that if Guido lets me reformat the C code, that
we freeze other checkins for the duration and I temporarily turn off
the python-checkins email.  That is, unless you guys /want/ to be
bombarded with boatloads of useless diffs. :)

-Barry


From pf at artcom-gmbh.de  Fri Mar 31 08:45:45 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Fri, 31 Mar 2000 08:45:45 +0200 (MEST)
Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....)
In-Reply-To: <14564.14912.629414.970309@anthem.cnri.reston.va.us> from "bwarsaw@cnri.reston.va.us" at "Mar 31, 2000  0:40:16 am"
Message-ID: <m12avBh-000CnCC@artcom0.artcom-gmbh.de>

Hi!

sigh :-(

> >>>>> "MH" == Mark Hammond <mhammond at skippinet.com.au> writes:
> 
>     MH> +1 for me too.  It also brings all source files under the same
>     MH> guidelines (rather than seperate ones for .py and .c)
 
bwarsaw at cnri.reston.va.us:
> BTW, I further propose that if Guido lets me reformat the C code, that
> we freeze other checkins for the duration and I temporarily turn off
> the python-checkins email.  That is, unless you guys /want/ to be
> bombarded with boatloads of useless diffs. :)

-1 for C reformatting.  The 4 space intendation seesm reasonable for
Python sources, but I disaggree for C code.  C is not Python.  Let me cite 
a very prominent member of the open source community (pasted from
/usr/src/linux/Documentation/CodingStyle):

		   Chapter 1: Indentation

   Tabs are 8 characters, and thus indentations are also 8 characters. 
   There are heretic movements that try to make indentations 4 (or even 2!)
   characters deep, and that is akin to trying to define the value of PI to
   be 3. 

   Rationale: The whole idea behind indentation is to clearly define where
   a block of control starts and ends.  Especially when you've been looking
   at your screen for 20 straight hours, you'll find it a lot easier to see
   how the indentation works if you have large indentations. 

   Now, some people will claim that having 8-character indentations makes
   the code move too far to the right, and makes it hard to read on a
   80-character terminal screen.  The answer to that is that if you need
   more than 3 levels of indentation, you're screwed anyway, and should fix
   your program. 

   In short, 8-char indents make things easier to read, and have the added
   benefit of warning you when you're nesting your functions too deep. 
   Heed that warning. 

Also the Python interpreter has no strong relationship with Linux kernel
a agree with Linus on this topic.  Python source code is another thing:
Python identifiers are usually longer due to qualifiying and Python
operands are often lists, tuples or the like, so lines contain more stuff.

disliking-yet-another-white-space-discussion-ly y'rs  - peter


From mhammond at skippinet.com.au  Fri Mar 31 09:11:50 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri, 31 Mar 2000 17:11:50 +1000
Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....)
In-Reply-To: <m12avBh-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <ECEPKNMJLHAPFFJHDOJBOEKECHAA.mhammond@skippinet.com.au>

>    Rationale: The whole idea behind indentation is to
> clearly define where
>    a block of control starts and ends.  Especially when

Ironically, this statement is a strong argument for insisting on
Python using real tab characters!  "Clearly define" is upgraded to
"used to define".

>    80-character terminal screen.  The answer to that is
> that if you need
>    more than 3 levels of indentation, you're screwed
> anyway, and should fix
>    your program.

Yeah, right!

int foo()
{
	// one level for the privilege of being here.
	switch (bar) {
		// uh oh - running out of room...
		case WTF:
			// Oh no - if I use an "if" statement,
			// my code is "screwed"??
	}

}

> disliking-yet-another-white-space-discussion-ly y'rs  - peter

Like-death-and-taxes-ly y'rs - Mark.


From moshez at math.huji.ac.il  Fri Mar 31 10:04:32 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Fri, 31 Mar 2000 10:04:32 +0200 (IST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <200003302134.QAA22939@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003311002290.27570-100000@sundial>

On Thu, 30 Mar 2000, Guido van Rossum wrote:

> > Whoa...  Not sure.  This will give issues with Patrice, at least (even
> > if it is pure Open Source -- given the size).
> 
> For those outside CNRI -- Patrice is CNRI's tough IP lawyer.

It was understandable from the context...
Personally, I'd rather if it was folded in by value, and not by reference:
one reason is versioning problems, and another is pure laziness on my
part.

what-do-you-have-when-you-got-a-lawyer-up-to-his-neck-in-the-sand-ly y'rs,
Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mal at lemburg.com  Fri Mar 31 09:42:04 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 31 Mar 2000 09:42:04 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules 
 mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us>
Message-ID: <38E456CC.1A49334A@lemburg.com>

"Barry A. Warsaw" wrote:
> 
> >>>>> "Guido" == Guido van Rossum <guido at cnri.reston.va.us> writes:
> 
>     Guido> Modified Files: mmapmodule.c Log Message: Hacked for Win32
>     Guido> by Mark Hammond.  Reformatted for 8-space tabs and fitted
>     Guido> into 80-char lines by GvR.
> 
> Can we change the 8-space-tab rule for all new C code that goes in?  I
> know that we can't practically change existing code right now, but for
> new C code, I propose we use no tab characters, and we use a 4-space
> block indentation.

Why not just leave new code formatted as it is (except maybe
to bring the used TAB width to the standard 8 spaces used throughout
the Python C source code) ?

BTW, most of the new unicode stuff uses 4-space indents.
Unfortunately, it mixes whitespace and tabs since Emacs 
c-mode doesn't do the python-mode magic yet (is there a
way to turn it on ?).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From effbot at telia.com  Fri Mar 31 11:14:49 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 11:14:49 +0200
Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....)
References: <m12avBh-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <01ae01bf9af1$927b1940$34aab5d4@hagrid>

Peter Funk wrote:

> Also the Python interpreter has no strong relationship with Linux kernel
> a agree with Linus on this topic.  Python source code is another thing:
> Python identifiers are usually longer due to qualifiying and Python
> operands are often lists, tuples or the like, so lines contain more stuff.

you're just guessing, right?

(if you check, you'll find that the actual difference is very small.
iirc, that's true for c, c++, java, python, tcl, and probably a few
more languages.  dunno about perl, though... :-)

</F>


From effbot at telia.com  Fri Mar 31 11:17:42 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 11:17:42 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com>
Message-ID: <01b501bf9af1$f9b44500$34aab5d4@hagrid>

M.-A. Lemburg <mal at lemburg.com> wrote:
> Why not just leave new code formatted as it is (except maybe
> to bring the used TAB width to the standard 8 spaces used throughout
> the Python C source code) ?
> 
> BTW, most of the new unicode stuff uses 4-space indents.
> Unfortunately, it mixes whitespace and tabs since Emacs 
> c-mode doesn't do the python-mode magic yet (is there a
> way to turn it on ?).

http://www.jwz.org/doc/tabs-vs-spaces.html
contains some hints.

</F>


From moshez at math.huji.ac.il  Fri Mar 31 13:24:05 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Fri, 31 Mar 2000 13:24:05 +0200 (IST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
Message-ID: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>

Here is a new list of things that will change in the next release. 
Thanks to all the people who gave me hints and information!
If you have anything you think I missed, or mistreated, please e-mail
me personally -- I'll post an updated version soon.

Obligatory
==========
A lot of bug-fixes, some optimizations, many improvements in the documentation

Core changes
============
Deleting objects is safe even for deeply nested data structures.

Long/int unifications: long integers can be used in seek() calls, as slice
indexes. str(1L) --> '1', not '1L' (repr() is still the same)

Builds on NT Alpha

UnboundLocalError is raised when a local variable is undefined
long, int take optional "base" parameter

string objects now have methods (though they are still immutable)

unicode support: Unicode strings are marked with u"string", and there
is support for arbitrary encoders/decoders

"in" operator can now be overriden in user-defined classes to mean anything:
it calls the magic method __contains__

New calling syntax: f(*args, **kw) equivalent to apply(f, args, kw)

Some methods which would take multiple arguments and treat them as a tuple
were fixed: list.{append, insert, remove, count}, socket.connect

New modules
===========
winreg - Windows registry interface.
Distutils - tools for distributing Python modules
robotparser - parse a robots.txt file (for writing web spiders)
linuxaudio - audio for Linux
mmap - treat a file as a memory buffer
sre -  regular expressions (fast, supports unicode)
filecmp - supersedes the old cmp.py and dircmp.py modules
tabnanny - check Python sources for tab-width dependance
unicode - support for unicode
codecs - support for Unicode encoders/decoders

Module changes
==============
re - changed to be a frontend to sre
readline, ConfigParser, cgi, calendar, posix, readline, xmllib, aifc, chunk, 
wave, random, shelve, nntplib - minor enhancements
socket, httplib, urllib - optional OpenSSL support
_tkinter - support for 8.1,8.2,8.3 (no support for versions older then 8.0)

Tool changes
============
IDLE -- complete overhaul

(Andrew, I'm still waiting for the expat support and integration to add to
this list -- other than that, please contact me if you want something less
telegraphic <wink>)
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping at lfw.org  Fri Mar 31 14:01:21 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 31 Mar 2000 04:01:21 -0800 (PST)
Subject: [Python-Dev] Roundup et al.
Message-ID: <Pine.LNX.4.10.10003310355250.1007-100000@skuld.lfw.org>

Hi -- there was some talk on this list earlier about nosy
lists, managing patches, and such things, so i just wanted
to mention, for anybody interested, that i threw together
Roundup very quickly for you to try out.

    http://www.lfw.org/python/

There's a tar file there -- it's very messy code, and i
apologize (it was hastily hacked out of the running
prototype implementation), but it should be workable
enough to play with.  There's a test installation to play
with at

    http://www.lfw.org/ping/roundup/roundup.cgi

Dummy user:password pairs are test:test, spam:spam, eggs:eggs.

A fancier design, still in the last stages of coming
together (which will be my submission to the Software
Carpentry contest) is up at

    http://crit.org/http://www.lfw.org/ping/sctrack.html

and i welcome your thoughts and comments on that if you
have the spare time (ha!) and generous inclination to
contribute them.

Thank you and apologies for the interruption.


-- ?!ng

"To be human is to continually change.  Your desire to remain as you are
is what ultimately limits you."
    -- The Puppet Master, Ghost in the Shell


From guido at python.org  Fri Mar 31 14:10:45 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Mar 2000 07:10:45 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
In-Reply-To: Your message of "Thu, 30 Mar 2000 23:34:15 EST."
             <14564.10951.90258.729547@anthem.cnri.reston.va.us> 
References: <200003310117.UAA26774@eric.cnri.reston.va.us>  
            <14564.10951.90258.729547@anthem.cnri.reston.va.us> 
Message-ID: <200003311210.HAA29010@eric.cnri.reston.va.us>

> Can we change the 8-space-tab rule for all new C code that goes in?  I
> know that we can't practically change existing code right now, but for
> new C code, I propose we use no tab characters, and we use a 4-space
> block indentation.

Actually, this one was formatted for 8-space indents but using 4-space
tabs, so in my editor it looked like 16-space indents!

Given that we don't want to change existing code, I'd prefer to stick
with 1-tab 8-space indents.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From moshez at math.huji.ac.il  Fri Mar 31 15:10:06 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Fri, 31 Mar 2000 15:10:06 +0200 (IST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52
In-Reply-To: <200003311301.IAA29221@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003311507270.3725-100000@sundial>

On Fri, 31 Mar 2000, Guido van Rossum wrote:

> + Christian Tismer
> + Christian Tismer

Ummmmm....I smell something fishy here. Are there two Christian Tismers?
That would explain how Christian has so much time to work on Stackless.

Well, between the both of them, Guido will have no chance but to put
Stackless in the standard distribution.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From fredrik at pythonware.com  Fri Mar 31 15:16:16 2000
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 15:16:16 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52
References: <200003311301.IAA29221@eric.cnri.reston.va.us>
Message-ID: <000d01bf9b13$4be1db00$0500a8c0@secret.pythonware.com>

>   Tracy Tims
> + Christian Tismer
> + Christian Tismer
>   R Lindsay Todd

two christians?

</F>


From bwarsaw at cnri.reston.va.us  Fri Mar 31 15:55:13 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Fri, 31 Mar 2000 08:55:13 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules 
 mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us>
	<14564.10951.90258.729547@anthem.cnri.reston.va.us>
	<38E456CC.1A49334A@lemburg.com>
Message-ID: <14564.44609.221250.471147@anthem.cnri.reston.va.us>

>>>>> "M" == M  <mal at lemburg.com> writes:

    M> BTW, most of the new unicode stuff uses 4-space indents.
    M> Unfortunately, it mixes whitespace and tabs since Emacs 
    M> c-mode doesn't do the python-mode magic yet (is there a
    M> way to turn it on ?).

(setq indent-tabs-mode nil)

I could add that to the "python" style.  And to zap all your existing
tab characters:

C-M-h M-x untabify RET

-Barry


From skip at mojam.com  Fri Mar 31 16:04:46 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 31 Mar 2000 08:04:46 -0600 (CST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>
References: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>
Message-ID: <14564.45182.460160.589244@beluga.mojam.com>

Moshe,

I would highlight those bits that are likely to warrant a little closer
scrutiny.  The list.{append,insert,...} and socket.connect change certainly
qualify.  Perhaps split the Core Changes section into two subsections, one
set of changes likely to require some adaptation and one set that should be
backwards-compatible. 

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From guido at python.org  Fri Mar 31 16:47:31 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Mar 2000 09:47:31 -0500
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: Your message of "Fri, 31 Mar 2000 08:04:46 CST."
             <14564.45182.460160.589244@beluga.mojam.com> 
References: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>  
            <14564.45182.460160.589244@beluga.mojam.com> 
Message-ID: <200003311447.JAA29633@eric.cnri.reston.va.us>

See what I've done to Moshe's list: http://www.python.org/1.6/

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at mojam.com  Fri Mar 31 17:28:56 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 31 Mar 2000 09:28:56 -0600 (CST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: <200003311447.JAA29633@eric.cnri.reston.va.us>
References: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>
	<14564.45182.460160.589244@beluga.mojam.com>
	<200003311447.JAA29633@eric.cnri.reston.va.us>
Message-ID: <14564.50232.734778.152933@beluga.mojam.com>


    Guido> See what I've done to Moshe's list: http://www.python.org/1.6/

Looks good.  Attached are a couple nitpicky diffs.

Skip

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.6.diff
Type: application/octet-stream
Size: 1263 bytes
Desc: diffs to 1.6 Release Notes
URL: <http://mail.python.org/pipermail/python-dev/attachments/20000331/379961d0/attachment.obj>

From guido at python.org  Fri Mar 31 17:47:56 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Mar 2000 10:47:56 -0500
Subject: [Python-Dev] Windows installer pre-prelease
Message-ID: <200003311547.KAA15538@eric.cnri.reston.va.us>

The Windows installer is always hard to get just right.  If you have a
moment, go to http://www.python.org/1.6/ and download the Windows
Installer prerelease.  Let me know what works, what doesn't!

I've successfully installed it on Windows NT 4.0 and on Windows 98,
both with default install target and with a modified install target.

I'd love to hear that it also installs cleanly on Windows 95.  Please
test IDLE from the start menu!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward at cnri.reston.va.us  Fri Mar 31 18:18:43 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Fri, 31 Mar 2000 11:18:43 -0500
Subject: [Python-Dev] Distutils for the std. library (was: Expat module)
In-Reply-To: <14563.52125.401817.986919@amarok.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Mar 30, 2000 at 04:48:13PM -0500
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us>
Message-ID: <20000331111842.A8060@cnri.reston.va.us>

On 30 March 2000, Andrew M. Kuchling said:
> Should we consider replacing the makesetup/Setup.in mechanism with a
> setup.py script that uses the Distutils?  You'd have to compile a
> minipython with just enough critical modules -- strop and posixmodule
> are probably the most important ones -- in order to run setup.py.
> It's something I'd like to look at for 1.6, because then you could be
> much smarter in automatically enabling modules.

Gee, I didn't think anyone was gonna open *that* can of worms for 1.6.
Obviously, I'd love to see the Distutils used to build parts of the
Python library.  Some possible problems:

  * Distutils relies heavily on the sys, os, string, and re modules,
    so those would have to be built and included in the mythical
    mini-python (as would everything they rely on -- strop, pcre, ... ?)

  * Distutils currently assumes that it's working with an installed
    Python -- it doesn't know anything about working in the Python
    source tree.  I think this could be fixed just be tweaking the
    distutils.sysconfig module, but there might be subtle assumptions
    elsewhere in the code.

  * I haven't written the mythical Autoconf-in-Python yet, so we'd still have
    to rely on either the configure script or user intervention to find
    out whether library X is installed, and where its header and library
    files live (for X in zlib, tcl, tk, ...).

Of course, the configure script would still be needed to build the
mini-python, so it's not going away any time soon.

        Greg


From skip at mojam.com  Fri Mar 31 18:26:55 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 31 Mar 2000 10:26:55 -0600 (CST)
Subject: [Python-Dev] Distutils for the std. library (was: Expat module)
In-Reply-To: <20000331111842.A8060@cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
	<200003282000.PAA11988@eric.cnri.reston.va.us>
	<14563.46954.70800.706245@amarok.cnri.reston.va.us>
	<200003302025.PAA22367@eric.cnri.reston.va.us>
	<14563.50417.909045.81868@amarok.cnri.reston.va.us>
	<200003302131.QAA22897@eric.cnri.reston.va.us>
	<14563.52125.401817.986919@amarok.cnri.reston.va.us>
	<20000331111842.A8060@cnri.reston.va.us>
Message-ID: <14564.53711.803509.962248@beluga.mojam.com>

    Greg>   * Distutils relies heavily on the sys, os, string, and re
    Greg>     modules, so those would have to be built and included in the
    Greg>     mythical mini-python (as would everything they rely on --
    Greg>     strop, pcre, ... ?)

With string methods in 1.6, reliance on the string and strop modules should
be lessened or eliminated, right?  re and os may need a tweak or two to use
string methods themselves. The sys module is always available.  Perhaps it
would make sense to put sre(module)?.c into the Python directory where
sysmodule.c lives.  That way, a Distutils-capable mini-python could be built
without messing around in the Modules directory at all...

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From moshez at math.huji.ac.il  Fri Mar 31 18:25:11 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Fri, 31 Mar 2000 18:25:11 +0200 (IST)
Subject: [Python-Dev] Distutils for the std. library (was: Expat module)
In-Reply-To: <20000331111842.A8060@cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003311817090.7408-100000@sundial>

On Fri, 31 Mar 2000, Greg Ward wrote:

> Gee, I didn't think anyone was gonna open *that* can of worms for 1.6.

Well, it's not like it's not a lot of work, but it could be done, with
liberal interpretation of "mini": include in "mini" Python *all* modules
which do not rely on libraries not distributed with the Python core --
zlib, expat and Tkinter go right out the window, but most everything
else can stay. That way, Distutils can use all modules it currently 
uses <wink>.

The other problem, file-location, is a problem I have talked about
earlier: it *cannot* be assumed that the default place for putting new
libraries is the same place the Python interpreter resides, for many
reasons. Why not ask the user explicitly?


--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From gward at cnri.reston.va.us  Fri Mar 31 18:29:33 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Fri, 31 Mar 2000 11:29:33 -0500
Subject: [Python-Dev] Distutils for the std. library (was: Expat module)
In-Reply-To: <14564.53711.803509.962248@beluga.mojam.com>; from skip@mojam.com on Fri, Mar 31, 2000 at 10:26:55AM -0600
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> <20000331111842.A8060@cnri.reston.va.us> <14564.53711.803509.962248@beluga.mojam.com>
Message-ID: <20000331112933.B8060@cnri.reston.va.us>

On 31 March 2000, Skip Montanaro said:
> With string methods in 1.6, reliance on the string and strop modules should
> be lessened or eliminated, right?  re and os may need a tweak or two to use
> string methods themselves. The sys module is always available.  Perhaps it
> would make sense to put sre(module)?.c into the Python directory where
> sysmodule.c lives.  That way, a Distutils-capable mini-python could be built
> without messing around in the Modules directory at all...

But I'm striving to maintain compatability with (at least) Python 1.5.2
in Distutils.  That need will fade with time, but it's not going to
disappear the moment Python 1.6 is released.  (Guess I'll have to find
somewhere else to play with string methods and extended call syntax).

        Greg


From thomas.heller at ion-tof.com  Fri Mar 31 19:09:41 2000
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Fri, 31 Mar 2000 19:09:41 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils msvccompiler.py
References: <200003311653.LAA08175@thrak.cnri.reston.va.us>
Message-ID: <038701bf9b33$e7c49240$4500a8c0@thomasnotebook>

> Simplified Thomas Heller's registry patch: just assign all those
> HKEY_* and Reg* names once, rather than having near-duplicate code
> in the two import attempts.

Your change won't work, the function names in win32api and winreg are not the same:
Example:    win32api.RegEnumValue <-> winreg.EnumValue 

> 
> Also dropped the leading underscore on all the imported symbols,
> as it's not appropriate (they're not local to this module).

Are they used anywhere else? Or do you think they *could* be used somewhere else?

Thomas Heller


From mal at lemburg.com  Fri Mar 31 12:19:58 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 31 Mar 2000 12:19:58 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules 
 mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com> <01b501bf9af1$f9b44500$34aab5d4@hagrid>
Message-ID: <38E47BCE.94E4E012@lemburg.com>

Fredrik Lundh wrote:
> 
> M.-A. Lemburg <mal at lemburg.com> wrote:
> > Why not just leave new code formatted as it is (except maybe
> > to bring the used TAB width to the standard 8 spaces used throughout
> > the Python C source code) ?
> >
> > BTW, most of the new unicode stuff uses 4-space indents.
> > Unfortunately, it mixes whitespace and tabs since Emacs
> > c-mode doesn't do the python-mode magic yet (is there a
> > way to turn it on ?).
> 
> http://www.jwz.org/doc/tabs-vs-spaces.html
> contains some hints.

Ah, cool. Thanks :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From pf at artcom-gmbh.de  Fri Mar 31 20:56:40 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Fri, 31 Mar 2000 20:56:40 +0200 (MEST)
Subject: [Python-Dev] 'make install' should create lib/site-packages IMO
In-Reply-To: <200003311513.KAA00790@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 31, 2000 10:13:20 am"
Message-ID: <m12b6b2-000CnCC@artcom0.artcom-gmbh.de>

Hi!

Guido van Rossum:
[...]
> Modified Files:
> 	Makefile.in 
> Log Message:
> Added distutils and distutils/command to LIBSUBDIRS.  Noted by Andrew
> Kuchling.
[...]
> ! LIBSUBDIRS=	lib-old lib-tk test test/output encodings \
> ! 		distutils distutils/command $(MACHDEPS)
[...]

What about 'site-packages'?  SuSE added this to their Python packaging
and I think it is a good idea to have an empty 'site-packages' directory
installed by default.

Regards, Peter


From akuchlin at mems-exchange.org  Fri Mar 31 22:16:53 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 31 Mar 2000 15:16:53 -0500 (EST)
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
In-Reply-To: <00e901bf9a9c$6c036240$34aab5d4@hagrid>
References: <00b701bf9a99$022339c0$34aab5d4@hagrid>
	<14563.58848.109072.339060@amarok.cnri.reston.va.us>
	<00e901bf9a9c$6c036240$34aab5d4@hagrid>
Message-ID: <14565.1973.361549.291817@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>btw, "pattern" doesn't make much sense in SRE -- who says
>the pattern object was created by re.compile?  guess I'll just
>set it to None in other cases (e.g. sregex, sreverb, sgema...)

Good point; I can imagine fabulously complex patterns assembled
programmatically, for which no summary could be made.  I guess there
could be another attribute that also gives the class (module?
function?) used to compile the pattern, but more likely, the pattern
attribute should be deprecated and eventually dropped.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
You know how she is when she gets an idea into her head. I mean, when one
finally penetrates.
  -- Desire describes Delirium, in SANDMAN #41: "Brief Lives:1"


From pf at artcom-gmbh.de  Fri Mar 31 22:14:41 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Fri, 31 Mar 2000 22:14:41 +0200 (MEST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: <200003311447.JAA29633@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 31, 2000  9:47:31 am"
Message-ID: <m12b7oX-000CnCC@artcom0.artcom-gmbh.de>

Hi!

Guido van Rossum :
> See what I've done to Moshe's list: http://www.python.org/1.6/

Very fine, but I have a few small annotations:

1.'linuxaudio' has been renamed to 'linuxaudiodev'

2.The following text:

  "_tkinter - support for 8.1,8.2,8.3 (no support for versions older than 8.0)."

  looks a bit misleading, since it is not explicit about Version 8.0.x
  I suggest the following wording:

  "_tkinter - supports Tcl/Tk from version 8.0 up to the current 8.3.  
   Support for versions older than 8.0 has been dropped."

3.'src/Tools/i18n/pygettext.py' by Barry should be mentioned.  This is
  a very useful utility.  I suggest to append the following text:

   "New utility pygettext.py -- Python equivalent of xgettext(1).
    A message text extraction tool used for internationalizing 
    applications written in Python"

Regards, Peter


From fdrake at acm.org  Fri Mar 31 22:30:00 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 31 Mar 2000 15:30:00 -0500 (EST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: <m12b7oX-000CnCC@artcom0.artcom-gmbh.de>
References: <200003311447.JAA29633@eric.cnri.reston.va.us>
	<m12b7oX-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <14565.2760.665022.206361@seahag.cnri.reston.va.us>

Peter Funk writes:
 >   I suggest the following wording:
...
 >   a very useful utility.  I suggest to append the following text:

Peter,
  I'm beginning to figure this out -- you really just want to get
published!  ;)
  You forgot the legelese.  ;(


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido at python.org  Fri Mar 31 23:30:42 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Mar 2000 16:30:42 -0500
Subject: [Python-Dev] Python 1.6 alpha 1 released
Message-ID: <200003312130.QAA04361@eric.cnri.reston.va.us>

I've just released a source tarball and a Windows installer for Python
1.6 alpha 1 to the Python website:

  http://www.python.org/1.6/

Probably the biggest news (if you hadn't heard the rumors) is Unicode
support.  More news on the above webpage.

Note: this is an alpha release.  Some of the code is very rough!
Please give it a try with your favorite Python application, but don't
trust it for production use yet.  I plan to release several more alpha
and beta releases over the next two months, culminating in an 1.6
final release around June first.

We need your help to make the final 1.6 release as robust as possible
-- please test this alpha release!!!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gandalf at starship.python.net  Fri Mar 31 23:56:16 2000
From: gandalf at starship.python.net (Vladimir Ulogov)
Date: Fri, 31 Mar 2000 16:56:16 -0500 (EST)
Subject: [Python-Dev] Re: Python 1.6 alpha 1 released
In-Reply-To: <200003312130.QAA04361@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003311651590.22919-100000@starship.python.net>

Guido,

"""where you used to write sock.connect(host, port) you must now write
sock.connect((host, port))"""

Is it possible to keep old notation ? I'm understand (according you past
mail about parameters of the connect) this may be not what you has have in
mind, but we do use this notation "a lot" and for us it will means to
create workaround for socket.connect function. It's inconvinient. In
general, I'm thinknig the socket.connect(Host, Port) looks prettier :))
than socket.connect((Host, Port))
Vladimir


From gstein at lyra.org  Wed Mar  1 00:47:55 2000
From: gstein at lyra.org (Greg Stein)
Date: Tue, 29 Feb 2000 15:47:55 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <38BC2375.5C832488@tismer.com>
Message-ID: <Pine.LNX.4.10.10002291543330.10607-100000@nebula.lyra.org>

On Tue, 29 Feb 2000, Christian Tismer wrote:
> Greg Stein wrote:
> > +1 on breaking it now, rather than deferring it Yet Again.
> > 
> > IMO, there has been plenty of warning, and there is plenty of time to
> > correct the software.
> > 
> > I'm +0 on adding a warning architecture to Python to support issuing a
> > warning/error when .append is called with multiple arguments.
> 
> Well, the (bad) effect of this patch is that you cannot run
> PythonWin any longer unless Mark either supplies an updated
> distribution, or one corrects the two barfing Scintilla
> support scripts by hand.

Yes, but there is no reason to assume this won't happen.

Why don't we simply move forward with the assumption that PythonWin and
Scintilla will be updated? If we stand around pointing at all the uses of
append that are incorrect and claim that is why we can't move forward,
then we won't get anywhere. Instead, let's just *MOVE* and see that
software authors update accordingly. It isn't like it is a difficult
change to make. Heck, PythonWin and Scintilla could be updated within the
week and re-released. *WAY* ahead of the 1.6 release.

> Bad for me, since I'm building Stackless Python against 1.5.2+,
> and that means the users will see PythonWin barf when installing SLP.

If you're building a system using an interim release of Python, then I
think you need to take responsibility for that. If you don't want those
people to have problems, then you can back out the list.append change. Or
you can release patches to PythonWin. I don't think the Python world at
large should be hampered because somebody is using an unstable/interim
version of Python. Again: we couldn't move forward.

> Adding a warning instead of raising an exception would be nice IMHO,
> since the warning could probably contain the file name and line
> number to change, and I would leave my users with this easy task.

Yes, this would be nice. But somebody has to take the time to code it up.
The warning won't appear out of nowhere...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mhammond at skippinet.com.au  Wed Mar  1 00:57:38 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Wed, 1 Mar 2000 10:57:38 +1100
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.10.10002291543330.10607-100000@nebula.lyra.org>
Message-ID: <ECEPKNMJLHAPFFJHDOJBAEJNCFAA.mhammond@skippinet.com.au>

> Why don't we simply move forward with the assumption that PythonWin and
> Scintilla will be updated?

Done :-)

However, I think dropping it now _is_ a little heavy handed.  I decided to
do a wider search and found a few in, eg, Sam Rushings calldll based ODBC
package.

Personally, I would much prefer a warning now, and drop it later.  _Then_ we
can say we have made enough noise about it.  It would only be 2 years ago
that I became aware that this "feature" of append was not a feature at all -
up until then I used it purposely, and habits are sometimes hard to change
:-)

MArk.


From gstein at lyra.org  Wed Mar  1 01:12:29 2000
From: gstein at lyra.org (Greg Stein)
Date: Tue, 29 Feb 2000 16:12:29 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBAEJNCFAA.mhammond@skippinet.com.au>
Message-ID: <Pine.LNX.4.10.10002291608020.10607-100000@nebula.lyra.org>

On Wed, 1 Mar 2000, Mark Hammond wrote:
> > Why don't we simply move forward with the assumption that PythonWin and
> > Scintilla will be updated?
> 
> Done :-)

hehe...

> However, I think dropping it now _is_ a little heavy handed.  I decided to
> do a wider search and found a few in, eg, Sam Rushings calldll based ODBC
> package.
> 
> Personally, I would much prefer a warning now, and drop it later.  _Then_ we
> can say we have made enough noise about it.  It would only be 2 years ago
> that I became aware that this "feature" of append was not a feature at all -
> up until then I used it purposely, and habits are sometimes hard to change
> :-)

What's the difference between a warning and an error? If you're running a
program and it suddenly spits out a warning about a misuse of list.append,
I'd certainly see that as "the program did something unexpected; that is
an error."

But this is all moot. Guido has already said that we would be amenable to
a warning/error infrastructure which list.append could use. His
description used some awkward sentences, so I'm not sure (without spending
some brain cycles to parse the email) exactly what his desired defaults
and behavior are. But hey... the possibility is there, and is just waiting
for somebody to code it.

IMO, Guido has left an out for people that are upset with the current
hard-line approach. One of those people just needs to spend a bit of time
coming up with a patch :-)

And yes, Guido is also the Benevolent Dictator and can certainly have his
mind changed, so people can definitely continue pestering him to back away
from the hard-line approach...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From ping at lfw.org  Wed Mar  1 01:20:07 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Tue, 29 Feb 2000 18:20:07 -0600 (CST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.10.10002291608020.10607-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.10.10002291816190.10505-100000@server1.lfw.org>

On Tue, 29 Feb 2000, Greg Stein wrote:
>
> What's the difference between a warning and an error? If you're running a
> program and it suddenly spits out a warning about a misuse of list.append,
> I'd certainly see that as "the program did something unexpected; that is
> an error."

A big, big difference.  Perhaps to one of us, it's the minor inconvenience
of reading the error message and inserting a couple of parentheses in the
appropriate file -- but to the end user, it's the difference between the
program working (albeit noisily) and *not* working.  When the program throws
an exception and stops, it is safe to say most users will declare it broken
and give up.

We can't assume that they're going to be able to figure out what to edit
(or be brave enough to try) just by reading the error message... or even
what interpreter flag to give, if errors (rather than warnings) are the
default behaviour.


-- ?!ng


From klm at digicool.com  Wed Mar  1 01:37:09 2000
From: klm at digicool.com (Ken Manheimer)
Date: Tue, 29 Feb 2000 19:37:09 -0500 (EST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBAEJNCFAA.mhammond@skippinet.com.au>
Message-ID: <Pine.LNX.4.21.0002291925060.22173-100000@korak.digicool.com>

On Wed, 1 Mar 2000, Mark Hammond wrote:

> > Why don't we simply move forward with the assumption that PythonWin and
> > Scintilla will be updated?
> 
> Done :-)
> 
> However, I think dropping it now _is_ a little heavy handed.  I decided to
> do a wider search and found a few in, eg, Sam Rushings calldll based ODBC
> package.
> 
> Personally, I would much prefer a warning now, and drop it later.  _Then_ we
> can say we have made enough noise about it.  It would only be 2 years ago
> that I became aware that this "feature" of append was not a feature at all -
> up until then I used it purposely, and habits are sometimes hard to change
> :-)

I agree with mark.  Why the sudden rush??  It seems to me to be unfair to
make such a change - one that will break peoples code - without advanced
warning, which typically is handled by a deprecation period.  There *are*
going to be people who won't be informed of the change in the short span
of less than a single release. Just because it won't cause you pain isn't
a good reason to disregard the pain of those that will suffer,
particularly when you can do something relatively low-cost to avoid it.

Ken
klm at digicool.com


From gstein at lyra.org  Wed Mar  1 01:57:56 2000
From: gstein at lyra.org (Greg Stein)
Date: Tue, 29 Feb 2000 16:57:56 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.21.0002291925060.22173-100000@korak.digicool.com>
Message-ID: <Pine.LNX.4.10.10002291642080.10607-100000@nebula.lyra.org>

On Tue, 29 Feb 2000, Ken Manheimer wrote:
>...
> I agree with mark.  Why the sudden rush??  It seems to me to be unfair to
> make such a change - one that will break peoples code - without advanced
> warning, which typically is handled by a deprecation period.  There *are*
> going to be people who won't be informed of the change in the short span
> of less than a single release. Just because it won't cause you pain isn't
> a good reason to disregard the pain of those that will suffer,
> particularly when you can do something relatively low-cost to avoid it.

Sudden rush?!?

Mark said he knew about it for a couple years. Same here. It was a long
while ago that .append()'s semantics were specified to "no longer" accept
multiple arguments.

I see in the HISTORY file, that changes were made to Python 1.4 (October,
1996) to avoid calling append() with multiple arguments.

So, that is over three years that append() has had multiple-args
deprecated. There was probably discussion even before that, but I can't
seem to find something to quote. Seems like plenty of time -- far from
rushed.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From klm at digicool.com  Wed Mar  1 02:02:02 2000
From: klm at digicool.com (Ken Manheimer)
Date: Tue, 29 Feb 2000 20:02:02 -0500 (EST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.10.10002291642080.10607-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>

On Tue, 29 Feb 2000, Greg Stein wrote:

> On Tue, 29 Feb 2000, Ken Manheimer wrote:
> >...
> > I agree with mark.  Why the sudden rush??  It seems to me to be unfair to
> > make such a change - one that will break peoples code - without advanced
> > warning, which typically is handled by a deprecation period.  There *are*
> > going to be people who won't be informed of the change in the short span
> > of less than a single release. Just because it won't cause you pain isn't
> > a good reason to disregard the pain of those that will suffer,
> > particularly when you can do something relatively low-cost to avoid it.
> 
> Sudden rush?!?
> 
> Mark said he knew about it for a couple years. Same here. It was a long
> while ago that .append()'s semantics were specified to "no longer" accept
> multiple arguments.
> 
> I see in the HISTORY file, that changes were made to Python 1.4 (October,
> 1996) to avoid calling append() with multiple arguments.
> 
> So, that is over three years that append() has had multiple-args
> deprecated. There was probably discussion even before that, but I can't
> seem to find something to quote. Seems like plenty of time -- far from
> rushed.

None the less, for those practicing it, the incorrectness of it will be
fresh news.  I would be less sympathetic with them if there was recent
warning, eg, the schedule for changing it in the next release was part of
the current release.  But if you tell somebody you're going to change
something, and then don't for a few years, you probably need to renew the
warning before you make the change.  Don't you think so?  Why not?

Ken
klm at digicool.com


From paul at prescod.net  Wed Mar  1 03:56:33 2000
From: paul at prescod.net (Paul Prescod)
Date: Tue, 29 Feb 2000 18:56:33 -0800
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>
Message-ID: <38BC86E1.53F69776@prescod.net>

Software configuration management is HARD. Every sudden backwards
incompatible change (warranted or not) makes it harder. Mutli-arg append
is not hurting anyone as much as a sudden change to it would. It would
be better to leave append() alone and publicize its near-term removal
rather than cause random, part-time supported modules to stop working
because their programmers may be too busy to update them right now.

So no, I'm not stepping up to do it. But I'm also saying that the better
"lazy" option is to put something in a prominent place in the
documentation and otherwise leave it alone.

<aside>
As far as I am concerned, a formal warning-based deprecation mechanism
is necessary for Python's continued evolution. Perhaps we can even
expose the deprecation flag to the programmer so we can say:

if deprecation:
	print "This module isn't supported anymore."

if deprecation:
	print "Use method FooEx instead."

If we had a deprecation mechanism, maybe introducing new keywords would
not be quite so painful. Version x deprecates, version y adds the
keyword. Mayhap we should also deprecate implicit truncating integral
division while we are at it...
</aside>

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"The calculus and the rich body of mathematical analysis to which it
gave rise made modern science possible, but it was the algorithm that
made possible the modern world." 
        - from "Advent of the Algorithm" David Berlinski
	http://www.opengroup.com/mabooks/015/0151003386.shtml


From guido at python.org  Wed Mar  1 05:11:02 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Feb 2000 23:11:02 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: Your message of "Tue, 29 Feb 2000 18:56:33 PST."
             <38BC86E1.53F69776@prescod.net> 
References: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>  
            <38BC86E1.53F69776@prescod.net> 
Message-ID: <200003010411.XAA12988@eric.cnri.reston.va.us>

> Software configuration management is HARD. Every sudden backwards
> incompatible change (warranted or not) makes it harder. Mutli-arg append
> is not hurting anyone as much as a sudden change to it would. It would
> be better to leave append() alone and publicize its near-term removal
> rather than cause random, part-time supported modules to stop working
> because their programmers may be too busy to update them right now.

I'm tired of this rhetoric.  It's not like I'm changing existing
Python installations retroactively.  I'm planning to release a new
version of Python which no longer supports certain long-obsolete and
undocumented behavior.  If you maintain a non-core Python module, you
should test it against the new release and fix anything that comes up.
This is why we have an alpha and beta test cycle and even before that
the CVS version.  If you are a Python user who depends on a 3rd party
module, you need to find out whether the new version is compatible
with the 3rd party code you are using, or whether there's a newer
version available that solves the incompatibility.

There are people who still run Python 1.4 (really!) because they
haven't upgraded.  I don't have a problem with that -- they don't get
much support, but it's their choice, and they may not need the new
features introduced since then.  I expect that lots of people won't
upgrade their Python 1.5.2 to 1.6 right away -- they'll wait until the
other modules/packages they need are compatible with 1.6.  Multi-arg
append probably won't be the only reason why e.g. Digital Creations
may need to release an update to Zope for Python 1.6.  Zope comes with
its own version of Python anyway, so they have control over when they
make the switch.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Wed Mar  1 06:04:35 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 00:04:35 -0500
Subject: [Python-Dev] Size of int across machines (was RE: Blowfish in Python?)
In-Reply-To: <al87lfovzm7.fsf@sirppi.helsinki.fi>
Message-ID: <000201bf833b$a3b01bc0$412d153f@tim>

[Markus Stenberg]
> ...
>  speed was horrendous. >
> I think the main reason was the fact that I had to use _long ints_ for
> calculations, as the normal ints are signed, and apparently the bitwise
> operators do not work as advertised when bit32 is set (=number is
> negative).

[Tim, takes "bitwise operators" to mean & | ^ ~, and expresses surprise]

[Markus, takes umbrage, and expresses umbrage <wink>]
> Hmm.. As far as I'm concerned, shifts for example do screw up.

Do you mean "for example" as in "there are so many let's just pick one at
random", or as in "this is the only one I've stumbled into" <0.9 wink>?

> i.e.
>
> 0xffffffff >> 30
>
> [64bit Python: 3]
> [32bit Python: -1]
>
> As far as I'm concerned, that should _not_ happen. Or maybe it's just me.

I could not have guessed that your complaint was about 64-bit Python from
your "when bit32 is set (=number is negative)" description <wink>.

The behavior shown in a Python compiled under a C in which sizeof(long)==4
matches the Reference Manual (see the "Integer and long integer literals"
and "shifting operations" sections).  So that can't be considered broken
(you may not *like* it, but it's functioning as designed & as documented).

The behavior under a sizeof(long)==8 C seems more of an ill-documented (and
debatable to me too) feature.  The possibility is mentioned in the "The
standard type hierarchy" section (under Numbers -> Integers -> Plain
integers) but really not fleshed out, and the "Integer and long integer
literals" section plainly contradicts it.

Python's going to have to clean up its act here -- 64-bit machines are
getting more common.  There's a move afoot to erase the distinction between
Python ints and longs (in the sense of auto-converting from one to the other
under the covers, as needed).  In that world, your example would work like
the "64bit Python" one.  There are certainly compatability issues, though,
in that int left shifts are end-off now, and on a 32-bit machine any int for
which i & 0x8000000 is true "is negative" (and so sign-extends on a right
shift; note that Python guarantees sign-extending right shifts *regardless*
of what the platform C does (C doesn't define what happens here -- Python
does)).

[description of pain getting a fast C-like "mod 2**32 int +" to work too]

Python really wasn't designed for high-performance bit-fiddling, so you're
(as you've discovered <wink>) swimming upstream with every stroke.  Given
that you can't write a C module here, there's nothing better than to do the
^ & | ~ parts with ints, and fake the rest slowly & painfully.  Note that
you can at least determine the size of a Python int via inspecting
sys.maxint.

sympathetically-unhelpfully y'rs  - tim


From guido at python.org  Wed Mar  1 06:44:10 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 00:44:10 -0500
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: Your message of "Tue, 29 Feb 2000 15:34:21 MST."
             <20000229153421.A16502@acs.ucalgary.ca> 
References: <20000229153421.A16502@acs.ucalgary.ca> 
Message-ID: <200003010544.AAA13155@eric.cnri.reston.va.us>

[I don't like to cross-post to patches and python-dev, but I think
this belongs in patches because it's a followup to Neil's post there
and also in -dev because of its longer-term importance.]

Thanks for the new patches, Neil!

We had a visitor here at CNRI today, Eric Tiedemann
<est at hyperreal.org>, who had a look at your patches before.  Eric
knows his way around the Scheme, Lisp and GC literature, and presented
a variant on your approach which takes the bite out of the recursive
passes.

Eric had commented earlier on Neil's previous code, and I had used the
morning to make myself familiar with Neil's code.  This was relatively
easy because Neil's code is very clear.

Today, Eric proposed to do away with Neil's hash table altogether --
as long as we're wasting memory, we might as well add 3 fields to each
container object rather than allocating the same amount in a separate
hash table.  Eric expects that this will run faster, although this
obviously needs to be tried.

Container types are: dict, list, tuple, class, instance; plus
potentially user-defined container types such as kjbuckets.  I have a
feeling that function objects should also be considered container
types, because of the cycle involving globals.

Eric's algorithm, then, consists of the following parts.

Each container object has three new fields: gc_next, gc_prev, and
gc_refs.  (Eric calls the gc_refs "refcount-zero".)

We color objects white (initial), gray (root), black (scanned root).
(The terms are explained later; we believe we don't actually need bits
in the objects to store the color; see later.)

All container objects are chained together in a doubly-linked list --
this is the same as Neil's code except Neil does it only for dicts.
(Eric postulates that you need a list header.)

When GC is activated, all objects are colored white; we make a pass
over the entire list and set gc_refs equal to the refcount for each
object.

Next, we make another pass over the list to collect the internal
references.  Internal references are (just like in Neil's version)
references from other container types.  In Neil's version, this was
recursive; in Eric's version, we don't need recursion, since the list
already contains all containers.  So we simple visit the containers in
the list in turn, and for each one we go over all the objects it
references and subtract one from *its* gc_refs field.  (Eric left out
the little detail that we ened to be able to distinguish between
container and non-container objects amongst those references; this can
be a flag bit in the type field.)

Now, similar to Neil's version, all objects for which gc_refs == 0
have only internal references, and are potential garbage; all objects
for which gc_refs > 0 are "roots".  These have references to them from
other places, e.g. from globals or stack frames in the Python virtual
machine.

We now start a second list, to which we will move all roots.  The way
to do this is to go over the first list again and to move each object
that has gc_refs > 0 to the second list.  Objects placed on the second
list in this phase are considered colored gray (roots).

Of course, some roots will reference some non-roots, which keeps those
non-roots alive.  We now make a pass over the second list, where for
each object on the second list, we look at every object it references.
If a referenced object is a container and is still in the first list
(colored white) we *append* it to the second list (colored gray).
Because we append, objects thus added to the second list will
eventually be considered by this same pass; when we stop finding
objects that sre still white, we stop appending to the second list,
and we will eventually terminate this pass.  Conceptually, objects on
the second list that have been scanned in this pass are colored black
(scanned root); but there is no need to to actually make the
distinction.

(How do we know whether an object pointed to is white (in the first
list) or gray or black (in the second)?  We could use an extra
bitfield, but that's a waste of space.  Better: we could set gc_refs
to a magic value (e.g. 0xffffffff) when we move the object to the
second list.  During the meeting, I proposed to set the back pointer
to NULL; that might work too but I think the gc_refs field is more
elegant.  We could even just test for a non-zero gc_refs field; the
roots moved to the second list initially all have a non-zero gc_refs
field already, and for the objects with a zero gc_refs field we could
indeed set it to something arbitrary.)

Once we reach the end of the second list, all objects still left in
the first list are garbage.  We can destroy them in a similar to the
way Neil does this in his code.  Neil calls PyDict_Clear on the
dictionaries, and ignores the rest.  Under Neils assumption that all
cycles (that he detects) involve dictionaries, that is sufficient.  In
our case, we may need a type-specific "clear" function for containers
in the type object.

We discussed more things, but not as thoroughly.  Eric & Eric stressed
the importance of making excellent statistics available about the rate
of garbage collection -- probably as data structures that Python code
can read rather than debugging print statements.  Eric T also sketched
an incremental version of the algorithm, usable for real-time
applications.  This involved keeping the gc_refs field ("external"
reference counts) up-to-date at all times, which would require two
different versions of the INCREF/DECREF macros: one for
adding/deleting a reference from a container, and another for
adding/deleting a root reference.  Also, a 4th color (red) was added,
to distinguish between scanned roots and scanned non-roots.  We
decided not to work this out in more detail because the overhead cost
appeared to be much higher than for the previous algorithm; instead,
we recommed that for real-time requirements the whole GC is disabled
(there should be run-time controls for this, not just compile-time).
We also briefly discussed possibilities for generational schemes.

The general opinion was that we should first implement and test the
algorithm as sketched above, and then changes or extensions could be
made.

I was pleasantly surprised to find Neil's code in my inbox when we
came out of the meeting; I think it would be worthwhile to compare and
contrast the two approaches.  (Hm, maybe there's a paper in it?)

The rest of the afternoon was spent discussing continuations,
coroutines and generators, and the fundamental reason why
continuations are so hard (the C stack getting in the way everywhere).
But that's a topic for another mail, maybe.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Wed Mar  1 06:57:49 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 00:57:49 -0500
Subject: need .append patch (was RE: [Python-Dev] Re: Python-checkins digest, Vol 1 #370 - 8 msgs)
In-Reply-To: <200002291302.IAA04581@eric.cnri.reston.va.us>
Message-ID: <000601bf8343$13575040$412d153f@tim>

[Tim, runs checkappend.py over the entire CVS tree, comes up with
 surprisingly many remaining problems, and surprisingly few false hits]

[Guido fixes mailerdaemon.py, and argues for nuking

    Demo\tkinter\www\           (the whole directory)
    Demo\sgi\video\VcrIndex.py  (unclear whether the dir or just the file)

    Demo\sgi\gl\glstdwin\glstdwin.py   (stdwin-related)
    Demo\ibrowse\ibrowse.py            (stdwin-related)
> All these are stdwin-related.  Stdwin will also go out of service per
> 1.6.
]

Then the sooner someone nukes them from the CVS tree, the sooner my
automated hourly checkappend complaint generator will stop pestering
Python-Dev about them <wink>.

> (Conclusion: most multi-arg append() calls are *very* old,

But part of that is because we went thru this exercise a couple years ago
too, and you repaired all the ones in the less obscure parts of the
distribution then.

> or contributed by others.  Sigh.  I must've given bad examples long
> ago...)

Na, I doubt that.  Most people will not read a language defn, at least not
until "something doesn't work".  If the compiler accepts a thing, they
simply *assume* it's correct.  It's pretty easy (at least for me!) to make
this particular mistake as a careless typo, so I assume that's the "source
origin" for many of these too.  As soon you *notice* you've done it, and
that nothing bad happened, the natural tendencies are to (a) believe it's
OK, and (b) save 4 keystrokes (incl. the SHIFTs) over & over again in the
glorious indefinite future <wink>.

Reminds me of a c.l.py thread a while back, wherein someone did stuff like

    None, x, y, None = function_returning_a_4_tuple

to mean that they didn't care what the 1st & 4th values were.  It happened
to work, so they did it more & more.  Eventually a function containing this
mistake needed to reference None after that line, and "suddenly for no
reason at all Python stopped working".

To the extent that you're serious about CP4E, you're begging for more of
this, not less <wink>.

newbies-even-keep-on-doing-things-that-*don't*-work!-ly y'rs  - tim


From tim_one at email.msn.com  Wed Mar  1 07:50:44 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 01:50:44 -0500
Subject: [Python-Dev] Unicode mapping tables
In-Reply-To: <38BBD1A2.CD29AADD@lemburg.com>
Message-ID: <000701bf834a$77acdfe0$412d153f@tim>

[M.-A. Lemburg]
> ...
> Currently, mapping tables map characters to Unicode characters
> and vice-versa. Now the .translate method will use a different
> kind of table: mapping integer ordinals to integer ordinals.

You mean that if I want to map u"a" to u"A", I have to set up some sort of
dict mapping ord(u"a") to ord(u"A")?  I simply couldn't follow this.

> Question: What is more of efficient: having lots of integers
> in a dictionary or lots of characters ?

My bet is "lots of integers", to reduce both space use and comparison time.

> ...
> Something else that changed is the way .capitalize() works. The
> Unicode version uses the Unicode algorithm for it (see TechRep. 13
> on the www.unicode.org site).

#13 is "Unicode Newline Guidelines".  I assume you meant #21 ("Case
Mappings").

> Here's the new doc string:
>
> S.capitalize() -> unicode
>
> Return a capitalized version of S, i.e. words start with title case
> characters, all remaining cased characters have lower case.
>
> Note that *all* characters are touched, not just the first one.
> The change was needed to get it in sync with the .iscapitalized()
> method which is based on the Unicode algorithm too.
>
> Should this change be propogated to the string implementation ?

Unicode makes distinctions among "upper case", "lower case" and "title
case", and you're trying to get away with a single "capitalize" function.
Java has separate toLowerCase, toUpperCase and toTitleCase methods, and
that's the way to do it.  Whatever you do, leave .capitalize alone for 8-bit
strings -- there's no reason to break code that currently works.
"capitalize" seems a terrible choice of name for a titlecase method anyway,
because of its baggage connotations from 8-bit strings.  Since this stuff is
complicated, I say it would be much better to use the same names for these
things as the Unicode and Java folk do:  there's excellent documentation
elsewhere for all this stuff, and it's Bad to make users mentally translate
unique Python terminology to make sense of the official docs.

So my vote is:  leave capitalize the hell alone <wink>.  Do not implement
capitialize for Unicode strings.  Introduce a new titlecase method for
Unicode strings.  Add a new titlecase method to 8-bit strings too.  Unicode
strings should also have methods to get at uppercase and lowercase (as
Unicode defines those).


From tim_one at email.msn.com  Wed Mar  1 08:36:03 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 02:36:03 -0500
Subject: [Python-Dev] Re: Python / Haskell  (fwd)
In-Reply-To: <Pine.LNX.4.10.10002291126370.9095-100000@akbar.nevex.com>
Message-ID: <000801bf8350$cc4ec580$412d153f@tim>

[Greg Wilson, quoting Philip Wadler]

> Well, what I most want is typing.  But you already know that.

So invite him to contribute to the Types-SIG <0.5 wink>.

> Next after typing?  Full lexical scoping for closures.  I want to write:
>
> 	fun x: fun y: x+y
>
> Not:
>
> 	fun x: fun y, x=x: x+y
>
> Lexically scoped closures would be a big help for the embedding technique
> I described [GVW: in a posting to the Software Carpentry discussion list,
> archived at
>
>  http://software-carpentry.codesourcery.com/lists/sc-discuss/msg00068.html
>
> which discussed how to build a flexible 'make' alternative in Python].

So long as we're not deathly concerned over saving a few lines of easy
boilerplate code, Python already supports this approach wonderfully well --
but via using classes with __call__ methods instead of lexical closures.  I
can't make time to debate this now, but suffice it to say dozens on c.l.py
would be delighted to <wink>.  Philip is understandably attached to the
"functional way of spelling things", but Python's way is at least as usable
for this (and many-- including me --would say more so).

> Next after closures?  Disjoint sums.  E.g.,
>
>    fun area(shape) :
>        switch shape:
>            case Circle(r):
>                return pi*r*r
>            case Rectangle(h,w):
>                return h*w
>
> (I'm making up a Python-like syntax.)  This is an alternative to the OO
> approach.  With the OO approach, it is hard to add area, unless you modify
> the Circle and Rectangle class definitions.

Python allows adding new methods to classes dynamically "from the
outside" -- the original definitions don't need to be touched (although it's
certainly preferable to add new methods directly!).  Take this complaint to
the extreme, and I expect you end up reinventing multimethods (suppose you
need to add an intersection(shape1, shape2) method:  N**2 nesting of
"disjoint sums" starts to appear ludicrous <wink>).

In any case, the Types-SIG already seems to have decided that some form of
"typecase" stmt will be needed; see the archives for that; I expect the use
above would be considered abuse, though; Python has no "switch" stmt of any
kind today, and the use above can already be spelled via

    if isinstance(shape, Circle):
        etc
    elif isinstace(shape, Rectange):
        etc
    else:
        raise TypeError(etc)


From gstein at lyra.org  Wed Mar  1 08:51:29 2000
From: gstein at lyra.org (Greg Stein)
Date: Tue, 29 Feb 2000 23:51:29 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>
Message-ID: <Pine.LNX.4.10.10002292348430.19420-100000@nebula.lyra.org>

On Tue, 29 Feb 2000, Ken Manheimer wrote:
>...
> None the less, for those practicing it, the incorrectness of it will be
> fresh news.  I would be less sympathetic with them if there was recent
> warning, eg, the schedule for changing it in the next release was part of
> the current release.  But if you tell somebody you're going to change
> something, and then don't for a few years, you probably need to renew the
> warning before you make the change.  Don't you think so?  Why not?

I agree.

Note that Guido posted a note to c.l.py on Monday. I believe that meets
your notification criteria.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Wed Mar  1 09:10:28 2000
From: gstein at lyra.org (Greg Stein)
Date: Wed, 1 Mar 2000 00:10:28 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <200003010411.XAA12988@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10002292352590.19420-100000@nebula.lyra.org>

On Tue, 29 Feb 2000, Guido van Rossum wrote:
> I'm tired of this rhetoric.  It's not like I'm changing existing
> Python installations retroactively.  I'm planning to release a new
> version of Python which no longer supports certain long-obsolete and
> undocumented behavior.  If you maintain a non-core Python module, you
> should test it against the new release and fix anything that comes up.
> This is why we have an alpha and beta test cycle and even before that
> the CVS version.  If you are a Python user who depends on a 3rd party
> module, you need to find out whether the new version is compatible
> with the 3rd party code you are using, or whether there's a newer
> version available that solves the incompatibility.
> 
> There are people who still run Python 1.4 (really!) because they
> haven't upgraded.  I don't have a problem with that -- they don't get
> much support, but it's their choice, and they may not need the new
> features introduced since then.  I expect that lots of people won't
> upgrade their Python 1.5.2 to 1.6 right away -- they'll wait until the
> other modules/packages they need are compatible with 1.6.  Multi-arg
> append probably won't be the only reason why e.g. Digital Creations
> may need to release an update to Zope for Python 1.6.  Zope comes with
> its own version of Python anyway, so they have control over when they
> make the switch.

I wholeheartedly support his approach. Just ask Mark Hammond :-) how many
times I've said "let's change the code to make it Right; people aren't
required to upgrade [and break their code]."

Of course, his counter is that people need to upgrade to fix other,
unrelated problems. So I relax and try again later :-). But I still
maintain that they can independently grab the specific fixes and leave the
other changes we make.

Maybe it is grey, but I think this change is quite fine. Especially given
Tim's tool.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tim_one at email.msn.com  Wed Mar  1 09:22:06 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 03:22:06 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.10.10002292352590.19420-100000@nebula.lyra.org>
Message-ID: <000b01bf8357$3af08d60$412d153f@tim>

[Greg Stein]
> ...
> Maybe it is grey, but I think this change is quite fine. Especially given
> Tim's tool.

What the heck does Tim's one-eyed trouser snake have to do with this?  I
know *it* likes to think it's the measure of all things, but, frankly, my
tool barely affects the world at all a mere two feet beyond its base <wink>.

tim-and-his-tool-think-the-change-is-a-mixed-thing-but-on-balance-
    the-best-thing-ly y'rs  - tim


From effbot at telia.com  Wed Mar  1 09:40:01 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Wed, 1 Mar 2000 09:40:01 +0100
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.10.10002292348430.19420-100000@nebula.lyra.org>
Message-ID: <00fb01bf8359$c8196a20$34aab5d4@hagrid>

Greg Stein wrote:
> Note that Guido posted a note to c.l.py on Monday. I believe that meets
> your notification criteria.

ahem.  do you seriously believe that everyone in the
Python universe reads comp.lang.python?

afaik, most Python programmers don't.

...

so as far as I'm concerned, this was officially deprecated
with Guido's post.  afaik, no official python documentation
has explicitly mentioned this (and the fact that it doesn't
explicitly allow it doesn't really matter, since the docs don't
explicitly allow the x[a, b, c] syntax either.  both work in
1.5.2).

has anyone checked the recent crop of Python books,
btw?  the eff-bot guide uses old syntax in two examples
out of 320.  how about the others?

...

sigh.  running checkappend over a 50k LOC application, I
just realized that it doesn't catch a very common append
pydiom.  

how fun.  even though 99% of all append calls are "legal",
this "minor" change will break every single application and
library we have :-(

oh, wait.  xmlrpclib isn't affected.  always something!

</F>


From gstein at lyra.org  Wed Mar  1 09:43:02 2000
From: gstein at lyra.org (Greg Stein)
Date: Wed, 1 Mar 2000 00:43:02 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <00fb01bf8359$c8196a20$34aab5d4@hagrid>
Message-ID: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org>

On Wed, 1 Mar 2000, Fredrik Lundh wrote:
> Greg Stein wrote:
> > Note that Guido posted a note to c.l.py on Monday. I believe that meets
> > your notification criteria.
> 
> ahem.  do you seriously believe that everyone in the
> Python universe reads comp.lang.python?
> 
> afaik, most Python programmers don't.

Now you're simply taking my comments out of context. Not a proper thing to
do. Ken said that he wanted notification along certain guidelines. I said
that I believed Guido's post did just that. Period.

Personally, I think it is fine. I also think that a CHANGES file that
arrives with 1.6 that points out the incompatibility is also fine.

>...
> sigh.  running checkappend over a 50k LOC application, I
> just realized that it doesn't catch a very common append
> pydiom.  

And which is that? Care to help out? Maybe just a little bit? Or do you
just want to talk about how bad this change is? :-(

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Wed Mar  1 10:01:52 2000
From: gstein at lyra.org (Greg Stein)
Date: Wed, 1 Mar 2000 01:01:52 -0800 (PST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <000b01bf8357$3af08d60$412d153f@tim>
Message-ID: <Pine.LNX.4.10.10003010101100.19420-100000@nebula.lyra.org>

On Wed, 1 Mar 2000, Tim Peters wrote:
> [Greg Stein]
> > ...
> > Maybe it is grey, but I think this change is quite fine. Especially given
> > Tim's tool.
> 
> What the heck does Tim's one-eyed trouser snake have to do with this?  I
> know *it* likes to think it's the measure of all things, but, frankly, my
> tool barely affects the world at all a mere two feet beyond its base <wink>.
> 
> tim-and-his-tool-think-the-change-is-a-mixed-thing-but-on-balance-
>     the-best-thing-ly y'rs  - tim

Heh. Now how is one supposed to respond to *that* ??!

All right. Fine. +3 cool points go to Tim.

:-)

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Wed Mar  1 10:03:32 2000
From: gstein at lyra.org (Greg Stein)
Date: Wed, 1 Mar 2000 01:03:32 -0800 (PST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src Makefile.in,1.82,1.83
In-Reply-To: <14523.56638.286603.340358@weyr.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003010102080.19420-100000@nebula.lyra.org>

On Tue, 29 Feb 2000, Fred L. Drake, Jr. wrote:
> Guido van Rossum writes:
>  > You can already extract this from the updated documetation on the
>  > website (which has a list of obsolete modules).
>  > 
>  > But you're righ,t it would be good to be open about this.  I'll think
>  > about it.
> 
>   Note that the updated documentation isn't yet "published"; there are 
> no links to it and it hasn't been checked as much as I need it to be
> before announcing it.

Isn't the documentation better than what has been released? In other
words, if you release now, how could you make things worse? If something
does turn up during a check, you can always release again...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From effbot at telia.com  Wed Mar  1 10:13:13 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Wed, 1 Mar 2000 10:13:13 +0100
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org>
Message-ID: <011001bf835e$600d1da0$34aab5d4@hagrid>

Greg Stein <gstein at lyra.org> wrote:
> On Wed, 1 Mar 2000, Fredrik Lundh wrote:
> > Greg Stein wrote:
> > > Note that Guido posted a note to c.l.py on Monday. I believe that meets
> > > your notification criteria.
> > 
> > ahem.  do you seriously believe that everyone in the
> > Python universe reads comp.lang.python?
> > 
> > afaik, most Python programmers don't.
> 
> Now you're simply taking my comments out of context. Not a proper thing to
> do. Ken said that he wanted notification along certain guidelines. I said
> that I believed Guido's post did just that. Period.

my point was that most Python programmers won't
see that notification.  when these people download
1.6 final and find that all theirs apps just broke, they
probably won't be happy with a pointer to dejanews.

> And which is that? Care to help out? Maybe just a little bit?

this rather common pydiom:

    append = list.append
    for x in something:
        append(...)

it's used a lot where performance matters.

> Or do you just want to talk about how bad this change is? :-(

yes, I think it's bad.  I've been using Python since 1.2,
and no other change has had the same consequences
(wrt. time/money required to fix it)

call me a crappy programmer if you want, but I'm sure
there are others out there who are nearly as bad.  and
lots of them won't be aware of this change until some-
one upgrades the python interpreter on their server.

</F>


From mal at lemburg.com  Wed Mar  1 09:38:52 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 01 Mar 2000 09:38:52 +0100
Subject: [Python-Dev] Unicode mapping tables
References: <000701bf834a$77acdfe0$412d153f@tim>
Message-ID: <38BCD71C.3592E6A@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > ...
> > Currently, mapping tables map characters to Unicode characters
> > and vice-versa. Now the .translate method will use a different
> > kind of table: mapping integer ordinals to integer ordinals.
> 
> You mean that if I want to map u"a" to u"A", I have to set up some sort of
> dict mapping ord(u"a") to ord(u"A")?  I simply couldn't follow this.

I meant:

  'a': u'A' vs. ord('a'): ord(u'A')

The latter wins ;-) Reasoning for the first was that it allows
character sequences to be handled by the same mapping algorithm.
I decided to leave those techniques to some future implementation,
since mapping integers has the nice side-effect of also allowing
sequences to be used as mapping tables... resulting in some
speedup at the cost of memory consumption.

BTW, there are now three different ways to do char translations:

1. char -> unicode  (char mapping codec's decode)
2. unicode -> char  (char mapping codec's encode)
3. unicode -> unicode (unicode's .translate() method)
 
> > Question: What is more of efficient: having lots of integers
> > in a dictionary or lots of characters ?
> 
> My bet is "lots of integers", to reduce both space use and comparison time.

Right. That's what I found too... it's "lots of integers" now :-)
 
> > ...
> > Something else that changed is the way .capitalize() works. The
> > Unicode version uses the Unicode algorithm for it (see TechRep. 13
> > on the www.unicode.org site).
> 
> #13 is "Unicode Newline Guidelines".  I assume you meant #21 ("Case
> Mappings").

Dang. You're right. Here's the URL in case someone
wants to join in:

   http://www.unicode.org/unicode/reports/tr21/tr21-2.html

> > Here's the new doc string:
> >
> > S.capitalize() -> unicode
> >
> > Return a capitalized version of S, i.e. words start with title case
> > characters, all remaining cased characters have lower case.
> >
> > Note that *all* characters are touched, not just the first one.
> > The change was needed to get it in sync with the .iscapitalized()
> > method which is based on the Unicode algorithm too.
> >
> > Should this change be propogated to the string implementation ?
> 
> Unicode makes distinctions among "upper case", "lower case" and "title
> case", and you're trying to get away with a single "capitalize" function.
> Java has separate toLowerCase, toUpperCase and toTitleCase methods, and
> that's the way to do it.

The Unicode implementation has the corresponding:

.upper(), .lower() and .capitalize()

They work just like .toUpperCase, .toLowerCase, .toTitleCase
resp. (well at least they should ;).

> Whatever you do, leave .capitalize alone for 8-bit
> strings -- there's no reason to break code that currently works.
> "capitalize" seems a terrible choice of name for a titlecase method anyway,
> because of its baggage connotations from 8-bit strings.  Since this stuff is
> complicated, I say it would be much better to use the same names for these
> things as the Unicode and Java folk do:  there's excellent documentation
> elsewhere for all this stuff, and it's Bad to make users mentally translate
> unique Python terminology to make sense of the official docs.

Hmm, that's an argument but it breaks the current method
naming scheme of all lowercase letter. Perhaps I should simply
provide a new method for .toTitleCase(), e.g. .title(), and
leave the previous definition of .capitalize() intact...

> So my vote is:  leave capitalize the hell alone <wink>.  Do not implement
> capitialize for Unicode strings.  Introduce a new titlecase method for
> Unicode strings.  Add a new titlecase method to 8-bit strings too.  Unicode
> strings should also have methods to get at uppercase and lowercase (as
> Unicode defines those).

...looks like you're more or less on the same wave length here ;-)

Here's what I'll do:

* implement .capitalize() in the traditional way for Unicode
  objects (simply convert the first char to uppercase)
* implement u.title() to mean the same as Java's toTitleCase()
* don't implement s.title(): the reasoning here is that it would
  confuse the user when she get's different return values for
  the same string (titlecase chars usually live in higher Unicode
  code ranges not reachable in Latin-1)

Thanks for the feedback,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From tim_one at email.msn.com  Wed Mar  1 11:06:58 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 05:06:58 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <00fb01bf8359$c8196a20$34aab5d4@hagrid>
Message-ID: <000e01bf8365$e1e0b9c0$412d153f@tim>

[/F]
> ...
> so as far as I'm concerned, this was officially deprecated
> with Guido's post.  afaik, no official python documentation
> has explicitly mentioned this (and the fact that it doesn't
> explicitly allow it doesn't really matter, since the docs don't
> explicitly allow the x[a, b, c] syntax either.  both work in
> 1.5.2).

The "Subscriptions" section of the Reference Manual explicitly allows for

    dict[a, b, c]

and explicitly does not allow for

    sequence[a, b, c]

The "Mapping Types" section of the Library Ref does not explicitly allow for
it, though, and if you read it as implicitly allowing for it (based on the
Reference Manual's clarification of "key" syntax), you would also have to
read the Library Ref as allowing for

    dict.has_key(a, b, c)

Which 1.5.2 does allow, but which Guido very recently patched to treat as a
syntax error.

> ...
> sigh.  running checkappend over a 50k LOC application, I
> just realized that it doesn't catch a very common append
> pydiom.

[And, later, after prodding by GregS]

> this rather common pydiom:
>
>    append = list.append
>    for x in something:
>        append(...)

This limitation was pointed out in checkappend's module docstring.  Doesn't
make it any easier for you to swallow, but I needed to point out that you
didn't *have* to stumble into this the hard way <wink>.

> how fun.  even though 99% of all append calls are "legal",
> this "minor" change will break every single application and
> library we have :-(
>
> oh, wait.  xmlrpclib isn't affected.  always something!

What would you like to do, then?  The code will be at least as broken a year
from now, and probably more so -- unless you fix it.  So this sounds like an
indirect argument for never changing Python's behavior here.  Frankly, I
expect you could fix the 50K LOC in less time than it took me to write this
naggy response <0.50K wink>.

embrace-change-ly y'rs  - tim


From tim_one at email.msn.com  Wed Mar  1 11:31:12 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 05:31:12 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <000e01bf8365$e1e0b9c0$412d153f@tim>
Message-ID: <001001bf8369$453e9fc0$412d153f@tim>

[Tim. needing sleep]
>     dict.has_key(a, b, c)
> 
> Which 1.5.2 does allow, but which Guido very recently patched to 
> treat as a syntax error.

No, a runtime error.  haskeynanny.py, anyone?

not-me-ly y'rs  - tim


From fredrik at pythonware.com  Wed Mar  1 12:14:18 2000
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 1 Mar 2000 12:14:18 +0100
Subject: [Python-Dev] breaking list.append()
References: <000e01bf8365$e1e0b9c0$412d153f@tim>
Message-ID: <002101bf836f$4a012220$f29b12c2@secret.pythonware.com>

Tim Peters wrote:
> The "Subscriptions" section of the Reference Manual explicitly allows for
> 
>     dict[a, b, c]
> 
> and explicitly does not allow for
> 
>     sequence[a, b, c]

I'd thought we'd agreed that nobody reads the
reference manual ;-)

> What would you like to do, then?

more time to fix it, perhaps?  it's surely a minor
code change, but fixing it can be harder than
you think (just witness Gerrit's bogus patches)

after all, python might be free, but more and more
people are investing lots of money in using it [1].

> The code will be at least as broken a year
> from now, and probably more so -- unless you fix it. 

sure.  we've already started.  but it's a lot of work,
and it's quite likely that it will take a while until we
can be 100% confident that all the changes are pro-
perly done.

(not all software have a 100% complete test suite that
simply says "yes, this works" or "no, it doesn't")

</F>

1) fwiw, some poor soul over here posted a short note
to the pythonworks mailing, mentioning that we've now
fixed the price.  a major flamewar erupted, and my mail-
box is now full of mail from unknowns telling me that I
must be a complete moron that doesn't understand that
Python is just a toy system, which everyone uses just be-
cause they cannot afford anything better...


From tim_one at email.msn.com  Wed Mar  1 12:26:21 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 1 Mar 2000 06:26:21 -0500
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: <200003010544.AAA13155@eric.cnri.reston.va.us>
Message-ID: <001101bf8370$f881dfa0$412d153f@tim>

Very briefly:

[Guido]
> ...
> Today, Eric proposed to do away with Neil's hash table altogether --
> as long as we're wasting memory, we might as well add 3 fields to each
> container object rather than allocating the same amount in a separate
> hash table.  Eric expects that this will run faster, although this
> obviously needs to be tried.

No, it doesn't <wink>:  it will run faster.

> Container types are: dict, list, tuple, class, instance; plus
> potentially user-defined container types such as kjbuckets.  I
> have a feeling that function objects should also be considered
> container types, because of the cycle involving globals.

Note that the list-migrating steps you sketch later are basically the same
as (but hairier than) the ones JimF and I worked out for M&S-on-RC a few
years ago, right down to using appending to effect a breadth-first traversal
without requiring recursion -- except M&S doesn't have to bother accounting
for sources of refcounts.  Since *this* scheme does more work per item per
scan, to be as fast in the end it has to touch less stuff than M&S.  But the
more kinds of types you track, the more stuff this scheme will have to
chase.

The tradeoffs are complicated & unclear, so I'll just raise an uncomfortable
meta-point <wink>:  you balked at M&S the last time around because of the
apparent need for two link fields + a bit or two per object of a "chaseable
type".  If that's no longer perceived as being a showstopper, M&S should be
reconsidered too.

I happen to be a fan of both approaches <wink>.  The worst part of M&S-on-RC
(== the one I never had a good answer for) is that a non-cooperating
extension type E can't be chased, hence objects reachable only from objects
of type E never get marked, so are vulnerable to bogus collection.  In the
Neil/Toby scheme, objects of type E merely act as  sources of "external"
references, so the scheme fails safe (in the sense of never doing a bogus
collection due to non-cooperating types).

Hmm ... if both approaches converge on keeping a list of all chaseable
objects, and being careful of uncoopoerating types, maybe the only real
difference in the end is whether the root set is given explicitly (as in
traditional M&S) or inferred indirectly (but where "root set" has a
different meaning in the scheme you sketched).

> ...
> In our case, we may need a type-specific "clear" function for containers
> in the type object.

I think definitely, yes.

full-speed-sideways<wink>-ly y'rs  - tim


From mal at lemburg.com  Wed Mar  1 11:40:36 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 01 Mar 2000 11:40:36 +0100
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org> <011001bf835e$600d1da0$34aab5d4@hagrid>
Message-ID: <38BCF3A4.1CCADFCE@lemburg.com>

Fredrik Lundh wrote:
> 
> Greg Stein <gstein at lyra.org> wrote:
> > On Wed, 1 Mar 2000, Fredrik Lundh wrote:
> > > Greg Stein wrote:
> > > > Note that Guido posted a note to c.l.py on Monday. I believe that meets
> > > > your notification criteria.
> > >
> > > ahem.  do you seriously believe that everyone in the
> > > Python universe reads comp.lang.python?
> > >
> > > afaik, most Python programmers don't.
> >
> > Now you're simply taking my comments out of context. Not a proper thing to
> > do. Ken said that he wanted notification along certain guidelines. I said
> > that I believed Guido's post did just that. Period.
> 
> my point was that most Python programmers won't
> see that notification.  when these people download
> 1.6 final and find that all theirs apps just broke, they
> probably won't be happy with a pointer to dejanews.

Dito. Anyone remember the str(2L) == '2' change, BTW ?
That one will cost lots of money in case someone implemented
an eShop using the common str(2L)[:-1] idiom...

There will need to be a big warning sign somewhere that
people see *before* finding the download link. (IMHO, anyways.)

> > And which is that? Care to help out? Maybe just a little bit?
> 
> this rather common pydiom:
> 
>     append = list.append
>     for x in something:
>         append(...)
> 
> it's used a lot where performance matters.

Same here. checkappend.py doesn't find these (a great tool BTW,
thanks Tim; I noticed that it leaks memory badly though).
 
> > Or do you just want to talk about how bad this change is? :-(
> 
> yes, I think it's bad.  I've been using Python since 1.2,
> and no other change has had the same consequences
> (wrt. time/money required to fix it)
> 
> call me a crappy programmer if you want, but I'm sure
> there are others out there who are nearly as bad.  and
> lots of them won't be aware of this change until some-
> one upgrades the python interpreter on their server.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Wed Mar  1 13:07:42 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 07:07:42 -0500
Subject: need .append patch (was RE: [Python-Dev] Re: Python-checkins digest, Vol 1 #370 - 8 msgs)
In-Reply-To: Your message of "Wed, 01 Mar 2000 00:57:49 EST."
             <000601bf8343$13575040$412d153f@tim> 
References: <000601bf8343$13575040$412d153f@tim> 
Message-ID: <200003011207.HAA13342@eric.cnri.reston.va.us>

> To the extent that you're serious about CP4E, you're begging for more of
> this, not less <wink>.

Which is exactly why I am breaking multi-arg append now -- this is my
last chance.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Wed Mar  1 13:27:10 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 07:27:10 -0500
Subject: [Python-Dev] Unicode mapping tables
In-Reply-To: Your message of "Wed, 01 Mar 2000 09:38:52 +0100."
             <38BCD71C.3592E6A@lemburg.com> 
References: <000701bf834a$77acdfe0$412d153f@tim>  
            <38BCD71C.3592E6A@lemburg.com> 
Message-ID: <200003011227.HAA13396@eric.cnri.reston.va.us>

> Here's what I'll do:
> 
> * implement .capitalize() in the traditional way for Unicode
>   objects (simply convert the first char to uppercase)
> * implement u.title() to mean the same as Java's toTitleCase()
> * don't implement s.title(): the reasoning here is that it would
>   confuse the user when she get's different return values for
>   the same string (titlecase chars usually live in higher Unicode
>   code ranges not reachable in Latin-1)

Huh?  For ASCII at least, titlecase seems to map to ASCII; in your
current implementation, only two Latin-1 characters (u'\265' and
u'\377', I have no easy way to show them in Latin-1) map outside the
Latin-1 range.

Anyway, I would suggest to add a title() call to 8-bit strings as
well; then we can do away with string.capwords(), which does something
similar but different, mostly by accident.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jack at oratrix.nl  Wed Mar  1 13:34:42 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 01 Mar 2000 13:34:42 +0100
Subject: [Python-Dev] Re: A warning switch? 
In-Reply-To: Message by Guido van Rossum <guido@python.org> ,
	     Mon, 28 Feb 2000 12:35:12 -0500 , <200002281735.MAA27771@eric.cnri.reston.va.us> 
Message-ID: <20000301123442.7DEF8371868@snelboot.oratrix.nl>

> > What about adding a command-line switch for enabling warnings, as has
> > been suggested long ago?  The .append() change could then print a
> > warning in 1.6alphas (and betas?), but still run, and be turned into
> > an error later.
> 
> That's better.  I propose that the warnings are normally on, and that
> there are flags to turn them off or thrn them into errors.

Can we then please have an interface to the "give warning" call (in stead of a 
simple fprintf)? On the mac (and possibly also in PythonWin) it's probably 
better to pop up a dialog (possibly with a "don't show again" button) than do 
a printf which may get lost.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido at python.org  Wed Mar  1 13:55:42 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 07:55:42 -0500
Subject: [Python-Dev] Re: A warning switch?
In-Reply-To: Your message of "Wed, 01 Mar 2000 13:34:42 +0100."
             <20000301123442.7DEF8371868@snelboot.oratrix.nl> 
References: <20000301123442.7DEF8371868@snelboot.oratrix.nl> 
Message-ID: <200003011255.HAA13489@eric.cnri.reston.va.us>

> Can we then please have an interface to the "give warning" call (in
> stead of a simple fprintf)? On the mac (and possibly also in
> PythonWin) it's probably better to pop up a dialog (possibly with a
> "don't show again" button) than do a printf which may get lost.

Sure.  All you have to do is code it (or get someone else to code it).

<0.9 wink>

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed Mar  1 14:32:02 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 01 Mar 2000 14:32:02 +0100
Subject: [Python-Dev] Unicode mapping tables
References: <000701bf834a$77acdfe0$412d153f@tim>  
	            <38BCD71C.3592E6A@lemburg.com> <200003011227.HAA13396@eric.cnri.reston.va.us>
Message-ID: <38BD1BD2.792E9B73@lemburg.com>

Guido van Rossum wrote:
> 
> > Here's what I'll do:
> >
> > * implement .capitalize() in the traditional way for Unicode
> >   objects (simply convert the first char to uppercase)
> > * implement u.title() to mean the same as Java's toTitleCase()
> > * don't implement s.title(): the reasoning here is that it would
> >   confuse the user when she get's different return values for
> >   the same string (titlecase chars usually live in higher Unicode
> >   code ranges not reachable in Latin-1)
> 
> Huh?  For ASCII at least, titlecase seems to map to ASCII; in your
> current implementation, only two Latin-1 characters (u'\265' and
> u'\377', I have no easy way to show them in Latin-1) map outside the
> Latin-1 range.

You're right, sorry for the confusion. I was thinking of other
encodings like e.g. cp437 which have corresponding characters
in the higher Unicode ranges.

> Anyway, I would suggest to add a title() call to 8-bit strings as
> well; then we can do away with string.capwords(), which does something
> similar but different, mostly by accident.

Ok, I'll do it this way then: s.title() will use C's toupper() and
tolower() for case mapping and u.title() the Unicode routines.

This will be in sync with the rest of the 8-bit string world
(which is locale aware on many platforms AFAIK), even though
it might not return the same string as the corresponding
u.title() call.

u.capwords() will be disabled in the Unicode implemetation...
it wasn't even implemented for the string implementetation,
so there's no breakage ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From akuchlin at mems-exchange.org  Wed Mar  1 15:59:07 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Wed, 1 Mar 2000 09:59:07 -0500 (EST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <011001bf835e$600d1da0$34aab5d4@hagrid>
References: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org>
	<011001bf835e$600d1da0$34aab5d4@hagrid>
Message-ID: <14525.12347.120543.804804@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>yes, I think it's bad.  I've been using Python since 1.2,
>and no other change has had the same consequences
>(wrt. time/money required to fix it)

There are more things in 1.6 that might require fixing existing code:
str(2L) returning '2', the int/long changes, the Unicode changes, and
if it gets added, garbage collection -- and bugs caused by those
changes might not be catchable by a nanny.  IMHO it's too early to
point at the .append() change as breaking too much existing code;
there may be changes that break a lot more.  I'd wait and see what
happens once the 1.6 alphas become available; if c.l.p is filled with
shrieks and groans, GvR might decide to back the offending change out.
(Or he might not...)

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
I have no skills with machines. I fear them, and because I cannot help
attributing human qualities to them, I suspect that they hate me and will kill
me if they can.
    -- Robertson Davies, "Reading"


From klm at digicool.com  Wed Mar  1 16:37:49 2000
From: klm at digicool.com (Ken Manheimer)
Date: Wed, 1 Mar 2000 10:37:49 -0500 (EST)
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <Pine.LNX.4.10.10002292348430.19420-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.21.0003011033030.22173-100000@korak.digicool.com>

On Tue, 29 Feb 2000, Greg Stein wrote:

> On Tue, 29 Feb 2000, Ken Manheimer wrote:
> >...
> > None the less, for those practicing it, the incorrectness of it will be
> > fresh news.  I would be less sympathetic with them if there was recent
> > warning, eg, the schedule for changing it in the next release was part of
> > the current release.  But if you tell somebody you're going to change
> > something, and then don't for a few years, you probably need to renew the
> > warning before you make the change.  Don't you think so?  Why not?
> 
> I agree.
> 
> Note that Guido posted a note to c.l.py on Monday. I believe that meets
> your notification criteria.

Actually, by "part of the current release", i meant having the
deprecation/impending-deletion warning in the release notes for the
release before the one where the deletion happens - saying it's being
deprecated now, will be deleted next time around.

Ken
klm at digicool.com

 I mean, you tell one guy it's blue.  He tells his guy it's brown, and it
 lands on the page sorta purple.         Wavy Gravy/Hugh Romney


From marangoz at python.inrialpes.fr  Wed Mar  1 18:07:07 2000
From: marangoz at python.inrialpes.fr (Vladimir Marangozov)
Date: Wed, 1 Mar 2000 18:07:07 +0100 (CET)
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: <200003010544.AAA13155@eric.cnri.reston.va.us> from "Guido van Rossum" at Mar 01, 2000 12:44:10 AM
Message-ID: <200003011707.SAA01310@python.inrialpes.fr>

Guido van Rossum wrote:
> 
> Thanks for the new patches, Neil!

Thanks from me too!
I notice, however, that hash_resize() still uses a malloc call
instead of PyMem_NEW. Neil, please correct this in your version
immediately ;-)

> 
> We had a visitor here at CNRI today, Eric Tiedemann
> <est at hyperreal.org>, who had a look at your patches before.  Eric
> knows his way around the Scheme, Lisp and GC literature, and presented
> a variant on your approach which takes the bite out of the recursive
> passes.

Avoiding the recursion is valuable, as long we're optimizing the
implementation of one particular scheme. It doesn't bother me that
Neil's scheme is recursive, because I still perceive his code as a
proof of concept.

You're presenting here another scheme based on refcounts arithmetic,
generalized for all container types. The linked list implementation
of this generalized scheme is not directly related to the logic.

I have some suspitions on the logic, so you'll probably want to elaborate
a bit more on it, and convince me that this scheme would actually work.

> Today, Eric proposed to do away with Neil's hash table altogether --
> as long as we're wasting memory, we might as well add 3 fields to each
> container object rather than allocating the same amount in a separate
> hash table.

I cannot agree so easily with this statement, but you should have expecting
this from me :-)  If we're about to opimize storage, I have good reasons
to believe that we don't need 3 additional slots per container (but 1 for
gc_refs, yes).

We could certainly envision allocating the containers within memory pools
of 4K (just as it is done in pymalloc, and close to what we have for
ints & floats). These pools would be labaled as "container's memory",
they would obviously be under our control, and we'd have additional slots
per pool, not per object. As long as we isolate the containers from the
rest, we can enumerate them easily by walking though the pools.

But I'm willing to defer this question for now, as it involves the object
allocators (the builtin allocators + PyObject_NEW for extension types E --
user objects of type E would be automatically taken into account for GC
if there's a flag in the type struct which identifies them as containers).

> Eric expects that this will run faster, although this obviously needs
> to be tried.

Definitely, although I trust Eric & Tim :-)

> 
> Container types are: dict, list, tuple, class, instance; plus
> potentially user-defined container types such as kjbuckets.  I have a
> feeling that function objects should also be considered container
> types, because of the cycle involving globals.

+ other extension container types. And I insist.
Don't forget that we're planning to merge types and classes...

> 
> Eric's algorithm, then, consists of the following parts.
> 
> Each container object has three new fields: gc_next, gc_prev, and
> gc_refs.  (Eric calls the gc_refs "refcount-zero".)
> 
> We color objects white (initial), gray (root), black (scanned root).
> (The terms are explained later; we believe we don't actually need bits
> in the objects to store the color; see later.)
> 
> All container objects are chained together in a doubly-linked list --
> this is the same as Neil's code except Neil does it only for dicts.
> (Eric postulates that you need a list header.)
> 
> When GC is activated, all objects are colored white; we make a pass
> over the entire list and set gc_refs equal to the refcount for each
> object.

Step 1:  for all containers, c->gc_refs = c->ob_refcnt

> 
> Next, we make another pass over the list to collect the internal
> references.  Internal references are (just like in Neil's version)
> references from other container types.  In Neil's version, this was
> recursive; in Eric's version, we don't need recursion, since the list
> already contains all containers.  So we simple visit the containers in
> the list in turn, and for each one we go over all the objects it
> references and subtract one from *its* gc_refs field.  (Eric left out
> the little detail that we ened to be able to distinguish between
> container and non-container objects amongst those references; this can
> be a flag bit in the type field.)

Step 2:  c->gc_refs = c->gc_refs - Nb_referenced_containers_from_c

I guess that you realize that after this step, gc_refs can be zero
or negative.

I'm not sure that you collect "internal" references here (references
from other container types). A list referencing 20 containers, being
itself referenced by one container + one static variable + two times
from the runtime stack, has an initial refcount == 4, so we'll end
up with gc_refs == -16.

A tuple referencing 1 list, referenced once by the stack, will end up
with gc_refs == 0.

Neil's scheme doesn't seem to have this "property".

> 
> Now, similar to Neil's version, all objects for which gc_refs == 0
> have only internal references, and are potential garbage; all objects
> for which gc_refs > 0 are "roots".  These have references to them from
> other places, e.g. from globals or stack frames in the Python virtual
> machine.
> 

Agreed, some roots have gc_refs > 0
I'm not sure that all of them have it, though... Do they?

> We now start a second list, to which we will move all roots.  The way
> to do this is to go over the first list again and to move each object
> that has gc_refs > 0 to the second list.  Objects placed on the second
> list in this phase are considered colored gray (roots).
> 

Step 3: Roots with gc_refs > 0 go to the 2nd list.
        All c->gc_refs <= 0 stay in the 1st list.

> Of course, some roots will reference some non-roots, which keeps those
> non-roots alive.  We now make a pass over the second list, where for
> each object on the second list, we look at every object it references.
> If a referenced object is a container and is still in the first list
> (colored white) we *append* it to the second list (colored gray).
> Because we append, objects thus added to the second list will
> eventually be considered by this same pass; when we stop finding
> objects that sre still white, we stop appending to the second list,
> and we will eventually terminate this pass.  Conceptually, objects on
> the second list that have been scanned in this pass are colored black
> (scanned root); but there is no need to to actually make the
> distinction.
> 

Step 4: Closure on reachable containers which are all moved to the 2nd list.

(Assuming that the objects are checked only via their type, without
involving gc_refs)

> (How do we know whether an object pointed to is white (in the first
> list) or gray or black (in the second)?

Good question? :-)

> We could use an extra  bitfield, but that's a waste of space.
> Better: we could set gc_refs to a magic value (e.g. 0xffffffff) when
> we move the object to the second list.

I doubt that this would work for the reasons mentioned above.

> During the meeting, I proposed to set the back pointer to NULL; that
> might work too but I think the gc_refs field is more elegant. We could
> even just test for a non-zero gc_refs field; the roots moved to the
> second list initially all have a non-zero gc_refs field already, and
> for the objects with a zero gc_refs field we could indeed set it to
> something arbitrary.)

Not sure that "arbitrary" is a good choice if the differentiation
is based solely on gc_refs.

> 
> Once we reach the end of the second list, all objects still left in
> the first list are garbage.  We can destroy them in a similar to the
> way Neil does this in his code.  Neil calls PyDict_Clear on the
> dictionaries, and ignores the rest.  Under Neils assumption that all
> cycles (that he detects) involve dictionaries, that is sufficient.  In
> our case, we may need a type-specific "clear" function for containers
> in the type object.

Couldn't this be done in the object's dealloc function?

Note that both Neil's and this scheme assume that garbage _detection_
and garbage _collection_ is an atomic operation. I must say that
I don't care of having some living garbage if it doesn't hurt my work.
IOW, the used criterion for triggering the detection phase _may_ eventually
differ from the one used for the collection phase. But this is where we
reach the incremental approaches, implying different reasoning as a
whole. My point is that the introduction of a "clear" function depends
on the adopted scheme, whose logic depends on pertinent statistics on
memory consumption of the cyclic garbage.

To make it simple, we first need stats on memory consumption, then we
can discuss objectively on how to implement some particular GC scheme.
I second Eric on the need for excellent statistics.

> 
> The general opinion was that we should first implement and test the
> algorithm as sketched above, and then changes or extensions could be
> made.

I'd like to see it discussed first in conjunction with (1) the possibility
of having a proprietary malloc, (2) the envisioned type/class unification.
Perhaps I'm getting too deep, but once something gets in, it's difficult
to take it out, even when a better solution is found subsequently. Although
I'm enthousiastic about this work on GC, I'm not in a position to evaluate
the true benefits of the proposed schemes, as I still don't have a basis
for evaluating how much garbage my program generates and whether it hurts
the interpreter compared to its overal memory consumption.

> 
> I was pleasantly surprised to find Neil's code in my inbox when we
> came out of the meeting; I think it would be worthwhile to compare and
> contrast the two approaches.  (Hm, maybe there's a paper in it?)

I'm all for it!

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From jeremy at cnri.reston.va.us  Wed Mar  1 18:53:13 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Wed, 1 Mar 2000 12:53:13 -0500 (EST)
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: <200003011707.SAA01310@python.inrialpes.fr>
References: <200003010544.AAA13155@eric.cnri.reston.va.us>
	<200003011707.SAA01310@python.inrialpes.fr>
Message-ID: <14525.22793.963077.707198@goon.cnri.reston.va.us>

>>>>> "VM" == Vladimir Marangozov <marangoz at python.inrialpes.fr> writes:

  [">>" == Guido explaining Eric Tiedemann's GC design]
  >>  Next, we make another pass over the list to collect the internal
  >> references.  Internal references are (just like in Neil's
  >> version) references from other container types.  In Neil's
  >> version, this was recursive; in Eric's version, we don't need
  >> recursion, since the list already contains all containers.  So we
  >> simple visit the containers in the list in turn, and for each one
  >> we go over all the objects it references and subtract one from
  >> *its* gc_refs field.  (Eric left out the little detail that we
  >> ened to be able to distinguish between container and
  >> non-container objects amongst those references; this can be a
  >> flag bit in the type field.)

  VM> Step 2: c->gc_refs = c->gc_refs -
  VM> Nb_referenced_containers_from_c

  VM> I guess that you realize that after this step, gc_refs can be
  VM> zero or negative.

I think Guido's explanation is slightly ambiguous.  When he says,
"subtract one from *its" gc_refs field" he means subtract one from the
_contained_ object's gc_refs field.  

  VM> I'm not sure that you collect "internal" references here
  VM> (references from other container types). A list referencing 20
  VM> containers, being itself referenced by one container + one
  VM> static variable + two times from the runtime stack, has an
  VM> initial refcount == 4, so we'll end up with gc_refs == -16.

The strategy is not that the container's gc_refs is decremented once
for each object it contains.  Rather, the container decrements each
contained object's gc_refs by one.  So you should never end of with
gc_refs < 0.

  >> During the meeting, I proposed to set the back pointer to NULL;
  >> that might work too but I think the gc_refs field is more
  >> elegant. We could even just test for a non-zero gc_refs field;
  >> the roots moved to the second list initially all have a non-zero
  >> gc_refs field already, and for the objects with a zero gc_refs
  >> field we could indeed set it to something arbitrary.)

I believe we discussed this further and concluded that setting the
back pointer to NULL would not work.  If we make the second list
doubly-linked (like the first one), it is trivial to end GC by
swapping the first and second lists.  If we've zapped the NULL
pointer, then we have to go back and re-set them all.

Jeremy


From mal at lemburg.com  Wed Mar  1 19:44:58 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 01 Mar 2000 19:44:58 +0100
Subject: [Python-Dev] Unicode Snapshot 2000-03-01
Message-ID: <38BD652A.EA2EB0A3@lemburg.com>

There is a new Unicode implementation snaphot available at the secret
URL. It contains quite a few small changes to the internal APIs,
doc strings for all methods and some new methods (e.g. .title()) 
on the Unicode and the string objects. The code page mappings
are now integer->integer which should make them more performant.

Some of the C codec APIs have changed, so you may need to
adapt code that already uses these (Fredrik ?!).

Still missing is a MSVC project file... haven't gotten around yet
to build one. The code does compile on WinXX though, as Finn
Bock told me in private mail.

Please try out the new stuff... Most interesting should be the
code in Lib/codecs.py as it provides a very high level interface
to all those builtin codecs.

BTW: I would like to implement a .readline() method using only
the .read() method as basis. Does anyone have a good idea on
how this could be done without buffering ?
(Unicode has a slightly larger choice of line break chars as C; the
.splitlines() method will deal with these)

Gotta run...
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From effbot at telia.com  Wed Mar  1 20:20:12 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Wed, 1 Mar 2000 20:20:12 +0100
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.10.10003010038000.19420-100000@nebula.lyra.org><011001bf835e$600d1da0$34aab5d4@hagrid> <14525.12347.120543.804804@amarok.cnri.reston.va.us>
Message-ID: <034a01bf83b3$e97c8620$34aab5d4@hagrid>

Andrew M. Kuchling wrote:
> There are more things in 1.6 that might require fixing existing code:
> str(2L) returning '2', the int/long changes, the Unicode changes, and
> if it gets added, garbage collection -- and bugs caused by those
> changes might not be catchable by a nanny.

hey, you make it sound like "1.6" should really be "2.0" ;-)

</F>


From nascheme at enme.ucalgary.ca  Wed Mar  1 20:29:02 2000
From: nascheme at enme.ucalgary.ca (nascheme at enme.ucalgary.ca)
Date: Wed, 1 Mar 2000 12:29:02 -0700
Subject: [Python-Dev] Re: [Patches] Reference cycle collection for Python
In-Reply-To: <200003011707.SAA01310@python.inrialpes.fr>; from marangoz@python.inrialpes.fr on Wed, Mar 01, 2000 at 06:07:07PM +0100
References: <200003010544.AAA13155@eric.cnri.reston.va.us> <200003011707.SAA01310@python.inrialpes.fr>
Message-ID: <20000301122902.B7773@acs.ucalgary.ca>

On Wed, Mar 01, 2000 at 06:07:07PM +0100, Vladimir Marangozov wrote:
> Guido van Rossum wrote:
> > Once we reach the end of the second list, all objects still left in
> > the first list are garbage.  We can destroy them in a similar to the
> > way Neil does this in his code.  Neil calls PyDict_Clear on the
> > dictionaries, and ignores the rest.  Under Neils assumption that all
> > cycles (that he detects) involve dictionaries, that is sufficient.  In
> > our case, we may need a type-specific "clear" function for containers
> > in the type object.
> 
> Couldn't this be done in the object's dealloc function?

No, I don't think so.  The object still has references to it.
You have to be careful about how you break cycles so that memory
is not accessed after it is freed.


    Neil

-- 
"If elected mayor, my first act will be to kill the whole lot of you, and
burn your town to cinders!" -- Groundskeeper Willie


From gvwilson at nevex.com  Wed Mar  1 21:19:30 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Wed, 1 Mar 2000 15:19:30 -0500 (EST)
Subject: [Python-Dev] DDJ article on Python GC
Message-ID: <Pine.LNX.4.10.10003011516160.29299-100000@akbar.nevex.com>

Jon Erickson (editor-in-chief) of "Doctor Dobb's Journal" would like an
article on what's involved in adding garbage collection to Python.  Please
email me if you're interested in tackling it...

Thanks,
Greg


From fdrake at acm.org  Wed Mar  1 21:37:49 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 1 Mar 2000 15:37:49 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src Makefile.in,1.82,1.83
In-Reply-To: <Pine.LNX.4.10.10003010102080.19420-100000@nebula.lyra.org>
References: <14523.56638.286603.340358@weyr.cnri.reston.va.us>
	<Pine.LNX.4.10.10003010102080.19420-100000@nebula.lyra.org>
Message-ID: <14525.32669.909212.716484@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Isn't the documentation better than what has been released? In other
 > words, if you release now, how could you make things worse? If something
 > does turn up during a check, you can always release again...

  Releasing is still somewhat tedious, and I don't want to ask people
to do several substantial downloads & installs.
  So far, a major navigation bug has been fonud in the test version I
posted (just now fixed online); *thats* why I don't like to release
too hastily!  I don't think waiting two more weeks is a problem.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido at python.org  Wed Mar  1 23:53:26 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 01 Mar 2000 17:53:26 -0500
Subject: [Python-Dev] DDJ article on Python GC
In-Reply-To: Your message of "Wed, 01 Mar 2000 15:19:30 EST."
             <Pine.LNX.4.10.10003011516160.29299-100000@akbar.nevex.com> 
References: <Pine.LNX.4.10.10003011516160.29299-100000@akbar.nevex.com> 
Message-ID: <200003012253.RAA16056@eric.cnri.reston.va.us>

> Jon Erickson (editor-in-chief) of "Doctor Dobb's Journal" would like an
> article on what's involved in adding garbage collection to Python.  Please
> email me if you're interested in tackling it...

I might -- although I should get Neil, Eric and Tim as co-authors.

I'm halfway implementing the scheme that Eric showed yesterday.  It's
very elegant, but I don't have an idea about its impact performance
yet.

Say hi to Jon -- we've met a few times.  I liked his March editorial,
having just read the same book and had the same feeling of "wow, an
open source project in the 19th century!"

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond at skippinet.com.au  Thu Mar  2 00:09:23 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu, 2 Mar 2000 10:09:23 +1100
Subject: [Python-Dev] Re: A warning switch?
In-Reply-To: <200003011255.HAA13489@eric.cnri.reston.va.us>
Message-ID: <ECEPKNMJLHAPFFJHDOJBOELECFAA.mhammond@skippinet.com.au>

> > Can we then please have an interface to the "give warning" call (in
> > stead of a simple fprintf)? On the mac (and possibly also in
> > PythonWin) it's probably better to pop up a dialog (possibly with a
> > "don't show again" button) than do a printf which may get lost.
>
> Sure.  All you have to do is code it (or get someone else to code it).

How about just having either a "sys.warning" function, or maybe even a
sys.stdwarn stream?  Then a simple C API to call this, and we are done :-)
sys.stdwarn sounds OK - it just defaults to sys.stdout, so the Mac and
Pythonwin etc should "just work" by sending the output wherever sys.stdout
goes today...

Mark.


From tim_one at email.msn.com  Thu Mar  2 06:08:39 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 2 Mar 2000 00:08:39 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <38BCF3A4.1CCADFCE@lemburg.com>
Message-ID: <001001bf8405$5f9582c0$732d153f@tim>

[/F]
>     append = list.append
>     for x in something:
>         append(...)

[M.-A. Lemburg]
> Same here.  checkappend.py doesn't find these

As detailed in a c.l.py posting, I have yet to find a single instance of
this actually called with multiple arguments.  Pointing out that it's
*possible* isn't the same as demonstrating it's an actual problem.  I'm
quite willing to believe that it is, but haven't yet seen evidence of it.
For whatever reason, people seem much (and, in my experience so far,
infinitely <wink>) more prone to make the

    list.append(1, 2, 3)

error than the

    maybethisisanappend(1, 2, 3)

error.

> (a great tool BTW, thanks Tim; I noticed that it leaks memory badly
> though).

Which Python?  Which OS?  How do you know?  What were you running it over?

Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the
total (code + data) virtual memory allocated to it peaked at about 2Mb a few
seconds into the run, and actually decreased as time went on.  So, akin to
the bound method multi-argument append problem, the "checkappend leak
problem" is something I simply have no reason to believe <wink>.  Check your
claim again?  checkappend.py itself obviously creates no cycles or holds on
to any state across files, so if you're seeing a leak it must be a bug in
some other part of the version of Python + std libraries you're using.
Maybe a new 1.6 bug?  Something you did while adding Unicode?  Etc.  Tell us
what you were running.

Has anyone else seen a leak?


From tim_one at email.msn.com  Thu Mar  2 06:50:19 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 2 Mar 2000 00:50:19 -0500
Subject: [Python-Dev] str vs repr at prompt again (FW: String printing behavior?)
Message-ID: <001401bf840b$3177ba60$732d153f@tim>

Another unsolicited testimonial that countless users are oppressed by
auto-repr (as opposed to auto-str) at the interpreter prompt.  Just trying
to keep a once-hot topic from going stone cold forever <wink>.


-----Original Message-----
From: python-list-admin at python.org [mailto:python-list-admin at python.org]
On Behalf Of Ted Drain
Sent: Wednesday, March 01, 2000 5:42 PM
To: python-list at python.org
Subject: String printing behavior?


Hi all,
I've got a question about the string printing behavior.  If I define a
functions as:

>>> def foo():
...    return "line1\nline2"

>>> foo()
'line1\013line2'

>>> print foo()
line1
line2

>>>

It seems to me that the default printing behavior for strings should match
behavior of the print routine.  I realize that some people may want to
see embedded control codes, but I would advocate a seperate method for
printing raw byte sequences.

We are using the python interactive prompt as a pseudo-matlab like user
interface and the current printing behavior is very confusing to users.
It also means that functions that return text (like help routines)
must print the string rather than returning it.  Returning the string
is much more flexible because it allows the string to be captured
easily and redirected.

Any thoughts?

Ted

--
Ted Drain   Jet Propulsion Laboratory    Ted.Drain at jpl.nasa.gov
--
http://www.python.org/mailman/listinfo/python-list


From mal at lemburg.com  Thu Mar  2 08:42:33 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 02 Mar 2000 08:42:33 +0100
Subject: [Python-Dev] breaking list.append()
References: <001001bf8405$5f9582c0$732d153f@tim>
Message-ID: <38BE1B69.E0B88B41@lemburg.com>

Tim Peters wrote:
> 
> [/F]
> >     append = list.append
> >     for x in something:
> >         append(...)
> 
> [M.-A. Lemburg]
> > Same here.  checkappend.py doesn't find these
> 
> As detailed in a c.l.py posting, I have yet to find a single instance of
> this actually called with multiple arguments.  Pointing out that it's
> *possible* isn't the same as demonstrating it's an actual problem.  I'm
> quite willing to believe that it is, but haven't yet seen evidence of it.

Haven't had time to check this yet, but I'm pretty sure
there are some instances of this idiom in my code. Note that
I did in fact code like this on purpose: it saves a tuple
construction for every append, which can make a difference
in tight loops...

> For whatever reason, people seem much (and, in my experience so far,
> infinitely <wink>) more prone to make the
> 
>     list.append(1, 2, 3)
> 
> error than the
> 
>     maybethisisanappend(1, 2, 3)
> 
> error.

Of course... still there are hidden instances of the problem
which are yet to be revealed. For my own code the siutation
is even worse, since I sometimes did:

add = list.append
for x in y:
   add(x,1,2)

> > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly
> > though).
> 
> Which Python?  Which OS?  How do you know?  What were you running it over?

That's Python 1.5 on Linux2. I let the script run over
a large lib directory and my projects directory. In the
projects directory the script consumed as much as 240MB
of process size.
 
> Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the
> total (code + data) virtual memory allocated to it peaked at about 2Mb a few
> seconds into the run, and actually decreased as time went on.  So, akin to
> the bound method multi-argument append problem, the "checkappend leak
> problem" is something I simply have no reason to believe <wink>.  Check your
> claim again?  checkappend.py itself obviously creates no cycles or holds on
> to any state across files, so if you're seeing a leak it must be a bug in
> some other part of the version of Python + std libraries you're using.
> Maybe a new 1.6 bug?  Something you did while adding Unicode?  Etc.  Tell us
> what you were running.

I'll try the same thing again using Python1.5.2 and the CVS version.
 
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Thu Mar  2 08:46:49 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 02 Mar 2000 08:46:49 +0100
Subject: [Python-Dev] breaking list.append()
References: <001001bf8405$5f9582c0$732d153f@tim> <38BE1B69.E0B88B41@lemburg.com>
Message-ID: <38BE1C69.C8A9E6B0@lemburg.com>

"M.-A. Lemburg" wrote:
> 
> > > (a great tool BTW, thanks Tim; I noticed that it leaks memory badly
> > > though).
> >
> > Which Python?  Which OS?  How do you know?  What were you running it over?
> 
> That's Python 1.5 on Linux2. I let the script run over
> a large lib directory and my projects directory. In the
> projects directory the script consumed as much as 240MB
> of process size.
> 
> > Using 1.5.2 under Win95, according to wintop, & over the whole CVS tree, the
> > total (code + data) virtual memory allocated to it peaked at about 2Mb a few
> > seconds into the run, and actually decreased as time went on.  So, akin to
> > the bound method multi-argument append problem, the "checkappend leak
> > problem" is something I simply have no reason to believe <wink>.  Check your
> > claim again?  checkappend.py itself obviously creates no cycles or holds on
> > to any state across files, so if you're seeing a leak it must be a bug in
> > some other part of the version of Python + std libraries you're using.
> > Maybe a new 1.6 bug?  Something you did while adding Unicode?  Etc.  Tell us
> > what you were running.
> 
> I'll try the same thing again using Python1.5.2 and the CVS version.

Using the Unicode patched CVS version there's no leak anymore.
Couldn't find a 1.5.2 version on my machine... I'll build one
later.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Thu Mar  2 16:32:32 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 02 Mar 2000 10:32:32 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
Message-ID: <200003021532.KAA17088@eric.cnri.reston.va.us>

I was looking at the code that invokes __del__, with the intent to
implement a feature from Java: in Java, a finalizer is only called
once per object, even if calling it makes the object live longer.

To implement this, we need a flag in each instance that means "__del__
was called".  I opened the creation code for instances, looking for
the right place to set the flag.  I then realized that it might be
smart, now that we have this flag anyway, to set it to "true" during
initialization.  There are a number of exits from the initialization
where the object is created but not fully initialized, where the new
object is DECREF'ed and NULL is returned.  When such an exit is taken,
__del__ is called on an incompletely initialized object!  Example:

	>>> class C:
	  def __del__(self): print "deleting", self

	>>> x = C(1)
 !-->   deleting <__main__.C instance at 1686d8>
	Traceback (innermost last):
	  File "<stdin>", line 1, in ?
	TypeError: this constructor takes no arguments
	>>>

Now I have a choice to make.  If the class has an __init__, should I
clear the flag only after __init__ succeeds?  This means that if
__init__ raises an exception, __del__ is never called.  This is an
incompatibility.  It's possible that someone has written code that
relies on __del__ being called even when __init__ fails halfway, and
then their code would break.

But it is just as likely that calling __del__ on a partially
uninitialized object is a bad mistake, and I am doing all these cases
a favor by not calling __del__ when __init__ failed!

Any opinions?  If nobody speaks up, I'll make the change.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From bwarsaw at cnri.reston.va.us  Thu Mar  2 17:44:00 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 2 Mar 2000 11:44:00 -0500 (EST)
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
References: <200003021532.KAA17088@eric.cnri.reston.va.us>
Message-ID: <14526.39504.36065.657527@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido at python.org> writes:

    GvR> Now I have a choice to make.  If the class has an __init__,
    GvR> should I clear the flag only after __init__ succeeds?  This
    GvR> means that if __init__ raises an exception, __del__ is never
    GvR> called.  This is an incompatibility.  It's possible that
    GvR> someone has written code that relies on __del__ being called
    GvR> even when __init__ fails halfway, and then their code would
    GvR> break.

It reminds me of the separation between object allocation and
initialization in ObjC.  

    GvR> But it is just as likely that calling __del__ on a partially
    GvR> uninitialized object is a bad mistake, and I am doing all
    GvR> these cases a favor by not calling __del__ when __init__
    GvR> failed!

    GvR> Any opinions?  If nobody speaks up, I'll make the change.

I think you should set the flag right before you call __init__(),
i.e. after (nearly all) the C level initialization has occurred.
Here's why: your "favor" can easily be accomplished by Python
constructs in the __init__():

class MyBogo:
    def __init__(self):
	self.get_delified = 0
	do_sumtin_exceptional()
	self.get_delified = 1

    def __del__(self):
	if self.get_delified:
	    ah_sweet_release()

-Barry


From gstein at lyra.org  Thu Mar  2 18:14:35 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 2 Mar 2000 09:14:35 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ only after successful
 __init__?
In-Reply-To: <200003021532.KAA17088@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003020913520.2146-100000@nebula.lyra.org>

On Thu, 2 Mar 2000, Guido van Rossum wrote:
>...
> But it is just as likely that calling __del__ on a partially
> uninitialized object is a bad mistake, and I am doing all these cases
> a favor by not calling __del__ when __init__ failed!
> 
> Any opinions?  If nobody speaks up, I'll make the change.

+1 on calling __del__ IFF __init__ completes successfully.


Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From jeremy at cnri.reston.va.us  Thu Mar  2 18:15:14 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Thu, 2 Mar 2000 12:15:14 -0500 (EST)
Subject: [Python-Dev] str vs repr at prompt again (FW: String printing behavior?)
In-Reply-To: <001401bf840b$3177ba60$732d153f@tim>
References: <001401bf840b$3177ba60$732d153f@tim>
Message-ID: <14526.41378.374653.497993@goon.cnri.reston.va.us>

>>>>> "TP" == Tim Peters <tim_one at email.msn.com> writes:

  TP> Another unsolicited testimonial that countless users are
  TP> oppressed by auto-repr (as opposed to auto-str) at the
  TP> interpreter prompt.  Just trying to keep a once-hot topic from
  TP> going stone cold forever <wink>.

  [Signature from the included message:]

  >> -- Ted Drain Jet Propulsion Laboratory Ted.Drain at jpl.nasa.gov --

This guy is probably a rocket scientist.  We want the language to be
useful for everybody, not just rocket scientists. <wink>

Jeremy


From guido at python.org  Thu Mar  2 23:45:37 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 02 Mar 2000 17:45:37 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: Your message of "Thu, 02 Mar 2000 11:44:00 EST."
             <14526.39504.36065.657527@anthem.cnri.reston.va.us> 
References: <200003021532.KAA17088@eric.cnri.reston.va.us>  
            <14526.39504.36065.657527@anthem.cnri.reston.va.us> 
Message-ID: <200003022245.RAA20265@eric.cnri.reston.va.us>

> >>>>> "GvR" == Guido van Rossum <guido at python.org> writes:
> 
>     GvR> Now I have a choice to make.  If the class has an __init__,
>     GvR> should I clear the flag only after __init__ succeeds?  This
>     GvR> means that if __init__ raises an exception, __del__ is never
>     GvR> called.  This is an incompatibility.  It's possible that
>     GvR> someone has written code that relies on __del__ being called
>     GvR> even when __init__ fails halfway, and then their code would
>     GvR> break.

[Barry]
> It reminds me of the separation between object allocation and
> initialization in ObjC.  

Is that good or bad?

>     GvR> But it is just as likely that calling __del__ on a partially
>     GvR> uninitialized object is a bad mistake, and I am doing all
>     GvR> these cases a favor by not calling __del__ when __init__
>     GvR> failed!
> 
>     GvR> Any opinions?  If nobody speaks up, I'll make the change.
> 
> I think you should set the flag right before you call __init__(),
> i.e. after (nearly all) the C level initialization has occurred.
> Here's why: your "favor" can easily be accomplished by Python
> constructs in the __init__():
> 
> class MyBogo:
>     def __init__(self):
> 	self.get_delified = 0
> 	do_sumtin_exceptional()
> 	self.get_delified = 1
> 
>     def __del__(self):
> 	if self.get_delified:
> 	    ah_sweet_release()

But the other behavior (call __del__ even when __init__ fails) can
also easily be accomplished in Python:

    class C:

        def __init__(self):
            try:
                ...stuff that may fail...
            except:
                self.__del__()
                raise

        def __del__(self):
            ...cleanup...

I believe that in almost all cases the programmer would be happier if
__del__ wasn't called when their __init__ fails.  This makes it easier
to write a __del__ that can assume that all the object's fields have
been properly initialized.

In my code, typically when __init__ fails, this is a symptom of a
really bad bug (e.g. I just renamed one of __init__'s arguments and
forgot to fix all references), and I don't care much about cleanup
behavior.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From bwarsaw at cnri.reston.va.us  Thu Mar  2 23:52:31 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Thu, 2 Mar 2000 17:52:31 -0500 (EST)
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
References: <200003021532.KAA17088@eric.cnri.reston.va.us>
	<14526.39504.36065.657527@anthem.cnri.reston.va.us>
	<200003022245.RAA20265@eric.cnri.reston.va.us>
Message-ID: <14526.61615.362973.624022@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido at python.org> writes:

    GvR> But the other behavior (call __del__ even when __init__
    GvR> fails) can also easily be accomplished in Python:

It's a fair cop.

    GvR> I believe that in almost all cases the programmer would be
    GvR> happier if __del__ wasn't called when their __init__ fails.
    GvR> This makes it easier to write a __del__ that can assume that
    GvR> all the object's fields have been properly initialized.

That's probably fine; I don't have strong feelings either way.

-Barry

P.S. Interesting what X-Oblique-Strategy was randomly inserted in this
message (but I'm not sure which approach is more "explicit" :).

-Barry


From tim_one at email.msn.com  Fri Mar  3 06:38:59 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 3 Mar 2000 00:38:59 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: <200003021532.KAA17088@eric.cnri.reston.va.us>
Message-ID: <000001bf84d2$c711e2e0$092d153f@tim>

[Guido]
> I was looking at the code that invokes __del__, with the intent to
> implement a feature from Java: in Java, a finalizer is only called
> once per object, even if calling it makes the object live longer.

Why?  That is, in what way is this an improvement over current behavior?

Note that Java is a bit subtle:  a finalizer is only called once by magic;
explicit calls "don't count".

The Java rules add up to quite a confusing mish-mash.  Python's rules are
*currently* clearer.

I deal with possible exceptions in Python constructors the same way I do in
C++ and Java:  if there's a destructor, don't put anything in __init__ that
may raise an uncaught exception.  Anything dangerous is moved into a
separate .reset() (or .clear() or ...) method.  This works well in practice.

> To implement this, we need a flag in each instance that means "__del__
> was called".

At least <wink>.

> I opened the creation code for instances, looking for the right place
> to set the flag.  I then realized that it might be smart, now that we
> have this flag anyway, to set it to "true" during initialization.  There
> are a number of exits from the initialization where the object is created
> but not fully initialized, where the new object is DECREF'ed and NULL is
> returned.  When such an exit is taken, __del__ is called on an
> incompletely initialized object!

I agree *that* isn't good.  Taken on its own, though, it argues for adding
an "instance construction completed" flag that __del__ later checks, as if
its body were:

    if self.__instance_construction_completed:
        body

That is, the problem you've identified here could be addressed directly.

> Now I have a choice to make.  If the class has an __init__, should I
> clear the flag only after __init__ succeeds?  This means that if
> __init__ raises an exception, __del__ is never called.  This is an
> incompatibility.  It's possible that someone has written code that
> relies on __del__ being called even when __init__ fails halfway, and
> then their code would break.
>
> But it is just as likely that calling __del__ on a partially
> uninitialized object is a bad mistake, and I am doing all these cases
> a favor by not calling __del__ when __init__ failed!
>
> Any opinions?  If nobody speaks up, I'll make the change.

I'd be in favor of fixing the actual problem; I don't understand the point
to the rest of it, especially as it has the potential to break existing code
and I don't see a compensating advantage (surely not compatibility w/
JPython -- JPython doesn't invoke __del__ methods at all by magic, right?
or is that changing, and that's what's driving this?).

too-much-magic-is-dizzying-ly y'rs  - tim


From bwarsaw at cnri.reston.va.us  Fri Mar  3 06:50:16 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 3 Mar 2000 00:50:16 -0500 (EST)
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
References: <200003021532.KAA17088@eric.cnri.reston.va.us>
	<000001bf84d2$c711e2e0$092d153f@tim>
Message-ID: <14527.21144.9421.958311@anthem.cnri.reston.va.us>

>>>>> "TP" == Tim Peters <tim_one at email.msn.com> writes:

    TP> (surely not compatibility w/ JPython -- JPython doesn't invoke
    TP> __del__ methods at all by magic, right?  or is that changing,
    TP> and that's what's driving this?).

No, JPython doesn't invoke __del__ methods by magic, and I don't have
any plans to change that.

-Barry


From ping at lfw.org  Fri Mar  3 10:00:21 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 3 Mar 2000 01:00:21 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ only after successful
 __init__?
In-Reply-To: <Pine.LNX.4.10.10003020913520.2146-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.10.10003030049150.1788-100000@skuld.lfw.org>

On Thu, 2 Mar 2000, Greg Stein wrote:
> On Thu, 2 Mar 2000, Guido van Rossum wrote:
> >...
> > But it is just as likely that calling __del__ on a partially
> > uninitialized object is a bad mistake, and I am doing all these cases
> > a favor by not calling __del__ when __init__ failed!
> > 
> > Any opinions?  If nobody speaks up, I'll make the change.
> 
> +1 on calling __del__ IFF __init__ completes successfully.

That would be my vote as well.

What convinced me of this is the following:

If it's up to the implementation of __del__ to deal with a problem
that happened during initialization, you only know about the problem
with very coarse granularity.  It's a pain (or even impossible) to
then rediscover the information you need to recover adequately.

If on the other hand you deal with the problem in __init__, then
you have much better control over what is happening, because you
can position try/except blocks precisely where you need them to
deal with specific potential problems.  Each block can take care
of its case appropriately, and re-raise if necessary.

In general, it seems to me that what you want to do when __init__
runs afoul is going to be different from what you want to do to
take care of object cleanup in __del__.  So it doesn't belong
there -- it belongs in an except: clause in __init__.

Even though it's an incompatibility, i really think this is the
right behaviour.


-- ?!ng

"To be human is to continually change.  Your desire to remain as you are
is what ultimately limits you."
    -- The Puppet Master, Ghost in the Shell


From guido at python.org  Fri Mar  3 17:13:16 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 03 Mar 2000 11:13:16 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: Your message of "Fri, 03 Mar 2000 00:38:59 EST."
             <000001bf84d2$c711e2e0$092d153f@tim> 
References: <000001bf84d2$c711e2e0$092d153f@tim> 
Message-ID: <200003031613.LAA21571@eric.cnri.reston.va.us>

> [Guido]
> > I was looking at the code that invokes __del__, with the intent to
> > implement a feature from Java: in Java, a finalizer is only called
> > once per object, even if calling it makes the object live longer.

[Tim]
> Why?  That is, in what way is this an improvement over current behavior?
> 
> Note that Java is a bit subtle:  a finalizer is only called once by magic;
> explicit calls "don't count".

Of course.  Same in my proposal.  But I wouldn't call it "by magic" --
just "on behalf of the garbage collector".

> The Java rules add up to quite a confusing mish-mash.  Python's rules are
> *currently* clearer.

I don't find the Java rules confusing.  It seems quite useful that the
GC promises to call the finalizer at most once -- this can simplify
the finalizer logic.  (Otherwise it may have to ask itself, "did I
clean this already?" and leave notes for itself.)  Explicit finalizer
calls are always a mistake and thus "don't count" -- the response to
that should in general be "don't do that" (unless you have
particularly stupid callers -- or very fearful lawyers :-).

> I deal with possible exceptions in Python constructors the same way I do in
> C++ and Java:  if there's a destructor, don't put anything in __init__ that
> may raise an uncaught exception.  Anything dangerous is moved into a
> separate .reset() (or .clear() or ...) method.  This works well in practice.

Sure, but the rule "if __init__ fails, __del__ won't be called" means
that we don't have to program our __init__ or __del__ quite so
defensively.  Most people who design a __del__ probably assume that
__init__ has run to completion.  The typical scenario (which has
happened to me!  And I *implemented* the damn thing!) is this:
__init__ opens a file and assigns it to an instance variable; __del__
closes the file.  This is tested a few times and it works great.  Now
in production the file somehow unexpectedly fails to be openable.
Sure, the programmer should've expected that, but she didn't.  Now, at
best, the failed __del__ creates an additional confusing error
message on top of the traceback generated by IOError.  At worst, the
failed __del__ could wreck the original traceback.

Note that I'm not proposing to change the C level behavior; when a
Py<Object>_New() function is halfway its initialization and decides to
bail out, it does a DECREF(self) and you bet that at this point the
<object>_dealloc() function gets called (via
self->ob_type->tp_dealloc).  Occasionally I need to initialize certain
fields to NULL so that the dealloc() function doesn't try to free
memory that wasn't allocated.  Often it's as simple as using XDECREF
instead of DECREF in the dealloc() function (XDECREF is safe when the
argument is NULL, DECREF dumps core, saving a load-and-test if you are
sure its arg is a valid object).

> > To implement this, we need a flag in each instance that means "__del__
> > was called".
> 
> At least <wink>.
> 
> > I opened the creation code for instances, looking for the right place
> > to set the flag.  I then realized that it might be smart, now that we
> > have this flag anyway, to set it to "true" during initialization.  There
> > are a number of exits from the initialization where the object is created
> > but not fully initialized, where the new object is DECREF'ed and NULL is
> > returned.  When such an exit is taken, __del__ is called on an
> > incompletely initialized object!
> 
> I agree *that* isn't good.  Taken on its own, though, it argues for adding
> an "instance construction completed" flag that __del__ later checks, as if
> its body were:
> 
>     if self.__instance_construction_completed:
>         body
> 
> That is, the problem you've identified here could be addressed directly.

Sure -- but I would argue that when __del__ returns,
__instance_construction_completed should be reset to false, because
the destruction (conceptually, at least) cancels out the construction!

> > Now I have a choice to make.  If the class has an __init__, should I
> > clear the flag only after __init__ succeeds?  This means that if
> > __init__ raises an exception, __del__ is never called.  This is an
> > incompatibility.  It's possible that someone has written code that
> > relies on __del__ being called even when __init__ fails halfway, and
> > then their code would break.
> >
> > But it is just as likely that calling __del__ on a partially
> > uninitialized object is a bad mistake, and I am doing all these cases
> > a favor by not calling __del__ when __init__ failed!
> >
> > Any opinions?  If nobody speaks up, I'll make the change.
> 
> I'd be in favor of fixing the actual problem; I don't understand the point
> to the rest of it, especially as it has the potential to break existing code
> and I don't see a compensating advantage (surely not compatibility w/
> JPython -- JPython doesn't invoke __del__ methods at all by magic, right?
> or is that changing, and that's what's driving this?).

JPython's a red herring here.

I think that the proposed change probably *fixes* much morecode that
is subtly wrong than it breaks code that is relying on __del__ being
called after a partial __init__.  All the rules relating to __del__
are confusing (e.g. what __del__ can expect to survive in its
globals).

Also note Ping's observation:

| If it's up to the implementation of __del__ to deal with a problem
| that happened during initialization, you only know about the problem
| with very coarse granularity.  It's a pain (or even impossible) to
| then rediscover the information you need to recover adequately.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Fri Mar  3 17:49:52 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 3 Mar 2000 11:49:52 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: <200003031613.LAA21571@eric.cnri.reston.va.us>
Message-ID: <000501bf8530$7f8c78a0$b0a0143f@tim>

[Tim]
>> Note that Java is a bit subtle:  a finalizer is only called
>> once by magic; explicit calls "don't count".

[Guido]
> Of course.  Same in my proposal.

OK -- that wasn't clear.

> But I wouldn't call it "by magic" -- just "on behalf of the garbage
> collector".

Yup, magically called <wink>.

>> The Java rules add up to quite a confusing mish-mash.  Python's
>> rules are *currently* clearer.

> I don't find the Java rules confusing.

"add up" == "taken as a whole"; include the Java spec's complex state
machine for cleanup semantics, and the later complications added by three
(four?) distinct flavors of weak reference, and I doubt 1 Java programmer in
1,000 actually understands the rules.  This is why I'm wary of moving in the
Java *direction* here.  Note that Java programmers in past c.l.py threads
have generally claimed Java's finalizers are so confusing & unpredictable
they don't use them at all!  Which, in the end, is probably a good idea in
Python too <0.5 wink>.

> It seems quite useful that the GC promises to call the finalizer at
> most once -- this can simplify the finalizer logic.

Granting that explicit calls are "use at your own risk", the only
user-visible effect of "called only once" is in the presence of
resurrection.  Now in my Python experience, on the few occasions I've
resurrected an object in __del__, *of course* I expected __del__ to get
called again if the object is about to die again!  Typical:

    def __del__(self):
        if oops_i_still_need_to_stay_alive:
            resurrect(self)
        else:
            # really going away
            release(self.critical_resource)

Call __del__ only once, and code like this is busted bigtime.

OTOH, had I written __del__ logic that relied on being called only once,
switching the implementation to call it more than once would break *that*
bigtime.  Neither behavior is an obvious all-cases win to me, or even a
plausibly most-cases win.  But Python already took a stand on this & so I
think you need a *good* reason to change semantics now.

> ...
> Sure, but the rule "if __init__ fails, __del__ won't be called" means
> that we don't have to program our __init__ or __del__ quite so
> defensively.  Most people who design a __del__ probably assume that
> __init__ has run to completion. ...

This is (or can easily be made) a separate issue, & I agreed the first time
this seems worth fixing (although if nobody has griped about it in a decade
of use, it's hard to call it a major bug <wink>).

> ...
> Sure -- but I would argue that when __del__ returns,
>__instance_construction_completed should be reset to false, because
> the destruction (conceptually, at least) cancels out the construction!

In the __del__ above (which is typical of the cases of resurrection I've
seen), there is no such implication.  Perhaps this is philosophical abuse of
Python's intent, but if so it relied only on trusting its advertised
semantics.

> I think that the proposed change probably *fixes* much morecode that
> is subtly wrong than it breaks code that is relying on __del__ being
> called after a partial __init__.

Yes, again, I have no argument against refusing to call __del__ unless
__init__ succeeded.  Going beyond that to a new "called at most once" rule
is indeed going beyond that, *will* break reasonable old code, and holds no
particular attraction that I can see (it trades making one kind of
resurrection scenario easier at the cost of making other kinds harder).

If there needs to be incompatible change here, curiously enough I'd be more
in favor of making resurrection illegal period (which could *really*
simplify gc's headaches).

> All the rules relating to __del__ are confusing (e.g. what __del__ can
> expect to survive in its globals).

Problems unique to final shutdown don't seem relevant here.

> Also note Ping's observation: ...

I can't agree with that yet another time without being quadruply redundant
<wink>.


From guido at python.org  Fri Mar  3 17:50:08 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 03 Mar 2000 11:50:08 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: Your message of "Wed, 01 Mar 2000 00:44:10 EST."
             <200003010544.AAA13155@eric.cnri.reston.va.us> 
References: <20000229153421.A16502@acs.ucalgary.ca>  
            <200003010544.AAA13155@eric.cnri.reston.va.us> 
Message-ID: <200003031650.LAA21647@eric.cnri.reston.va.us>

We now have two implementations of Eric Tiedemann's idea: Neil and I
both implemented it.  It's too soon to post the patch sets (both are
pretty rough) but I've got another design question.

Once we've identified a bunch of objects that are only referring to
each other (i.e., one or more cycles) we have to dispose of them.

The question is, how?  We can't just call free on each of the objects;
some may not be allocated with malloc, and some may contain pointers
to other malloc'ed memory that also needs to be freed.

So we have to get their destructors involved.  But how?  Calling
ob->ob_type->tp_dealloc(ob) for an object who reference count is
unsafe -- this will destroy the object while there are still
references to it!  Those references are all coming from other objects
that are part of the same cycle; those objects will also be
deallocated and they will reference the deallocated objects (if only
to DECREF them).

Neil uses the same solution that I use when finalizing the Python
interpreter -- find the dictionaries and call PyDict_Clear() on them.
(In his unpublished patch, he also clears the lists using
PyList_SetSlice(list, 0, list->ob_size, NULL).  He's also generalized
so that *every* object can define a tp_clear function in its type
object.)

As long as every cycle contains at least one dictionary or list
object, this will break cycles reliably and get rid of all the
garbage.  (If you wonder why: clearing the dict DECREFs the next
object(s) in the cycle; if the last dict referencing a particular
object is cleared, the last DECREF will deallocate that object, which
will in turn DECREF the objects it references, and so forth.  Since
none of the objects in the cycle has incoming references from outside
the cycle, we can prove that this will delete all objects as long as
there's a dict or list in each cycle.

However, there's a snag.  It's the same snag as what finalizing the
Python interpreter runs into -- it has to do with __del__ methods and
the undefined order in which the dictionaries are cleared.

For example, it's quite possible that the first dictionary we clear is
the __dict__ of an instance, so this zaps all its instance variables.
Suppose this breaks the cycle, so then the instance itself gets
DECREFed to zero.  Its deallocator will be called.  If it's got a
__del__, this __del__ will be called -- but all the instance variables
have already been zapped, so it will fail miserably!

It's also possible that the __dict__ of a class involved in a cycle
gets cleared first, in which case the __del__ no longer "exists", and
again the cleanup is skipped.

So the question is: What to *do*?

My solution is to make an extra pass over all the garbage objects
*before* we clear dicts and lists, and for those that are instances
and have __del__ methods, call their __del__ ("by magic", as Tim calls
it in another post).  The code in instance_dealloc() already does the
right thing here: it calls __del__, then discovers that the reference
count is > 0 ("I'm not dead yet" :-), and returns without freeing the
object.  (This is also why I want to introduce a flag ensuring that
__del__ gets called by instance_dealloc at most once: later when the
instance gets DECREFed to 0, instance_dealloc is called again and will
correctly free the object; but we don't want __del__ called again.)
[Note for Neil: somehow I forgot to add this logic to the code;
in_del_called isn't used!  The change is obvious though.]

This still leaves a problem for the user: if two class instances
reference each other and both have a __del__, we can't predict whose
__del__ is called first when they are called as part of cycle
collection.  The solution is to write each __del__ so that it doesn't
depend on the other __del__.

Someone (Tim?) in the past suggested a different solution (probably
found in another language): for objects that are collected as part of
a cycle, the destructor isn't called at all.  The memory is freed
(since it's no longer reachable), but the destructor is not called --
it is as if the object lives on forever.

This is theoretically superior, but not practical: when I have an
object that creates a temp file, I want to be able to reliably delete
the temp file in my destructor, even when I'm part of a cycle!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jack at oratrix.nl  Fri Mar  3 17:57:54 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Fri, 03 Mar 2000 17:57:54 +0100
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage? 
In-Reply-To: Message by Guido van Rossum <guido@python.org> ,
	     Fri, 03 Mar 2000 11:50:08 -0500 , <200003031650.LAA21647@eric.cnri.reston.va.us> 
Message-ID: <20000303165755.490EA371868@snelboot.oratrix.nl>

The __init__ rule for calling __del__ has me confused. Is this per-class or 
per-object?

I.e. what will happen in the following case:

class Purse:
	def __init__(self):
		self.balance = WithdrawCashFromBank(1000)

	def __del__(self):
		PutCashBackOnBank(self.balance)
		self.balance = 0

class LossyPurse(Purse):
	def __init__(self):
		Purse.__init__(self)
		 raise 'kaboo! kaboo!'

If the new scheme means that the __del__ method of Purse isn't called I think 
I don't like it. In the current scheme I can always program defensively:
	def __del__(self):
		try:
			b = self.balance
			self.balance = 0
		except AttributeError:
			pass
		else:
			PutCashBackOnBank(b)
but in a new scheme with a per-object "__del__ must be called" flag I can't...
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido at python.org  Fri Mar  3 18:05:00 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 03 Mar 2000 12:05:00 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: Your message of "Fri, 03 Mar 2000 11:49:52 EST."
             <000501bf8530$7f8c78a0$b0a0143f@tim> 
References: <000501bf8530$7f8c78a0$b0a0143f@tim> 
Message-ID: <200003031705.MAA21700@eric.cnri.reston.va.us>

OK, so we're down to this one point: if __del__ resurrects the object,
should __del__ be called again later?  Additionally, should
resurrection be made illegal?

I can easily see how __del__ could *accidentally* resurrect the object
as part of its normal cleanup -- e.g. you make a call to some other
routine that helps with the cleanup, passing self as an argument, and
this other routine keeps a helpful cache of the last argument for some
reason.  I don't see how we could forbid this type of resurrection.
(What are you going to do?  You can't raise an exception from
instance_dealloc, since it is called from DECREF.  You can't track
down the reference and replace it with a None easily.)
In this example, the helper routine will eventually delete the object
from its cache, at which point it is truly deleted.  It would be
harmful, not helpful, if __del__ was called again at this point.

Now, it is true that the current docs for __del__ imply that
resurrection is possible.  The intention of that note was to warn
__del__ writers that in the case of accidental resurrection __del__
might be called again.  The intention certainly wasn't to allow or
encourage intentional resurrection.

Would there really be someone out there who uses *intentional*
resurrection?  I severely doubt it.  I've never heard of this.

[Jack just finds a snag]

> The __init__ rule for calling __del__ has me confused. Is this per-class or 
> per-object?
> 
> I.e. what will happen in the following case:
> 
> class Purse:
> 	def __init__(self):
> 		self.balance = WithdrawCashFromBank(1000)
> 
> 	def __del__(self):
> 		PutCashBackOnBank(self.balance)
> 		self.balance = 0
> 
> class LossyPurse(Purse):
> 	def __init__(self):
> 		Purse.__init__(self)
> 		 raise 'kaboo! kaboo!'
> 
> If the new scheme means that the __del__ method of Purse isn't called I think 
> I don't like it. In the current scheme I can always program defensively:
> 	def __del__(self):
> 		try:
> 			b = self.balance
> 			self.balance = 0
> 		except AttributeError:
> 			pass
> 		else:
> 			PutCashBackOnBank(b)
> but in a new scheme with a per-object "__del__ must be called" flag I can't...

Yes, that's a problem.  But there are other ways for the subclass to
break the base class's invariant (e.g. it could override __del__
without calling the base class' __del__).

So I think it's a red herring.  In Python 3000, typechecked classes
may declare invariants that are enforced by the inheritance mechanism;
then we may need to keep track which base class constructors succeeded
and only call corresponding destructors.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Fri Mar  3 19:17:11 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 03 Mar 2000 19:17:11 +0100
Subject: [Python-Dev] Design question: call __del__ only after successful 
 __init__?
References: <000501bf8530$7f8c78a0$b0a0143f@tim> <200003031705.MAA21700@eric.cnri.reston.va.us>
Message-ID: <38C001A7.6CF8F365@lemburg.com>

Guido van Rossum wrote:
> 
> OK, so we're down to this one point: if __del__ resurrects the object,
> should __del__ be called again later?  Additionally, should
> resurrection be made illegal?

Yes and no :-)

One example comes to mind: implementations of weak references,
which manage weak object references themselves (as soon as
__del__ is called the weak reference implementation takes
over the object). Another example is that of free list
like implementations which reduce object creation times
by implementing smart object recycling, e.g. objects could
keep allocated dictionaries alive or connections to databases
open, etc.

As for the second point: 
Calling __del__ again is certainly needed to keep application
logic sane... after all, __del__ should be called whenever the
refcount reaches 0 -- and that can happend more than once
in the objects life-time if reanimation occurs.

> I can easily see how __del__ could *accidentally* resurrect the object
> as part of its normal cleanup -- e.g. you make a call to some other
> routine that helps with the cleanup, passing self as an argument, and
> this other routine keeps a helpful cache of the last argument for some
> reason.  I don't see how we could forbid this type of resurrection.
> (What are you going to do?  You can't raise an exception from
> instance_dealloc, since it is called from DECREF.  You can't track
> down the reference and replace it with a None easily.)
> In this example, the helper routine will eventually delete the object
> from its cache, at which point it is truly deleted.  It would be
> harmful, not helpful, if __del__ was called again at this point.

I'd say this is an application logic error -- nothing that
the mechanism itself can help with automagically. OTOH,
turning multi calls to __del__ off, would make certain
techniques impossible.

> Now, it is true that the current docs for __del__ imply that
> resurrection is possible.  The intention of that note was to warn
> __del__ writers that in the case of accidental resurrection __del__
> might be called again.  The intention certainly wasn't to allow or
> encourage intentional resurrection.

I don't think that docs are the right argument here ;-)
It is simply the reference counting logic that plays its role:
__del__ is called when refcount reaches 0, which usually
means that the object is about to be garbage collected...
unless the object is rereferenced by some other object and
thus gets reanimated.
 
> Would there really be someone out there who uses *intentional*
> resurrection?  I severely doubt it.  I've never heard of this.

BTW, I can't see what the original question has to do with this
discussion ... calling __del__ only after successful __init__
is ok, IMHO, but what does this have to do with the way __del__
itself is implemented ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Fri Mar  3 19:30:36 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 03 Mar 2000 19:30:36 +0100
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
References: <20000229153421.A16502@acs.ucalgary.ca>  
	            <200003010544.AAA13155@eric.cnri.reston.va.us> <200003031650.LAA21647@eric.cnri.reston.va.us>
Message-ID: <38C004CC.1FE0A501@lemburg.com>

[Guido about ways to cleanup cyclic garbage]

FYI, I'm using a special protocol for disposing of cyclic
garbage: the __cleanup__ protocol. The purpose of this call
is probably similar to Neil's tp_clear: it is intended to
let objects break possible cycles in their own storage scope,
e.g. instances can delete instance variables which they
know can cause cyclic garbage.

The idea is simple: give all power to the objects rather
than try to solve everything with one magical master plan.

The mxProxy package has details on the protocol. The __cleanup__
method is called by the Proxy when the Proxy is about to be deleted.
If all references to an object go through the Proxy, the
__cleanup__ method call can easily break cycles to have the
refcount reach zero in which case __del__ is called. Since the
object knows about this scheme it can take precautions to
make sure that __del__ still works after __cleanup__ was
called.

Anyway, just a thought... there are probably many ways to do
all this.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From tismer at tismer.com  Fri Mar  3 19:51:55 2000
From: tismer at tismer.com (Christian Tismer)
Date: Fri, 03 Mar 2000 19:51:55 +0100
Subject: [Python-Dev] Design question: call __del__ only after successful 
 __init__?
References: <000501bf8530$7f8c78a0$b0a0143f@tim> <200003031705.MAA21700@eric.cnri.reston.va.us>
Message-ID: <38C009CB.72BD49CA@tismer.com>


Guido van Rossum wrote:
> 
> OK, so we're down to this one point: if __del__ resurrects the object,
> should __del__ be called again later?  Additionally, should
> resurrection be made illegal?

[much stuff]

Just a random note:

What if we had a __del__ with zombie behavior?

Assume an instance that is about to be destructed.
Then __del__ is called via normal method lookup.
What we want is to let this happen only once.
Here the Zombie:
After method lookup, place a dummy __del__ into the
to-be-deleted instance dict, and we are sure that
this does not harm.
Kinda "yes its there, but a broken link ". The zombie
always works by doing nothing. Makes some sense?

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From gstein at lyra.org  Sat Mar  4 00:09:48 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 3 Mar 2000 15:09:48 -0800 (PST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>

You may as well remove the entire "vi" concept from ConfigParser. Since
"vi" can be *only* a '=' or ':', then you aren't truly checking anything
in the "if" statement. Further, "vi" is used nowhere else, so that
variable and the corresponding regex group can be nuked altogether.

IMO, I'm not sure why the ";" comment form was initially restricted to
just one option format in the first place.

Cheers,
-g

On Fri, 3 Mar 2000, Jeremy Hylton wrote:
> Update of /projects/cvsroot/python/dist/src/Lib
> In directory bitdiddle:/home/jhylton/python/src/Lib
> 
> Modified Files:
> 	ConfigParser.py 
> Log Message:
> allow comments beginning with ; in key: value as well as key = value
> 
> 
> Index: ConfigParser.py
> ===================================================================
> RCS file: /projects/cvsroot/python/dist/src/Lib/ConfigParser.py,v
> retrieving revision 1.16
> retrieving revision 1.17
> diff -C2 -r1.16 -r1.17
> *** ConfigParser.py	2000/02/28 23:23:55	1.16
> --- ConfigParser.py	2000/03/03 20:43:57	1.17
> ***************
> *** 359,363 ****
>                           optname, vi, optval = mo.group('option', 'vi', 'value')
>                           optname = string.lower(optname)
> !                         if vi == '=' and ';' in optval:
>                               # ';' is a comment delimiter only if it follows
>                               # a spacing character
> --- 359,363 ----
>                           optname, vi, optval = mo.group('option', 'vi', 'value')
>                           optname = string.lower(optname)
> !                         if vi in ('=', ':') and ';' in optval:
>                               # ';' is a comment delimiter only if it follows
>                               # a spacing character
> 
> 
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at python.org
> http://www.python.org/mailman/listinfo/python-checkins
> 

-- 
Greg Stein, http://www.lyra.org/


From jeremy at cnri.reston.va.us  Sat Mar  4 00:15:32 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Fri, 3 Mar 2000 18:15:32 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us>
	<Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>
Message-ID: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>

Thanks for catching that.  I didn't look at the context.  I'm going to
wait, though, until I talk to Fred to mess with the code any more.

General question for python-dev readers: What are your experiences
with ConfigParser?  I just used it to build a simple config parser for
IDLE and found it hard to use for several reasons.  The biggest
problem was that the file format is undocumented.  I also found it
clumsy to have to specify section and option arguments. I ended up
writing a proxy that specializes on section so that get takes only an
option argument.

It sounds like ConfigParser code and docs could use a general cleanup.
Are there any other issues to take care of as part of that cleanup?

Jeremy


From gstein at lyra.org  Sat Mar  4 00:35:09 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 3 Mar 2000 15:35:09 -0800 (PST)
Subject: [Python-Dev] ConfigParser stuff (was: CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17)
In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003031525230.14301-100000@nebula.lyra.org>

On Fri, 3 Mar 2000, Jeremy Hylton wrote:
> Thanks for catching that.  I didn't look at the context.  I'm going to
> wait, though, until I talk to Fred to mess with the code any more.

Not a problem. I'm glad that diffs are now posted to -checkins. :-)

> General question for python-dev readers: What are your experiences
> with ConfigParser?

Love it!

> I just used it to build a simple config parser for
> IDLE and found it hard to use for several reasons.  The biggest
> problem was that the file format is undocumented.

In my most complex use of ConfigParser, I had to override SECTCRE to allow
periods in the section name. Of course, that was quite interesting since
the variable is __SECTRE in 1.5.2 (i.e. I had to compensate for the
munging).

I also change OPTCRE to allow a few more charaters ("@" in particular,
which even the update doesn't do). Not a problem nowadays since those are
public.

My subclass also defines a set() method and a delsection() method. These
are used because I write the resulting changes back out to a file. It
might be nice to have a method which writes out a config file (with an
"AUTOGENERATED BY ConfigParser.py -- DO NOT EDIT BY HAND"; or maybe
"... BY <appname> ...").

> I also found it
> clumsy to have to specify section and option arguments.

I found these were critical in my application. I also take advantage of
the sections in my "edna" application for logical organization.

> I ended up
> writing a proxy that specializes on section so that get takes only an
> option argument.
> 
> It sounds like ConfigParser code and docs could use a general cleanup.
> Are there any other issues to take care of as part of that cleanup?

A set() method and a writefile() type of method would be nice.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tim_one at email.msn.com  Sat Mar  4 02:38:43 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 3 Mar 2000 20:38:43 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <200003031650.LAA21647@eric.cnri.reston.va.us>
Message-ID: <000001bf857a$60b45ac0$c6a0143f@tim>

[Guido]
> ...
> Someone (Tim?) in the past suggested a different solution (probably
> found in another language): for objects that are collected as part of
> a cycle, the destructor isn't called at all.  The memory is freed
> (since it's no longer reachable), but the destructor is not called --
> it is as if the object lives on forever.

Stroustrup has written in favor of this for C++.  It's exactly the kind of
overly slick "good argument" he would never accept from anyone else <0.1
wink>.

> This is theoretically superior, but not practical: when I have an
> object that creates a temp file, I want to be able to reliably delete
> the temp file in my destructor, even when I'm part of a cycle!

A member of the C++ committee assured me Stroustrup is overwhelmingly
opposed on this.  I don't even agree it's theoretically superior:  it relies
on the fiction that gc "may never occur", and that's just silly in practice.

You're moving down the Java path.  I can't possibly do a better job of
explaining the Java rules than the Java Language Spec. does for itself.  So
pick that up and study section 12.6 (Finalization of Class Instances).  The
end result makes little sense to users, but is sufficient to guarantee that
Java itself never blows up.

Note, though, that there is NO good answer to finalizers in cycles!  The
implementation cannot be made smart enough to both avoid trouble and "do the
right thing" from the programmer's POV, because the latter is unknowable.
Somebody has to lose, one way or another.

Rather than risk doing a wrong thing, the BDW collector lets cycles with
finalizers leak.  But it also has optional hacks to support exceptions for
use with C++ (which sometimes creates self-cycles) and Java.  See

    http://reality.sgi.com/boehm_mti/finalization.html

for Boehm's best concentrated <wink> thoughts on the subject.

The only principled approach I know of comes out of the Scheme world.
Scheme has no finalizers, of course.  But it does have gc, and the concept
of "guardians" was invented to address all gc finalization problems in one
stroke.  It's extremely Scheme-like in providing a perfectly general
mechanism with no policy whatsoever.  You (the Scheme programmer) can create
guardian objects, and "register" other objects with a guardian.  At any
time, you can ask a guardian whether some object registered with it is
"ready to die" (i.e., the only thing keeping it alive is its registration
with the guardian).  If so, you can ask it to give you one.  Everything else
is up to you:  if you want to run a finalizer, your problem.  If there are
cycles, also your problem.  Even if there are simple non-cyclic
dependencies, your problem.  Etc.

So those are the extremes:  BDW avoids blame by refusing to do anything.
Java avoids blame by exposing an impossibly baroque implementation-driven
finalization model.  Scheme avoids blame by refusing to do anything "by
magic", but helps you to shoot yourself with the weapon of your choice.

That bad news is that I don't know of a scheme *not* at an extreme!

It's extremely un-Pythonic to let things leak (despite that it has let
things leak for a decade <wink>), but also extremely un-Pythonic to make
some wild-ass guess.

So here's what I'd consider doing:  explicit is better than implicit, and in
the face of ambiguity refuse the temptation to guess.  If a trash cycle
contains a finalizer (my, but that has to be rare. in practice, in
well-designed code!), don't guess, but make it available to the user.  A
gc.guardian() call could expose such beasts, or perhaps a callback could be
registered, invoked when gc finds one of these things.  Anyone crazy enough
to create cyclic trash with finalizers then has to take responsibility for
breaking the cycle themself.  This puts the burden on the person creating
the problem, and they can solve it in the way most appropriate to *their*
specific needs.  IOW, the only people who lose under this scheme are the
ones begging to lose, and their "loss" consists of taking responsibility.

when-a-problem-is-impossible-to-solve-favor-sanity<wink>-ly y'rs  - tim


From gstein at lyra.org  Sat Mar  4 03:59:26 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 3 Mar 2000 18:59:26 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim>
Message-ID: <Pine.LNX.4.10.10003031812160.14301-100000@nebula.lyra.org>

On Fri, 3 Mar 2000, Tim Peters wrote:
>...
> Note, though, that there is NO good answer to finalizers in cycles!  The

"Note" ?? Not just a note, but I'd say an axiom :-)

By definition, you have two objects referring to each other in some way.
How can you *definitely* know how to break the link between them? Do you
call A's finalizer or B's first? If they're instances, do you just whack
their __dict__ and hope for the best?

>...
> So here's what I'd consider doing:  explicit is better than implicit, and in
> the face of ambiguity refuse the temptation to guess.  If a trash cycle
> contains a finalizer (my, but that has to be rare. in practice, in
> well-designed code!), don't guess, but make it available to the user.  A
> gc.guardian() call could expose such beasts, or perhaps a callback could be
> registered, invoked when gc finds one of these things.  Anyone crazy enough
> to create cyclic trash with finalizers then has to take responsibility for
> breaking the cycle themself.  This puts the burden on the person creating
> the problem, and they can solve it in the way most appropriate to *their*
> specific needs.  IOW, the only people who lose under this scheme are the
> ones begging to lose, and their "loss" consists of taking responsibility.

I'm not sure if Tim is saying the same thing, but I'll write down a
concreate idea for cleaning garbage cycles.

First, a couple observations:

* Some objects can always be reliably "cleaned": lists, dicts, tuples.
  They just drop their contents, with no invocations against any of them.

  Note that an instance without a __del__ has no opinion on how it is
  cleaned.
  (this is related to Tim's point about whether a cycle has a finalizer)

* The other objects may need to *use* their referenced objects in some way
  to clean out cycles.

Since the second set of objects (possibly) need more care during their
cleanup, we must concentrate on how to solve their problem.

Back up a step: to determine where an object falls, let's define a
tp_clean type slot. It returns an integer and takes one parameter: an
operation integer.

    Py_TPCLEAN_CARE_CHECK      /* check whether care is needed */
    Py_TPCLEAN_CARE_EXEC       /* perform the careful cleaning */
    Py_TPCLEAN_EXEC            /* perform a non-careful cleaning */

Given a set of objects that require special cleaning mechanisms, there is
no way to tell where to start first. So... just pick the first one. Call
its tp_clean type slot with CARE_EXEC. For instances, this maps to
__clean__. If the instance does not have a __clean__, then tp_clean
returns FALSE meaning that it could not clean this object. The algorithm
moves on to the next object in the set.

If tp_clean returns TRUE, then the object has been "cleaned" and is moved
to the "no special care needed" list of objects, awaiting its reference
count to hit zero.

Note that objects in the "care" and "no care" lists may disappear during
the careful-cleaning process.

If the careful-cleaning algorithm hits the end of the careful set of
objects and the set is non-empty, then throw an exception:
GCImpossibleError. The objects in this set each said they could not be
cleaned carefully AND they were not dealloc'd during other objects'
cleaning.

[ it could be possible to define a *dynamic* CARE_EXEC that will succeed
  if you call it during a second pass; I'm not sure this is a Good Thing
  to allow, however. ]

This also implies that a developer should almost *always* consider writing
a __clean__ method whenever they write a __del__ method. That method MAY
be called when cycles need to be broken; the object should delete any
non-essential variables in such a way that integrity is retained (e.g. it
fails gracefully when methods are called and __del__ won't raise an
error). For example, __clean__ could call a self.close() to shut down its
operation. Whatever... you get the idea.

At the end of the iteration of the "care" set, then you may have objects
remaining in the "no care" set. By definition, these objects don't care
about their internal references to other objects (they don't need them
during deallocation). We iterate over this set, calling tp_clean(EXEC).
For lists, dicts, and tuples, the tp_clean(EXEC) call simply clears out
the references to other objects (but does not dealloc the object!). Again:
objects in the "no care" set will go away during this process. By the end
of the iteration over the "no care" set, it should be empty.

[ note: the iterations over these sets should probably INCREF/DECREF
  across the calls; otherwise, the object could be dealloc'd during the
  tp_clean call. ]

[ if the set is NOT empty, then tp_clean(EXEC) did not remove all possible
  references to other objects; not sure what this means. is it an error?
  maybe you just force a tp_dealloc on the remaining objects. ]

Note that the tp_clean mechanism could probably be used during the Python
finalization, where Python does a bunch of special-casing to clean up
modules. Specifically: a module does not care about its contents during
its deallocation, so it is a "no care" object; it responds to
tp_clean(EXEC) by clearing its dictionary. Class objects are similar: they
can clear their dict (which contains a module reference which usually
causes a loop) during tp_clean(EXEC). Module cleanup is easy once objects
with CARE_CHECK have been handled -- all that funny logic in there is to
deal with "care" objects.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tim_one at email.msn.com  Sat Mar  4 04:26:54 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 3 Mar 2000 22:26:54 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.LNX.4.10.10003031812160.14301-100000@nebula.lyra.org>
Message-ID: <000401bf8589$7d1364e0$c6a0143f@tim>

[Tim]
> Note, though, that there is NO good answer to finalizers in cycles!  The

[Greg Stein]
> "Note" ?? Not just a note, but I'd say an axiom :-)

An axiom is accepted without proof:  we have plenty of proof that there's no
thoroughly good answer (i.e., every language that has ever addressed this
issue -- along with every language that ever will <wink>).

> By definition, you have two objects referring to each other in some way.
> How can you *definitely* know how to break the link between them? Do you
> call A's finalizer or B's first? If they're instances, do you just whack
> their __dict__ and hope for the best?

Exactly.  The *programmer* may know the right thing to do, but the Python
implementation can't possibly know.  Facing both facts squarely constrains
the possibilities to the only ones that are all of understandable,
predictable and useful.  Cycles with finalizers must be a Magic-Free Zone
else you lose at least one of those three:  even Guido's kung fu isn't
strong enough to outguess this.

[a nice implementation sketch, of what seems an overly elaborate scheme,
 if you believe cycles with finalizers are rare in intelligently designed
 code)
]

Provided Guido stays interested in this, he'll make his own fun.  I'm just
inviting him to move in a sane direction <0.9 wink>.

One caution:

> ...
> If the careful-cleaning algorithm hits the end of the careful set of
> objects and the set is non-empty, then throw an exception:
> GCImpossibleError.

Since gc "can happen at any time", this is very severe (c.f. Guido's
objection to making resurrection illegal).  Hand a trash cycle back to the
programmer instead, via callback or request or whatever, and it's all
explicit without more cruft in the implementation.  It's alive again when
they get it back, and they can do anything they want with it (including
resurrecting it, or dropping it again, or breaking cycles -- anything).  I'd
focus on the cycles themselves, not on the types of objects involved.  I'm
not pretending to address the "order of finalization at shutdown" question,
though (although I'd agree they're deeply related:  how do you follow a
topological sort when there *isn't* one?  well, you don't, because you
can't).

realistically y'rs  - tim


From gstein at lyra.org  Sat Mar  4 09:43:45 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 4 Mar 2000 00:43:45 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <000401bf8589$7d1364e0$c6a0143f@tim>
Message-ID: <Pine.LNX.4.10.10003040000260.14301-100000@nebula.lyra.org>

On Fri, 3 Mar 2000, Tim Peters wrote:
>...
> [a nice implementation sketch, of what seems an overly elaborate scheme,
>  if you believe cycles with finalizers are rare in intelligently designed
>  code)
> ]

Nah. Quite simple to code up, but a bit longer to explain in English :-)

The hardest part is finding the cycles, but Guido already posted a long
explanation about that. Once that spits out the doubly-linked list of
objects, then you're set.

1) scan the list calling tp_clean(CARE_CHECK), shoving "care needed"
   objects to a second list
2) scan the care-needed list calling tp_clean(CARE_EXEC). if TRUE is
   returned, then the object was cleaned and moves to the "no care" list.
3) assert len(care-needed list) == 0
4) scan the no-care list calling tp_clean(EXEC)
5) (questionable) assert len(no-care list) == 0

The background makes it longer. The short description of the algorithm is
easy. Step (1) could probably be merged right into one of the scans in the
GC algorithm (e.g. during the placement into the "these are cyclical
garbage" list)

> Provided Guido stays interested in this, he'll make his own fun.  I'm just
> inviting him to move in a sane direction <0.9 wink>.

hehe... Agreed.

> One caution:
> 
> > ...
> > If the careful-cleaning algorithm hits the end of the careful set of
> > objects and the set is non-empty, then throw an exception:
> > GCImpossibleError.
> 
> Since gc "can happen at any time", this is very severe (c.f. Guido's
> objection to making resurrection illegal).

GCImpossibleError would simply be a subclass of MemoryError. Makes sense
to me, and definitely allows for its "spontaneity."

> Hand a trash cycle back to the
> programmer instead, via callback or request or whatever, and it's all
> explicit without more cruft in the implementation.  It's alive again when
> they get it back, and they can do anything they want with it (including
> resurrecting it, or dropping it again, or breaking cycles -- anything).  I'd
> focus on the cycles themselves, not on the types of objects involved.  I'm
> not pretending to address the "order of finalization at shutdown" question,
> though (although I'd agree they're deeply related:  how do you follow a
> topological sort when there *isn't* one?  well, you don't, because you
> can't).

I disagree. I don't think a Python-level function is going to have a very
good idea of what to do. IMO, this kind of semantics belong down in the
interpreter with a specific, documented algorithm. Throwing it out to
Python won't help -- that function will still have to use a "standard
pattern" for getting the cyclical objects to toss themselves. I think that
standard pattern should be a language definition. Without a standard
pattern, then you're saying the application will know what to do, but that
is kind of weird -- what happens when an unexpected cycle arrives?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From moshez at math.huji.ac.il  Sat Mar  4 10:50:19 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 4 Mar 2000 11:50:19 +0200 (IST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib
 ConfigParser.py,1.16,1.17
In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003041145580.1138-100000@sundial>

On Fri, 3 Mar 2000, Jeremy Hylton wrote:

> It sounds like ConfigParser code and docs could use a general cleanup.
> Are there any other issues to take care of as part of that cleanup?

One thing that bothered me once:

I want to be able to have something like:

[section]
tag = 1
tag = 2

And be able to retrieve ("section", "tag") -> ["1", "2"].
Can be awfully useful for things that make sense several time. 
Perhaps there should be two functions, one that reads a single-tag and
one that reads a multi-tag?

File format: I'm sure I'm going to get yelled at, but why don't we 
make it XML? Hard to edit, yadda, yadda, but you can easily write a
special purpose widget to edit XConfig (that's what we'll call the DTD)
files.

hopefull-yet-not-naive-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From gstein at lyra.org  Sat Mar  4 11:05:15 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 4 Mar 2000 02:05:15 -0800 (PST)
Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.GSO.4.10.10003041145580.1138-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003040201540.14301-100000@nebula.lyra.org>

On Sat, 4 Mar 2000, Moshe Zadka wrote:
> On Fri, 3 Mar 2000, Jeremy Hylton wrote:
> > It sounds like ConfigParser code and docs could use a general cleanup.
> > Are there any other issues to take care of as part of that cleanup?
> 
> One thing that bothered me once:
> 
> I want to be able to have something like:
> 
> [section]
> tag = 1
> tag = 2
> 
> And be able to retrieve ("section", "tag") -> ["1", "2"].
> Can be awfully useful for things that make sense several time. 
> Perhaps there should be two functions, one that reads a single-tag and
> one that reads a multi-tag?

Structured values would be nice. Several times, I've needed to decompose
the right hand side into lists.

> File format: I'm sure I'm going to get yelled at, but why don't we 
> make it XML? Hard to edit, yadda, yadda, but you can easily write a
> special purpose widget to edit XConfig (that's what we'll call the DTD)
> files.

Write a whole new module. ConfigParser is for files that look like the
above.

There isn't a reason to NOT use XML, but it shouldn't go into
ConfigParser.

<IMO>
I find the above style much easier for *humans*, than an XML file, to
specify options. XML is good for computers; not so good for humans.
</IMO>

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From moshez at math.huji.ac.il  Sat Mar  4 11:46:40 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 4 Mar 2000 12:46:40 +0200 (IST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim>
Message-ID: <Pine.GSO.4.10.10003041242100.1138-100000@sundial>

[Tim Peters]
> ...If a trash cycle
> contains a finalizer (my, but that has to be rare. in practice, in
> well-designed code!), 

This shows something Tim himself has often said -- he never programmed a
GUI. It's very hard to build a GUI (especially with Tkinter) which is
cycle-less, but the classes implementing the GUI often have __del__'s
to break system-allocated resources.

So, it's not as rare as we would like to believe, which is the reason
I haven't given this answer.

which-is-not-the-same-thing-as-disagreeing-with-it-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From moshez at math.huji.ac.il  Sat Mar  4 12:16:19 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 4 Mar 2000 13:16:19 +0200 (IST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.LNX.4.10.10003040000260.14301-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003041313270.1138-100000@sundial>

On Sat, 4 Mar 2000, Greg Stein wrote:

> I disagree. I don't think a Python-level function is going to have a very
> good idea of what to do
<snip>

Much better then the Python interpreter...

<snip>
> Throwing it out to Python won't help
<snip>
> what happens when an unexpected cycle arrives?

Don't delete it.
It's as simple as that, since it's a bug.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From moshez at math.huji.ac.il  Sat Mar  4 12:29:33 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 4 Mar 2000 13:29:33 +0200 (IST)
Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.LNX.4.10.10003040201540.14301-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003041317140.1138-100000@sundial>

On Sat, 4 Mar 2000, Greg Stein wrote:

> Write a whole new module. ConfigParser is for files that look like the
> above.

Gotcha.

One problem: two configurations modules might cause the classic "which
should I use?" confusion.

> <IMO>
> I find the above style much easier for *humans*, than an XML file, to
> specify options. XML is good for computers; not so good for humans.
> </IMO>

Of course: what human could delimit his text with <tag> and </tag>?

oh-no-another-c.l.py-bot-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From gstein at lyra.org  Sat Mar  4 12:38:46 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 4 Mar 2000 03:38:46 -0800 (PST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.GSO.4.10.10003041313270.1138-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003040329370.14301-100000@nebula.lyra.org>

On Sat, 4 Mar 2000, Moshe Zadka wrote:
> On Sat, 4 Mar 2000, Greg Stein wrote:
> > I disagree. I don't think a Python-level function is going to have a very
> > good idea of what to do
> <snip>
> 
> Much better then the Python interpreter...

If your function receives two instances (A and B), what are you going to
do? How can you know what their policy is for cleaning up in the face of a
cycle?

I maintain that you would call the equivalent of my proposed __clean__.
There isn't much else you'd be able to do, unless you had a completely
closed system, you expected cycles between specific types of objects, and
you knew a way to clean them up. Even then, you would still be calling
something like __clean__ to let the objects do whatever they needed.

I'm suggesting that __clean__ should be formalized (as part of tp_clean).
Throwing the handling "up to Python" isn't going to do much for you.

Seriously... I'm all for coding more stuff in Python rather than C, but
this just doesn't feel right. Getting the objects GC'd is a language
feature, and a specific pattern/method/recommendation is best formulated
as an interpreter mechanism.

> <snip>
> > Throwing it out to Python won't help
> <snip>
> > what happens when an unexpected cycle arrives?
> 
> Don't delete it.
> It's as simple as that, since it's a bug.

The point behind this stuff is to get rid of it, rather than let it linger
on. If the objects have finalizers (which is how we get to this step!),
then it typically means there is a resource they must release. Getting the
object cleaned and dealloc'd becomes quite important.

Cheers,
-g

p.s. did you send in a patch for the instance_contains() thing yet?

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Sat Mar  4 12:43:12 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 4 Mar 2000 03:43:12 -0800 (PST)
Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.GSO.4.10.10003041317140.1138-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003040338510.14301-100000@nebula.lyra.org>

On Sat, 4 Mar 2000, Moshe Zadka wrote:
> On Sat, 4 Mar 2000, Greg Stein wrote:
> > Write a whole new module. ConfigParser is for files that look like the
> > above.
> 
> Gotcha.
> 
> One problem: two configurations modules might cause the classic "which
> should I use?" confusion.

Nah. They wouldn't *both* be called ConfigParser. And besides, I see the
XML format more as a persistence mechanism rather than a configuration
mechanism. I'd call the module something like "XMLPersist".

> > <IMO>
> > I find the above style much easier for *humans*, than an XML file, to
> > specify options. XML is good for computers; not so good for humans.
> > </IMO>
> 
> Of course: what human could delimit his text with <tag> and </tag>?

Feh. As a communciation mechanism, dropping in that stuff... it's easy.

<appository>But</appository><comma/><noun>I</noun>
<verb><tense>would<modifier>not</modifier></tense>want</verb> ... bleck.

I wouldn't want to use XML for configuration stuff. It just gets ugly.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gvwilson at nevex.com  Sat Mar  4 17:46:24 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Sat, 4 Mar 2000 11:46:24 -0500 (EST)
Subject: [Python-Dev] HTMLgen-style interface to SQL?
Message-ID: <Pine.LNX.4.10.10003041145580.21381-100000@akbar.nevex.com>

[short form]

I'm looking for an object-oriented toolkit that will do for SQL what
Perl's CGI.pm module, or Python's HTMLgen, does for HTML.  Pointers,
examples, or expressions of interest would be welcome.

[long form]

Lincoln Stein's CGI.pm module for Perl allows me to build HTML in an
object-oriented way, instead of getting caught in the Turing tarpit of
string substitution and printf. DOM does the same (in a variety of
languages) for XML.

Right now, if I want to interact with an SQL database from Perl or Python,
I have to embed SQL strings in my programs. I would like to have a
DOM-like ability to build and manipulate queries as objects, then call a
method that translate the query structure into SQL to send to the
database. Alternatively, if there is an XML DTD for SQL (how's that for a
chain of TLAs?), and some tool to convert the XML/SQL to pure SQL, so that
I could build my query using DOM, that would be cool too.

RSVP,

Greg Wilson
gvwilson at nevex.com


From moshez at math.huji.ac.il  Sat Mar  4 19:02:54 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 4 Mar 2000 20:02:54 +0200 (IST)
Subject: [Python-Dev] Re: [Patches] selfnanny.py: checking for "self" in every method
In-Reply-To: <200003041724.MAA05053@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003041955560.4094-100000@sundial>

On Sat, 4 Mar 2000, Guido van Rossum wrote:

> Before we all start writing nannies and checkers, how about a standard
> API design first?  

I thoroughly agree -- we should have a standard API. I tried to write 
selfnanny so it could be callable from any API possible (e.g., it can
take either a file, a string, an ast or a tuple representation)

> I will want to call various nannies from a "Check"
> command that I plan to add to IDLE.  

Very cool: what I imagine is a sort of modular PyLint.

> I already did this with tabnanny,
> and found that it's barely possible -- it's really written to run like
> a script.

Mine definitely isn't: it's designed to run both like a script and like
a module. One outstanding bug: no docos. To be supplied upon request <0.5
wink>. I just wanted to float it out and see if people think that this
particular nanny is worth while.

> Since parsing is expensive, we probably want to share the parse tree.

Yes. Probably as an AST, and transform to tuples/lists inside the
checkers.

> Ideas?

Here's a strawman API:
There's a package called Nanny
Every module in that package should have a function called check_ast.
It's argument is an AST object, and it's output should be a list 
of three-tuples: (line-number, error-message, None) or 
(line-number, error-message, (column-begin, column-end)) (each tuple can
be a different form). 

Problems?
(I'm CCing to python-dev. Please follow up to that discussion to
python-dev only, as I don't believe it belongs in patches)
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From gvwilson at nevex.com  Sat Mar  4 19:26:20 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Sat, 4 Mar 2000 13:26:20 -0500 (EST)
Subject: [Python-Dev] Re: selfnanny.py / nanny architecture
In-Reply-To: <Pine.GSO.4.10.10003041955560.4094-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003041312320.21722-100000@akbar.nevex.com>

> > Guido van Rossum wrote:
> > Before we all start writing nannies and checkers, how about a standard
> > API design first?  

> Moshe Zadka wrote:
> Here's a strawman API:
> There's a package called Nanny
> Every module in that package should have a function called check_ast.
> It's argument is an AST object, and it's output should be a list 
> of three-tuples: (line-number, error-message, None) or 
> (line-number, error-message, (column-begin, column-end)) (each tuple can
> be a different form). 

Greg Wilson wrote:

The SUIF (Stanford University Intermediate Format) group has been working
on an extensible compiler framework for about ten years now.  The
framework is based on an extensible AST spec; anyone can plug in a new
analysis or optimization algorithm by writing one or more modules that
read and write decorated ASTs. (See http://suif.stanford.edu for more
information.)

Based on their experience, I'd suggest that every nanny take an AST as an
argument, and add complaints in place as decorations to the nodes.  A
terminal nanny could then collect these and display them to the user. I
think this architecture will make it simpler to write meta-nannies.

I'd further suggest that the AST be something that can be manipulated
through DOM, since (a) it's designed for tree-crunching, (b) it's already
documented reasonably well, (c) it'll save us re-inventing a wheel, and
(d) generating human-readable output in a variety of customizable formats
ought to be simple (well, simpler than the alternatives).

Greg


From jeremy at cnri.reston.va.us  Sun Mar  5 03:10:28 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Sat, 4 Mar 2000 21:10:28 -0500 (EST)
Subject: [Python-Dev] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <Pine.GSO.4.10.10003041317140.1138-100000@sundial>
References: <Pine.LNX.4.10.10003040201540.14301-100000@nebula.lyra.org>
	<Pine.GSO.4.10.10003041317140.1138-100000@sundial>
Message-ID: <14529.49684.219826.466310@bitdiddle.cnri.reston.va.us>

>>>>> "MZ" == Moshe Zadka <moshez at math.huji.ac.il> writes: 
 
  MZ> On Sat, 4 Mar 2000, Greg Stein wrote: 
  >> Write a whole new module. ConfigParser is for files that look 
  >> like the above. 
 
  MZ> Gotcha. 
 
  MZ> One problem: two configurations modules might cause the classic 
  MZ> "which should I use?" confusion. 
 
I don't think this is a hard decision to make.  ConfigParser is good 
for simple config files that are going to be maintained by humans with

a text editor. 
 
An XML-based configuration file is probably the right solution when 
humans aren't going to maintain the config files by hand.  Perhaps XML
will eventually be the right solution in both cases, but only if XML 
editors are widely available. 
 
  >> <IMO> I find the above style much easier for *humans*, than an 
  >> XML file, to specify options. XML is good for computers; not so 
  >> good for humans.  </IMO> 
 
  MZ> Of course: what human could delimit his text with <tag> and 
  MZ> </tag>? 
 
Could?  I'm sure there are more ways on Linux and Windows to mark up
text than are dreamt of in your philosophy, Moshe <wink>.  The
question is what is easiest to read and understand?

Jeremy


From tim_one at email.msn.com  Sun Mar  5 03:22:16 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 21:22:16 -0500
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" in every method
In-Reply-To: <200003041724.MAA05053@eric.cnri.reston.va.us>
Message-ID: <000201bf8649$a17383e0$f42d153f@tim>

[Guido van Rossum]
> Before we all start writing nannies and checkers, how about a standard
> API design first?  I will want to call various nannies from a "Check"
> command that I plan to add to IDLE.  I already did this with tabnanny,
> and found that it's barely possible -- it's really written to run like
> a script.

I like Moshe's suggestion fine, except with an abstract base class named
Nanny with a virtual method named check_ast.  Nannies should (of course)
derive from that.

> Since parsing is expensive, we probably want to share the parse tree.

What parse tree?  Python's parser module produces an AST not nearly "A
enough" for reasonably productive nanny writing.  GregS & BillT have
improved on that, but it's not in the std distrib.  Other "problems" include
the lack of original source lines in the trees, and lack of column-number
info.

Note that by the time Python has produced a parse tree, all evidence of the
very thing tabnanny is looking for has been removed.  That's why she used
the tokenize module to begin with.

God knows tokenize is too funky to use too when life gets harder (check out
checkappend.py's tokeneater state machine for a preliminary taste of that).

So the *only* solution is to adopt Christian's Stackless so I can rewrite
tokenize as a coroutine like God intended <wink>.

Seriously, I don't know of anything that produces a reasonably usable (for
nannies) parse tree now, except via modifying a Python grammar for use with
John Aycock's SPARK; the latter also comes with very pleasant & powerful
tree pattern-matching abilities.  But it's probably too slow for everyday
"just folks" use.  Grabbing the GregS/BillT enhancement is probably the most
practical thing we could build on right now (but tabnanny will have to
remain a special case).

unsure-about-the-state-of-simpleparse-on-mxtexttools-for-this-ly y'rs  - tim


From tim_one at email.msn.com  Sun Mar  5 04:24:18 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 22:24:18 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <38BE1B69.E0B88B41@lemburg.com>
Message-ID: <000301bf8652$4aadaf00$f42d153f@tim>

Just noting that two instances of this were found in Zope.

[/F]
>     append = list.append
>     for x in something:
>         append(...)

[Tim]
> As detailed in a c.l.py posting, I have yet to find a single instance of
> this actually called with multiple arguments.  Pointing out that it's
> *possible* isn't the same as demonstrating it's an actual problem.  I'm
> quite willing to believe that it is, but haven't yet seen evidence of it.


From fdrake at acm.org  Sun Mar  5 04:55:27 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Sat, 4 Mar 2000 22:55:27 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us>
	<Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>
	<14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
Message-ID: <14529.55983.263225.691427@weyr.cnri.reston.va.us>

Jeremy Hylton writes:
 > Thanks for catching that.  I didn't look at the context.  I'm going to
 > wait, though, until I talk to Fred to mess with the code any more.

  I did it that way since the .ini format allows comments after values 
(the ';' comments after a '=' vi; '#' comments are a ConfigParser
thing), but there's no equivalent concept for RFC822 parsing, other
than '(...)' in addresses.  The code was trying to allow what was
expected from the .ini crowd without breaking the "native" use of
ConfigParser.

 > General question for python-dev readers: What are your experiences
 > with ConfigParser?  I just used it to build a simple config parser for
 > IDLE and found it hard to use for several reasons.  The biggest
 > problem was that the file format is undocumented.  I also found it
 > clumsy to have to specify section and option arguments. I ended up
 > writing a proxy that specializes on section so that get takes only an
 > option argument.
 > 
 > It sounds like ConfigParser code and docs could use a general cleanup.
 > Are there any other issues to take care of as part of that cleanup?

  I agree that the API to ConfigParser sucks, and I think also that
the use of it as a general solution is a big mistake.  It's a messy
bit of code that doesn't need to be, supports a really nasty mix of
syntaxes, and can easily bite users who think they're getting
something .ini-like (the magic names and interpolation is a bad
idea!).  While it suited the original application well enough,
something with .ini syntax and interpolation from a subclass would
have been *much* better.
  I think we should create a new module, inilib, that implements
exactly .ini syntax in a base class that can be intelligently
extended.  ConfigParser should be deprecated.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From tim_one at email.msn.com  Sun Mar  5 05:11:12 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 23:11:12 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: <200003031705.MAA21700@eric.cnri.reston.va.us>
Message-ID: <000601bf8658$d81d34e0$f42d153f@tim>

[Guido]
> OK, so we're down to this one point: if __del__ resurrects the object,
> should __del__ be called again later?  Additionally, should
> resurrection be made illegal?

I give up on the latter, so it really is just one.

> I can easily see how __del__ could *accidentally* resurrect the object
> as part of its normal cleanup ...
> In this example, the helper routine will eventually delete the object
> from its cache, at which point it is truly deleted.  It would be
> harmful, not helpful, if __del__ was called again at this point.

If this is something that happens easily, and current behavior is harmful,
don't you think someone would have complained about it by now?  That is,
__del__ *is* "called again at this point" now, and has been for years &
years.  And if it happens easily, it *is* happening now, and in an unknown
amount of existing code.  (BTW, I doubt it happens at all <wink> -- people
tend to write very simple __del__ methods, so far as I've ever seen)

> Now, it is true that the current docs for __del__ imply that
> resurrection is possible.

"imply" is too weak.  The Reference Manual's "3.3.1 Basic customization"
flat-out says it's possible ("though not recommended").  The precise meaning
of the word "may" in the following sentence is open to debate, though.

> The intention of that note was to warn __del__ writers that in the case
> of accidental resurrection

Sorry, but I can't buy this:  saying that *accidents* are "not recommended"
is just too much of a stretch <wink/frown>.

> __del__ might be called again.

That's a plausible reading of the following "may", but not the only one.  I
believe it's the one you intended, but it's not the meaning I took prior to
this.

> The intention certainly wasn't to allow or encourage intentional
resurrection.

Well, I think it plainly says it's supported ("though not recommended").  I
used it intentionally at KSR, and even recommended it on c.l.py in the dim
past (in one of those "dark & useless" threads <wink>).

> Would there really be someone out there who uses *intentional*
> resurrection?  I severely doubt it.  I've never heard of this.

Why would anyone tell you about something that *works*?!  You rarely hear
the good stuff, you know.  I gave the typical pattern in the preceding msg.
To flesh out the motivation more, you have some external resource that's
very expensive to set up (in KSR's case, it was an IPC connection to a
remote machine).  Rights to use that resource are handed out in the form of
an object.  When a client is done using the resource, they *should*
explicitly use the object's .release() method, but you can't rely on that.
So the object's __del__ method looks like (for example):

def __del__(self):

    # Code not shown to figure out whether to disconnect:  the downside to
    # disconnecting is that it can cost a bundle to create a new connection.
    # If the whole app is shutting down, then of course we want to
disconnect.
    # Or if a timestamp trace shows that we haven't been making good use of
    # all the open connections lately, we may want to disconnect too.

    if decided_to_disconnect:
        self.external_resource.disconnect()
    else:
        # keep the connection alive for reuse
        global_available_connection_objects.append(self)

This is simple & effective, and it relies on both intentional resurrection
and __del__ getting called repeatedly.  I don't claim there's no other way
to write it, just that there's *been* no problem doing this for a millennium
<wink>.

Note that MAL spontaneously sketched similar examples, although I can't say
whether he's actually done stuff like this.


Going back up a level, in another msg you finally admitted <wink> that you
want "__del__ called only once" for the same reason Java wants it:  because
gc has no idea what to do when faced with finalizers in a trash cycle, and
settles for an unprincipled scheme whose primary virtue is that "it doesn't
blow up" -- and "__del__ called only once" happens to be convenient for that
scheme.

But toss such cycles back to the user to deal with at the Python level, and
all those problems go away (along with the artificial need to change
__del__).  The user can break the cycles in an order that makes sense to the
app (or they can let 'em leak!  up to them).

    >>> print gc.get_cycle.__doc__
    Return a list of objects comprising a single garbage cycle; [] if none.

    At least one of the objects has a finalizer, so Python can't determine
the
    intended order of destruction.  If you don't break the cycle, Python
will
    neither run any finalizers for the contained objects nor reclaim their
    memory.  If you do break the cycle, and dispose of the list, Python will
    follow its normal reference-counting rules for running finalizers and
    reclaiming memory.

That this "won't blow up" either is just the least of its virtues <wink>.

you-break-it-you-buy-it-ly y'rs  - tim


From tim_one at email.msn.com  Sun Mar  5 05:56:54 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 23:56:54 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.LNX.4.10.10003040000260.14301-100000@nebula.lyra.org>
Message-ID: <000001bf865f$3acb99a0$432d153f@tim>

[Tim sez "toss insane cycles back on the user"]

[Greg Stein]
> I disagree. I don't think a Python-level function is going to have a very
> good idea of what to do.

You've already assumed that Python coders know exactly what to do, else they
couldn't have coded the new __clean__ method your proposal relies on.  I'm
taking what strikes me as the best part of Scheme's Guardian idea:  don't
assume *anything* about what users "should" do to clean up their trash.
Leave it up to them:  their problem, their solution.  I think finalizers in
trash cycles should be so rare in well-written code that it's just not worth
adding much of anything in the implementation to cater to it.

> IMO, this kind of semantics belong down in the interpreter with a
> specific, documented algorithm. Throwing it out to Python won't help
> -- that function will still have to use a "standard pattern" for getting
> the cyclical objects to toss themselves.

They can use any pattern they want, and if the pattern doesn't *need* to be
coded in C as part of the implementation, it shouldn't be.

> I think that standard pattern should be a language definition.

I distrust our ability to foresee everything users may need over the next 10
years:  how can we know today that the first std pattern you dreamed up off
the top of your head is the best approach to an unbounded number of problems
we haven't yet seen a one of <wink>?

> Without a standard pattern, then you're saying the application will know
> what to do, but that is kind of weird -- what happens when an unexpected
> cycle arrives?

With the hypothetical gc.get_cycle() function I mentioned before, they
should inspect objects in the list they get back, and if they find they
don't know what to do with them, they can still do anything <wink> they
want.  Examples include raising an exception, dialing my home pager at 3am
to insist I come in to look at it, or simply let the list go away (at which
point the objects in the list will again become a trash cycle containing a
finalizer).

If several distinct third-party modules get into this act, I *can* see where
it could become a mess.  That's why Scheme "guardians" is plural:  a given
module could register its "problem objects" in advance with a specific
guardian of its own, and query only that guardian later for things ready to
die.  This probably can't be implemented in Python, though, without support
for weak references (or lots of brittle assumptions about specific refcount
values).

agreeably-disagreeing-ly y'rs  - tim


From tim_one at email.msn.com  Sun Mar  5 05:56:58 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 4 Mar 2000 23:56:58 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <Pine.GSO.4.10.10003041242100.1138-100000@sundial>
Message-ID: <000101bf865f$3cb0d460$432d153f@tim>

[Tim]
> ...If a trash cycle contains a finalizer (my, but that has to be rare.
> in practice, in well-designed code!),

[Moshe Zadka]
> This shows something Tim himself has often said -- he never programmed a
> GUI. It's very hard to build a GUI (especially with Tkinter) which is
> cycle-less, but the classes implementing the GUI often have __del__'s
> to break system-allocated resources.
>
> So, it's not as rare as we would like to believe, which is the reason
> I haven't given this answer.

I wrote Cyclops.py when trying to track down leaks in IDLE.  The
extraordinary thing we discovered is that "even real gc" would not have
reclaimed the cycles.  They were legitimately reachable, because, indeed,
"everything points to everything else".  Guido fixed almost all of them by
explicitly calling new "close" methods.  I believe IDLE has no __del__
methods at all now.  Tkinter.py currently contains two.

so-they-contained-__del__-but-weren't-trash-ly y'rs  - tim


From tim_one at email.msn.com  Sun Mar  5 07:05:24 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sun, 5 Mar 2000 01:05:24 -0500
Subject: [Python-Dev] Unicode mapping tables
In-Reply-To: <38BCD71C.3592E6A@lemburg.com>
Message-ID: <000601bf8668$cbbdd640$432d153f@tim>

[M.-A. Lemburg]
> ...
> Here's what I'll do:
>
> * implement .capitalize() in the traditional way for Unicode
>   objects (simply convert the first char to uppercase)

Given .title(), is .capitalize() of use for Unicode strings?  Or is it just
a temptation to do something senseless in the Unicode world?  If it doesn't
make sense, leave it out (this *seems* like compulsion <wink> to implement
all current string methods in *some* way for Unicode, whether or not they
make sense).


From moshez at math.huji.ac.il  Sun Mar  5 07:16:22 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 5 Mar 2000 08:16:22 +0200 (IST)
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" in
 every method
In-Reply-To: <000201bf8649$a17383e0$f42d153f@tim>
Message-ID: <Pine.GSO.4.10.10003050800540.7148-100000@sundial>

On Sat, 4 Mar 2000, Tim Peters wrote:

> I like Moshe's suggestion fine, except with an abstract base class named
> Nanny with a virtual method named check_ast.  Nannies should (of course)
> derive from that.

Why? The C++ you're programming damaged your common sense cycles?

> > Since parsing is expensive, we probably want to share the parse tree.
> 
> What parse tree?  Python's parser module produces an AST not nearly "A
> enough" for reasonably productive nanny writing.

As a note, selfnanny uses the parser module AST.

>  GregS & BillT have
> improved on that, but it's not in the std distrib.  Other "problems" include
> the lack of original source lines in the trees,

The parser module has source lines.

> and lack of column-number info.

Yes, that sucks.

> Note that by the time Python has produced a parse tree, all evidence of the
> very thing tabnanny is looking for has been removed.  That's why she used
> the tokenize module to begin with.

Well, it's one of the few nannies which would be in that position.

> God knows tokenize is too funky to use too when life gets harder (check out
> checkappend.py's tokeneater state machine for a preliminary taste of that).

Why doesn't checkappend.py uses the parser module?

> Grabbing the GregS/BillT enhancement is probably the most
> practical thing we could build on right now 

You got some pointers?

> (but tabnanny will have to remain a special case).

tim-will-always-be-a-special-case-in-our-hearts-ly y'rs, Z.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From tim_one at email.msn.com  Sun Mar  5 08:01:12 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sun, 5 Mar 2000 02:01:12 -0500
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method
In-Reply-To: <Pine.GSO.4.10.10003050800540.7148-100000@sundial>
Message-ID: <000901bf8670$97d8f320$432d153f@tim>

[Tim]
>> [make Nanny a base class]

[Moshe Zadka]
> Why?

Because it's an obvious application for OO design.  A common base class
formalizes the interface and can provide useful utilities for subclasses.

> The C++ you're programming damaged your common sense cycles?

Yes, very, but that isn't relevant here <wink>.  It's good Python sense too.

>> [parser module produces trees far too concrete for comfort]

> As a note, selfnanny uses the parser module AST.

Understood, but selfnanny has a relatively trivial task.  Hassling with
tuples nested dozens deep for even relatively simple stmts is both a PITA
and a time sink.

>> [parser doesn't give source lines]

> The parser module has source lines.

No, it does not (it only returns terminals, as isolated strings).  The
tokenize module does deliver original source lines in their entirety (as
well as terminals, as isolated strings; and column numbers).

>> and lack of column-number info.

> Yes, that sucks.

> ...
> Why doesn't checkappend.py uses the parser module?

Because it wanted to display the acutal source line containing an offending
"append" (which, again, the parse module does not supply).  Besides, it was
a trivial variation on tabnanny.py, of which I have approximately 300 copies
on my disk <wink>.

>> Grabbing the GregS/BillT enhancement is probably the most
>> practical thing we could build on right now

> You got some pointers?

Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab
transformer.py from the  zip file.  The latter supplies a very useful
post-processing pass over the parse module's output, squashing it *way*
down.


From moshez at math.huji.ac.il  Sun Mar  5 08:08:41 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 5 Mar 2000 09:08:41 +0200 (IST)
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self"
 inevery method
In-Reply-To: <000901bf8670$97d8f320$432d153f@tim>
Message-ID: <Pine.GSO.4.10.10003050906030.7148-100000@sundial>

On Sun, 5 Mar 2000, Tim Peters wrote:

> [Tim]
> >> [make Nanny a base class]
> 
> [Moshe Zadka]
> > Why?
> 
> Because it's an obvious application for OO design.  A common base class
> formalizes the interface and can provide useful utilities for subclasses.

The interface is just one function. You're welcome to have a do-nothing
nanny that people *can* derive from: I see no point in making them derive
from a base class.

> > As a note, selfnanny uses the parser module AST.
> 
> Understood, but selfnanny has a relatively trivial task.

That it does, and it was painful.

> >> [parser doesn't give source lines]
> 
> > The parser module has source lines.
> 
> No, it does not (it only returns terminals, as isolated strings). 

Sorry, misunderstanding: it seemed obvious to me you wanted line numbers.
For lines, use the linecache module...

> > You got some pointers?
> 
> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab
> transformer.py from the  zip file. 

I'll have a look.
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From effbot at telia.com  Sun Mar  5 10:24:37 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Sun, 5 Mar 2000 10:24:37 +0100
Subject: [Python-Dev] return statements in lambda
Message-ID: <006f01bf8686$391ced80$34aab5d4@hagrid>

from "Python for Lisp Programmers":
http://www.norvig.com/python-lisp.html

> Don't forget return. Writing def twice(x): x+x is tempting
> and doesn't signal a warning or > ception, but you probably
> meant to have a return in there. This is particularly irksome
> because in a lambda you are prohibited from writing return,
> but the semantics is to do the return. 

maybe adding an (optional but encouraged) "return"
to lambda would be an improvement?

    lambda x: x + 10

vs.

    lambda x: return x + 10

or is this just more confusing...  opinions?

</F>


From guido at python.org  Sun Mar  5 13:04:56 2000
From: guido at python.org (Guido van Rossum)
Date: Sun, 05 Mar 2000 07:04:56 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: Your message of "Sat, 04 Mar 2000 22:55:27 EST."
             <14529.55983.263225.691427@weyr.cnri.reston.va.us> 
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>  
            <14529.55983.263225.691427@weyr.cnri.reston.va.us> 
Message-ID: <200003051204.HAA05367@eric.cnri.reston.va.us>

[Fred]
>   I agree that the API to ConfigParser sucks, and I think also that
> the use of it as a general solution is a big mistake.  It's a messy
> bit of code that doesn't need to be, supports a really nasty mix of
> syntaxes, and can easily bite users who think they're getting
> something .ini-like (the magic names and interpolation is a bad
> idea!).  While it suited the original application well enough,
> something with .ini syntax and interpolation from a subclass would
> have been *much* better.
>   I think we should create a new module, inilib, that implements
> exactly .ini syntax in a base class that can be intelligently
> extended.  ConfigParser should be deprecated.

Amen.

Some thoughts:

- You could put it all in ConfigParser.py but with new classnames.
(Not sure though, since the ConfigParser class, which is really a
kind of weird variant, will be assumed to be the main class because
its name is that of the module.)

- Variants on the syntax could be given through some kind of option
system rather than through subclassing -- they should be combinable
independently.  Som possible options (maybe I'm going overboard here)
could be:

	- comment characters: ('#', ';', both, others?)
	- comments after variables allowed? on sections?
	- variable characters: (':', '=', both, others?)
	- quoting of values with "..." allowed?
	- backslashes in "..." allowed?
	- does backslash-newline mean a continuation?
	- case sensitivity for section names (default on)
	- case sensitivity for option names (default off)
	- variables allowed before first section name?
	- first section name?  (default "main")
	- character set allowed in section names
	- character set allowed in variable names
	- %(...) substitution?

(Well maybe the whole substitution thing should really be done through
a subclass -- it's too weird for normal use.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Sun Mar  5 13:17:31 2000
From: guido at python.org (Guido van Rossum)
Date: Sun, 05 Mar 2000 07:17:31 -0500
Subject: [Python-Dev] Unicode mapping tables
In-Reply-To: Your message of "Sun, 05 Mar 2000 01:05:24 EST."
             <000601bf8668$cbbdd640$432d153f@tim> 
References: <000601bf8668$cbbdd640$432d153f@tim> 
Message-ID: <200003051217.HAA05395@eric.cnri.reston.va.us>

> [M.-A. Lemburg]
> > ...
> > Here's what I'll do:
> >
> > * implement .capitalize() in the traditional way for Unicode
> >   objects (simply convert the first char to uppercase)

[Tim]
> Given .title(), is .capitalize() of use for Unicode strings?  Or is it just
> a temptation to do something senseless in the Unicode world?  If it doesn't
> make sense, leave it out (this *seems* like compulsion <wink> to implement
> all current string methods in *some* way for Unicode, whether or not they
> make sense).

The intention of this is to make code that does something using
strings do exactly the same strings if those strings happen to be
Unicode strings with the same values.

The capitalize method returns self[0].upper() + self[1:] -- that may
not make sense for e.g. Japanese, but it certainly does for Russian or
Greek.

It also does this in JPython.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Sun Mar  5 13:24:41 2000
From: guido at python.org (Guido van Rossum)
Date: Sun, 05 Mar 2000 07:24:41 -0500
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method
In-Reply-To: Your message of "Sun, 05 Mar 2000 02:01:12 EST."
             <000901bf8670$97d8f320$432d153f@tim> 
References: <000901bf8670$97d8f320$432d153f@tim> 
Message-ID: <200003051224.HAA05410@eric.cnri.reston.va.us>

> >> [parser doesn't give source lines]
> 
> > The parser module has source lines.
> 
> No, it does not (it only returns terminals, as isolated strings).  The
> tokenize module does deliver original source lines in their entirety (as
> well as terminals, as isolated strings; and column numbers).

Moshe meant line numbers - -it has those.

> > Why doesn't checkappend.py uses the parser module?
> 
> Because it wanted to display the acutal source line containing an offending
> "append" (which, again, the parse module does not supply).  Besides, it was
> a trivial variation on tabnanny.py, of which I have approximately 300 copies
> on my disk <wink>.

Of course another argument for making things more OO.  (The code used
in tabnanny.py to process files and recursively directories fronm
sys.argv is replicated a thousand times in various scripts of mine --
Tim took it from my now-defunct takpolice.py.  This should be in the
std library somehow...)

> >> Grabbing the GregS/BillT enhancement is probably the most
> >> practical thing we could build on right now
> 
> > You got some pointers?
> 
> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and grab
> transformer.py from the  zip file.  The latter supplies a very useful
> post-processing pass over the parse module's output, squashing it *way*
> down.

Those of you who have seen the compiler-sig should know that Jeremy
made an improvement which will find its way into p2c.  It's currently
on display in the Python CVS tree in the nondist branch: see
http://www.python.org/pipermail/compiler-sig/2000-February/000011.html
and the ensuing thread for more details.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Sun Mar  5 14:46:13 2000
From: guido at python.org (Guido van Rossum)
Date: Sun, 05 Mar 2000 08:46:13 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: Your message of "Fri, 03 Mar 2000 22:26:54 EST."
             <000401bf8589$7d1364e0$c6a0143f@tim> 
References: <000401bf8589$7d1364e0$c6a0143f@tim> 
Message-ID: <200003051346.IAA05539@eric.cnri.reston.va.us>

I'm beginning to believe that handing cycles with finalizers to the
user is better than calling __del__ with a different meaning, and I
tentatively withdraw my proposal to change the rules for when __del__
is called (even when __init__ fails; I haven't had any complaints
about that either).

There seem to be two competing suggestions for solutions: (1) call
some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the
object; (2) Tim's proposal of an interface to ask the garbage
collector for a trash cycle with a finalizer (or for an object with a
finalizer in a trash cycle?).

Somehow Tim's version looks less helpful to me, because it *seems*
that whoever gets to handle the cycle (the main code of the program?)
isn't necessarily responsible for creating it (some library you didn't
even know was used under the covers of some other library you called).

Of course, it's also posssible that a trash cycle is created by code
outside the responsibility of the finalizer.

But still, I have a hard time understanding how Tim's version would be
used.  Greg or Marc-Andre's version I understand.

What keeps nagging me though is what to do when there's a finalizer
but no cleanup method.  I guess the trash cycle remains alive.  Is
this acceptable?  (I guess so, because we've given the programmer a
way to resolve the trash: provide a cleanup method.)

If we detect individual cycles (the current algorithm doesn't do that
yet, though it seems easy enough to do another scan), could we
special-case cycles with only one finalizer and no cleaner-upper?
(I'm tempted to call the finalizer because it seems little harm can be
done -- but then of course there's the problem of the finalizer being
called again when the refcount really goes to zero. :-( )

> Exactly.  The *programmer* may know the right thing to do, but the Python
> implementation can't possibly know.  Facing both facts squarely constrains
> the possibilities to the only ones that are all of understandable,
> predictable and useful.  Cycles with finalizers must be a Magic-Free Zone
> else you lose at least one of those three:  even Guido's kung fu isn't
> strong enough to outguess this.
> 
> [a nice implementation sketch, of what seems an overly elaborate scheme,
>  if you believe cycles with finalizers are rare in intelligently designed
>  code)
> ]
> 
> Provided Guido stays interested in this, he'll make his own fun.  I'm just
> inviting him to move in a sane direction <0.9 wink>.

My current tendency is to go with the basic __cleanup__ and nothing
more, calling each instance's __cleanup__ before clobbering
directories and lists -- which should break all cycles safely.

> One caution:
> 
> > ...
> > If the careful-cleaning algorithm hits the end of the careful set of
> > objects and the set is non-empty, then throw an exception:
> > GCImpossibleError.
> 
> Since gc "can happen at any time", this is very severe (c.f. Guido's
> objection to making resurrection illegal).

Not quite.  Cycle detection is presumably only called every once in a
while on memory allocation, and memory *allocation* (as opposed to
deallocation) is allowed to fail.  Of course, this will probably run
into various coding bugs where allocation failure isn't dealt with
properly, because in practice this happens so rarely...

> Hand a trash cycle back to the
> programmer instead, via callback or request or whatever, and it's all
> explicit without more cruft in the implementation.  It's alive again when
> they get it back, and they can do anything they want with it (including
> resurrecting it, or dropping it again, or breaking cycles --
> anything).

That was the idea with calling the finalizer too: it would be called
between INCREF/DECREF, so the object would be considered alive for the
duration of the finalizer call.

Here's another way of looking at my error: for dicts and lists, I
would call a special *clear* function; but for instances, I would call
*dealloc*, however intending it to perform a *clear*.

I wish we didn't have to special-case finalizers on class instances
(since each dealloc function is potentially a combination of a
finalizer and a deallocation routine), but the truth is that they
*are* special -- __del__ has no responsibility for deallocating
memory, only for deallocating external resources (such as temp files).

And even if we introduced a tp_clean protocol that would clear dicts
and lists and call __cleanup__ for instances, we'd still want to call
it first for instances, because an instance depends on its __dict__
for its __cleanup__ to succeed (but the __dict__ doesn't depend on the
instance for its cleanup).  Greg's 3-phase tp_clean protocol seems
indeed overly elaborate but I guess it deals with such dependencies in
the most general fashion.

> I'd focus on the cycles themselves, not on the types of objects
> involved.  I'm not pretending to address the "order of finalization
> at shutdown" question, though (although I'd agree they're deeply
> related: how do you follow a topological sort when there *isn't*
> one?  well, you don't, because you can't).

In theory, you just delete the last root (a C global pointing to
sys.modules) and you run the garbage collector.  It might be more
complicated in practiceto track down all roots.  Another practical
consideration is that now there are cycles of the form

<function object> <=> <module dict>

which suggests that we should make function objects traceable.  Also,
modules can cross-reference, so module objects should be made
traceable.  I don't think that this will grow the sets of traced
objects by too much (since the dicts involved are already traced, and
a typical program has way fewer functions and modules than it has
class instances).  On the other hand, we may also have to trace
(un)bound method objects, and these may be tricky because they are
allocated and deallocated at high rates (once per typical method
call).

Back to the drawing board...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at mojam.com  Sun Mar  5 17:42:30 2000
From: skip at mojam.com (Skip Montanaro)
Date: Sun, 5 Mar 2000 10:42:30 -0600 (CST)
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <200003051346.IAA05539@eric.cnri.reston.va.us>
References: <000401bf8589$7d1364e0$c6a0143f@tim>
	<200003051346.IAA05539@eric.cnri.reston.va.us>
Message-ID: <14530.36471.11654.666900@beluga.mojam.com>

    Guido> What keeps nagging me though is what to do when there's a
    Guido> finalizer but no cleanup method.  I guess the trash cycle remains
    Guido> alive.  Is this acceptable?  (I guess so, because we've given the
    Guido> programmer a way to resolve the trash: provide a cleanup method.)

That assumes the programmer even knows there's a cycle, right?  I'd like to
see this scheme help provide debugging assistance.  If a cycle is discovered
but the programmer hasn't declared a cleanup method for the object it wants
to cleanup, a default cleanup method is called if it exists
(e.g. sys.default_cleanup), which would serve mostly as an alert (print
magic hex values to stderr, popup a Tk bomb dialog, raise the blue screen of
death, ...) as opposed to actually breaking any cycles.  Presumably the
programmer would define sys.default_cleanup during development and leave it
undefined during production.

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From paul at prescod.net  Sat Mar  4 02:04:43 2000
From: paul at prescod.net (Paul Prescod)
Date: Fri, 03 Mar 2000 17:04:43 -0800
Subject: [Python-Dev] breaking list.append()
References: <Pine.LNX.4.21.0002291957250.22173-100000@korak.digicool.com>  
	            <38BC86E1.53F69776@prescod.net> <200003010411.XAA12988@eric.cnri.reston.va.us>
Message-ID: <38C0612B.7C92F8C4@prescod.net>

Guido van Rossum wrote:
> 
> ..
> Multi-arg
> append probably won't be the only reason why e.g. Digital Creations
> may need to release an update to Zope for Python 1.6.  Zope comes with
> its own version of Python anyway, so they have control over when they
> make the switch.

My concernc is when I want to build an application with a module that
only works with Python 1.5.2 and another one that only works with Python
1.6. If we can avoid that situation by making 1.6 compatible with 1.5.2.
we should. By the time 1.7 comes around I will accept that everyone has
had enough time to update their modules. Remember that many module
authors are just part time volunteers. They may only use Python every
few months when they get a spare weekend!

I really hope that Andrew is wrong when he predicts that there may be
lots of different places where Python 1.6 breaks code! I'm in favor of
being a total jerk when it comes to Py3K but Python has been pretty
conservative thus far.

Could someone remind in one sentence what the downside is for treating
this as a warning condition as Java does with its deprecated features?
Then the CP4E people don't get into bad habits and those same CP4E
people trying to use older modules don't run into frustrating runtime
errors. Do it for the CP4E people! (how's that for rhetoric)
-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"We still do not know why mathematics is true and whether it is
certain. But we know what we do not know in an immeasurably richer way
than we did. And learning this has been a remarkable achievement,
among the greatest and least known of the modern era." 
        - from "Advent of the Algorithm" David Berlinski
	http://www.opengroup.com/mabooks/015/0151003386.shtml


From jeremy at cnri.reston.va.us  Sun Mar  5 18:46:14 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Sun, 5 Mar 2000 12:46:14 -0500 (EST)
Subject: [Python-Dev] RE: [Patches] selfnanny.py: checking for "self" inevery method
In-Reply-To: <000901bf8670$97d8f320$432d153f@tim>
References: <Pine.GSO.4.10.10003050800540.7148-100000@sundial>
	<000901bf8670$97d8f320$432d153f@tim>
Message-ID: <14530.40294.593407.777859@bitdiddle.cnri.reston.va.us>

>>>>> "TP" == Tim Peters <tim_one at email.msn.com> writes:

  >>> Grabbing the GregS/BillT enhancement is probably the most
  >>> practical thing we could build on right now

  >> You got some pointers?

  TP> Download python2c (http://www.mudlib.org/~rassilon/p2c/) and
  TP> grab transformer.py from the zip file.  The latter supplies a
  TP> very useful post-processing pass over the parse module's output,
  TP> squashing it *way* down.

The compiler tools in python/nondist/src/Compiler include Bill &
Greg's transformer code, a class-based AST (each node is a subclass of
the generic node), and a visitor framework for walking the AST.  

The APIs and organization are in a bit of flux; Mark Hammond suggested
some reorganization that I've not finished yet.  I may finish it up
this evening.

The transformer module does a good job of incuding line numbers, but
I've occasionally run into a node that didn't have a lineno
attribute when I expected it would.  I haven't taken the time to
figure out if my expection was unreasonable or if the transformer
should be fixed.

The compiler-sig might be a good place to discuss this further.  A
warning framework was one of my original goals for the SIG.  I imagine
we could convince Guido to move warnings + compiler tools into the
standard library if they end up being useful.

Jeremy


From mal at lemburg.com  Sun Mar  5 20:57:32 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 05 Mar 2000 20:57:32 +0100
Subject: [Python-Dev] Unicode mapping tables
References: <000601bf8668$cbbdd640$432d153f@tim>
Message-ID: <38C2BC2C.FFEB72C3@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > ...
> > Here's what I'll do:
> >
> > * implement .capitalize() in the traditional way for Unicode
> >   objects (simply convert the first char to uppercase)
> 
> Given .title(), is .capitalize() of use for Unicode strings?  Or is it just
> a temptation to do something senseless in the Unicode world?  If it doesn't
> make sense, leave it out (this *seems* like compulsion <wink> to implement
> all current string methods in *some* way for Unicode, whether or not they
> make sense).

.capitalize() only touches the first char of the string - not
sure whether it makes sense in both worlds ;-)

Anyhow, the difference is there but subtle: string.capitalize()
will use C's toupper() which is locale dependent, while
unicode.capitalize() uses Unicode's toTitleCase() for the first
character.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Sun Mar  5 21:15:47 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 05 Mar 2000 21:15:47 +0100
Subject: [Python-Dev] Design question: call __del__ only after successful 
 __init__?
References: <000601bf8658$d81d34e0$f42d153f@tim>
Message-ID: <38C2C073.CD51688@lemburg.com>

Tim Peters wrote:
> 
> [Guido]
> > Would there really be someone out there who uses *intentional*
> > resurrection?  I severely doubt it.  I've never heard of this.
> 
> Why would anyone tell you about something that *works*?!  You rarely hear
> the good stuff, you know.  I gave the typical pattern in the preceding msg.
> To flesh out the motivation more, you have some external resource that's
> very expensive to set up (in KSR's case, it was an IPC connection to a
> remote machine).  Rights to use that resource are handed out in the form of
> an object.  When a client is done using the resource, they *should*
> explicitly use the object's .release() method, but you can't rely on that.
> So the object's __del__ method looks like (for example):
> 
> def __del__(self):
> 
>     # Code not shown to figure out whether to disconnect:  the downside to
>     # disconnecting is that it can cost a bundle to create a new connection.
>     # If the whole app is shutting down, then of course we want to
> disconnect.
>     # Or if a timestamp trace shows that we haven't been making good use of
>     # all the open connections lately, we may want to disconnect too.
> 
>     if decided_to_disconnect:
>         self.external_resource.disconnect()
>     else:
>         # keep the connection alive for reuse
>         global_available_connection_objects.append(self)
> 
> This is simple & effective, and it relies on both intentional resurrection
> and __del__ getting called repeatedly.  I don't claim there's no other way
> to write it, just that there's *been* no problem doing this for a millennium
> <wink>.
> 
> Note that MAL spontaneously sketched similar examples, although I can't say
> whether he's actually done stuff like this.

Not exactly this, but similar things in the weak reference
implementation of mxProxy.

The idea came from a different area: the C implementation
of Python uses free lists a lot and these are basically
implementations of the same idiom: save an allocated
resource for reviving it at some later point.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From nascheme at enme.ucalgary.ca  Mon Mar  6 01:27:54 2000
From: nascheme at enme.ucalgary.ca (nascheme at enme.ucalgary.ca)
Date: Sun, 5 Mar 2000 17:27:54 -0700
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <000001bf857a$60b45ac0$c6a0143f@tim>; from tim_one@email.msn.com on Fri, Mar 03, 2000 at 08:38:43PM -0500
References: <200003031650.LAA21647@eric.cnri.reston.va.us> <000001bf857a$60b45ac0$c6a0143f@tim>
Message-ID: <20000305172754.A14998@acs.ucalgary.ca>

On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote:
> So here's what I'd consider doing:  explicit is better than implicit, and in
> the face of ambiguity refuse the temptation to guess.

I like Marc's suggestion.  Here is my proposal:

Allow classes to have a new method, __cleanup__ or whatever you
want to call it.  When tp_clear is called for an instance, it
checks for this method.  If it exists, call it, otherwise delete
the container objects from the instance's dictionary.  When
collecting cycles, call tp_clear for instances first.

Its simple and allows the programmer to cleanly break cycles if
they insist on creating them and using __del__ methods.


    Neil


From tim_one at email.msn.com  Mon Mar  6 08:13:21 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Mon, 6 Mar 2000 02:13:21 -0500
Subject: [Python-Dev] breaking list.append()
In-Reply-To: <38C0612B.7C92F8C4@prescod.net>
Message-ID: <000401bf873b$745f8320$ea2d153f@tim>

[Paul Prescod]
> ...
> Could someone remind in one sentence what the downside is for treating
> this as a warning condition as Java does with its deprecated features?

Simply the lack of anything to build on:  Python has no sort of runtime
warning system now, and nobody has volunteered to create one.  If you do
<wink>, remember that stdout & stderr may go to the bit bucket in a GUI app.

The bit about dropping the "L" suffix on longs seems unwarnable-about in any
case (short of warning every time anyone uses long()).

remember-that-you-asked-for-the-problems-not-for-solutions<wink>-ly y'rs
    - tim


From tim_one at email.msn.com  Mon Mar  6 08:33:49 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Mon, 6 Mar 2000 02:33:49 -0500
Subject: [Python-Dev] Design question: call __del__ only after successful __init__?
In-Reply-To: <38C2C073.CD51688@lemburg.com>
Message-ID: <000701bf873e$5032eca0$ea2d153f@tim>

[M.-A. Lemburg, on the resurrection/multiple-__del__ "idiom"]
> ...
> The idea came from a different area: the C implementation
> of Python uses free lists a lot and these are basically
> implementations of the same idiom: save an allocated
> resource for reviving it at some later point.

Excellent analogy!  Thanks.  Now that you phrased it in this clarifying way,
I recall that very much the same point was raised in the papers that
resulted in the creation of guardians in Scheme.  I don't know that anyone
is actually using Python __del__ this way today (I am not), but you reminded
me why I thought it was natural at one time <wink>.

generally-__del__-aversive-now-except-in-c++-where-destructors-are-
    guaranteed-to-be-called-when-you-except-them-to-be-ly y'rs  - tim


From tim_one at email.msn.com  Mon Mar  6 09:12:06 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Mon, 6 Mar 2000 03:12:06 -0500
Subject: [Python-Dev] return statements in lambda
In-Reply-To: <006f01bf8686$391ced80$34aab5d4@hagrid>
Message-ID: <000901bf8743$a9f61aa0$ea2d153f@tim>

[/F]
> maybe adding an (optional but encouraged) "return"
> to lambda would be an improvement?
>
>     lambda x: x + 10
>
> vs.
>
>     lambda x: return x + 10
>
> or is this just more confusing...  opinions?

It was an odd complaint to begin with, since Lisp-heads aren't used to using
"return" anyway.  More of a symptom of taking a shallow syntactic approach
to a new (to them) language.

For non-Lisp heads, I think it's more confusing in the end, blurring the
distinction between stmts and expressions ("the body of a lambda must be an
expression" ... "ok, i lied, unless it's a 'return' stmt).  If Guido had it
to do over again, I vote he rejects the original patch <wink>.  Short of
that, would have been better if the lambda arglist required parens, and if
the body were required to be a single return stmt (that would sure end the
"lambda x: print x" FAQ -- few would *expect* "return print x" to work!).

hindsight-is-great<wink>-ly y'rs  - tim


From tim_one at email.msn.com  Mon Mar  6 10:09:45 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Mon, 6 Mar 2000 04:09:45 -0500
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
In-Reply-To: <200003051346.IAA05539@eric.cnri.reston.va.us>
Message-ID: <000b01bf874b$b6fe9da0$ea2d153f@tim>

[Guido]
> I'm beginning to believe that handing cycles with finalizers to the
> user is better than calling __del__ with a different meaning,

You won't be sorry:  Python has the chance to be the first language that's
both useful and sane here!

> and I tentatively withdraw my proposal to change the rules for when
> __del__is called (even when __init__ fails; I haven't had any complaints
> about that either).

Well, everyone liked the parenthetical half of that proposal, although
Jack's example did  point out a real surprise with it.

> There seem to be two competing suggestions for solutions: (1) call
> some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the
> object; (2) Tim's proposal of an interface to ask the garbage
> collector for a trash cycle with a finalizer (or for an object with a
> finalizer in a trash cycle?).

Or a maximal strongly-connected component, or *something* -- unsure.

> Somehow Tim's version looks less helpful to me, because it *seems*
> that whoever gets to handle the cycle (the main code of the program?)
> isn't necessarily responsible for creating it (some library you didn't
> even know was used under the covers of some other library you called).

Yes, to me too.  This is the Scheme "guardian" idea in a crippled form
(Scheme supports as many distinct guardians as the programmer cares to
create), and even in its full-blown form it supplies "a perfectly general
mechanism with no policy whatsoever".

Greg convinced me (although I haven't admitted this yet <wink>) that "no
policy whatsoever" is un-Pythonic too.  *Some* policy is helpful, so I won't
be pushing the guardian idea any more (although see immediately below for an
immediate backstep on that <wink>).

> ...
> What keeps nagging me though is what to do when there's a finalizer
> but no cleanup method.  I guess the trash cycle remains alive.  Is
> this acceptable?  (I guess so, because we've given the programmer a
> way to resolve the trash: provide a cleanup method.)

BDW considers it better to leak than to risk doing a wrong thing, and I
agree wholeheartedly with that.  GC is one place you want to have a "100%
language".

This is where something like a guardian can remain useful:  while leaking is
OK because you've given them an easy & principled alternative, leaking
without giving them a clear way to *know* about it is not OK.  If gc pushes
the leaked stuff off to the side, the gc module should (say) supply an entry
point that returns all the leaked stuff in a list.  Then users can *know*
they're leaking, know how badly they're leaking, and examine exactly the
objects that are leaking.  Then they've got the info they need to repair
their program (or at least track down the 3rd-party module that's leaking).
As with a guardian, they *could* also build a reclamation scheme on top of
it, but that would no longer be the main (or even an encouraged) thrust.

> If we detect individual cycles (the current algorithm doesn't do that
> yet, though it seems easy enough to do another scan), could we
> special-case cycles with only one finalizer and no cleaner-upper?
> (I'm tempted to call the finalizer because it seems little harm can be
> done -- but then of course there's the problem of the finalizer being
> called again when the refcount really goes to zero. :-( )

"Better safe than sorry" is my immediate view on this -- you can't know that
the finalizer won't resurrect the cycle, and "finalizer called iff refcount
hits 0" is a wonderfully simple & predictable rule.  That's worth a lot to
preserve, unless & until it proves to be a disaster in practice.


As to the details of cleanup, I haven't succeeded in making the time to
understand all the proposals.  But I've done my primary job here if I've
harassed everyone into not repeating the same mistakes all previous
languages have made <0.9 wink>.

> ...
> I wish we didn't have to special-case finalizers on class instances
> (since each dealloc function is potentially a combination of a
> finalizer and a deallocation routine), but the truth is that they
> *are* special -- __del__ has no responsibility for deallocating
> memory, only for deallocating external resources (such as temp files).

And the problem is that __del__ can do anything whatsoever than can be
expressed in Python, so there's not a chance in hell of outguessing it.

> ...
> Another practical consideration is that now there are cycles of the form
>
> <function object> <=> <module dict>
>
> which suggests that we should make function objects traceable.  Also,
> modules can cross-reference, so module objects should be made
> traceable.  I don't think that this will grow the sets of traced
> objects by too much (since the dicts involved are already traced, and
> a typical program has way fewer functions and modules than it has
> class instances).  On the other hand, we may also have to trace
> (un)bound method objects, and these may be tricky because they are
> allocated and deallocated at high rates (once per typical method
> call).

This relates to what I was trying to get at with my response to your gc
implementation sketch:  mark-&-sweep needs to chase *everything*, so the set
of chased types is maximal from the start.  Adding chased types to the
"indirectly infer what's unreachable via accounting for internal refcounts
within the transitive closure" scheme can end up touching nearly as much as
a full M-&-S pass per invocation.  I don't know where the break-even point
is, but the more stuff you chase in the latter scheme the less often you
want to run it.

About high rates, so long as a doubly-linked list allows efficient removal
of stuff that dies via refcount exhaustion, you won't actually *chase* many
bound method objects (i.e.,  they'll usually go away by themselves).

Note in passing that bound method objects often showed up in cycles in IDLE,
although you usually managed to break those in other ways.

> Back to the drawing board...

Good!  That means you're making real progress <wink>.

glad-someone-is-ly y'rs  - tim


From mal at lemburg.com  Mon Mar  6 11:01:31 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 06 Mar 2000 11:01:31 +0100
Subject: [Python-Dev] Design question: call __del__ for cyclical garbage?
References: <200003031650.LAA21647@eric.cnri.reston.va.us> <000001bf857a$60b45ac0$c6a0143f@tim> <20000305172754.A14998@acs.ucalgary.ca>
Message-ID: <38C381FB.E222D6E4@lemburg.com>

nascheme at enme.ucalgary.ca wrote:
> 
> On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote:
> > So here's what I'd consider doing:  explicit is better than implicit, and in
> > the face of ambiguity refuse the temptation to guess.
> 
> I like Marc's suggestion.  Here is my proposal:
> 
> Allow classes to have a new method, __cleanup__ or whatever you
> want to call it.  When tp_clear is called for an instance, it
> checks for this method.  If it exists, call it, otherwise delete
> the container objects from the instance's dictionary.  When
> collecting cycles, call tp_clear for instances first.
> 
> Its simple and allows the programmer to cleanly break cycles if
> they insist on creating them and using __del__ methods.

Right :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Mon Mar  6 12:57:29 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 06 Mar 2000 12:57:29 +0100
Subject: [Python-Dev] Unicode character property methods
Message-ID: <38C39D29.A29CE67F@lemburg.com>

As you may have noticed, the Unicode objects provide
new methods .islower(), .isupper() and .istitle(). Finn Bock
mentioned that Java also provides .isdigit() and .isspace().

Question: should Unicode also provide these character
property methods: .isdigit(), .isnumeric(), .isdecimal()
and .isspace() ? Plus maybe .digit(), .numeric() and
.decimal() for the corresponding decoding ?

Similar APIs are already available through the unicodedata
module, but could easily be moved to the Unicode object
(they cause the builtin interpreter to grow a bit in size 
due to the new mapping tables).

BTW, string.atoi et al. are currently not mapped to
string methods... should they be ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Mon Mar  6 14:29:04 2000
From: guido at python.org (Guido van Rossum)
Date: Mon, 06 Mar 2000 08:29:04 -0500
Subject: [Python-Dev] Unicode character property methods
In-Reply-To: Your message of "Mon, 06 Mar 2000 12:57:29 +0100."
             <38C39D29.A29CE67F@lemburg.com> 
References: <38C39D29.A29CE67F@lemburg.com> 
Message-ID: <200003061329.IAA09529@eric.cnri.reston.va.us>

> As you may have noticed, the Unicode objects provide
> new methods .islower(), .isupper() and .istitle(). Finn Bock
> mentioned that Java also provides .isdigit() and .isspace().
> 
> Question: should Unicode also provide these character
> property methods: .isdigit(), .isnumeric(), .isdecimal()
> and .isspace() ? Plus maybe .digit(), .numeric() and
> .decimal() for the corresponding decoding ?

What would be the difference between isdigit, isnumeric, isdecimal?
I'd say don't do more than Java.  I don't understand what the
"corresponding decoding" refers to.  What would "3".decimal() return?

> Similar APIs are already available through the unicodedata
> module, but could easily be moved to the Unicode object
> (they cause the builtin interpreter to grow a bit in size 
> due to the new mapping tables).
> 
> BTW, string.atoi et al. are currently not mapped to
> string methods... should they be ?

They are mapped to int() c.s.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at acm.org  Mon Mar  6 16:09:55 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 6 Mar 2000 10:09:55 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <200003051204.HAA05367@eric.cnri.reston.va.us>
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us>
	<Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org>
	<14528.18324.283508.577221@bitdiddle.cnri.reston.va.us>
	<14529.55983.263225.691427@weyr.cnri.reston.va.us>
	<200003051204.HAA05367@eric.cnri.reston.va.us>
Message-ID: <14531.51779.650532.881626@weyr.cnri.reston.va.us>

Guido van Rossum writes:
 > - You could put it all in ConfigParser.py but with new classnames.
 > (Not sure though, since the ConfigParser class, which is really a
 > kind of weird variant, will be assumed to be the main class because
 > its name is that of the module.)

  The ConfigParser class could be clearly marked as deprecated both in 
the source/docstring and in the documentation.  But the class itself
should not be used in any way.

 > - Variants on the syntax could be given through some kind of option
 > system rather than through subclassing -- they should be combinable
 > independently.  Som possible options (maybe I'm going overboard here)
 > could be:

  Yes, you are going overboard.  It should contain exactly what's
right for .ini files, and that's it.
  There are really three aspects to the beast: reading, using, and
writing.  I think there should be a class which does the right thing
for using the informatin in the file, and reading & writing can be
handled through functions or helper classes.  That separates the
parsing issues from the use issues, and alternate syntaxes will be
easy enough to implement by subclassing the helper or writing a new
function.  An "editable" version that allows loading & saving without
throwing away comments, ordering, etc. would require a largely
separate implementation of all three aspects (or at least the reader
and writer).

 > (Well maybe the whole substitution thing should really be done through
 > a subclass -- it's too weird for normal use.)

  That and the ad hoc syntax are my biggest beefs with ConfigParser.
But it can easily be added by a subclass as long as the method to
override is clearly specified in the documenation (it should only
require one!).


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake at acm.org  Mon Mar  6 18:47:44 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 6 Mar 2000 12:47:44 -0500 (EST)
Subject: [Python-Dev] PyBufferProcs
Message-ID: <14531.61248.941076.803617@weyr.cnri.reston.va.us>

  While working on the documentation, I've noticed a naming
inconsistency regarding PyBufferProcs; it's peers are all named
Py*Methods (PySequenceMethods, PyNumberMethods, etc.).
  I'd like to propose that a synonym, PyBufferMethods, be made for
PyBufferProcs, and use that in the core implementations and the
documentation.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From jeremy at cnri.reston.va.us  Mon Mar  6 20:28:12 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Mon, 6 Mar 2000 14:28:12 -0500 (EST)
Subject: [Python-Dev] example checkers based on compiler package
Message-ID: <14532.1740.90292.440395@goon.cnri.reston.va.us>

There was some discussion on python-dev over the weekend about
generating warnings, and Moshe Zadke posted a selfnanny that warned
about methods that didn't have self as the first argument.

I think these kinds of warnings are useful, and I'd like to see a more
general framework for them built are Python abstract syntax originally
from P2C.  Ideally, they would be available as command line tools and
integrated into GUIs like IDLE in some useful way.

I've included a couple of quick examples I coded up last night based
on the compiler package (recently re-factored) that is resident in
python/nondist/src/Compiler.  The analysis on the one that checks for
name errors is a bit of a mess, but the overall structure seems right.

I'm hoping to collect a few more examples of checkers and generalize
from them to develop a framework for checking for errors and reporting
them.

Jeremy

------------ checkself.py ------------
"""Check for methods that do not have self as the first argument"""

from compiler import parseFile, walk, ast, misc

class Warning:
    def __init__(self, filename, klass, method, lineno, msg):
        self.filename = filename
        self.klass = klass
        self.method = method
        self.lineno = lineno
        self.msg = msg

    _template = "%(filename)s:%(lineno)s %(klass)s.%(method)s: %(msg)s"

    def __str__(self):
        return  self._template % self.__dict__

class NoArgsWarning(Warning):
    super_init = Warning.__init__
    
    def __init__(self, filename, klass, method, lineno):
        self.super_init(filename, klass, method, lineno,
                        "no arguments")

class NotSelfWarning(Warning):
    super_init = Warning.__init__
    
    def __init__(self, filename, klass, method, lineno, argname):
        self.super_init(filename, klass, method, lineno,
                        "self slot is named %s" % argname)

class CheckSelf:
    def __init__(self, filename):
        self.filename = filename
        self.warnings = []
        self.scope = misc.Stack()

    def inClass(self):
        if self.scope:
            return isinstance(self.scope.top(), ast.Class)
        return 0        

    def visitClass(self, klass):
        self.scope.push(klass)
        self.visit(klass.code)
        self.scope.pop()
        return 1

    def visitFunction(self, func):
        if self.inClass():
            classname = self.scope.top().name
            if len(func.argnames) == 0:
                w = NoArgsWarning(self.filename, classname, func.name,
                                  func.lineno)
                self.warnings.append(w)
            elif func.argnames[0] != "self":
                w = NotSelfWarning(self.filename, classname, func.name,
                                   func.lineno, func.argnames[0])
                self.warnings.append(w)
        self.scope.push(func)
        self.visit(func.code)
        self.scope.pop()
        return 1

def check(filename):
    global p, check
    p = parseFile(filename)
    check = CheckSelf(filename)
    walk(p, check)
    for w in check.warnings:
        print w

if __name__ == "__main__":
    import sys

    # XXX need to do real arg processing
    check(sys.argv[1])

------------ badself.py ------------
def foo():
    return 12

class Foo:
    def __init__():
        pass

    def foo(self, foo):
        pass

    def bar(this, that):
        def baz(this=that):
            return this
        return baz

def bar():
    class Quux:
        def __init__(self):
            self.sum = 1
        def quam(x, y):
            self.sum = self.sum + (x * y)
    return Quux()

------------ checknames.py ------------
"""Check for NameErrors"""

from compiler import parseFile, walk
from compiler.misc import Stack, Set

import __builtin__
from UserDict import UserDict

class Warning:
    def __init__(self, filename, funcname, lineno):
        self.filename = filename
        self.funcname = funcname
        self.lineno = lineno

    def __str__(self):
        return self._template % self.__dict__

class UndefinedLocal(Warning):
    super_init = Warning.__init__
    
    def __init__(self, filename, funcname, lineno, name):
        self.super_init(filename, funcname, lineno)
        self.name = name

    _template = "%(filename)s:%(lineno)s  %(funcname)s undefined local %(name)s"

class NameError(UndefinedLocal):
    _template = "%(filename)s:%(lineno)s  %(funcname)s undefined name %(name)s"

class NameSet(UserDict):
    """Track names and the line numbers where they are referenced"""
    def __init__(self):
        self.data = self.names = {}

    def add(self, name, lineno):
        l = self.names.get(name, [])
        l.append(lineno)
        self.names[name] = l

class CheckNames:
    def __init__(self, filename):
        self.filename = filename
        self.warnings = []
        self.scope = Stack()
        self.gUse = NameSet()
        self.gDef = NameSet()
        # _locals is the stack of local namespaces
        # locals is the top of the stack
        self._locals = Stack()
        self.lUse = None
        self.lDef = None
        self.lGlobals = None # var declared global
        # holds scope,def,use,global triples for later analysis
        self.todo = []

    def enterNamespace(self, node):
##        print node.name
        self.scope.push(node)
        self.lUse = use = NameSet()
        self.lDef = _def = NameSet()
        self.lGlobals = gbl = NameSet()
        self._locals.push((use, _def, gbl))

    def exitNamespace(self):
##        print
        self.todo.append((self.scope.top(), self.lDef, self.lUse,
                          self.lGlobals))
        self.scope.pop()
        self._locals.pop()
        if self._locals:
            self.lUse, self.lDef, self.lGlobals = self._locals.top()
        else:
            self.lUse = self.lDef = self.lGlobals = None

    def warn(self, warning, funcname, lineno, *args):
        args = (self.filename, funcname, lineno) + args
        self.warnings.append(apply(warning, args))

    def defName(self, name, lineno, local=1):
##        print "defName(%s, %s, local=%s)" % (name, lineno, local)
        if self.lUse is None:
            self.gDef.add(name, lineno)
        elif local == 0:
            self.gDef.add(name, lineno)
            self.lGlobals.add(name, lineno)
        else:
            self.lDef.add(name, lineno)

    def useName(self, name, lineno, local=1):
##        print "useName(%s, %s, local=%s)" % (name, lineno, local)
        if self.lUse is None:
            self.gUse.add(name, lineno)
        elif local == 0:
            self.gUse.add(name, lineno)
            self.lUse.add(name, lineno)            
        else:
            self.lUse.add(name, lineno)

    def check(self):
        for s, d, u, g in self.todo:
            self._check(s, d, u, g, self.gDef)
        # XXX then check the globals

    def _check(self, scope, _def, use, gbl, globals):
        # check for NameError
        # a name is defined iff it is in def.keys()
        # a name is global iff it is in gdefs.keys()
        gdefs = UserDict()
        gdefs.update(globals)
        gdefs.update(__builtin__.__dict__)
        defs = UserDict()
        defs.update(gdefs)
        defs.update(_def)
        errors = Set()
        for name in use.keys():
            if not defs.has_key(name):
                firstuse = use[name][0]
                self.warn(NameError, scope.name, firstuse, name)
                errors.add(name)

        # check for UndefinedLocalNameError
        # order == use & def sorted by lineno
        # elements are lineno, flag, name
        # flag = 0 if use, flag = 1 if def
        order = []
        for name, lines in use.items():
            if gdefs.has_key(name) and not _def.has_key(name):
                # this is a global ref, we can skip it
                continue
            for lineno in lines:
                order.append(lineno, 0, name)
        for name, lines in _def.items():
            for lineno in lines:
                order.append(lineno, 1, name)
        order.sort()
        # ready contains names that have been defined or warned about
        ready = Set()
        for lineno, flag, name in order:
            if flag == 0: # use
                if not ready.has_elt(name) and not errors.has_elt(name):
                    self.warn(UndefinedLocal, scope.name, lineno, name)
                    ready.add(name) # don't warn again
            else:
                ready.add(name)

    # below are visitor methods
        

    def visitFunction(self, node, noname=0):
        for expr in node.defaults:
            self.visit(expr)
        if not noname:
            self.defName(node.name, node.lineno)
        self.enterNamespace(node)
        for name in node.argnames:
            self.defName(name, node.lineno)
        self.visit(node.code)
        self.exitNamespace()
        return 1

    def visitLambda(self, node):
        return self.visitFunction(node, noname=1)

    def visitClass(self, node):
        for expr in node.bases:
            self.visit(expr)
        self.defName(node.name, node.lineno)
        self.enterNamespace(node)
        self.visit(node.code)
        self.exitNamespace()
        return 1

    def visitName(self, node):
        self.useName(node.name, node.lineno)

    def visitGlobal(self, node):
        for name in node.names:
            self.defName(name, node.lineno, local=0)

    def visitImport(self, node):
        for name in node.names:
            self.defName(name, node.lineno)

    visitFrom = visitImport

    def visitAssName(self, node):
        self.defName(node.name, node.lineno)
    
def check(filename):
    global p, checker
    p = parseFile(filename)
    checker = CheckNames(filename)
    walk(p, checker)
    checker.check()
    for w in checker.warnings:
        print w

if __name__ == "__main__":
    import sys

    # XXX need to do real arg processing
    check(sys.argv[1])

------------ badnames.py ------------
# XXX can we detect race conditions on accesses to global variables?
#     probably can (conservatively) by noting variables _created_ by
#     global decls in funcs
import string
import time

def foo(x):
    return x + y

def foo2(x):
    return x + z

a = 4

def foo3(x):
    a, b = x, a

def bar(x):
    z = x
    global z

def bar2(x):
    f = string.strip
    a = f(x)
    import string
    return string.lower(a)

def baz(x, y):
    return x + y + z

def outer(x):
    def inner(y):
        return x + y
    return inner


From gstein at lyra.org  Mon Mar  6 22:09:33 2000
From: gstein at lyra.org (Greg Stein)
Date: Mon, 6 Mar 2000 13:09:33 -0800 (PST)
Subject: [Python-Dev] PyBufferProcs
In-Reply-To: <14531.61248.941076.803617@weyr.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003061122120.17063-100000@nebula.lyra.org>

On Mon, 6 Mar 2000, Fred L. Drake, Jr. wrote:
>   While working on the documentation, I've noticed a naming
> inconsistency regarding PyBufferProcs; it's peers are all named
> Py*Methods (PySequenceMethods, PyNumberMethods, etc.).
>   I'd like to propose that a synonym, PyBufferMethods, be made for
> PyBufferProcs, and use that in the core implementations and the
> documentation.

+0

Although.. I might say that it should be renamed, and a synonym (#define
or typedef?) be provided for the old name.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Mon Mar  6 23:04:14 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 06 Mar 2000 23:04:14 +0100
Subject: [Python-Dev] Unicode character property methods
References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us>
Message-ID: <38C42B5E.42801755@lemburg.com>

Guido van Rossum wrote:
> 
> > As you may have noticed, the Unicode objects provide
> > new methods .islower(), .isupper() and .istitle(). Finn Bock
> > mentioned that Java also provides .isdigit() and .isspace().
> >
> > Question: should Unicode also provide these character
> > property methods: .isdigit(), .isnumeric(), .isdecimal()
> > and .isspace() ? Plus maybe .digit(), .numeric() and
> > .decimal() for the corresponding decoding ?
> 
> What would be the difference between isdigit, isnumeric, isdecimal?
> I'd say don't do more than Java.  I don't understand what the
> "corresponding decoding" refers to.  What would "3".decimal() return?

These originate in the Unicode database; see

ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html

Here are the descriptions:

"""
6
      Decimal digit value
                        normative
                                     This is a numeric field. If the
                                     character has the decimal digit
                                     property, as specified in Chapter
                                     4 of the Unicode Standard, the
                                     value of that digit is represented
                                     with an integer value in this field
   7
      Digit value
                        normative
                                     This is a numeric field. If the
                                     character represents a digit, not
                                     necessarily a decimal digit, the
                                     value is here. This covers digits
                                     which do not form decimal radix
                                     forms, such as the compatibility
                                     superscript digits
   8
      Numeric value
                        normative
                                     This is a numeric field. If the
                                     character has the numeric
                                     property, as specified in Chapter
                                     4 of the Unicode Standard, the
                                     value of that character is
                                     represented with an integer or
                                     rational number in this field. This
                                     includes fractions as, e.g., "1/5" for
                                     U+2155 VULGAR FRACTION
                                     ONE FIFTH Also included are
                                     numerical values for compatibility
                                     characters such as circled
                                     numbers.

u"3".decimal() would return 3. u"\u2155".

Some more examples from the unicodedata module (which makes
all fields of the database available in Python):

>>> unicodedata.decimal(u"3")
3
>>> unicodedata.decimal(u"?")
2
>>> unicodedata.digit(u"?")
2
>>> unicodedata.numeric(u"?")
2.0
>>> unicodedata.numeric(u"\u2155")
0.2
>>> unicodedata.numeric(u'\u215b')
0.125

> > Similar APIs are already available through the unicodedata
> > module, but could easily be moved to the Unicode object
> > (they cause the builtin interpreter to grow a bit in size
> > due to the new mapping tables).
> >
> > BTW, string.atoi et al. are currently not mapped to
> > string methods... should they be ?
> 
> They are mapped to int() c.s.

Hmm, I just noticed that int() et friends don't like
Unicode... shouldn't they use the "t" parser marker 
instead of requiring a string or tp_int compatible
type ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Tue Mar  7 00:12:33 2000
From: guido at python.org (Guido van Rossum)
Date: Mon, 06 Mar 2000 18:12:33 -0500
Subject: [Python-Dev] Unicode character property methods
In-Reply-To: Your message of "Mon, 06 Mar 2000 23:04:14 +0100."
             <38C42B5E.42801755@lemburg.com> 
References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us>  
            <38C42B5E.42801755@lemburg.com> 
Message-ID: <200003062312.SAA11697@eric.cnri.reston.va.us>

[MAL]
> > > As you may have noticed, the Unicode objects provide
> > > new methods .islower(), .isupper() and .istitle(). Finn Bock
> > > mentioned that Java also provides .isdigit() and .isspace().
> > >
> > > Question: should Unicode also provide these character
> > > property methods: .isdigit(), .isnumeric(), .isdecimal()
> > > and .isspace() ? Plus maybe .digit(), .numeric() and
> > > .decimal() for the corresponding decoding ?

[Guido]
> > What would be the difference between isdigit, isnumeric, isdecimal?
> > I'd say don't do more than Java.  I don't understand what the
> > "corresponding decoding" refers to.  What would "3".decimal() return?

[MAL]
> These originate in the Unicode database; see
> 
> ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html
> 
> Here are the descriptions:
> 
> """
> 6
>       Decimal digit value
>                         normative
>                                      This is a numeric field. If the
>                                      character has the decimal digit
>                                      property, as specified in Chapter
>                                      4 of the Unicode Standard, the
>                                      value of that digit is represented
>                                      with an integer value in this field
>    7
>       Digit value
>                         normative
>                                      This is a numeric field. If the
>                                      character represents a digit, not
>                                      necessarily a decimal digit, the
>                                      value is here. This covers digits
>                                      which do not form decimal radix
>                                      forms, such as the compatibility
>                                      superscript digits
>    8
>       Numeric value
>                         normative
>                                      This is a numeric field. If the
>                                      character has the numeric
>                                      property, as specified in Chapter
>                                      4 of the Unicode Standard, the
>                                      value of that character is
>                                      represented with an integer or
>                                      rational number in this field. This
>                                      includes fractions as, e.g., "1/5" for
>                                      U+2155 VULGAR FRACTION
>                                      ONE FIFTH Also included are
>                                      numerical values for compatibility
>                                      characters such as circled
>                                      numbers.
> 
> u"3".decimal() would return 3. u"\u2155".
> 
> Some more examples from the unicodedata module (which makes
> all fields of the database available in Python):
> 
> >>> unicodedata.decimal(u"3")
> 3
> >>> unicodedata.decimal(u"?")
> 2
> >>> unicodedata.digit(u"?")
> 2
> >>> unicodedata.numeric(u"?")
> 2.0
> >>> unicodedata.numeric(u"\u2155")
> 0.2
> >>> unicodedata.numeric(u'\u215b')
> 0.125

Hm, very Unicode centric.  Probably best left out of the general
string methods.  Isspace() seems useful, and an isdigit() that is only
true for ASCII '0' - '9' also makes sense.

What about "123".isdigit()?  What does Java say?  Or do these only
apply to single chars there?  I think "123".isdigit() should be true
if "abc".islower() is true.

> > > Similar APIs are already available through the unicodedata
> > > module, but could easily be moved to the Unicode object
> > > (they cause the builtin interpreter to grow a bit in size
> > > due to the new mapping tables).
> > >
> > > BTW, string.atoi et al. are currently not mapped to
> > > string methods... should they be ?
> > 
> > They are mapped to int() c.s.
> 
> Hmm, I just noticed that int() et friends don't like
> Unicode... shouldn't they use the "t" parser marker 
> instead of requiring a string or tp_int compatible
> type ?

Good catch.  Go ahead.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From moshez at math.huji.ac.il  Tue Mar  7 06:25:43 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 7 Mar 2000 07:25:43 +0200 (IST)
Subject: [Python-Dev] Re: example checkers based on compiler package
In-Reply-To: <14532.1740.90292.440395@goon.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003070712480.4496-100000@sundial>

On Mon, 6 Mar 2000, Jeremy Hylton wrote:

> I think these kinds of warnings are useful, and I'd like to see a more
> general framework for them built are Python abstract syntax originally
> from P2C.  Ideally, they would be available as command line tools and
> integrated into GUIs like IDLE in some useful way.

Yes! Guido already suggested we have a standard API to them. One thing
I suggested was that the abstract API include not only the input (one form
or another of an AST), but the output: so IDE's wouldn't have to parse
strings, but get a warning class. Something like a:

An output of a warning can be a subclass of GeneralWarning, and should
implemented the following methods:

	1. line-no() -- returns an integer
	2. columns() -- returns either a pair of integers, or None
        3. message() -- returns a string containing a message
	4. __str__() -- comes for free if inheriting GeneralWarning,
	                and formats the warning message.

> I've included a couple of quick examples I coded up last night based
> on the compiler package (recently re-factored) that is resident in
> python/nondist/src/Compiler.  The analysis on the one that checks for
> name errors is a bit of a mess, but the overall structure seems right.

One thing I had trouble with is that in my implementation of selfnanny,
I used Python's stack for recursion while you used an explicit stack.
It's probably because of the visitor pattern, which is just another
argument for co-routines and generators.

> I'm hoping to collect a few more examples of checkers and generalize
> from them to develop a framework for checking for errors and reporting
> them.

Cool! 
Brainstorming: what kind of warnings would people find useful? In
selfnanny, I wanted to include checking for assigment to self, and
checking for "possible use before definition of local variables" sounds
good. Another check could be a CP4E "checking that no two identifiers
differ only by case". I might code up a few if I have the time...

What I'd really want (but it sounds really hard) is a framework for
partial ASTs: warning people as they write code.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From mwh21 at cam.ac.uk  Tue Mar  7 09:31:23 2000
From: mwh21 at cam.ac.uk (Michael Hudson)
Date: 07 Mar 2000 08:31:23 +0000
Subject: [Python-Dev] Re: [Compiler-sig] Re: example checkers based on compiler package
In-Reply-To: Moshe Zadka's message of "Tue, 7 Mar 2000 07:25:43 +0200 (IST)"
References: <Pine.GSO.4.10.10003070712480.4496-100000@sundial>
Message-ID: <m3u2ij89lw.fsf@atrus.jesus.cam.ac.uk>

Moshe Zadka <moshez at math.huji.ac.il> writes:

> On Mon, 6 Mar 2000, Jeremy Hylton wrote:
> 
> > I think these kinds of warnings are useful, and I'd like to see a more
> > general framework for them built are Python abstract syntax originally
> > from P2C.  Ideally, they would be available as command line tools and
> > integrated into GUIs like IDLE in some useful way.
> 
> Yes! Guido already suggested we have a standard API to them. One thing
> I suggested was that the abstract API include not only the input (one form
> or another of an AST), but the output: so IDE's wouldn't have to parse
> strings, but get a warning class. 

That would be seriously cool.

> Something like a:
> 
> An output of a warning can be a subclass of GeneralWarning, and should
> implemented the following methods:
> 
> 	1. line-no() -- returns an integer
> 	2. columns() -- returns either a pair of integers, or None
>         3. message() -- returns a string containing a message
> 	4. __str__() -- comes for free if inheriting GeneralWarning,
> 	                and formats the warning message.

Wouldn't it make sense to include function/class name here too?  A
checker is likely to now, and it would save reparsing to find it out.

[little snip]
 
> > I'm hoping to collect a few more examples of checkers and generalize
> > from them to develop a framework for checking for errors and reporting
> > them.
> 
> Cool! 
> Brainstorming: what kind of warnings would people find useful? In
> selfnanny, I wanted to include checking for assigment to self, and
> checking for "possible use before definition of local variables" sounds
> good. Another check could be a CP4E "checking that no two identifiers
> differ only by case". I might code up a few if I have the time...

Is there stuff in the current Compiler code to do control flow
analysis?  You'd need that to check for use before definition in
meaningful cases, and also if you ever want to do any optimisation...

> What I'd really want (but it sounds really hard) is a framework for
> partial ASTs: warning people as they write code.

I agree (on both points).

Cheers,
M.

-- 
very few people approach me in real life and insist on proving they are
drooling idiots.                         -- Erik Naggum, comp.lang.lisp


From mal at lemburg.com  Tue Mar  7 10:14:25 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 10:14:25 +0100
Subject: [Python-Dev] Unicode character property methods
References: <38C39D29.A29CE67F@lemburg.com> <200003061329.IAA09529@eric.cnri.reston.va.us>  
	            <38C42B5E.42801755@lemburg.com> <200003062312.SAA11697@eric.cnri.reston.va.us>
Message-ID: <38C4C871.F47E17A3@lemburg.com>

Guido van Rossum wrote:
> [MAL about adding .isdecimal(), .isdigit() and .isnumeric()]
> > Some more examples from the unicodedata module (which makes
> > all fields of the database available in Python):
> >
> > >>> unicodedata.decimal(u"3")
> > 3
> > >>> unicodedata.decimal(u"?")
> > 2
> > >>> unicodedata.digit(u"?")
> > 2
> > >>> unicodedata.numeric(u"?")
> > 2.0
> > >>> unicodedata.numeric(u"\u2155")
> > 0.2
> > >>> unicodedata.numeric(u'\u215b')
> > 0.125
> 
> Hm, very Unicode centric.  Probably best left out of the general
> string methods.  Isspace() seems useful, and an isdigit() that is only
> true for ASCII '0' - '9' also makes sense.

Well, how about having all three on Unicode objects
and only .isdigit() on string objects ?
 
> What about "123".isdigit()?  What does Java say?  Or do these only
> apply to single chars there?  I think "123".isdigit() should be true
> if "abc".islower() is true.

In the current uPython implementation u"123".isdigit() is true;
same for the other two methods.
 
> > > > Similar APIs are already available through the unicodedata
> > > > module, but could easily be moved to the Unicode object
> > > > (they cause the builtin interpreter to grow a bit in size
> > > > due to the new mapping tables).
> > > >
> > > > BTW, string.atoi et al. are currently not mapped to
> > > > string methods... should they be ?
> > >
> > > They are mapped to int() c.s.
> >
> > Hmm, I just noticed that int() et friends don't like
> > Unicode... shouldn't they use the "t" parser marker
> > instead of requiring a string or tp_int compatible
> > type ?
> 
> Good catch.  Go ahead.

Done. float(), int() and long() now accept charbuf
compatible objects as argument.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Tue Mar  7 10:23:35 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 10:23:35 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
Message-ID: <38C4CA97.5D0AA9D@lemburg.com>

Before starting to code away, I would like to know which
of the new Unicode methods should also be available on
string objects.

Here are the currently available methods:

Unicode objects      string objects
------------------------------------
capitalize           capitalize
center              
count                count
encode              
endswith             endswith
expandtabs          
find                 find
index                index
isdecimal           
isdigit             
islower             
isnumeric           
isspace             
istitle             
isupper             
join                 join
ljust               
lower                lower
lstrip               lstrip
replace              replace
rfind                rfind
rindex               rindex
rjust               
rstrip               rstrip
split                split
splitlines          
startswith           startswith
strip                strip
swapcase             swapcase
title                title
translate            translate (*)
upper                upper
zfill               

(*) The two hvae slightly different implementations, e.g.
deletions are handled differently.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fredrik at pythonware.com  Tue Mar  7 12:54:56 2000
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 7 Mar 2000 12:54:56 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
References: <38C4CA97.5D0AA9D@lemburg.com>
Message-ID: <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com>

> Unicode objects      string objects
> expandtabs          

yes.

I'm pretty sure there's "expandtabs" code in the
strop module.  maybe barry missed it?

> center
> ljust
> rjust              

probably.

the implementation is trivial, and ljust/rjust are
somewhat useful, so you might as well add them
all (just cut and paste from the unicode class).

what about rguido and lguido, btw?

> zfill               

no.

</F>


From guido at python.org  Tue Mar  7 14:52:00 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 07 Mar 2000 08:52:00 -0500
Subject: [Python-Dev] finalization again
Message-ID: <200003071352.IAA13571@eric.cnri.reston.va.us>

Warning: long message.  If you're not interested in reading all this,
please skip to "Conclusion" at the end.

At Tim's recommendation I had a look at what section 12.6 of the Java
language spec says about finalizers. The stuff there is sure seductive
for language designers...

Have a look at te diagram at
http://java.sun.com/docs/books/jls/html/12.doc.html#48746. In all its
(seeming) complexity, it helped me understand some of the issues of
finalization better. Rather than the complex 8-state state machine
that it appears to be, think of it as a simple 3x3 table. The three
rows represent the categories reachable, finalizer-reachable
(abbreviated in the diagram as f-reachable), and unreachable. These
categories correspond directly to categories of objects that the
Schemenauer-Tiedemann cycle-reclamation scheme deals with: after
moving all the reachable objects to the second list (first the roots
and then the objects reachable from the roots), the first list is left
with the unreachable and finalizer-reachable objects.

If we want to distinguish between unreachable and finalizer-reachable
at this point, a straightforward application of the same algorithm
will work well: Create a third list (this will contain the
finalizer-reachable objects). Start by filling it with all the objects
from the first list (which contains the potential garbage at this
point) that have a finalizer. We can look for objects that have
__del__ or __clean__ or for which tp_clean(CARE_EXEC)==true, it
doesn't matter here.(*) Then walk through the third list, following
each object's references, and move all referenced objects that are
still in the first list to the third list. Now, we have:

List 1: truly unreachable objects. These have no finalizers and can be
discarded right away.

List 2: truly reachable objects. (Roots and objects reachable from
roots.) Leave them alone.

List 3: finalizer-reachable objects. This contains objects that are
unreachable but have a finalizer, and objects that are only reachable
through those.

We now have to decide on a policy for invoking finalizers. Java
suggests the following: Remember the "roots" of the third list -- the
nodes that were moved there directly from the first list because they
have a finalizer. These objects are marked *finalizable* (a category
corresponding to the second *column* of the Java diagram). The Java
spec allows the Java garbage collector to call all of these finalizers
in any order -- even simultaneously in separate threads. Java never
allows an object to go back from the finalizable to the unfinalized
state (there are no arrows pointing left in the diagram). The first
finalizer that is called could make its object reachable again (up
arrow), thereby possibly making other finalizable objects reachable
too. But this does not cancel their scheduled finalization! The
conclusion is that Java can sometimes call finalization on unreachable
objects -- but only if those objects have gone through a phase in
their life where they were unreachable or at least
finalizer-unreachable.

I agree that this is the best that Java can do: if there are cycles
containing multiple objects with finalizers, there is no way (short of
asking the programmer(s)) to decide which object to finalize first. We
could pick one at random, run its finalizer, and start garbage
collection all over -- if the finalizer doesn't resurrect anything,
this will give us the same set of unreachable objects, from which we
could pick the next finalizable object, and so on. That looks very
inefficient, might not terminate (the same object could repeatedly
show up as the candidate for finalization), and it's still arbitrary:
the programmer(s) still can't predict which finalizer in a cycle with
multiple finalizers will be called first. Assuming the recommended
characteristics of finalizers (brief and robust), it won't make much
difference if we call all finalizers (of the now-finalizeable objects)
"without looking back". Sure, some objects may find themselves in a
perfectly reachable position with their finalizer called -- but they
did go through a "near-death experience". I don't find this
objectionable, and I don't see how Java could possibly do better for
cycles with multiple finalizers.

Now let's look again at the rule that an object's finalizer will be
called at most once automatically by the garbage collector. The
transitions between the colums of the Java diagram enforce this: the
columns are labeled from left to right with unfinalized, finalizable,
and finalized, and there are no transition arrows pointing left. (In
my description above, already finalized objects are considered not to
have a finalizer.) I think this rule makes a lot of sense given Java's
multi-threaded garbage collection: the invocation of finalizers could
run concurreltly with another garbage collection, and we don't want
this to find some of the same finalizable objects and call their
finalizers again!

We could mark them with a "finalization in progress" flag only while
their finalizer is running, but in a cycle with multiple finalizers it
seems we should keep this flag set until *all* finalizers for objects
in the cycle have run. But we don't actually know exactly what the
cycles are: all we know is "these objects are involved in trash
cycles". More detailed knowledge would require yet another sweep, plus
a more hairy two-dimensional data structure (a list of separate
cycles).  And for what? as soon as we run finalizers from two separate
cycles, those cycles could be merged again (e.g. the first finalizer
could resurrect its cycle, and the second one could link to it). Now
we have a pool of objects that are marked "finalization in progress"
until all their finalizations terminate. For an incremental concurrent
garbage collector, this seems a pain, since it may continue to find
new finalizable objects and add them to the pile. Java takes the
logical conclusion: the "finalization in progress" flag is never
cleared -- and renamed to "finalized".

Conclusion
----------

Are the Java rules complex? Yes. Are there better rules possible? I'm
not so sure, given the requirement of allowing concurrent incremental
garbage collection algorithms that haven't even been invented
yet. (Plus the implied requirement that finalizers in trash cycles
should be invoked.) Are the Java rules difficult for the user? Only
for users who think they can trick finalizers into doing things for
them that they were not designed to do. I would think the following
guidelines should do nicely for the rest of us:

1. Avoid finalizers if you can; use them only to release *external*
(e.g. OS) resources.

2. Write your finalizer as robust as you can, with as little use of
other objects as you can.

3. Your only get one chance. Use it.

Unlike Scheme guardians or the proposed __cleanup__ mechanism, you
don't have to know whether your object is involved in a cycle -- your
finalizer will still be called.

I am reconsidering to use the __del__ method as the finalizer. As a
compromise to those who want their __del__ to run whenever the
reference count reaches zero, the finalized flag can be cleared
explicitly. I am considering to use the following implementation:
after retrieving the __del__ method, but before calling it,
self.__del__ is set to None (better, self.__dict__['__del__'] = None,
to avoid confusing __setattr__ hooks). The object call remove
self.__del__ to clear the finalized flag. I think I'll use the same
mechanism to prevent __del__ from being called upon a failed
initialization.

Final note: the semantics "__del__ is called whenever the reference
count reaches zero" cannot be defended in the light of a migration to
different forms of garbage collection (e.g. JPython).  There may not
be a reference count.

--Guido van Rossum (home page: http://www.python.org/~guido/)

____
(*) Footnote: there's one complication: to ask a Python class instance
if it has a finalizer, we have to use PyObject_Getattr(obj, ...). If
the object's class has a __getattr__ hook, this can invoke arbitrary
Python code -- even if the answer to the question is "no"! This can
make the object reachable again (in the Java diagram, arrows pointing
up or up and right). We could either use instance_getattr1(), which
avoids the __getattr__ hook, or mark all class instances as
finalizable until proven innocent.


From gward at cnri.reston.va.us  Tue Mar  7 15:04:30 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Tue, 7 Mar 2000 09:04:30 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
In-Reply-To: <200003051204.HAA05367@eric.cnri.reston.va.us>; from guido@python.org on Sun, Mar 05, 2000 at 07:04:56AM -0500
References: <200003032044.PAA08614@bitdiddle.cnri.reston.va.us> <Pine.LNX.4.10.10003031508020.14301-100000@nebula.lyra.org> <14528.18324.283508.577221@bitdiddle.cnri.reston.va.us> <14529.55983.263225.691427@weyr.cnri.reston.va.us> <200003051204.HAA05367@eric.cnri.reston.va.us>
Message-ID: <20000307090430.A16948@cnri.reston.va.us>

On 05 March 2000, Guido van Rossum said:
> - Variants on the syntax could be given through some kind of option
> system rather than through subclassing -- they should be combinable
> independently.  Som possible options (maybe I'm going overboard here)
> could be:
> 
> 	- comment characters: ('#', ';', both, others?)
> 	- comments after variables allowed? on sections?
> 	- variable characters: (':', '=', both, others?)
> 	- quoting of values with "..." allowed?
> 	- backslashes in "..." allowed?
> 	- does backslash-newline mean a continuation?
> 	- case sensitivity for section names (default on)
> 	- case sensitivity for option names (default off)
> 	- variables allowed before first section name?
> 	- first section name?  (default "main")
> 	- character set allowed in section names
> 	- character set allowed in variable names
> 	- %(...) substitution?

I agree with Fred that this level of flexibility is probably overkill
for a config file parser; you don't want every application author who
uses the module to have to explain his particular variant of the syntax.

However, if you're interested in a class that *does* provide some of the
above flexibility, I have written such a beast.  It's currently used to
parse the Distutils MANIFEST.in file, and I've considered using it for
the mythical Distutils config files.  (And it also gets heavy use in my
day job.)  It's really a class for reading a file in preparation for
"text processing the Unix way", though: it doesn't say anything about
syntax, it just worries about blank lines, comments, continuations, and
a few other things.  Here's the class docstring:

class TextFile:

    """Provides a file-like object that takes care of all the things you
       commonly want to do when processing a text file that has some
       line-by-line syntax: strip comments (as long as "#" is your comment
       character), skip blank lines, join adjacent lines by escaping the
       newline (ie. backslash at end of line), strip leading and/or
       trailing whitespace, and collapse internal whitespace.  All of these
       are optional and independently controllable.

       Provides a 'warn()' method so you can generate warning messages that
       report physical line number, even if the logical line in question
       spans multiple physical lines.  Also provides 'unreadline()' for
       implementing line-at-a-time lookahead.

       Constructor is called as:

           TextFile (filename=None, file=None, **options)

       It bombs (RuntimeError) if both 'filename' and 'file' are None;
       'filename' should be a string, and 'file' a file object (or
       something that provides 'readline()' and 'close()' methods).  It is
       recommended that you supply at least 'filename', so that TextFile
       can include it in warning messages.  If 'file' is not supplied,
       TextFile creates its own using the 'open()' builtin.

       The options are all boolean, and affect the value returned by
       'readline()':
         strip_comments [default: true]
           strip from "#" to end-of-line, as well as any whitespace
           leading up to the "#" -- unless it is escaped by a backslash
         lstrip_ws [default: false]
           strip leading whitespace from each line before returning it
         rstrip_ws [default: true]
           strip trailing whitespace (including line terminator!) from
           each line before returning it
         skip_blanks [default: true}
           skip lines that are empty *after* stripping comments and
           whitespace.  (If both lstrip_ws and rstrip_ws are true,
           then some lines may consist of solely whitespace: these will
           *not* be skipped, even if 'skip_blanks' is true.)
         join_lines [default: false]
           if a backslash is the last non-newline character on a line
           after stripping comments and whitespace, join the following line
           to it to form one "logical line"; if N consecutive lines end
           with a backslash, then N+1 physical lines will be joined to
           form one logical line.
         collapse_ws [default: false]  
           after stripping comments and whitespace and joining physical
           lines into logical lines, all internal whitespace (strings of
           whitespace surrounded by non-whitespace characters, and not at
           the beginning or end of the logical line) will be collapsed
           to a single space.

       Note that since 'rstrip_ws' can strip the trailing newline, the
       semantics of 'readline()' must differ from those of the builtin file
       object's 'readline()' method!  In particular, 'readline()' returns
       None for end-of-file: an empty string might just be a blank line (or
       an all-whitespace line), if 'rstrip_ws' is true but 'skip_blanks' is
       not."""

Interested in having something like this in the core?  Adding more
options is possible, but the code is already on the hairy side to
support all of these.  And I'm not a big fan of the subtle difference in
semantics with file objects, but honestly couldn't think of a better way
at the time.

If you're interested, you can download it from

    http://www.mems-exchange.org/exchange/software/python/text_file/

or just use the version in the Distutils CVS tree.

        Greg


From mal at lemburg.com  Tue Mar  7 15:38:09 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 15:38:09 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com>
Message-ID: <38C51451.D38B21FE@lemburg.com>

Fredrik Lundh wrote:
> 
> > Unicode objects      string objects
> > expandtabs
> 
> yes.
> 
> I'm pretty sure there's "expandtabs" code in the
> strop module.  maybe barry missed it?
> 
> > center
> > ljust
> > rjust
> 
> probably.
> 
> the implementation is trivial, and ljust/rjust are
> somewhat useful, so you might as well add them
> all (just cut and paste from the unicode class).
> 
> what about rguido and lguido, btw?

Ooops, forgot those, thanks :-)
 
> > zfill
> 
> no.

Why not ?

Since the string implementation had all of the above
marked as TBD, I added all four.

What about the other new methods (.isXXX() and .splitlines()) ?

.isXXX() are mostly needed due to the extended character
properties in Unicode. They would be new to the string object
world.

.splitlines() is Unicode aware and also treats CR/LF
combinations across platforms:

S.splitlines([maxsplit]]) -> list of strings

Return a list of the lines in S, breaking at line boundaries.
If maxsplit is given, at most maxsplit are done. Line breaks are not
included in the resulting list.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Tue Mar  7 16:38:18 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 07 Mar 2000 10:38:18 -0500
Subject: [Python-Dev] Adding Unicode methods to string objects
In-Reply-To: Your message of "Tue, 07 Mar 2000 15:38:09 +0100."
             <38C51451.D38B21FE@lemburg.com> 
References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com>  
            <38C51451.D38B21FE@lemburg.com> 
Message-ID: <200003071538.KAA13977@eric.cnri.reston.va.us>

> > > zfill
> > 
> > no.
> 
> Why not ?

Zfill is (or ought to be) deprecated.  It stems from times before we
had things like "%08d" % x and no longer serves a useful purpose.
I doubt anyone would miss it.

(Of course, now /F will claim that PIL will break in 27 places because
of this. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Tue Mar  7 18:07:40 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Tue, 7 Mar 2000 12:07:40 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <200003071352.IAA13571@eric.cnri.reston.va.us>
Message-ID: <000701bf8857$a56ed660$a72d153f@tim>

[Guido]
> ...
> Conclusion
> ----------
>
> Are the Java rules complex? Yes. Are there better rules possible? I'm
> not so sure, given the requirement of allowing concurrent incremental
> garbage collection algorithms that haven't even been invented
> yet.

Guy Steele worked his ass off on Java's rules.  He had as much real-world
experience with implementing GC as anyone, via his long & deep Lisp
implementation background (both SW & HW), and indeed invented several key
techniques in high-performance GC.  But he had no background in GC with
user-defined finalizers -- and it shows!

> (Plus the implied requirement that finalizers in trash cycles
> should be invoked.) Are the Java rules difficult for the user? Only
> for users who think they can trick finalizers into doing things for
> them that they were not designed to do.

This is so implementation-centric it's hard to know what to say <0.5 wink>.
The Java rules weren't designed to do much of anything except guarantee that
Java (1) would eventually reclaim all unreachable objects, and (2) wouldn't
expose dangling pointers to user finalizers, or chase any itself.  Whatever
*useful* finalizer semantics may remain are those that just happened to
survive.

> ...
> Unlike Scheme guardians or the proposed __cleanup__ mechanism, you
> don't have to know whether your object is involved in a cycle -- your
> finalizer will still be called.

This is like saying a user doesn't have to know whether the new drug
prescribed for them by their doctor has potentially fatal side effects --
they'll be forced to take it regardless <wink>.

> ...
> Final note: the semantics "__del__ is called whenever the reference
> count reaches zero" cannot be defended in the light of a migration to
> different forms of garbage collection (e.g. JPython).  There may not
> be a reference count.

1. I don't know why JPython doesn't execute __del__ methods at all now, but
have to suspect that the Java rules imply an implementation so grossly
inefficient in the presence of __del__ that Barry simply doesn't want to
endure the speed complaints.  The Java spec itself urges implementations to
special-case the snot out of classes that don't  override the default
do-nothing finalizer, for "go fast" reasons too.

2. The "refcount reaches 0" rule in CPython is merely a wonderfully concrete
way to get across the idea of "destruction occurs in an order consistent
with a topological sort of the points-to graph".  The latter is explicit in
the BDW collector, which has no refcounts; the topsort concept is applicable
and thoroughly natural in all languages; refcounts in CPython give an
exploitable hint about *when* collection will occur, but add no purely
semantic constraint beyond the topsort requirement (they neatly *imply* the
topsort requirement).  There is no topsort in the presence of cycles, so
cycles create problems in all languages.  The same "throw 'em back at the
user" approach makes just as much sense from the topsort view as the RC
view; it doesn't rely on RC at all.

stop-the-insanity<wink>-ly y'rs  - tim


From guido at python.org  Tue Mar  7 18:33:31 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 07 Mar 2000 12:33:31 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Tue, 07 Mar 2000 12:07:40 EST."
             <000701bf8857$a56ed660$a72d153f@tim> 
References: <000701bf8857$a56ed660$a72d153f@tim> 
Message-ID: <200003071733.MAA14926@eric.cnri.reston.va.us>

[Tim tells Guido again that he finds the Java rules bad, slinging some
mud at Guy Steel, but without explaining what the problem with them
is, and then asks:]

> 1. I don't know why JPython doesn't execute __del__ methods at all now, but
> have to suspect that the Java rules imply an implementation so grossly
> inefficient in the presence of __del__ that Barry simply doesn't want to
> endure the speed complaints.  The Java spec itself urges implementations to
> special-case the snot out of classes that don't  override the default
> do-nothing finalizer, for "go fast" reasons too.

Something like that, yes, although it was Jim Hugunin.  I have a
feeling it has to do with the dynamic of __del__ -- this would imply
that *all* Python class instances would appear to Java to have a
finalizer -- just in most cases it would do a failing lookup of
__del__ and bail out quickly.  Maybe some source code or class
analysis looking for a __del__ could fix this, at the cost of not
allowing one to patch __del__ into an existing class after instances
have already been created.  I don't find that breach of dynamicism a
big deal -- e.g. CPython keeps copies of __getattr__, __setattr__ and
__delattr__ in the class for similar reasons.

> 2. The "refcount reaches 0" rule in CPython is merely a wonderfully concrete
> way to get across the idea of "destruction occurs in an order consistent
> with a topological sort of the points-to graph".  The latter is explicit in
> the BDW collector, which has no refcounts; the topsort concept is applicable
> and thoroughly natural in all languages; refcounts in CPython give an
> exploitable hint about *when* collection will occur, but add no purely
> semantic constraint beyond the topsort requirement (they neatly *imply* the
> topsort requirement).  There is no topsort in the presence of cycles, so
> cycles create problems in all languages.  The same "throw 'em back at the
> user" approach makes just as much sense from the topsort view as the RC
> view; it doesn't rely on RC at all.

Indeed.  I propose to throw it back at the user by calling __del__.

The typical user defines __del__ because they want to close a file,
say goodbye nicely on a socket connection, or delete a temp file.
That sort of thing.  This is what finalizers are *for*.  As an author
of this kind of finalizer, I don't see why I need to know whether I'm
involved in a cycle or not.  I want my finalizer called when my object
goes away, and I don't want my object kept alive by unreachable
cycles.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Tue Mar  7 18:39:15 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 18:39:15 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com> <38C51451.D38B21FE@lemburg.com>
Message-ID: <38C53EC3.5292ECF@lemburg.com>

I've ported most of the Unicode methods to strings now.
Here's the new table:

Unicode objects      string objects
------------------------------------------------------------
capitalize           capitalize
center               center
count                count
encode              
endswith             endswith
expandtabs           expandtabs
find                 find
index                index
isdecimal           
isdigit              isdigit
islower              islower
isnumeric           
isspace              isspace
istitle              istitle
isupper              isupper
join                 join
ljust                ljust
lower                lower
lstrip               lstrip
replace              replace
rfind                rfind
rindex               rindex
rjust                rjust
rstrip               rstrip
split                split
splitlines           splitlines
startswith           startswith
strip                strip
swapcase             swapcase
title                title
translate            translate
upper                upper
zfill                zfill

I don't think that .isdecimal() and .isnumeric() are
needed for strings since most of the added mappings
refer to Unicode char points.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Tue Mar  7 18:42:53 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 07 Mar 2000 18:42:53 +0100
Subject: [Python-Dev] Adding Unicode methods to string objects
References: <38C4CA97.5D0AA9D@lemburg.com> <001001bf882b$f6004f90$f29b12c2@secret.pythonware.com>  
	            <38C51451.D38B21FE@lemburg.com> <200003071538.KAA13977@eric.cnri.reston.va.us>
Message-ID: <38C53F9D.44C3A0F3@lemburg.com>

Guido van Rossum wrote:
> 
> > > > zfill
> > >
> > > no.
> >
> > Why not ?
> 
> Zfill is (or ought to be) deprecated.  It stems from times before we
> had things like "%08d" % x and no longer serves a useful purpose.
> I doubt anyone would miss it.
> 
> (Of course, now /F will claim that PIL will break in 27 places because
> of this. :-)

Ok, I'll remove it from both implementations again... (there
was some email overlap).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From bwarsaw at cnri.reston.va.us  Tue Mar  7 20:24:39 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 7 Mar 2000 14:24:39 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <200003071352.IAA13571@eric.cnri.reston.va.us>
	<000701bf8857$a56ed660$a72d153f@tim>
Message-ID: <14533.22391.447739.901802@anthem.cnri.reston.va.us>

>>>>> "TP" == Tim Peters <tim_one at email.msn.com> writes:

    TP> 1. I don't know why JPython doesn't execute __del__ methods at
    TP> all now, but have to suspect that the Java rules imply an
    TP> implementation so grossly inefficient in the presence of
    TP> __del__ that Barry simply doesn't want to endure the speed
    TP> complaints.

Actually, it was JimH that discovered this performance gotcha.  The
problem is that if you want to support __del__, you've got to take the
finalize() hit for every instance (i.e. PyInstance object) and it's
just not worth it.

<doing!> I just realized that it would be relatively trivial to add a
subclass of PyInstance differing only in that it has a finalize()
method which would invoke __del__().  Now when the class gets defined,
the __del__() would be mined and cached and we'd look at that cache
when creating an instance.  If there's a function there, we create a
PyFinalizableInstance, otherwise we create a PyInstance.  The cache
means you couldn't dynamically add a __del__ later, but I don't think
that's a big deal.  It wouldn't be hard to look up the __del__ every
time, but that'd be a hit for every instance creation (as opposed to
class creation), so again, it's probably not worth it.

I just did a quick and dirty hack and it seems at first blush to
work.  I'm sure there's something I'm missing :).

For those of you who don't care about JPython, you can skip the rest.

Okay, first the Python script to exercise this, then the
PyFinalizableInstance.java file, and then the diffs to PyClass.java.

JPython-devers, is it worth adding this?

-------------------- snip snip --------------------del.py
class B:
    def __del__(self):
        print 'In my __del__'

b = B()
del b

from java.lang import System
System.gc()
-------------------- snip snip --------------------PyFinalizableInstance.java
// Copyright ? Corporation for National Research Initiatives

// These are just like normal instances, except that their classes included
// a definition for __del__(), i.e. Python's finalizer.  These two instance
// types have to be separated due to Java performance issues.

package org.python.core;

public class PyFinalizableInstance extends PyInstance 
{
    public PyFinalizableInstance(PyClass iclass) {
        super(iclass);
    }

    // __del__ method is invoked upon object finalization.
    protected void finalize() {
        __class__.__del__.__call__(this);
    }
}
-------------------- snip snip --------------------
Index: PyClass.java
===================================================================
RCS file: /projects/cvsroot/jpython/dist/org/python/core/PyClass.java,v
retrieving revision 2.8
diff -c -r2.8 PyClass.java
*** PyClass.java	1999/10/04 20:44:28	2.8
--- PyClass.java	2000/03/07 19:02:29
***************
*** 21,27 ****
          
      // Store these methods for performance optimization
      // These are only used by PyInstance
!     PyObject __getattr__, __setattr__, __delattr__, __tojava__;
  
      // Holds the classes for which this is a proxy
      // Only used when subclassing from a Java class
--- 21,27 ----
          
      // Store these methods for performance optimization
      // These are only used by PyInstance
!     PyObject __getattr__, __setattr__, __delattr__, __tojava__, __del__;
  
      // Holds the classes for which this is a proxy
      // Only used when subclassing from a Java class
***************
*** 111,116 ****
--- 111,117 ----
          __setattr__ = lookup("__setattr__", false);
          __delattr__ = lookup("__delattr__", false);
          __tojava__ = lookup("__tojava__", false);
+         __del__ = lookup("__del__", false);
      }
          
      protected void findModule(PyObject dict) {
***************
*** 182,188 ****
      }
  
      public PyObject __call__(PyObject[] args, String[] keywords) {
!         PyInstance inst = new PyInstance(this);
          inst.__init__(args, keywords);
          return inst;
      }
--- 183,194 ----
      }
  
      public PyObject __call__(PyObject[] args, String[] keywords) {
!         PyInstance inst;
!         if (__del__ == null)
!             inst = new PyInstance(this);
!         else
!             // the class defined an __del__ method
!             inst = new PyFinalizableInstance(this);
          inst.__init__(args, keywords);
          return inst;
      }


From bwarsaw at cnri.reston.va.us  Tue Mar  7 20:35:44 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 7 Mar 2000 14:35:44 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <000701bf8857$a56ed660$a72d153f@tim>
	<200003071733.MAA14926@eric.cnri.reston.va.us>
Message-ID: <14533.23056.517661.633574@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido at python.org> writes:

    GvR> Maybe some source code or class analysis looking for a
    GvR> __del__ could fix this, at the cost of not allowing one to
    GvR> patch __del__ into an existing class after instances have
    GvR> already been created.  I don't find that breach of dynamicism
    GvR> a big deal -- e.g. CPython keeps copies of __getattr__,
    GvR> __setattr__ and __delattr__ in the class for similar reasons.

For those of you who enter the "Being Guido van Rossum" door like I
just did, please keep in mind that it dumps you out not on the NJ
Turnpike, but in the little ditch back behind CNRI.  Stop by and say
hi after you brush yourself off.

-Barry


From Tim_Peters at Dragonsys.com  Tue Mar  7 23:30:16 2000
From: Tim_Peters at Dragonsys.com (Tim_Peters at Dragonsys.com)
Date: Tue, 7 Mar 2000 17:30:16 -0500
Subject: [Python-Dev] finalization again
Message-ID: <8525689B.007AB2BA.00@notes-mta.dragonsys.com>

[Guido]
> Tim tells Guido again that he finds the Java rules bad, slinging some
> mud at Guy Steele, but without explaining what the problem with them
> is ...

Slinging mud?  Let's back off here.  You've read the Java spec and were
impressed.  That's fine -- it is impressive <wink>.  But go on from
there and see where it leads in practice.  That Java's GC model did a
masterful job but includes a finalization model users dislike is really
just conventional wisdom in the Java world.  My sketch of Guy Steele's
involvement was an attempt to explain why both halves of that are valid.

I didn't think "explaining the problem" was necessary, as it's been
covered in depth multiple times in c.l.py threads, by Java programmers
as well as by me.  Searching the web for articles about this turns up
many; the first one I hit is typical:

    http://www.quoininc.com/quoininc/Design_Java0197.html

eventually concludes

    Consequently we recommend that [Java] programmers support but do
    not rely on finalization. That is, place all finalization semantics
    in finalize() methods, but call those methods explicitly and in the
    order required.  The points below provide more detail.

That's par for the Java course:  advice to write finalizers to survive
being called multiple times, call them explicitly, and do all you can
to ensure that the "by magic" call is a nop.  The lack of ordering
rules in the language forces people to "do it by hand" (as the Java
spec acknowledges: "It is straightforward to implement a Java class
that will cause a set of finalizer-like methods to be invoked in a
specified order for a set of objects when all the objects become
unreachable. Defining such a class is left as an exercise for the
reader."  But from what I've seen, that exercise is beyond the
imagination of most Java programmers!  The perceived need for ordering
is not.).

It's fine that you want to restrict finalizers to "simple" cases; it's
not so fine if the language can't ensure that simple cases are the only
ones the user can write, & can neither detect & complain at runtime
about cases it didn't intend to support.  The Java spec is unhelpful
here too:

   Therefore, we recommend that the design of finalize methods be kept
   simple and that they be programmed defensively, so that they will
   work in all cases.

Mom and apple pie, but what does it mean, exactly?  The spec realizes
that you're going to be tempted to try things that won't work, but
can't really explain what those are in terms simpler than the full set
of implementation consequences.  As a result, users hate it -- but
don't take my word for that!  If you look & don't find that Java's
finalization rules are widely viewed as "a problem to be wormed around"
by serious Java programmers, fine -- then you've got a much better
search engine than mine <wink>.

As for why I claim following topsort rules is very likely to work out
better, they follow from the nature of the problem, and can be
explained as such, independent of implementation details.  See the
Boehm reference for more about topsort.

will-personally-use-python-regardless-ly y'rs  - tim


From guido at python.org  Wed Mar  8 01:50:38 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 07 Mar 2000 19:50:38 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Tue, 07 Mar 2000 17:30:16 EST."
             <8525689B.007AB2BA.00@notes-mta.dragonsys.com> 
References: <8525689B.007AB2BA.00@notes-mta.dragonsys.com> 
Message-ID: <200003080050.TAA19264@eric.cnri.reston.va.us>

> [Guido]
> > Tim tells Guido again that he finds the Java rules bad, slinging some
> > mud at Guy Steele, but without explaining what the problem with them
> > is ...
> 
> Slinging mud?  Let's back off here.  You've read the Java spec and were
> impressed.  That's fine -- it is impressive <wink>.  But go on from
> there and see where it leads in practice.  That Java's GC model did a
> masterful job but includes a finalization model users dislike is really
> just conventional wisdom in the Java world.  My sketch of Guy Steele's
> involvement was an attempt to explain why both halves of that are valid.

Granted.  I can read Java code and sometimes I write some, but I'm not
a Java programmer by any measure, and I wasn't aware that finalize()
has a general bad rep.

> I didn't think "explaining the problem" was necessary, as it's been
> covered in depth multiple times in c.l.py threads, by Java programmers
> as well as by me.  Searching the web for articles about this turns up
> many; the first one I hit is typical:
> 
>     http://www.quoininc.com/quoininc/Design_Java0197.html
> 
> eventually concludes
> 
>     Consequently we recommend that [Java] programmers support but do
>     not rely on finalization. That is, place all finalization semantics
>     in finalize() methods, but call those methods explicitly and in the
>     order required.  The points below provide more detail.
> 
> That's par for the Java course:  advice to write finalizers to survive
> being called multiple times, call them explicitly, and do all you can
> to ensure that the "by magic" call is a nop.

It seems the authors make one big mistake: they recommend to call
finalize() explicitly.  This may be par for the Java course: the
quality of the materials is often poor, and that has to be taken into
account when certain features have gotten a bad rep.  (These authors
also go on at length about the problems of GC in a real-time situation
-- attempts to use Java in sutations for which it is inappropriate are
also par for the cours, inspired by all the hype.)

Note that e.g. Bruce Eckel in "Thinking in Java" makes it clear that
you should never call finalize() explicitly (except that you should
always call super.fuinalize() in your finalize() method).  (Bruce goes
on at length explaining that there aren't a lot of things you should
use finalize() for -- except to observe the garbage collector. :-)

> The lack of ordering
> rules in the language forces people to "do it by hand" (as the Java
> spec acknowledges: "It is straightforward to implement a Java class
> that will cause a set of finalizer-like methods to be invoked in a
> specified order for a set of objects when all the objects become
> unreachable. Defining such a class is left as an exercise for the
> reader."  But from what I've seen, that exercise is beyond the
> imagination of most Java programmers!  The perceived need for ordering
> is not.).

True, but note that Python won't have the ordering problem, at least
not as long as we stick to reference counting as the primary means of
GC.  The ordering problem in Python will only happen when there are
cycles, and there you really can't blame the poor GC design!

> It's fine that you want to restrict finalizers to "simple" cases; it's
> not so fine if the language can't ensure that simple cases are the only
> ones the user can write, & can neither detect & complain at runtime
> about cases it didn't intend to support.  The Java spec is unhelpful
> here too:
> 
>    Therefore, we recommend that the design of finalize methods be kept
>    simple and that they be programmed defensively, so that they will
>    work in all cases.
> 
> Mom and apple pie, but what does it mean, exactly?  The spec realizes
> that you're going to be tempted to try things that won't work, but
> can't really explain what those are in terms simpler than the full set
> of implementation consequences.  As a result, users hate it -- but
> don't take my word for that!  If you look & don't find that Java's
> finalization rules are widely viewed as "a problem to be wormed around"
> by serious Java programmers, fine -- then you've got a much better
> search engine than mine <wink>.

Hm.  Of course programmers hate finalizers.  They hate GC as well.
But they hate even more not to have it (witness the relentless
complaints about Python's "lack of GC" -- and Java's GC is often
touted as one of the reasons for its superiority over C++).

I think this stuff is just hard!  (Otherwise why would we be here
having this argument?)

> As for why I claim following topsort rules is very likely to work out
> better, they follow from the nature of the problem, and can be
> explained as such, independent of implementation details.  See the
> Boehm reference for more about topsort.

Maybe we have a disconnect?  We *are* using topsort -- for
non-cyclical data structures.  Reference counting ensure that.
Nothing in my design changes that.  The issue at hand is what to do
with *cyclical* data structures, where topsort doesn't help.  Boehm,
on http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html,
says: "Cycles involving one or more finalizable objects are never
finalized."

The question remains, what to do with trash cycles?  I find having a
separate __cleanup__ protocol cumbersome.  I think that the "finalizer
only called once by magic" rule is reasonable.  I believe that the
ordering problems will be much less than in Java, because we use
topsort whenever we can.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Wed Mar  8 07:25:56 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 8 Mar 2000 01:25:56 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <200003080050.TAA19264@eric.cnri.reston.va.us>
Message-ID: <001401bf88c7$29f2a320$452d153f@tim>

[Guido]
> Granted.  I can read Java code and sometimes I write some, but I'm not
> a Java programmer by any measure, and I wasn't aware that finalize()
> has a general bad rep.

It does, albeit often for bad reasons.

1. C++ programmers seeking to emulate techniques based on C++'s
   rigid specification of the order and timing of destruction of autos.

2. People pushing the limits (as in the URL I happened to post).

3. People trying to do anything <wink>.  Java's finalization semantics
   are very weak, and s-l-o-w too (under most current implementations).

Now I haven't used Java for real in about two years, and avoided finalizers
completely when I did use it.  I can't recall any essential use of __del__ I
make in Python code, either.  So what Python does here makes no personal
difference to me.  However, I frequently respond to complaints & questions
on c.l.py, and don't want to get stuck trying to justify Java's uniquely
baroque rules outside of comp.lang.java <0.9 wink>.

>> [Tim, passes on the first relevant URL he finds:
>>  http://www.quoininc.com/quoininc/Design_Java0197.html]

> It seems the authors make one big mistake: they recommend to call
> finalize() explicitly.  This may be par for the Java course: the
> quality of the materials is often poor, and that has to be taken into
> account when certain features have gotten a bad rep.

Well, in the "The Java Programming Language", Gosling recommends to:

a) Add a method called close(), that tolerates being called multiple
   times.

b) Write a finalize() method whose body calls close().

People tended to do that at first, but used a bunch of names other than
"close" too.  I guess people eventually got weary of having two methods that
did the same thing, so decided to just use the single name Java guaranteed
would make sense.

> (These authors also go on at length about the problems of GC in a real-
> time situation -- attempts to use Java in sutations for which it is
> inappropriate are also par for the course, inspired by all the hype.)

I could have picked any number of other URLs, but don't regret picking this
one:  you can't judge a ship in smooth waters, and people will push *all*
features beyond their original intents.  Doing so exposes weaknesses.
Besides, Sun won't come out & say Java is unsuitable for real-time, no
matter how obvious it is <wink>.

> Note that e.g. Bruce Eckel in "Thinking in Java" makes it clear that
> you should never call finalize() explicitly (except that you should
> always call super.fuinalize() in your finalize() method).

You'll find lots of conflicting advice here, be it about Java or C++.  Java
may be unique, though, in the universality of the conclusion Bruce draws
here:

> (Bruce goes on at length explaining that there aren't a lot of things
> you should use finalize() for -- except to observe the garbage collector.
:-)

Frankly, I think Java would be better off without finalizers.  Python could
do fine without __del__ too -- if you and I were the only users <0.6 wink>.

[on Java's lack of ordering promises]
> True, but note that Python won't have the ordering problem, at least
> not as long as we stick to reference counting as the primary means of
> GC.  The ordering problem in Python will only happen when there are
> cycles, and there you really can't blame the poor GC design!

I cannot.  Nor do I intend to.  The cyclic ordering problem isn't GC's
fault, it's the program's; but GC's *response* to it is entirely GC's
responsibility.

>> ... The Java spec is unhelpful here too:
>>
>>  Therefore, we recommend that the design of finalize methods be kept
>>  simple and that they be programmed defensively, so that they will
>>  work in all cases.
>>
>> Mom and apple pie, but what does it mean, exactly?  The spec realizes
>> that you're going to be tempted to try things that won't work, but
>> can't really explain what those are in terms simpler than the full set
>> of implementation consequences.  As a result, users hate it -- but
>> don't take my word for that!  If you look & don't find that Java's
>> finalization rules are widely viewed as "a problem to be wormed around"
>> by serious Java programmers, fine -- then you've got a much better
>> search engine than mine <wink>.

> Hm.  Of course programmers hate finalizers.

Oh no!  C++ programmers *love* destructors!  I mean it, they're absolutely
gaga over them.  I haven't detected signs that CPython programmers hate
__del__ either, except at shutdown time.  Regardless of language, they love
them when they're predictable and work as expected, they hate them when
they're unpredictable and confusing.  C++ auto destructors are extremely
predictable (e.g., after "{SomeClass a, b; ...}", b is destructed before a,
and both destructions are guaranteed before leaving the block they're
declared in, regardless of whether via return, exception, goto or falling
off the end).  CPython's __del__ is largely predictable (modulo shutdown,
cycles, and sometimes exceptions).  The unhappiness in the Java world comes
from Java finalizers' unpredictability and consequent all-around uselessness
in messy real life.

> They hate GC as well.

Yes, when it's unpredictable and confusing <wink>.

> But they hate even more not to have it (witness the relentless
> complaints about Python's "lack of GC" -- and Java's GC is often
> touted as one of the reasons for its superiority over C++).

Back when JimF & I were looking at gc, we may have talked each other into
really believing that paying careful attention to RC issues leads to cleaner
and more robust designs.  In fact, I still believe that, and have never
clamored for "real gc" in Python.  Jim now may even be opposed to "real gc".
But Jim and I and you all think a lot about the art of programming, and most
users just don't have time or inclination for that -- the slowly changing
nature of c.l.py is also clear evidence of this.  I'm afraid this makes
growing "real GC" a genuine necessity for Python's continued growth.  It's
not a *bad* thing in any case.  Think of it as a marketing requirement <0.7
wink>.

> I think this stuff is just hard!  (Otherwise why would we be here
> having this argument?)

Honest to Guido, I think it's because you're sorely tempted to go down an
un-Pythonic path here, and I'm fighting that.  I said early on there are no
thoroughly good answers (yes, it's hard), but that's nothing new for Python!
We're having this argument solely because you're confusing Python with some
other language <wink>.

[a 2nd or 3rd plug for taking topsort seriously]
> Maybe we have a disconnect?

Not in the technical analysis, but in what conclusions to take from it.

> We *are* using topsort -- for non-cyclical data structures.  Reference
> counting ensure that. Nothing in my design changes that.

And it's great!  Everyone understands the RC rules pretty quickly, lots of
people like them a whole lot, and if it weren't for cyclic trash everything
would be peachy.

> The issue at hand is what to do with *cyclical* data structures, where
> topsort doesn't help.  Boehm, on
> http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html,
> says: "Cycles involving one or more finalizable objects are never
> finalized."

This is like some weird echo chamber, where the third time I shout something
the first one comes back without any distortion at all <wink>.  Yes, Boehm's
first rule is "Do No Harm".  It's a great rule.  Python follows the same
rule all over the place; e.g., when you see

    x = "4" + 2

you can't possibly know what was intended, so you refuse to guess:  you
would rather *kill* the program than make a blind guess!  I see cycles with
finalizers as much the same:  it's plain wrong to guess when you can't
possibly know what was intended.  Because topsort is the only principled way
to decide order of finalization, and they've *created* a situation where a
topsort doesn't exist, what they're handing you is no less amibiguous than
in trying to add a string to an int.  This isn't the time to abandon topsort
as inconvenient, it's the time to defend it as inviolate principle!

The only throughly rational response is "you know, this doesn't make
sense -- since I can't know what you want here, I refuse to pretend that I
can".  Since that's "the right" response everywhere else in Python, what the
heck is so special about this case?  It's like you decided Python *had* to
allow adding strings to ints, and now we're going to argue about whether
Perl, Awk or Tcl makes the best unprincipled guess <wink>.

> The question remains, what to do with trash cycles?

A trash cycle without a finalizer isn't a problem, right?  In that case,
topsort rules have no visible consquence so it doesn't matter in what order
you merely reclaim the memory.

If it has an object with a finalizer, though, at the very worst you can let
it leak, and  make the collection of leaked objects available for
inspection.  Even that much is a *huge* "improvement" over what they have
today:  most cycles won't have a finalizer and so will get reclaimed, and
for the rest they'll finally have a simple way to identify exactly where the
problem is, and a simple criterion for predicting when it will happen.  If
that's not "good enough", then without abandoning principle the user needs
to have some way to reduce such a cycle *to* a topsort case themself.

> I find having a separate __cleanup__ protocol cumbersome.

Same here, but if you're not comfortable leaking, and you agree Python is
not in the business of guesing in inherently ambiguous situations, maybe
that's what it takes!  MAL and GregS both gravitated to this kind of thing
at once, and that's at least suggestive; and MAL has actually been using his
approach.  It's explicit, and that's Pythonic on the face of it.

> I think that the "finalizer only called once by magic" rule is reasonable.

If it weren't for its specific use in emulating Java's scheme, would you
still be in favor of that?  It's a little suspicious that it never came up
before <wink>.

> I believe that the ordering problems will be much less than in Java,
because
> we use topsort whenever we can.

No argument here, except that I believe there's never sufficient reason to
abandon topsort ordering.  Note that BDW's adamant refusal to yield on this
hasn't stopped "why doesn't Python use BDW?" from becoming a FAQ <wink>.

a-case-where-i-expect-adhering-to-principle-is-more-pragmatic-
    in-the-end-ly y'rs  - tim


From tim_one at email.msn.com  Wed Mar  8 08:48:24 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 8 Mar 2000 02:48:24 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
Message-ID: <001801bf88d2$af0037c0$452d153f@tim>

Mike has a darned good point here.  Anyone have a darned good answer <wink>?

-----Original Message-----
From: python-list-admin at python.org [mailto:python-list-admin at python.org]
On Behalf Of Mike Fletcher
Sent: Tuesday, March 07, 2000 2:08 PM
To: Python Listserv (E-mail)
Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be
adopted?

Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage
all over the hard-disk, traffic rerouted through the bit-bucket, you aren't
getting to work anytime soon Mrs. Programmer) and wondering why we have a
FAQ instead of having the win32pipe stuff rolled into the os module to fix
it.  Is there some incompatibility?  Is there a licensing problem?

Ideas?
Mike
__________________________________
 Mike C. Fletcher
 Designer, VR Plumber
 http://members.home.com/mcfletch

-- 
http://www.python.org/mailman/listinfo/python-list


From mal at lemburg.com  Wed Mar  8 09:36:57 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 09:36:57 +0100
Subject: [Python-Dev] finalization again
References: <8525689B.007AB2BA.00@notes-mta.dragonsys.com> <200003080050.TAA19264@eric.cnri.reston.va.us>
Message-ID: <38C61129.2F8C9E95@lemburg.com>

> [Guido]
> The question remains, what to do with trash cycles?  I find having a
> separate __cleanup__ protocol cumbersome.  I think that the "finalizer
> only called once by magic" rule is reasonable.  I believe that the
> ordering problems will be much less than in Java, because we use
> topsort whenever we can.

Note that the __cleanup__ protocol is intended to break cycles
*before* calling the garbage collector. After those cycles are broken,
ordering is not a problem anymore and because __cleanup__ can
do its task on a per-object basis all magic is left in the hands
of the programmer.

The __cleanup__ protocol as I use it is designed to be called
in situations where the system knows that all references into
a cycle are about to be dropped (I typically use small cyclish
object systems in my application, e.g. ones that create and
reference namespaces which include a reference to the hosting
object itself). In my application that is done by using mxProxies
at places where I know these cyclic object subsystems are being
referenced. In Python the same could be done whenever the
interpreter knows that a certain object is about to be
deleted, e.g. during shutdown (important for embedding Python
in other applications such as Apache) or some other major
subsystem finalization, e.g. unload of a module or killing
of a thread (yes, I know these are nonos, but they could
be useful, esp. the thread kill operation in multi-threaded
servers).

After __cleanup__ has done its thing, the finalizer can either
choose to leave all remaining cycles in memory (and leak) or
apply its own magic to complete the task. In any case, __del__
should be called when the refcount reaches 0. (I find it somewhat
strange that people are argueing to keep external resources
alive even though there is a chance of freeing them.)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Mar  8 09:46:14 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 09:46:14 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <001801bf88d2$af0037c0$452d153f@tim>
Message-ID: <38C61356.E0598DBF@lemburg.com>

Tim Peters wrote:
> 
> Mike has a darned good point here.  Anyone have a darned good answer <wink>?
> 
> -----Original Message-----
> From: python-list-admin at python.org [mailto:python-list-admin at python.org]
> On Behalf Of Mike Fletcher
> Sent: Tuesday, March 07, 2000 2:08 PM
> To: Python Listserv (E-mail)
> Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be
> adopted?
> 
> Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage
> all over the hard-disk, traffic rerouted through the bit-bucket, you aren't
> getting to work anytime soon Mrs. Programmer) and wondering why we have a
> FAQ instead of having the win32pipe stuff rolled into the os module to fix
> it.  Is there some incompatibility?  Is there a licensing problem?
> 
> Ideas?

I'd suggest moving the popen from the C modules into os.py
as Python API and then applying all necessary magic to either
use the win32pipe implementation (if available) or the native
C one from the posix module in os.py.

Unless, of course, the win32 stuff (or some of it) makes it into
the core.

I'm mostly interested in this for my platform.py module... 
BTW, is there any interest of moving it into the core ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Wed Mar  8 13:10:53 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 08 Mar 2000 07:10:53 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: Your message of "Wed, 08 Mar 2000 09:46:14 +0100."
             <38C61356.E0598DBF@lemburg.com> 
References: <001801bf88d2$af0037c0$452d153f@tim>  
            <38C61356.E0598DBF@lemburg.com> 
Message-ID: <200003081210.HAA19931@eric.cnri.reston.va.us>

> Tim Peters wrote:
> > 
> > Mike has a darned good point here.  Anyone have a darned good answer <wink>?
> > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be
> > adopted?
> > 
> > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage
> > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't
> > getting to work anytime soon Mrs. Programmer) and wondering why we have a
> > FAQ instead of having the win32pipe stuff rolled into the os module to fix
> > it.  Is there some incompatibility?  Is there a licensing problem?

MAL:
> I'd suggest moving the popen from the C modules into os.py
> as Python API and then applying all necessary magic to either
> use the win32pipe implementation (if available) or the native
> C one from the posix module in os.py.
> 
> Unless, of course, the win32 stuff (or some of it) makes it into
> the core.

No concrete plans -- except that I think the registry access is
supposed to go in.  Haven't seen the code on patches at python.org yet
though.

> I'm mostly interested in this for my platform.py module... 
> BTW, is there any interest of moving it into the core ?

"it" == platform.py?  Little interest from me personally; I suppose it
could go in Tools/scripts/...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Wed Mar  8 15:06:53 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 08 Mar 2000 09:06:53 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Wed, 08 Mar 2000 01:25:56 EST."
             <001401bf88c7$29f2a320$452d153f@tim> 
References: <001401bf88c7$29f2a320$452d153f@tim> 
Message-ID: <200003081406.JAA20033@eric.cnri.reston.va.us>

> A trash cycle without a finalizer isn't a problem, right?  In that case,
> topsort rules have no visible consquence so it doesn't matter in what order
> you merely reclaim the memory.

When we have a pile of garbage, we don't know whether it's all
connected or whether it's lots of little cycles.  So if we find
[objects with -- I'm going to omit this] finalizers, we have to put
those on a third list and put everything reachable from them on that
list as well (the algorithm I described before).

What's left on the first list then consists of finalizer-free garbage.
We dispose of this garbage by clearing dicts and lists.  Hopefully
this makes the refcount of some of the finalizers go to zero -- those
are finalized in the normal way.

And now we have to deal with the inevitable: finalizers that are part
of cycles.  It makes sense to reduce the graph of objects to a graph
of finalizers only.  Example:

  A <=> b -> C <=> d

A and C have finalizers.  C is part of a cycle (C-d) that contains no
other finalizers, but C is also reachable from A.  A is part of a
cycle (A-b) that keeps it alive.  The interesting thing here is that
if we only look at the finalizers, there are no cycles!

If we reduce the graph to only finalizers (setting aside for now the
problem of how to do that -- we may need to allocate more memory to
hold the reduced greaph), we get:

  A -> C

We can now finalize A (even though its refcount is nonzero!).  And
that's really all we can do!  A could break its own cycle, thereby
disposing of itself and b.  It could also break C's cycle, disposing
of C and d.  It could do nothing.  Or it could resurrect A, thereby
resurrecting all of A, b, C, and d.

This leads to (there's that weird echo again :-) Boehm's solution:
Call A's finalizer and leave the rest to the next time the garbage
collection runs.

Note that we're now calling finalizers on objects with a non-zero
refcount.  At some point (probably as a result of finalizing A) its
refcount will go to zero.  We should not finalize it again -- this
would serve no purpose.  Possible solution:

  INCREF(A);
  A->__del__();
  if (A->ob_refcnt == 1)
      A->__class__ = NULL; /* Make a finalizer-less */
  DECREF(A);

This avoids finalizing twice if the first finalization broke all
cycles in which A is involved.  But if it doesn't, A is still cyclical
garbage with a finalizer!  Even if it didn't resurrect itself.

Instead of the code fragment above, we could mark A as "just
finalized" and when it shows up at the head of the tree (of finalizers
in cyclical trash) again on the next garbage collection, to discard it
without calling the finalizer again (because this clearly means that
it didn't resurrect itself -- at least not for a very long time).

I would be happier if we could still have a rule that says that a
finalizer is called only once by magic -- even if we have two forms of
magic: refcount zero or root of the tree.  Tim: I don't know if you
object against this rule as a matter of principle (for the sake of
finalizers that resurrect the object) or if your objection is really
against the unordered calling of finalizers legitimized by Java's
rules.  I hope the latter, since I think it that this rule (__del__
called only once by magic) by itself is easy to understand and easy to
deal with, and I believe it may be necessary to guarantee progress for
the garbage collector.

The problem is that the collector can't easily tell whether A has
resurrected itself.  Sure, if the refcount is 1 after the finalizer
run, I know it didn't resurrect itself.  But even if it's higher than
before, that doesn't mean it's resurrected: it could have linked to
itself.  Without doing a full collection I can't tell the difference.
If I wait until a full collection happens again naturally, and look at
the "just finalized flag", I can't tell the difference between the
case whereby the object resurrected itself but died again before the
next collection, and the case where it was dead already.  So I don't
know how many times it was expecting the "last rites" to be performed,
and the object can't know whether to expect them again or not.  This
seems worse than the only-once rule to me.

Even if someone once found a good use for resurrecting inside __del__,
against all recommendations, I don't mind breaking their code, if it's
for a good cause.  The Java rules aren't a good cause.  But top-sorted
finalizer calls seem a worthy cause.

So now we get to discuss what to do with multi-finalizer cycles, like:

  A <=> b <=> C

Here the reduced graph is:

  A <=> C

About this case you say:

> If it has an object with a finalizer, though, at the very worst you can let
> it leak, and  make the collection of leaked objects available for
> inspection.  Even that much is a *huge* "improvement" over what they have
> today:  most cycles won't have a finalizer and so will get reclaimed, and
> for the rest they'll finally have a simple way to identify exactly where the
> problem is, and a simple criterion for predicting when it will happen.  If
> that's not "good enough", then without abandoning principle the user needs
> to have some way to reduce such a cycle *to* a topsort case themself.
> 
> > I find having a separate __cleanup__ protocol cumbersome.
> 
> Same here, but if you're not comfortable leaking, and you agree Python is
> not in the business of guesing in inherently ambiguous situations, maybe
> that's what it takes!  MAL and GregS both gravitated to this kind of thing
> at once, and that's at least suggestive; and MAL has actually been using his
> approach.  It's explicit, and that's Pythonic on the face of it.
> 
> > I think that the "finalizer only called once by magic" rule is reasonable.
> 
> If it weren't for its specific use in emulating Java's scheme, would you
> still be in favor of that?  It's a little suspicious that it never came up
> before <wink>.

Suspicious or not, it still comes up.  I still like it.  I still think
that playing games with resurrection is evil.  (Maybe my spiritual
beliefs shine through here -- I'm a convinced atheist. :-)

Anyway, once-only rule aside, we still need a protocol to deal with
cyclical dependencies between finalizers.  The __cleanup__ approach is
one solution, but it also has a problem: we have a set of finalizers.
Whose __cleanup__ do we call?  Any?  All?  Suggestions?

Note that I'd like some implementation freedom: I may not want to
bother with the graph reduction algorithm at first (which seems very
hairy) so I'd like to have the right to use the __cleanup__ API
as soon as I see finalizers in cyclical trash.  I don't mind disposing
of finalizer-free cycles first, but once I have more than one
finalizer left in the remaining cycles, I'd like the right not to
reduce the graph for topsort reasons -- that algorithm seems hard.

So we're back to the __cleanup__ design.  Strawman proposal: for all
finalizers in a trash cycle, call their __cleanup__ method, in
arbitrary order.  After all __cleanup__ calls are done, if the objects
haven't all disposed of themselves, they are all garbage-collected
without calling __del__.  (This seems to require another garbage
colelction cycle -- so perhaps there should also be a once-only rule
for __cleanup__?)

Separate question: what if there is no __cleanup__?  This should
probably be reported: "You have cycles with finalizers, buddy!  What
do you want to do about them?"  This same warning could be given when
there is a __cleanup__ but it doesn't break all cycles.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed Mar  8 14:34:06 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 14:34:06 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <001801bf88d2$af0037c0$452d153f@tim>  
	            <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us>
Message-ID: <38C656CE.B0ACFF35@lemburg.com>

Guido van Rossum wrote:
> 
> > Tim Peters wrote:
> > >
> > > Mike has a darned good point here.  Anyone have a darned good answer <wink>?
> > > Subject: Fixing os.popen on Win32 => is the win32pipe stuff going to be
> > > adopted?
> > >
> > > Just reading one more post (and a FAQ) on the win32 pipe breakage (sewage
> > > all over the hard-disk, traffic rerouted through the bit-bucket, you aren't
> > > getting to work anytime soon Mrs. Programmer) and wondering why we have a
> > > FAQ instead of having the win32pipe stuff rolled into the os module to fix
> > > it.  Is there some incompatibility?  Is there a licensing problem?
> 
> MAL:
> > I'd suggest moving the popen from the C modules into os.py
> > as Python API and then applying all necessary magic to either
> > use the win32pipe implementation (if available) or the native
> > C one from the posix module in os.py.
> >
> > Unless, of course, the win32 stuff (or some of it) makes it into
> > the core.
> 
> No concrete plans -- except that I think the registry access is
> supposed to go in.  Haven't seen the code on patches at python.org yet
> though.

Ok, what about the optional "use win32pipe if available" idea then ?
 
> > I'm mostly interested in this for my platform.py module...
> > BTW, is there any interest of moving it into the core ?
> 
> "it" == platform.py? 

Right.

> Little interest from me personally; I suppose it
> could go in Tools/scripts/...

Hmm, it wouldn't help much in there I guess... after all, it defines
APIs which are to be queried by other scripts. The default
action to print the platform information to stdout is just
a useful addition.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Wed Mar  8 15:33:53 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 08 Mar 2000 09:33:53 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: Your message of "Wed, 08 Mar 2000 14:34:06 +0100."
             <38C656CE.B0ACFF35@lemburg.com> 
References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us>  
            <38C656CE.B0ACFF35@lemburg.com> 
Message-ID: <200003081433.JAA20177@eric.cnri.reston.va.us>

> > MAL:
> > > I'd suggest moving the popen from the C modules into os.py
> > > as Python API and then applying all necessary magic to either
> > > use the win32pipe implementation (if available) or the native
> > > C one from the posix module in os.py.
> > >
> > > Unless, of course, the win32 stuff (or some of it) makes it into
> > > the core.
[Guido]
> > No concrete plans -- except that I think the registry access is
> > supposed to go in.  Haven't seen the code on patches at python.org yet
> > though.
> 
> Ok, what about the optional "use win32pipe if available" idea then ?

Sorry, I meant please send me the patch!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at acm.org  Wed Mar  8 15:59:46 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 8 Mar 2000 09:59:46 -0500 (EST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us>
References: <001801bf88d2$af0037c0$452d153f@tim>
	<38C61356.E0598DBF@lemburg.com>
	<200003081210.HAA19931@eric.cnri.reston.va.us>
Message-ID: <14534.27362.139106.701784@weyr.cnri.reston.va.us>

Guido van Rossum writes:
 > "it" == platform.py?  Little interest from me personally; I suppose it
 > could go in Tools/scripts/...

  I think platform.py is pretty nifty, but I'm not entirely sure how
it's expected to be used.  Perhaps Marc-Andre could explain further
the motivation behind the module?
  My biggest requirement is that it be accompanied by documentation.
The coolness factor and shared use of hackerly knowledge would
probably get *me* to put it in, but there are a lot of things about
which I'll disagree with Guido just to hear his (well-considered)
thoughts on the matter.  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From mal at lemburg.com  Wed Mar  8 18:37:43 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 18:37:43 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 ... code for thought.
References: <001801bf88d2$af0037c0$452d153f@tim> <38C61356.E0598DBF@lemburg.com> <200003081210.HAA19931@eric.cnri.reston.va.us>  
	            <38C656CE.B0ACFF35@lemburg.com> <200003081433.JAA20177@eric.cnri.reston.va.us>
Message-ID: <38C68FE7.63943C5C@lemburg.com>

Guido van Rossum wrote:
> 
> > > MAL:
> > > > I'd suggest moving the popen from the C modules into os.py
> > > > as Python API and then applying all necessary magic to either
> > > > use the win32pipe implementation (if available) or the native
> > > > C one from the posix module in os.py.
> > > >
> > > > Unless, of course, the win32 stuff (or some of it) makes it into
> > > > the core.
> [Guido]
> > > No concrete plans -- except that I think the registry access is
> > > supposed to go in.  Haven't seen the code on patches at python.org yet
> > > though.
> >
> > Ok, what about the optional "use win32pipe if available" idea then ?
> 
> Sorry, I meant please send me the patch!

Here's the popen() interface I use in platform.py. It should
serve well as basis for a os.popen patch... (don't have time
to do it myself right now):

class _popen:

    """ Fairly portable (alternative) popen implementation.

        This is mostly needed in case os.popen() is not available, or
        doesn't work as advertised, e.g. in Win9X GUI programs like
        PythonWin or IDLE.

        XXX Writing to the pipe is currently not supported.

    """
    tmpfile = ''
    pipe = None
    bufsize = None
    mode = 'r'

    def __init__(self,cmd,mode='r',bufsize=None):

        if mode != 'r':
            raise ValueError,'popen()-emulation only support read mode'
        import tempfile
        self.tmpfile = tmpfile = tempfile.mktemp()
        os.system(cmd + ' > %s' % tmpfile)
        self.pipe = open(tmpfile,'rb')
        self.bufsize = bufsize
        self.mode = mode

    def read(self):

        return self.pipe.read()

    def readlines(self):

        if self.bufsize is not None:
            return self.pipe.readlines()

    def close(self,

              remove=os.unlink,error=os.error):

        if self.pipe:
            rc = self.pipe.close()
        else:
            rc = 255
        if self.tmpfile:
            try:
                remove(self.tmpfile)
            except error:
                pass
        return rc

    # Alias
    __del__ = close

def popen(cmd, mode='r', bufsize=None):

    """ Portable popen() interface.
    """
    # Find a working popen implementation preferring win32pipe.popen
    # over os.popen over _popen
    popen = None
    if os.environ.get('OS','') == 'Windows_NT':
        # On NT win32pipe should work; on Win9x it hangs due to bugs
        # in the MS C lib (see MS KnowledgeBase article Q150956)
        try:
            import win32pipe
        except ImportError:
            pass
        else:
            popen = win32pipe.popen
    if popen is None:
        if hasattr(os,'popen'):
            popen = os.popen
            # Check whether it works... it doesn't in GUI programs
            # on Windows platforms
            if sys.platform == 'win32': # XXX Others too ?
                try:
                    popen('')
                except os.error:
                    popen = _popen
        else:
            popen = _popen
    if bufsize is None:
        return popen(cmd,mode)
    else:
        return popen(cmd,mode,bufsize)

if __name__ == '__main__':
    print """
I confirm that, to the best of my knowledge and belief, this
contribution is free of any claims of third parties under
copyright, patent or other rights or interests ("claims").  To
the extent that I have any such claims, I hereby grant to CNRI a
nonexclusive, irrevocable, royalty-free, worldwide license to
reproduce, distribute, perform and/or display publicly, prepare
derivative versions, and otherwise use this contribution as part
of the Python software and its related documentation, or any
derivative versions thereof, at no cost to CNRI or its licensed
users, and to authorize others to do so.

I acknowledge that CNRI may, at its sole discretion, decide
whether or not to incorporate this contribution in the Python
software and its related documentation.  I further grant CNRI
permission to use my name and other identifying information
provided to CNRI by me for use in connection with the Python
software and its related documentation.
"""

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Mar  8 18:44:59 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 08 Mar 2000 18:44:59 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <001801bf88d2$af0037c0$452d153f@tim>
		<38C61356.E0598DBF@lemburg.com>
		<200003081210.HAA19931@eric.cnri.reston.va.us> <14534.27362.139106.701784@weyr.cnri.reston.va.us>
Message-ID: <38C6919B.EA3EE2E7@lemburg.com>

"Fred L. Drake, Jr." wrote:
> 
> Guido van Rossum writes:
>  > "it" == platform.py?  Little interest from me personally; I suppose it
>  > could go in Tools/scripts/...
> 
>   I think platform.py is pretty nifty, but I'm not entirely sure how
> it's expected to be used.  Perhaps Marc-Andre could explain further
> the motivation behind the module?

It was first intended to provide a way to format a platform
identifying file name for the mxCGIPython project and then
quickly moved on to provide many different APIs to query
platform specific information.

    architecture(executable='/usr/local/bin/python', bits='', linkage='') :
        Queries the given executable (defaults to the Python interpreter
        binary) for various architecture informations.
        
        Returns a tuple (bits,linkage) which contain information about
        the bit architecture and the linkage format used for the
        executable. Both values are returned as strings.
        
        Values that cannot be determined are returned as given by the
        parameter presets. If bits is given as '', the sizeof(long) is
        used as indicator for the supported pointer size.
        
        The function relies on the system's "file" command to do the
        actual work. This is available on most if not all Unix
        platforms. On some non-Unix platforms and then only if the
        executable points to the Python interpreter defaults from
        _default_architecture are used.

    dist(distname='', version='', id='') :
        Tries to determine the name of the OS distribution name
        
        The function first looks for a distribution release file in
        /etc and then reverts to _dist_try_harder() in case no
        suitable files are found.
        
        Returns a tuple distname,version,id which default to the
        args given as parameters.

    java_ver(release='', vendor='', vminfo=('', '', ''), osinfo=('', '', '')) :
        Version interface for JPython.
        
        Returns a tuple (release,vendor,vminfo,osinfo) with vminfo being
        a tuple (vm_name,vm_release,vm_vendor) and osinfo being a
        tuple (os_name,os_version,os_arch).
        
        Values which cannot be determined are set to the defaults
        given as parameters (which all default to '').

    libc_ver(executable='/usr/local/bin/python', lib='', version='') :
        Tries to determine the libc version against which the
        file executable (defaults to the Python interpreter) is linked.
        
        Returns a tuple of strings (lib,version) which default to the
        given parameters in case the lookup fails.
        
        Note that the function has intimate knowledge of how different
        libc versions add symbols to the executable is probably only
        useable for executables compiled using gcc. 
        
        The file is read and scanned in chunks of chunksize bytes.

    mac_ver(release='', versioninfo=('', '', ''), machine='') :
        Get MacOS version information and return it as tuple (release,
        versioninfo, machine) with versioninfo being a tuple (version,
        dev_stage, non_release_version).
        
        Entries which cannot be determined are set to ''. All tuple
        entries are strings.
        
        Thanks to Mark R. Levinson for mailing documentation links and
        code examples for this function. Documentation for the
        gestalt() API is available online at:
        
           http://www.rgaros.nl/gestalt/

    machine() :
        Returns the machine type, e.g. 'i386'
        
        An empty string is returned if the value cannot be determined.

    node() :
        Returns the computer's network name (may not be fully qualified !)
        
        An empty string is returned if the value cannot be determined.

    platform(aliased=0, terse=0) :
        Returns a single string identifying the underlying platform
        with as much useful information as possible (but no more :).
        
        The output is intended to be human readable rather than
        machine parseable. It may look different on different
        platforms and this is intended.
        
        If "aliased" is true, the function will use aliases for
        various platforms that report system names which differ from
        their common names, e.g. SunOS will be reported as
        Solaris. The system_alias() function is used to implement
        this.
        
        Setting terse to true causes the function to return only the
        absolute minimum information needed to identify the platform.

    processor() :
        Returns the (true) processor name, e.g. 'amdk6'
        
        An empty string is returned if the value cannot be
        determined. Note that many platforms do not provide this
        information or simply return the same value as for machine(),
        e.g.  NetBSD does this.

    release() :
        Returns the system's release, e.g. '2.2.0' or 'NT'
        
        An empty string is returned if the value cannot be determined.

    system() :
        Returns the system/OS name, e.g. 'Linux', 'Windows' or 'Java'.
        
        An empty string is returned if the value cannot be determined.

    system_alias(system, release, version) :
        Returns (system,release,version) aliased to common
        marketing names used for some systems.
        
        It also does some reordering of the information in some cases
        where it would otherwise cause confusion.

    uname() :
        Fairly portable uname interface. Returns a tuple
        of strings (system,node,release,version,machine,processor)
        identifying the underlying platform.
        
        Note that unlike the os.uname function this also returns
        possible processor information as additional tuple entry.
        
        Entries which cannot be determined are set to ''.

    version() :
        Returns the system's release version, e.g. '#3 on degas'
        
        An empty string is returned if the value cannot be determined.

    win32_ver(release='', version='', csd='', ptype='') :
        Get additional version information from the Windows Registry
        and return a tuple (version,csd,ptype) referring to version
        number, CSD level and OS type (multi/single
        processor).
        
        As a hint: ptype returns 'Uniprocessor Free' on single
        processor NT machines and 'Multiprocessor Free' on multi
        processor machines. The 'Free' refers to the OS version being
        free of debugging code. It could also state 'Checked' which
        means the OS version uses debugging code, i.e. code that
        checks arguments, ranges, etc. (Thomas Heller).
        
        Note: this functions only works if Mark Hammond's win32
        package is installed and obviously only runs on Win32
        compatible platforms.
        
        XXX Is there any way to find out the processor type on WinXX ?
        
        XXX Is win32 available on Windows CE ?
        
        Adapted from code posted by Karl Putland to comp.lang.python.

>   My biggest requirement is that it be accompanied by documentation.
> The coolness factor and shared use of hackerly knowledge would
> probably get *me* to put it in, but there are a lot of things about
> which I'll disagree with Guido just to hear his (well-considered)
> thoughts on the matter.  ;)

The module is doc-string documented (see above).
This should server well as basis for the latex docs.

--
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From DavidA at ActiveState.com  Wed Mar  8 19:36:01 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Wed, 8 Mar 2000 10:36:01 -0800
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us>
Message-ID: <NDBBJPNCJLKKIOBLDOMJOEHOCBAA.DavidA@ActiveState.com>

> "it" == platform.py?  Little interest from me personally; I suppose it
> could go in Tools/scripts/...

FWIW, I think it belongs in the standard path. It allows one to do the
equivalent of
if os.platform == '...'  but in a much more useful way.

--david


From mhammond at skippinet.com.au  Wed Mar  8 22:36:12 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu, 9 Mar 2000 08:36:12 +1100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <200003081210.HAA19931@eric.cnri.reston.va.us>
Message-ID: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>

> No concrete plans -- except that I think the registry access is
> supposed to go in.  Haven't seen the code on patches at python.org yet
> though.

FYI, that is off with Trent who is supposed to be testing it on the Alpha.

Re win32pipe - I responded to that post suggesting that we do with os.pipe
and win32pipe what was done with os.path.abspath/win32api - optionally try
to import the win32 specific module and use it.

My only "concern" is that this then becomes more code for Guido to maintain
in the core, even though Guido has expressed a desire to get out of the
installers business.

Assuming the longer term plan is for other people to put together
installation packages, and that these people are free to redistribute
win32api/win32pipe, Im wondering if it is worth bothering with?

Mark.


From trentm at ActiveState.com  Wed Mar  8 15:42:06 2000
From: trentm at ActiveState.com (Trent Mick)
Date: Wed, 8 Mar 2000 14:42:06 -0000
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <38C6919B.EA3EE2E7@lemburg.com>
Message-ID: <NDBBKLNNJCFFMINBECLEMEIGCDAA.trentm@ActiveState.com>

MAL:
>     architecture(executable='/usr/local/bin/python', bits='',
> linkage='') :
>
>         Values that cannot be determined are returned as given by the
>         parameter presets. If bits is given as '', the sizeof(long) is
>         used as indicator for the supported pointer size.

Just a heads up, using sizeof(long) will not work on forthcoming WIN64
(LLP64 data model) to determine the supported pointer size. You would want
to use the 'P' struct format specifier instead, I think (I am speaking in
relative ignorance). However, the docs say that a PyInt is used to store 'P'
specified value, which as a C long, will not hold a pointer on LLP64. Hmmmm.
The keyword perhaps is "forthcoming".

This is the code in question in platform.py:

    # Use the sizeof(long) as default number of bits if nothing
    # else is given as default.
    if not bits:
        import struct
        bits = str(struct.calcsize('l')*8) + 'bit'


Guido:
> > No concrete plans -- except that I think the registry access is
> > supposed to go in.  Haven't seen the code on patches at python.org yet
> > though.
>
Mark Hammond:
> FYI, that is off with Trent who is supposed to be testing it on the Alpha.

My Alpha is in pieces right now! I will get to it soon. I will try it on
Win64 as well, if I can.


Trent


Trent Mick
trentm at activestate.com


From guido at python.org  Thu Mar  9 03:59:51 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 08 Mar 2000 21:59:51 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: Your message of "Thu, 09 Mar 2000 08:36:12 +1100."
             <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au> 
References: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au> 
Message-ID: <200003090259.VAA20928@eric.cnri.reston.va.us>

> My only "concern" is that this then becomes more code for Guido to maintain
> in the core, even though Guido has expressed a desire to get out of the
> installers business.

Theoretically, it shouldn't need much maintenance.  I'm more concerned
that it will have different semantics than on Unix so that in practice
you'd need to know about the platform anyway (apart from the fact that
the installed commands are different, of course).

> Assuming the longer term plan is for other people to put together
> installation packages, and that these people are free to redistribute
> win32api/win32pipe, Im wondering if it is worth bothering with?

So that everybody could use os.popen() regardless of whether they're
on Windows or Unix.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond at skippinet.com.au  Thu Mar  9 04:31:21 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu, 9 Mar 2000 14:31:21 +1100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <200003090259.VAA20928@eric.cnri.reston.va.us>
Message-ID: <ECEPKNMJLHAPFFJHDOJBKEDLCGAA.mhammond@skippinet.com.au>

[Me]
> > Assuming the longer term plan is for other people to put together
> > installation packages, and that these people are free to redistribute
> > win32api/win32pipe, Im wondering if it is worth bothering with?

[Guido]
> So that everybody could use os.popen() regardless of whether they're
> on Windows or Unix.

Sure.  But what I meant was "should win32pipe code move into the core, or
should os.pipe() just auto-detect and redirect to win32pipe if installed?"

I was suggesting that over the longer term, it may be reasonable to assume
that win32pipe _will_ be installed, as everyone who releases installers for
Python should include it :-)  It could also be written in such a way that it
prints a warning message when win32pipe doesnt exist, so in 99% of cases, it
will answer the FAQ before they have had a chance to ask it :-)

It also should be noted that the win32pipe support for popen on Windows
95/98 includes a small, dedicated .exe - this just adds to the maintenance
burden.

But it doesnt worry me at all what happens - I was just trying to save you
work <wink>.  Anyone is free to take win32pipe and move the relevant code
into the core anytime they like, with my and Bill's blessing.  It quite
suits me that people have to download win32all to get this working, so I
doubt I will get around to it any time soon :-)

Mark.


From tim_one at email.msn.com  Thu Mar  9 04:52:58 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 8 Mar 2000 22:52:58 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>
Message-ID: <000401bf897a$f5a7e620$0d2d153f@tim>

I had another take on all this, which I'll now share <wink> since nobody
seems inclined to fold in the Win32 popen:  perhaps os.popen should not be
supported at all under Windows!

The current function is a mystery wrapped in an enigma -- sometimes it
works, sometimes it doesn't, and I've never been able to outguess which one
will obtain (there's more to it than just whether a console window is
attached).  If it's not reliable (it's not), and we can't document the
conditions under which it can be used safely (I can't), Python shouldn't
expose it.

Failing that, the os.popen docs should caution it's "use at your own risk"
under Windows, and that this is directly inherited from MS's popen
implementation.


From tim_one at email.msn.com  Thu Mar  9 10:40:26 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 04:40:26 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <200003081406.JAA20033@eric.cnri.reston.va.us>
Message-ID: <000701bf89ab$80cb8e20$0d2d153f@tim>

[Guido, with some implementation details and nice examples]

Normally I'd eat this up -- today I'm gasping for air trying to stay afloat.
I'll have to settle for sketching the high-level approach I've had in the
back of my mind.  I start with the pile of incestuous stuff Toby/Neil
discovered have no external references.  It consists of dead cycles, and
perhaps also non-cycles reachable only from dead cycles.

1. The "points to" relation on this pile defines a graph G.

2. From any graph G, we can derive a related graph G' consisting of the
maximal strongly connected components (SCCs) of G.  Each (super)node of G'
is an SCC of G, where (super)node A' of G' points to (super)node B' of G'
iff there exists a node A in A' that points to (wrt G) some node B in B'.
It's not obvious, but the SCCs can be found in linear time (via Tarjan's
algorithm, which is simple but subtle; Cyclops.py uses a much dumber
brute-force approach, which is nevertheless perfectly adequate in the
absence of massively large cycles -- premature optimization is the root etc
<0.5 wink>).

3. G' is necessarily a DAG.  For if distinct A' and B' are both reachable
from each other in G', then every pair of A in A' and B in B' are reachable
from each other in G, contradicting that A' and B' are distinct maximal SCCs
(that is, the union of A' and B' is also an SCC).

4. The point to all this:  Every DAG can be topsorted.  Start with the nodes
of G' without predecessors.  There must be at least one, because G' is a
DAG.

5. For every node A' in G' without predecessors (wrt G'), it either does or
does not contain an object with a potentially dangerous finalizer.  If it
does not, let's call it a safe node.  If there are no safe nodes without
predecessors, GC is stuck, and for good reason:  every object in the whole
pile is reachable from an object with a finalizer, which could change the
topology in near-arbitrary ways.  The unsafe nodes without predecessors (and
again, by #4, there must be at least one) are the heart of the problem, and
this scheme identifies them precisely.

6. Else there is a safe node A'.  For each A in A', reclaim it, following
the normal refcount rules (or in an implementation w/o RC, by following a
topsort of "points to" in the original G).  This *may* cause reclamation of
an object X with a finalizer outside of A'.  But doing so cannot cause
resurrection of anything in A' (X is reachable from A' else cleaning up A'
couldn't have affected X, and if anything in A' were also reachable from X,
X would have been in A' to begin with (SCC!), contradicting that A' is
safe).  So the objects in A' can get reclaimed without difficulty.

7. The simplest thing to do now is just stop:  rebuild it from scratch the
next time the scheme is invoked.  If it was *possible* to make progress
without guessing, we did; and if it was impossible, we identified the
precise SCC(s) that stopped us.  Anything beyond that is optimization <0.6
wink>.

Seems the most valuable optimization would be to keep track of whether an
object with a finalizer gets reclaimed in step 6 (so long as that doesn't
happen, the mutations that can occur to the structure of G' seem nicely
behaved enough that it should be possible to loop back to step #5 without
crushing pain).


On to Guido's msg:

[Guido]
> When we have a pile of garbage, we don't know whether it's all
> connected or whether it's lots of little cycles.  So if we find
> [objects with -- I'm going to omit this] finalizers, we have to put
> those on a third list and put everything reachable from them on that
> list as well (the algorithm I described before).

SCC determination gives precise answers to all that.

> What's left on the first list then consists of finalizer-free garbage.
> We dispose of this garbage by clearing dicts and lists.  Hopefully
> this makes the refcount of some of the finalizers go to zero -- those
> are finalized in the normal way.

In Python it's even possible for a finalizer to *install* a __del__ method
that didn't previously exist, into the class of one of the objects on your
"first list".  The scheme above is meant to be bulletproof in the face of
abuses even I can't conceive of <wink>.

More mundanely, clearing an item on your first list can cause a chain of
events that runs a finalizer, which in turn can resurrect one of the objects
on your first list (and so it should *not* get reclaimed).  Without doing
the SCC bit, I don't think you can out-think that (the reasoning above
showed that the finalizer can't resurrect something in the *same* SCC as the
object that started it all, but that argument cannot be extended to objects
in other safe SCCs:  they're vulnerable).

> And now we have to deal with the inevitable: finalizers that are part
> of cycles.  It makes sense to reduce the graph of objects to a graph
> of finalizers only.  Example:
>
>   A <=> b -> C <=> d
>
> A and C have finalizers.  C is part of a cycle (C-d) that contains no
> other finalizers, but C is also reachable from A.  A is part of a
> cycle (A-b) that keeps it alive.  The interesting thing here is that
> if we only look at the finalizers, there are no cycles!

The scheme above derives G':

    A' -> C'

where A' consists of the A<=>b cycle and C' the C<=>d cycle.  That there are
no cycles in G' isn't surprising, it's just the natural consequence of doing
the natural analysis <wink>.  The scheme above refuses to do anything here,
because the only node in G' without a predecessor (namely A') isn't "safe".

> If we reduce the graph to only finalizers (setting aside for now the
> problem of how to do that -- we may need to allocate more memory to
> hold the reduced greaph), we get:
>
>   A -> C

You should really have self-loops on both A and C, right? (because A is
reachable from itself via chasing pointers; ditto for C)

> We can now finalize A (even though its refcount is nonzero!).  And
> that's really all we can do!  A could break its own cycle, thereby
> disposing of itself and b.  It could also break C's cycle, disposing
> of C and d.  It could do nothing.  Or it could resurrect A, thereby
> resurrecting all of A, b, C, and d.
>
> This leads to (there's that weird echo again :-) Boehm's solution:
> Call A's finalizer and leave the rest to the next time the garbage
> collection runs.

This time the echo came back distorted <wink>:

   [Boehm]
   Cycles involving one or more finalizable objects are never finalized.

A<=>b is "a cycle involving one or more finalizable objects", so he won't
touch it.  The scheme at the top doesn't either.  If you handed him your
*derived* graph (but also without the self-loops), he would; me too.  KISS!

> Note that we're now calling finalizers on objects with a non-zero
> refcount.

I don't know why you want to do this.  As the next several paragraphs
confirm, it creates real headaches for the implementation, and I'm unclear
on what it buys in return.  Is "we'll do something by magic for cycles with
no more than one finalizer" a major gain for the user over "we'll do
something by magic for cycles with no finalizer"?  0, 1 and infinity *are*
the only interesting numbers <wink>, but the difference between 0 and 1
*here* doesn't seem to me worth signing up for any pain at all.

> At some point (probably as a result of finalizing A) its
> refcount will go to zero.  We should not finalize it again -- this
> would serve no purpose.

I don't believe BDW (or the scheme at the top) has this problem (simply
because the only way to run finalizer in a cycle under them is for the user
to break the cycle explicitly -- so if an object's finalizer gets run, the
user caused it directly, and so can never claim surprise).

>  Possible solution:
>
>   INCREF(A);
>   A->__del__();
>   if (A->ob_refcnt == 1)
>       A->__class__ = NULL; /* Make a finalizer-less */
>   DECREF(A);
>
> This avoids finalizing twice if the first finalization broke all
> cycles in which A is involved.  But if it doesn't, A is still cyclical
> garbage with a finalizer!  Even if it didn't resurrect itself.
>
> Instead of the code fragment above, we could mark A as "just
> finalized" and when it shows up at the head of the tree (of finalizers
> in cyclical trash) again on the next garbage collection, to discard it
> without calling the finalizer again (because this clearly means that
> it didn't resurrect itself -- at least not for a very long time).

I don't think you need to do any of this -- unless you think you need to do
the thing that created the need for this, which I didn't think you needed to
do either <wink>.

> I would be happier if we could still have a rule that says that a
> finalizer is called only once by magic -- even if we have two forms of
> magic: refcount zero or root of the tree.  Tim: I don't know if you
> object against this rule as a matter of principle (for the sake of
> finalizers that resurrect the object) or if your objection is really
> against the unordered calling of finalizers legitimized by Java's
> rules.  I hope the latter, since I think it that this rule (__del__
> called only once by magic) by itself is easy to understand and easy to
> deal with, and I believe it may be necessary to guarantee progress for
> the garbage collector.

My objections to Java's rules have been repeated enough.

I would have no objection to "__del__ called only once" if it weren't for
that Python currently does something different.  I don't know whether people
rely on that now; if they do, it's a much more dangerous thing to change
than adding a new keyword (the compiler gives automatic 100% coverage of the
latter; but nothing mechanical can help people track down reliance-- whether
deliberate or accidental --on the former).

My best *guess* is that __del__ is used rarely; e.g., there are no more than
40 instances of it in the whole CVS tree, including demo directories; and
they all look benign (at least three have bodies consisting of "pass"!).
The most complicated one I found in my own code is:

    def __del__(self):
        self.break_cycles()

    def break_cycles(self):
        for rule in self.rules:
            if rule is not None:
                rule.cleanse()

But none of this self-sampling is going to comfort some guy in France who
has a megaline of code relying on it.  Good *bet*, though <wink>.

> [and another cogent explanation of why breaking the "leave cycles with
>  finalizers" alone injunction creates headaches]

> ...
> Even if someone once found a good use for resurrecting inside __del__,
> against all recommendations, I don't mind breaking their code, if it's
> for a good cause.  The Java rules aren't a good cause.  But top-sorted
> finalizer calls seem a worthy cause.

They do to me too, except that I say even a cycle involving but a single
object (w/ finalizer) looping on itself is the user's problem.

> So now we get to discuss what to do with multi-finalizer cycles, like:
>
>   A <=> b <=> C
>
> Here the reduced graph is:
>
>   A <=> C

The SCC reduction is simply to

    A

and, right, the scheme at the top punts.

> [more the on once-only rule chopped]
> ...
> Anyway, once-only rule aside, we still need a protocol to deal with
> cyclical dependencies between finalizers.  The __cleanup__ approach is
> one solution, but it also has a problem: we have a set of finalizers.
> Whose __cleanup__ do we call?  Any?  All?  Suggestions?

This is why a variant of guardians were more appealing to me at first:  I
could ask a guardian for the entire SCC, so I get the *context* of the
problem as well as the final microscopic symptom.

I see Marc-Andre already declined to get sucked into the magical part of
this <wink>.  Greg should speak for his scheme, and I haven't made time to
understand it fully; my best guess is to call x.__cleanup__ for every object
in the SCC (but there's no clear way to decide which order to call them in,
and unless they're more restricted than __del__ methods they can create all
the same problems __del__ methods can!).

> Note that I'd like some implementation freedom: I may not want to
> bother with the graph reduction algorithm at first (which seems very
> hairy) so I'd like to have the right to use the __cleanup__ API
> as soon as I see finalizers in cyclical trash.  I don't mind disposing
> of finalizer-free cycles first, but once I have more than one
> finalizer left in the remaining cycles, I'd like the right not to
> reduce the graph for topsort reasons -- that algorithm seems hard.

I hate to be realistic <wink>, but modern GC algorithms are among the
hardest you'll ever see in any field; even the outer limits of what we've
talked about here is baby stuff.  Sun's Java group (the one in Chelmsford,
MA, down the road from me) had a group of 4+ people (incl. the venerable Mr.
Steele) working full-time for over a year on the last iteration of Java's
GC.  The simpler BDW is a megabyte of code spread over 100+ files.  Etc --
state of the art GC can be crushingly hard.

So I've got nothing against taking shortcuts at first -- there's actually no
realistic alternative.  I think we're overlooking the obvious one, though:
if any finalizer appears in any trash cycle, tough luck.  Python 3000 --
which may be a spelling of 1.7 <wink>, but doesn't *need* to be a spelling
of 1.6.

> So we're back to the __cleanup__ design.  Strawman proposal: for all
> finalizers in a trash cycle, call their __cleanup__ method, in
> arbitrary order.  After all __cleanup__ calls are done, if the objects
> haven't all disposed of themselves, they are all garbage-collected
> without calling __del__.  (This seems to require another garbage
> colelction cycle -- so perhaps there should also be a once-only rule
> for __cleanup__?)
>
> Separate question: what if there is no __cleanup__?  This should
> probably be reported: "You have cycles with finalizers, buddy!  What
> do you want to do about them?"  This same warning could be given when
> there is a __cleanup__ but it doesn't break all cycles.

If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly
1" isn't special to me), I will consider it to be a bug.  So I want a way to
get it back from gc, so I can see what the heck it is, so I can fix my code
(or harass whoever did it to me).  __cleanup__ suffices for that, so the
very act of calling it is all I'm really after ("Python invoked __cleanup__
== Tim has a bug").

But after I outgrow that <wink>, I'll certainly want the option to get
another kind of complaint if __cleanup__ doesn't break the cycles, and after
*that* I couldn't care less.  I've given you many gracious invitations to
say that you don't mind leaking in the face of a buggy program <wink>, but
as you've declined so far, I take it that never hearing another gripe about
leaking is a Primary Life Goal.  So collection without calling __del__ is
fine -- but so is collection with calling it!  If we're going to (at least
implicitly) approve of this stuff, it's probably better *to* call __del__,
if for no other reason than to catch your case of some poor innocent object
caught in a cycle not of its making that expects its __del__ to abort
starting World War III if it becomes unreachable <wink>.

whatever-we-don't-call-a-mistake-is-a-feature-ly y'rs  - tim


From fdrake at acm.org  Thu Mar  9 15:25:35 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 9 Mar 2000 09:25:35 -0500 (EST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <000401bf897a$f5a7e620$0d2d153f@tim>
References: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>
	<000401bf897a$f5a7e620$0d2d153f@tim>
Message-ID: <14535.46175.991970.135642@weyr.cnri.reston.va.us>

Tim Peters writes:
 > Failing that, the os.popen docs should caution it's "use at your own risk"
 > under Windows, and that this is directly inherited from MS's popen
 > implementation.

Tim (& others),
  Would this additional text be sufficient for the os.popen()
documentation?

	\strong{Note:} This function behaves unreliably under Windows
        due to the native implementation of \cfunction{popen()}.

  If someone cares to explain what's weird about it, that might be
appropriate as well, but I've never used this under Windows.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From mal at lemburg.com  Thu Mar  9 15:42:37 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 09 Mar 2000 15:42:37 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <NDBBKLNNJCFFMINBECLEMEIGCDAA.trentm@ActiveState.com>
Message-ID: <38C7B85D.E6090670@lemburg.com>

Trent Mick wrote:
> 
> MAL:
> >     architecture(executable='/usr/local/bin/python', bits='',
> > linkage='') :
> >
> >         Values that cannot be determined are returned as given by the
> >         parameter presets. If bits is given as '', the sizeof(long) is
> >         used as indicator for the supported pointer size.
> 
> Just a heads up, using sizeof(long) will not work on forthcoming WIN64
> (LLP64 data model) to determine the supported pointer size. You would want
> to use the 'P' struct format specifier instead, I think (I am speaking in
> relative ignorance). However, the docs say that a PyInt is used to store 'P'
> specified value, which as a C long, will not hold a pointer on LLP64. Hmmmm.
> The keyword perhaps is "forthcoming".
> 
> This is the code in question in platform.py:
> 
>     # Use the sizeof(long) as default number of bits if nothing
>     # else is given as default.
>     if not bits:
>         import struct
>         bits = str(struct.calcsize('l')*8) + 'bit'

Python < 1.5.2 doesn't support 'P', but anyway, I'll change
those lines according to your suggestion.
 
Does struct.calcsize('P')*8 return 64 on 64bit-platforms as
it should (probably ;) ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jim at interet.com  Thu Mar  9 16:45:54 2000
From: jim at interet.com (James C. Ahlstrom)
Date: Thu, 09 Mar 2000 10:45:54 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <000401bf897a$f5a7e620$0d2d153f@tim>
Message-ID: <38C7C732.D9086C34@interet.com>

Tim Peters wrote:
> 
> I had another take on all this, which I'll now share <wink> since nobody
> seems inclined to fold in the Win32 popen:  perhaps os.popen should not be
> supported at all under Windows!
> 
> The current function is a mystery wrapped in an enigma -- sometimes it
> works, sometimes it doesn't, and I've never been able to outguess which one
> will obtain (there's more to it than just whether a console window is
> attached).  If it's not reliable (it's not), and we can't document the
> conditions under which it can be used safely (I can't), Python shouldn't
> expose it.

OK, I admit I don't understand this either, but here goes...

It looks like Python popen() uses the Windows _popen() function.
The _popen() docs say that it creates a spawned copy of the command
processor (shell) with the given string argument.  It further states
that
it does NOT work in a Windows program and ONLY works when called from a
Windows Console program.


From tim_one at email.msn.com  Thu Mar  9 18:14:17 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 12:14:17 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <38C7C732.D9086C34@interet.com>
Message-ID: <000401bf89ea$e6e54180$79a0143f@tim>

[James C. Ahlstrom]
> OK, I admit I don't understand this either, but here goes...
>
> It looks like Python popen() uses the Windows _popen() function.
> The _popen() docs say ...

Screw the docs.  Pretend you're a newbie and *try* it.  Here:

import os
p = os.popen("dir")
while 1:
    line = p.readline()
    if not line:
        break
    print line

Type that in by hand, or stick it in a file & run it from a cmdline
python.exe (which is a Windows console program).  Under Win95 the process
freezes solid, and even trying to close the DOS box doesn't work.  You have
to bring up the task manager and kill it that way.  I once traced this under
the debugger -- it's hung inside an MS DLL.  "dir" is not entirely arbitrary
here:  for *some* cmds it works fine, for others not.  The set of which work
appears to vary across Windows flavors.  Sometimes you can worm around it by
wrapping "a bad" cmd in a .bat file, and popen'ing the latter instead; but
sometimes not.

After hours of poke-&-hope (in the past), as I said, I've never been able to
predict which cases will work.

> ...
> It further states that it does NOT work in a Windows program and ONLY
> works when called from a Windows Console program.

The latter is a necessary condition but not sufficient; don't know what *is*
sufficient, and AFAIK nobody else does either.

> From this I assume that popen() works from python.exe (it is a Console
> app) if the command can be directly executed by the shell (like "dir"),

See above for a counterexample to both <wink>.  I actually have much better
luck with cmds command.com *doesn't* know anything about.  So this appears
to vary by shell too.

> ...
> If there is something wrong with _popen() then the way to fix it is
> to avoid using it and create the pipes directly.

libc pipes ares as flaky as libc popen under Windows, Jim!  MarkH has the
only versions of these things that come close to working under Windows (he
wraps the native Win32 spellings of these things; MS's libc entry points
(which Python uses now) are much worse).

> ...
> Of course, the strength of Python is portable code.  popen() should be
> fixed the right way.

pipes too, but users get baffled by popen much more often simply because
they try popen much more often.

there's-no-question-about-whether-it-works-right-it-doesn't-ly y'rs  - tim


From gstein at lyra.org  Thu Mar  9 18:47:23 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 09:47:23 -0800 (PST)
Subject: [Python-Dev] platform.py (was: Fixing os.popen on Win32 => is the win32pipe stuff
 going to be adopted?)
In-Reply-To: <38C7B85D.E6090670@lemburg.com>
Message-ID: <Pine.LNX.4.10.10003090946420.18225-100000@nebula.lyra.org>

On Thu, 9 Mar 2000, M.-A. Lemburg wrote:
>...
> Python < 1.5.2 doesn't support 'P', but anyway, I'll change
> those lines according to your suggestion.
>  
> Does struct.calcsize('P')*8 return 64 on 64bit-platforms as
> it should (probably ;) ?

Yes. It returns sizeof(void *).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Thu Mar  9 15:55:36 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 09 Mar 2000 15:55:36 +0100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>
		<000401bf897a$f5a7e620$0d2d153f@tim> <14535.46175.991970.135642@weyr.cnri.reston.va.us>
Message-ID: <38C7BB68.9FAE3BE9@lemburg.com>

"Fred L. Drake, Jr." wrote:
> 
> Tim Peters writes:
>  > Failing that, the os.popen docs should caution it's "use at your own risk"
>  > under Windows, and that this is directly inherited from MS's popen
>  > implementation.
> 
> Tim (& others),
>   Would this additional text be sufficient for the os.popen()
> documentation?
> 
>         \strong{Note:} This function behaves unreliably under Windows
>         due to the native implementation of \cfunction{popen()}.
> 
>   If someone cares to explain what's weird about it, that might be
> appropriate as well, but I've never used this under Windows.

Ehm, hasn't anyone looked at the code I posted yesterday ?
It goes a long way to deal with these inconsistencies... even
though its not perfect (yet ;).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake at acm.org  Thu Mar  9 19:52:40 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 9 Mar 2000 13:52:40 -0500 (EST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
In-Reply-To: <38C7BB68.9FAE3BE9@lemburg.com>
References: <ECEPKNMJLHAPFFJHDOJBAEDECGAA.mhammond@skippinet.com.au>
	<000401bf897a$f5a7e620$0d2d153f@tim>
	<14535.46175.991970.135642@weyr.cnri.reston.va.us>
	<38C7BB68.9FAE3BE9@lemburg.com>
Message-ID: <14535.62200.158087.102380@weyr.cnri.reston.va.us>

M.-A. Lemburg writes:
 > Ehm, hasn't anyone looked at the code I posted yesterday ?
 > It goes a long way to deal with these inconsistencies... even
 > though its not perfect (yet ;).

  I probably sent that before I'd read everything, and I'm not the one 
to change the popen() implementation.
  At this point, I'm waiting for someone who understands the details
to decide what happens (if anything) to the implementation before I
check in any changes to the docs.
  My inclination is to fix popen() on Windows to do the right thing,
but I don't know enough about pipes & process management on Windows to 
get into that fray.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From nascheme at enme.ucalgary.ca  Thu Mar  9 20:37:31 2000
From: nascheme at enme.ucalgary.ca (nascheme at enme.ucalgary.ca)
Date: Thu, 9 Mar 2000 12:37:31 -0700
Subject: [Python-Dev] finalization again
Message-ID: <20000309123731.A3664@acs.ucalgary.ca>

[Tim, explaining something I was thinking about more clearly than
I ever could]

>It's not obvious, but the SCCs can be found in linear time (via Tarjan's
>algorithm, which is simple but subtle;

Wow, it seems like it should be more expensive than that.  What
are the space requirements?  Also, does the simple algorithm you
used in Cyclops have a name?

>If there are no safe nodes without predecessors, GC is stuck,
>and for good reason: every object in the whole pile is reachable
>from an object with a finalizer, which could change the topology
>in near-arbitrary ways. The unsafe nodes without predecessors
>(and again, by #4, there must be at least one) are the heart of
>the problem, and this scheme identifies them precisely.

Exactly.  What is our policy on these unsafe nodes?  Guido seems
to feel that it is okay for the programmer to create them and
Python should have a way of collecting them.  Tim seems to feel
that the programmer should not create them in the first place.  I
agree with Tim.

If topological finalization is used, it is possible for the
programmer to design their classes so that this problem does not
happen.  This is explained on Hans Boehm's finalization web page.

If the programmer can or does not redesign their classes I don't
think it is unreasonable to leak memory.  We can link these
cycles to a global list of garbage or print a debugging message.
This is a large improvement over the current situation (ie.
leaking memory with no debugging even for cycles without
finalizers).


    Neil

-- 
"If you're a great programmer, you make all the routines depend on each
other, so little mistakes can really hurt you." -- Bill Gates, ca. 1985.


From gstein at lyra.org  Thu Mar  9 20:50:29 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 11:50:29 -0800 (PST)
Subject: [Python-Dev] finalization again
In-Reply-To: <20000309123731.A3664@acs.ucalgary.ca>
Message-ID: <Pine.LNX.4.10.10003091148180.18817-100000@nebula.lyra.org>

On Thu, 9 Mar 2000 nascheme at enme.ucalgary.ca wrote:
>...
> If the programmer can or does not redesign their classes I don't
> think it is unreasonable to leak memory.  We can link these
> cycles to a global list of garbage or print a debugging message.
> This is a large improvement over the current situation (ie.
> leaking memory with no debugging even for cycles without
> finalizers).

I think we throw an error (as a subclass of MemoryError).

As an alternative, is it possible to move those cycles to the garbage list
and then never look at them again? That would speed up future collection
processing.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From guido at python.org  Thu Mar  9 20:51:46 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 09 Mar 2000 14:51:46 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Thu, 09 Mar 2000 11:50:29 PST."
             <Pine.LNX.4.10.10003091148180.18817-100000@nebula.lyra.org> 
References: <Pine.LNX.4.10.10003091148180.18817-100000@nebula.lyra.org> 
Message-ID: <200003091951.OAA26184@eric.cnri.reston.va.us>

> As an alternative, is it possible to move those cycles to the garbage list
> and then never look at them again? That would speed up future collection
> processing.

With the current approach, that's almost automatic :-)

I'd rather reclaim the memory too.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gmcm at hypernet.com  Thu Mar  9 20:54:16 2000
From: gmcm at hypernet.com (Gordon McMillan)
Date: Thu, 9 Mar 2000 14:54:16 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <000401bf89ea$e6e54180$79a0143f@tim>
References: <38C7C732.D9086C34@interet.com>
Message-ID: <1259490837-400325@hypernet.com>

[Tim re popen on Windows]

...
> the debugger -- it's hung inside an MS DLL.  "dir" is not entirely arbitrary
> here:  for *some* cmds it works fine, for others not.  The set of which work
> appears to vary across Windows flavors.  Sometimes you can worm around it by
> wrapping "a bad" cmd in a .bat file, and popen'ing the latter instead; but
> sometimes not.

It doesn't work for commands builtin to whatever "shell" you're 
using. That's different between cmd and command, and the 
various flavors, versions and extensions thereof.

FWIW, I gave up a long time ago. I use redirection and a 
tempfile. The few times I've wanted "interactive" control, I've 
used Win32Process, dup'ed, inherited handles... the whole 9 
yards. Why? Look at all the questions about popen and child 
processes in general, on platforms where it *works*, (if it 
weren't for Donn Cave, nobody'd get it to work anywhere 
<wink>).
 
To reiterate Tim's point: *none* of the c runtime routines for 
process control on Windows are adequate (beyond os.system 
and living with a DOS box popping up). The raw Win32 
CreateProcess does everything you could possibly want, but 
takes a week or more to understand, (if this arg is a that, then 
that arg is a whatsit, and the next is limited to the values X 
and Z unless...).

your-brain-on-Windows-ly y'rs

- Gordon


From guido at python.org  Thu Mar  9 20:55:23 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 09 Mar 2000 14:55:23 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Thu, 09 Mar 2000 04:40:26 EST."
             <000701bf89ab$80cb8e20$0d2d153f@tim> 
References: <000701bf89ab$80cb8e20$0d2d153f@tim> 
Message-ID: <200003091955.OAA26217@eric.cnri.reston.va.us>

[Tim describes a more formal approach based on maximal strongly
connected components (SCCs).]

I like the SCC approach -- it's what I was struggling to invent but
came short of discovering.

However:

[me]
> > What's left on the first list then consists of finalizer-free garbage.
> > We dispose of this garbage by clearing dicts and lists.  Hopefully
> > this makes the refcount of some of the finalizers go to zero -- those
> > are finalized in the normal way.

[Tim]
> In Python it's even possible for a finalizer to *install* a __del__ method
> that didn't previously exist, into the class of one of the objects on your
> "first list".  The scheme above is meant to be bulletproof in the face of
> abuses even I can't conceive of <wink>.

Are you *sure* your scheme deals with this?  Let's look at an example.
(Again, lowercase nodes have no finalizers.)  Take G:

  a <=> b -> C

This is G' (a and b are strongly connected):

  a' -> C'

C is not reachable from any root node.  We decide to clear a and b.
Let's suppose we happen to clear b first.  This removes the last
reference to C, C's finalizer runs, and it installs a finalizer on
a.__class__.  So now a' has turned into A', and we're halfway
committing a crime we said we would never commit (touching cyclical
trash with finalizers).

I propose to disregard this absurd possibility, except to the extent
that Python shouldn't crash -- but we make no guarantees to the user.

> More mundanely, clearing an item on your first list can cause a chain of
> events that runs a finalizer, which in turn can resurrect one of the objects
> on your first list (and so it should *not* get reclaimed).  Without doing
> the SCC bit, I don't think you can out-think that (the reasoning above
> showed that the finalizer can't resurrect something in the *same* SCC as the
> object that started it all, but that argument cannot be extended to objects
> in other safe SCCs:  they're vulnerable).

I don't think so.  While my poor wording ("finalizer-free garbage")
didn't make this clear, my references to earlier algorithms were
intended to imply that this is garbage that consists of truly
unreachable objects.  I have three lists: let's call them T(rash),
R(oot-reachable), and F(inalizer-reachable).  The Schemenauer
c.s. algorithm moves all reachable nodes to R.  I then propose to move
all finalizers to F, and to run another pass of Schemenauer c.s. to
also move all finalizer-reachable (but not root-reachable) nodes to F.

I truly believe that (barring the absurdity of installing a new
__del__) the objects on T at this point cannot be resurrected by a
finalizer that runs, since they aren't reachable from any finalizers:
by virtue of Schemenauer c.s. (which computes a reachability closure
given some roots) anything reachable from a finalizer is on F by now
(if it isn't on R -- again, nothing on T is reachable from R, because
R is calculated a closure).

So, unless there's still a bug in my thinking here, I think that as
long as we only want to clear SCCs with 0 finalizers, T is exactly the
set of nodes we're looking for.

> This time the echo came back distorted <wink>:
> 
>    [Boehm]
>    Cycles involving one or more finalizable objects are never finalized.
> 
> A<=>b is "a cycle involving one or more finalizable objects", so he won't
> touch it.  The scheme at the top doesn't either.  If you handed him your
> *derived* graph (but also without the self-loops), he would; me too.  KISS!
> 
> > Note that we're now calling finalizers on objects with a non-zero
> > refcount.
> 
> I don't know why you want to do this.  As the next several paragraphs
> confirm, it creates real headaches for the implementation, and I'm unclear
> on what it buys in return.  Is "we'll do something by magic for cycles with
> no more than one finalizer" a major gain for the user over "we'll do
> something by magic for cycles with no finalizer"?  0, 1 and infinity *are*
> the only interesting numbers <wink>, but the difference between 0 and 1
> *here* doesn't seem to me worth signing up for any pain at all.

I do have a reason: if a maximal SCC has only one finalizer, there can
be no question about the ordering between finalizer calls.  And isn't
the whole point of this discussion to have predictable ordering of
finalizer calls in the light of trash recycling?

> I would have no objection to "__del__ called only once" if it weren't for
> that Python currently does something different.  I don't know whether people
> rely on that now; if they do, it's a much more dangerous thing to change
> than adding a new keyword (the compiler gives automatic 100% coverage of the
> latter; but nothing mechanical can help people track down reliance-- whether
> deliberate or accidental --on the former).
[...]
> But none of this self-sampling is going to comfort some guy in France who
> has a megaline of code relying on it.  Good *bet*, though <wink>.

OK -- so your objection is purely about backwards compatibility.
Apart from that, I strongly feel that the only-once rule is a good
one.  And I don't think that the compatibility issue weighs very
strongly here (given all the other problems that typically exist with
__del__).

> I see Marc-Andre already declined to get sucked into the magical part of
> this <wink>.  Greg should speak for his scheme, and I haven't made time to
> understand it fully; my best guess is to call x.__cleanup__ for every object
> in the SCC (but there's no clear way to decide which order to call them in,
> and unless they're more restricted than __del__ methods they can create all
> the same problems __del__ methods can!).

Yes, but at least since we're defining a new API (in a reserved
portion of the method namespace) there are no previous assumptions to
battle.

> > Note that I'd like some implementation freedom: I may not want to
> > bother with the graph reduction algorithm at first (which seems very
> > hairy) so I'd like to have the right to use the __cleanup__ API
> > as soon as I see finalizers in cyclical trash.  I don't mind disposing
> > of finalizer-free cycles first, but once I have more than one
> > finalizer left in the remaining cycles, I'd like the right not to
> > reduce the graph for topsort reasons -- that algorithm seems hard.
> 
> I hate to be realistic <wink>, but modern GC algorithms are among the
> hardest you'll ever see in any field; even the outer limits of what we've
> talked about here is baby stuff.  Sun's Java group (the one in Chelmsford,
> MA, down the road from me) had a group of 4+ people (incl. the venerable Mr.
> Steele) working full-time for over a year on the last iteration of Java's
> GC.  The simpler BDW is a megabyte of code spread over 100+ files.  Etc --
> state of the art GC can be crushingly hard.
> 
> So I've got nothing against taking shortcuts at first -- there's actually no
> realistic alternative.  I think we're overlooking the obvious one, though:
> if any finalizer appears in any trash cycle, tough luck.  Python 3000 --
> which may be a spelling of 1.7 <wink>, but doesn't *need* to be a spelling
> of 1.6.

Kind of sad though -- finally knowing about cycles and then not being
able to do anything about them.

> > So we're back to the __cleanup__ design.  Strawman proposal: for all
> > finalizers in a trash cycle, call their __cleanup__ method, in
> > arbitrary order.  After all __cleanup__ calls are done, if the objects
> > haven't all disposed of themselves, they are all garbage-collected
> > without calling __del__.  (This seems to require another garbage
> > colelction cycle -- so perhaps there should also be a once-only rule
> > for __cleanup__?)
> >
> > Separate question: what if there is no __cleanup__?  This should
> > probably be reported: "You have cycles with finalizers, buddy!  What
> > do you want to do about them?"  This same warning could be given when
> > there is a __cleanup__ but it doesn't break all cycles.
> 
> If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly
> 1" isn't special to me), I will consider it to be a bug.  So I want a way to
> get it back from gc, so I can see what the heck it is, so I can fix my code
> (or harass whoever did it to me).  __cleanup__ suffices for that, so the
> very act of calling it is all I'm really after ("Python invoked __cleanup__
> == Tim has a bug").
> 
> But after I outgrow that <wink>, I'll certainly want the option to get
> another kind of complaint if __cleanup__ doesn't break the cycles, and after
> *that* I couldn't care less.  I've given you many gracious invitations to
> say that you don't mind leaking in the face of a buggy program <wink>, but
> as you've declined so far, I take it that never hearing another gripe about
> leaking is a Primary Life Goal.  So collection without calling __del__ is
> fine -- but so is collection with calling it!  If we're going to (at least
> implicitly) approve of this stuff, it's probably better *to* call __del__,
> if for no other reason than to catch your case of some poor innocent object
> caught in a cycle not of its making that expects its __del__ to abort
> starting World War III if it becomes unreachable <wink>.

I suppose we can print some obnoxious message to stderr like

"""Your program has created cyclical trash involving one or more
objects with a __del__ method; calling their __cleanup__ method didn't
resolve the cycle(s).  I'm going to call the __del__ method(s) but I
can't guarantee that they will be called in a meaningful order,
because of the cyclical dependencies."""

But I'd still like to reclaim the memory.  If this is some
long-running server process that is executing arbitrary Python
commands sent to it by clients, it's not nice to leak, period.
(Because of this, I will also need to trace functions, methods and
modules -- these create massive cycles that currently require painful
cleanup.  Of course I also need to track down all the roots
then... :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gstein at lyra.org  Thu Mar  9 20:59:48 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 11:59:48 -0800 (PST)
Subject: [Python-Dev] finalization again
In-Reply-To: <200003091951.OAA26184@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003091157560.18817-100000@nebula.lyra.org>

On Thu, 9 Mar 2000, Guido van Rossum wrote:
> > As an alternative, is it possible to move those cycles to the garbage list
> > and then never look at them again? That would speed up future collection
> > processing.
> 
> With the current approach, that's almost automatic :-)
> 
> I'd rather reclaim the memory too.

Well, yah. I would too :-)  I'm at ApacheCon right now, so haven't read
the thread in detail, but it seems that people saw my algorithm as a bit
too complex. Bah. IMO, it's a pretty straightforward way for the
interpreter to get cycles cleaned up. (whether the objects in the cycles
are lists/dicts, class instances, or extension types!)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Thu Mar  9 21:18:06 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 12:18:06 -0800 (PST)
Subject: [Python-Dev] finalization again
In-Reply-To: <200003091955.OAA26217@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003091205510.18817-100000@nebula.lyra.org>

On Thu, 9 Mar 2000, Guido van Rossum wrote:
>...
> I don't think so.  While my poor wording ("finalizer-free garbage")
> didn't make this clear, my references to earlier algorithms were
> intended to imply that this is garbage that consists of truly
> unreachable objects.  I have three lists: let's call them T(rash),
> R(oot-reachable), and F(inalizer-reachable).  The Schemenauer
> c.s. algorithm moves all reachable nodes to R.  I then propose to move
> all finalizers to F, and to run another pass of Schemenauer c.s. to
> also move all finalizer-reachable (but not root-reachable) nodes to F.
>...
> [Tim Peters]
> > I see Marc-Andre already declined to get sucked into the magical part of
> > this <wink>.  Greg should speak for his scheme, and I haven't made time to
> > understand it fully; my best guess is to call x.__cleanup__ for every object
> > in the SCC (but there's no clear way to decide which order to call them in,
> > and unless they're more restricted than __del__ methods they can create all
> > the same problems __del__ methods can!).

My scheme was to identify objects in F, but only those with a finalizer
(not the closure). Then call __cleanup__ on each of them, in arbitrary
order. If any are left after the sequence of __cleanup__ calls, then I
call it an error.

[ note that my proposal defined checking for a finalizer by calling
  tp_clean(TPCLEAN_CARE_CHECK); this accounts for class instances and for
  extension types with "heavy" processing in tp_dealloc ]

The third step was to use tp_clean to try and clean all other objects in a
safe fashion. Specifically: the objects have no finalizers, so there is no
special care needed in finalizing, so this third step should nuke
references that are stored in the object. This means object pointers are
still valid (we haven't dealloc'd), but the insides have been emptied. If
the third step does not remove all cycles, then one of the PyType objects
did not remove all references during the tp_clean call.

>...
> > If I *ever* have a trash cycle with a finalizer in my code (> 0 -- "exactly
> > 1" isn't special to me), I will consider it to be a bug.  So I want a way to
> > get it back from gc, so I can see what the heck it is, so I can fix my code
> > (or harass whoever did it to me).  __cleanup__ suffices for that, so the
> > very act of calling it is all I'm really after ("Python invoked __cleanup__
> > == Tim has a bug").

Agreed.

>...
> I suppose we can print some obnoxious message to stderr like

A valid alternative to raising an exception, but it falls into the whole
trap of "where does stderr go?"

>...
> But I'd still like to reclaim the memory.  If this is some
> long-running server process that is executing arbitrary Python
> commands sent to it by clients, it's not nice to leak, period.

If an exception is raised, the top-level server loop can catch it, log the
error, and keep going. But yes: it will leak.

> (Because of this, I will also need to trace functions, methods and
> modules -- these create massive cycles that currently require painful
> cleanup.  Of course I also need to track down all the roots
> then... :-)

Yes. It would be nice to have these participate in the "cleanup protocol"
that I've described. It should help a lot at Python finalization time,
effectively moving some special casing from import.c to the objects
themselves.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From jim at interet.com  Thu Mar  9 21:20:23 2000
From: jim at interet.com (James C. Ahlstrom)
Date: Thu, 09 Mar 2000 15:20:23 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <000401bf89ea$e6e54180$79a0143f@tim>
Message-ID: <38C80787.7791A1A6@interet.com>

Tim Peters wrote:
> Screw the docs.  Pretend you're a newbie and *try* it.

I did try it.

> 
> import os
> p = os.popen("dir")
> while 1:
>     line = p.readline()
>     if not line:
>         break
>     print line
> 
> Type that in by hand, or stick it in a file & run it from a cmdline
> python.exe (which is a Windows console program).  Under Win95 the process
> freezes solid, and even trying to close the DOS box doesn't work.  You have
> to bring up the task manager and kill it that way.  I once traced this under

Point on the curve:  This program works perfectly on my
machine running NT.

> libc pipes ares as flaky as libc popen under Windows, Jim!  MarkH has the
> only versions of these things that come close to working under Windows (he
> wraps the native Win32 spellings of these things; MS's libc entry points
> (which Python uses now) are much worse).

I believe you when you say popen() is flakey.  It is a little
harder to believe it is not possible to write a _popen()
replacement using pipes which works.

Of course I wanted you to do it instead of me!  Well, if
I get any time before 1.6 comes out...

JimA


From gstein at lyra.org  Thu Mar  9 21:31:38 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 9 Mar 2000 12:31:38 -0800 (PST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe
 stuff  going to be adopted?
In-Reply-To: <38C80787.7791A1A6@interet.com>
Message-ID: <Pine.LNX.4.10.10003091230510.18817-100000@nebula.lyra.org>

On Thu, 9 Mar 2000, James C. Ahlstrom wrote:
>...
> > libc pipes ares as flaky as libc popen under Windows, Jim!  MarkH has the
> > only versions of these things that come close to working under Windows (he
> > wraps the native Win32 spellings of these things; MS's libc entry points
> > (which Python uses now) are much worse).
> 
> I believe you when you say popen() is flakey.  It is a little
> harder to believe it is not possible to write a _popen()
> replacement using pipes which works.
> 
> Of course I wanted you to do it instead of me!  Well, if
> I get any time before 1.6 comes out...

It *has* been done. Bill Tutt did it a long time ago. That's what
win32pipe is all about.

-g

-- 
Greg Stein, http://www.lyra.org/


From jim at interet.com  Thu Mar  9 22:04:59 2000
From: jim at interet.com (James C. Ahlstrom)
Date: Thu, 09 Mar 2000 16:04:59 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipestuff  
 going to be adopted?
References: <Pine.LNX.4.10.10003091230510.18817-100000@nebula.lyra.org>
Message-ID: <38C811FB.B6096FA4@interet.com>

Greg Stein wrote:
> 
> On Thu, 9 Mar 2000, James C. Ahlstrom wrote:
> > Of course I wanted you to do it instead of me!  Well, if
> > I get any time before 1.6 comes out...
> 
> It *has* been done. Bill Tutt did it a long time ago. That's what
> win32pipe is all about.

Thanks for the heads up!

Unfortunately, win32pipe is not in the core, and probably
covers more ground than just popen() and so might be a
maintenance problem.  And popen() is not written in it anyway.
So we are Not There Yet (TM).  Which I guess was Tim's
original point.

JimA


From mhammond at skippinet.com.au  Thu Mar  9 22:36:14 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri, 10 Mar 2000 08:36:14 +1100
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <38C80787.7791A1A6@interet.com>
Message-ID: <ECEPKNMJLHAPFFJHDOJBEEEHCGAA.mhammond@skippinet.com.au>

> Point on the curve:  This program works perfectly on my
> machine running NT.

And running from Python.exe.  I bet you didnt try it from a GUI.

The situation is worse WRT Windows 95.  MS has a knowledge base article
describing the bug, and telling you how to work around it by using a
decicated .EXE.

So, out of the box, popen works only on a NT from a console - pretty sorry
state of affairs :-(

> I believe you when you say popen() is flakey.  It is a little
> harder to believe it is not possible to write a _popen()
> replacement using pipes which works.

Which is what I believe win32pipe.popen* are.

Mark.


From guido at python.org  Fri Mar 10 02:13:51 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 09 Mar 2000 20:13:51 -0500
Subject: [Python-Dev] writelines() not thread-safe
Message-ID: <200003100113.UAA27337@eric.cnri.reston.va.us>

Christian Tismer just did an exhaustive search for thread unsafe use
of Python operations, and found two weaknesses.  One is
posix.listdir(), which I had already found; the other is
file.writelines().  Here's a program that demonstrates the bug;
basically, while writelines is walking down the list, another thread
could truncate the list, causing PyList_GetItem() to fail or a string
object to be deallocated while writelines is using it.  On my SOlaris
7 system it typically crashes in the first or second iteration.

It's easy to fix: just don't use release the interpreter lock (get rid
of Py_BEGIN_ALLOW_THREADS c.s.).  This would however prevent other
threads from doing any work while this thread may be blocked for I/O.

An alternative solution is to put Py_BEGIN_ALLOW_THREADS and
Py_END_ALLOW_THREADS just around the fwrite() call.  This is safe, but
would require a lot of lock operations and would probably slow things
down too much.

Ideas?

--Guido van Rossum (home page: http://www.python.org/~guido/)
import os
import sys
import thread
import random
import time
import tempfile

def good_guy(fp, list):
    t0 = time.time()
    fp.seek(0)
    fp.writelines(list)
    t1 = time.time()
    print fp.tell(), "bytes written"
    return t1-t0

def bad_guy(dt, list):
    time.sleep(random.random() * dt)
    del list[:]

def main():
    infn = "/usr/dict/words"
    if sys.argv[1:]:
        infn = sys.argv[1]
    print "reading %s..." % infn
    fp = open(infn)
    list = fp.readlines()
    fp.close()
    print "read %d lines" % len(list)
    tfn = tempfile.mktemp()
    fp = None
    try:
        fp = open(tfn, "w")
        print "calibrating..."
        dt = 0.0
        n = 3
        for i in range(n):
            dt = dt + good_guy(fp, list)
        dt = dt / n # average time it took to write the list to disk
        print "dt =", round(dt, 3)
        i = 0
        while 1:
            i = i+1
            print "test", i
            copy = map(lambda x: x[1:], list)
            thread.start_new_thread(bad_guy, (dt, copy))
            good_guy(fp, copy)
    finally:
        if fp:
            fp.close()
        try:
            os.unlink(tfn)
        except os.error:
            pass

main()


From tim_one at email.msn.com  Fri Mar 10 03:13:51 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 21:13:51 -0500
Subject: [Python-Dev] writelines() not thread-safe
In-Reply-To: <200003100113.UAA27337@eric.cnri.reston.va.us>
Message-ID: <000601bf8a36$46ebf880$58a2143f@tim>

[Guido van Rossum]
> Christian Tismer just did an exhaustive search for thread unsafe use
> of Python operations, and found two weaknesses.  One is
> posix.listdir(), which I had already found; the other is
> file.writelines().  Here's a program that demonstrates the bug;
> basically, while writelines is walking down the list, another thread
> could truncate the list, causing PyList_GetItem() to fail or a string
> object to be deallocated while writelines is using it.  On my SOlaris
> 7 system it typically crashes in the first or second iteration.
>
> It's easy to fix: just don't use release the interpreter lock (get rid
> of Py_BEGIN_ALLOW_THREADS c.s.).  This would however prevent other
> threads from doing any work while this thread may be blocked for I/O.
>
> An alternative solution is to put Py_BEGIN_ALLOW_THREADS and
> Py_END_ALLOW_THREADS just around the fwrite() call.  This is safe, but
> would require a lot of lock operations and would probably slow things
> down too much.
>
> Ideas?

2.5:

1: Before releasing the lock, make a shallow copy of the list.

1.5:  As in #1, but iteratively peeling off "the next N" values, for some N
balancing the number of lock operations against the memory burden (I don't
care about the speed of a shallow copy here ...).

2. Pull the same trick list.sort() uses:  make the list object immutable for
the duration (I know you think that's a hack, and it is <wink>, but it costs
virtually nothing and would raise an approriate error when they attempted
the insane mutation).

I actually like #2 best now, but won't in the future, because
file_writelines() should really accept an argument of any sequence type.
This makes 1.5 a better long-term hack.

although-adding-1.5-to-1.6-is-confusing<wink>-ly y'rs  - tim


From tim_one at email.msn.com  Fri Mar 10 03:52:26 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 21:52:26 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <1259490837-400325@hypernet.com>
Message-ID: <000901bf8a3b$ab314660$58a2143f@tim>

[Gordon McM, aspires to make sense of the mess]
> It doesn't work for commands builtin to whatever "shell" you're
> using. That's different between cmd and command, and the
> various flavors, versions and extensions thereof.

It's not that simple, either; e.g., old apps invoking the 16-bit subsystem
can screw up too.  Look at Tcl's man page for "exec" and just *try* to wrap
your brain around all the caveats they were left with after throwing a few
thousand lines of C at this under their Windows port <wink>.

> FWIW, I gave up a long time ago. I use redirection and a
> tempfile. The few times I've wanted "interactive" control, I've
> used Win32Process, dup'ed, inherited handles... the whole 9
> yards. Why? Look at all the questions about popen and child
> processes in general, on platforms where it *works*, (if it
> weren't for Donn Cave, nobody'd get it to work anywhere <wink>).

Donn is downright scary that way.  I stopped using 'em too, of course.

> To reiterate Tim's point: *none* of the c runtime routines for
> process control on Windows are adequate (beyond os.system
> and living with a DOS box popping up).

No, os.system is a problem under command.com flavors of Windows too, as
system spawns a new shell and command.com's exit code is *always* 0.  So
Python's os.system returns 0 no matter what app the user *thinks* they were
running, and whether it worked or set the baby on fire.

> The raw Win32 CreateProcess does everything you could possibly want, but
> takes a week or more to understand, (if this arg is a that, then that arg
> is a whatsit, and the next is limited to the values X  and Z unless...).

Except that CreateProcess doesn't handle shell metacharacters, right?  Tcl
is the only language I've seen that really works hard at making
cmdline-style process control portable.

so-all-we-need-to-do-is-a-single-createprocess-to-invoke-tcl<wink>-ly y'rs
    - tim


From tim_one at email.msn.com  Fri Mar 10 03:52:24 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 21:52:24 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <14535.46175.991970.135642@weyr.cnri.reston.va.us>
Message-ID: <000801bf8a3b$aa0c4e60$58a2143f@tim>

[Fred L. Drake, Jr.]
> Tim (& others),
>   Would this additional text be sufficient for the os.popen()
> documentation?
>
> 	\strong{Note:} This function behaves unreliably under Windows
>         due to the native implementation of \cfunction{popen()}.

Yes, that's good!  If Mark/Bill's alternatives don't make it in, would also
be good to point to the PythonWin extensions (although MarkH will have to
give us the Official Name for that).

>   If someone cares to explain what's weird about it, that might be
> appropriate as well, but I've never used this under Windows.

As the rest of this thread should have made abundantly clear by now <0.9
wink>, it's such a mess across various Windows flavors that nobody can
explain it.


From tim_one at email.msn.com  Fri Mar 10 04:15:18 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 9 Mar 2000 22:15:18 -0500
Subject: [Python-Dev] RE: finalization again
In-Reply-To: <20000309123731.A3664@acs.ucalgary.ca>
Message-ID: <000a01bf8a3e$dc8878c0$58a2143f@tim>

Quickie:

[Tim]
>> It's not obvious, but the SCCs can be found in linear time (via Tarjan's
>> algorithm, which is simple but subtle;

[NeilS]
> Wow, it seems like it should be more expensive than that.

Oh yes!  Many bright people failed to discover the trick; Tarjan didn't
discover it until (IIRC) the early 70's, and it was a surprise.  It's just a
few lines of simple code added to an ordinary depth-first search.  However,
while the code is simple, a correctness proof is not.  BTW, if it wasn't
clear, when talking about graph algorithms "linear" is usual taken to mean
"in the sum of the number of nodes and edges".  Cyclops.py finds all the
cycles in linear time in that sense, too (but does not find the SCCs in
linear time, at least not in theory -- in practice you can't tell the
difference <wink>).

> What are the space requirements?

Same as depth-first search, plus a way to associate an SCC id with each
node, plus a single global "id" vrbl.  So it's worst-case linear (in the
number of nodes) space.  See, e.g., any of the books in Sedgewick's
"Algorithms in [Language du Jour]" series for working code.

> Also, does the simple algorithm you used in Cyclops have a name?

Not officially, but it answers to "hey, dumb-ass!" <wink>.

then-again-so-do-i-so-make-eye-contact-ly y'rs  - tim


From bwarsaw at cnri.reston.va.us  Fri Mar 10 05:21:46 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 9 Mar 2000 23:21:46 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <000701bf89ab$80cb8e20$0d2d153f@tim>
	<200003091955.OAA26217@eric.cnri.reston.va.us>
Message-ID: <14536.30810.720836.886023@anthem.cnri.reston.va.us>

Okay, I had a flash of inspiration on the way home from my gig
tonight.  Of course, I'm also really tired so I'm sure Tim will shoot
this down in his usual witty but humbling way.  I just had to get this
out or I wouldn't sleep tonight.

What if you timestamp instances when you create them?  Then when you
have trash cycles with finalizers, you sort them and finalize in
chronological order.  The nice thing here is that the user can have
complete control over finalization order by controlling object
creation order.

Some random thoughts:

- Finalization order of cyclic finalizable trash is completely
  deterministic.

- Given sufficient resolution of your system clock, you should never
  have two objects with the same timestamp.

- You could reduce the memory footprint by only including a timestamp
  for objects whose classes have __del__'s at instance creation time.
  Sticking an __del__ into your class dynamically would have no effect
  on objects that are already created (and I wouldn't poke you with a
  pointy stick if even post-twiddle instances didn't get
  timestamped).  Thus, such objects would never be finalized -- tough
  luck.

- FIFO order /seems/ more natural to me than FILO, but then I rarely
  create cyclic objects, and almost never use __del__, so this whole
  argument has been somewhat academic to me :).

- The rule seems easy enough to implement, describe, and understand.

I think I came up with a few more points on the drive home, but my
post jam, post lightbulb endorphodrenalin rush is quickly subsiding,
so I leave the rest until tomorrow.

its-simply-a-matter-of-time-ly y'rs,
-Barry


From moshez at math.huji.ac.il  Fri Mar 10 06:32:41 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Fri, 10 Mar 2000 07:32:41 +0200 (IST)
Subject: [Python-Dev] finalization again
In-Reply-To: <Pine.LNX.4.10.10003091205510.18817-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003100728580.23922-100000@sundial>

On Thu, 9 Mar 2000, Greg Stein wrote:

> > But I'd still like to reclaim the memory.  If this is some
> > long-running server process that is executing arbitrary Python
> > commands sent to it by clients, it's not nice to leak, period.
> 
> If an exception is raised, the top-level server loop can catch it, log the
> error, and keep going. But yes: it will leak.

And Tim's version stops the leaking if the server is smart enough:
occasionally, it will call gc.get_dangerous_cycles(), and nuke everything
it finds there. (E.g., clean up dicts and lists). Some destructor raises
an exception? Ignore it (or whatever). And no willy-nilly "but I'm using a
silly OS which has hardly any concept of stderr" problems! If the server
wants, it can just send a message to the log.

rooting-for-tim-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From tim_one at email.msn.com  Fri Mar 10 09:18:29 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 10 Mar 2000 03:18:29 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <200003091955.OAA26217@eric.cnri.reston.va.us>
Message-ID: <000001bf8a69$37d57b40$812d153f@tim>

This is getting to be fun, but afraid I can only make time for the first
easy one tonight:

[Tim, conjures a horrid vision of finalizers installing new __del__ methods,
 then sez ...
]
> The scheme above is meant to be bulletproof in the face of abuses even
> I can't conceive of <wink>.

[Guido]
> Are you *sure* your scheme deals with this?

Never said it did -- only that it *meant* to <wink>.  Ya, you got me.  The
things I thought I had *proved* I put in the numbered list, and in a rush
put the speculative stuff in the reply body.  One practical thing I think I
can prove today:  after finding SCCs, and identifying the safe nodes without
predecessors, all such nodes S1, S2, ... can be cleaned up without fear of
resurrection, or of cleaning something in Si causing anything in Sj (i!=j)
to get reclaimed either (at the time I wrote it, I could only prove that
cleaning *one* Si was non-problematic).  Barring, of course, this "__del__
from hell" pathology.  Also suspect that this claim is isomorphic to your
later elaboration on why

    the objects on T at this point cannot be resurrected by a finalizer
    that runs, since they aren't reachable from any finalizers

That is, exactly the same is true of "the safe (SCC super)nodes without
predecessors", so I expect we've just got two ways of identifying the same
set here.  Perhaps yours is bigger, though (I realize that isn't clear;
later).

> Let's look at an example.
> (Again, lowercase nodes have no finalizers.)  Take G:
>
>   a <=> b -> C
>
> [and cleaning b can trigger C.__del__ which can create
>  a.__class__.__del__ before a is decref'ed ...]
>
> ... and we're halfway committing a crime we said we would never commit
> (touching cyclical trash with finalizers).

Wholly agreed.

> I propose to disregard this absurd possibility,

How come you never propose to just shoot people <0.9 wink>?

> except to the extent that Python shouldn't crash -- but we make no
> guarantees to the user.

"Shouldn't crash" is essential, sure.  Carry it another step:  after C is
finalized, we get back to the loop clearing b.__dict__, and the refcount on
"a" falls to 0 next.  So the new a.__del__ gets called.  Since b was visible
to a, it's possible for a.__del__ to resurrect b, which latter is now in
some bizarre (from the programmer's POV) cleared state (or even in the bit
bucket, if we optimistically reclaim b's memory "early"!).

I can't (well, don't want to <wink>) believe it will be hard to stop this.
It's just irksome to need to think about it at all.

making-java's-gc-look-easy?-ly y'rs  - tim


From guido at python.org  Fri Mar 10 14:46:43 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Mar 2000 08:46:43 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: Your message of "Thu, 09 Mar 2000 23:21:46 EST."
             <14536.30810.720836.886023@anthem.cnri.reston.va.us> 
References: <000701bf89ab$80cb8e20$0d2d153f@tim> <200003091955.OAA26217@eric.cnri.reston.va.us>  
            <14536.30810.720836.886023@anthem.cnri.reston.va.us> 
Message-ID: <200003101346.IAA27847@eric.cnri.reston.va.us>

> What if you timestamp instances when you create them?  Then when you
> have trash cycles with finalizers, you sort them and finalize in
> chronological order.  The nice thing here is that the user can have
> complete control over finalization order by controlling object
> creation order.
> 
> Some random thoughts:
> 
> - Finalization order of cyclic finalizable trash is completely
>   deterministic.
> 
> - Given sufficient resolution of your system clock, you should never
>   have two objects with the same timestamp.

Forget the clock -- just use a counter that is incremented on each
allocation.

> - You could reduce the memory footprint by only including a timestamp
>   for objects whose classes have __del__'s at instance creation time.
>   Sticking an __del__ into your class dynamically would have no effect
>   on objects that are already created (and I wouldn't poke you with a
>   pointy stick if even post-twiddle instances didn't get
>   timestamped).  Thus, such objects would never be finalized -- tough
>   luck.
> 
> - FIFO order /seems/ more natural to me than FILO, but then I rarely
>   create cyclic objects, and almost never use __del__, so this whole
>   argument has been somewhat academic to me :).

Ai, there's the rub.

Suppose I have a tree with parent and child links.  And suppose I have
a rule that children need to be finalized before their parents (maybe
they represent a Unix directory tree, where you must rm the files
before you can rmdir the directory).  This suggests that we should
choose LIFO: you must create the parents first (you have to create a
directory before you can create files in it).  However, now we add
operations to move nodes around in the tree.  Suddenly you can have a
child that is older than its parent! Conclusion: the creation time is
useless; the application logic and actual link relationships are
needed.

> - The rule seems easy enough to implement, describe, and understand.
> 
> I think I came up with a few more points on the drive home, but my
> post jam, post lightbulb endorphodrenalin rush is quickly subsiding,
> so I leave the rest until tomorrow.
> 
> its-simply-a-matter-of-time-ly y'rs,
> -Barry

Time flies like an arrow -- fruit flies like a banana.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Fri Mar 10 16:06:48 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Mar 2000 10:06:48 -0500
Subject: [Python-Dev] writelines() not thread-safe
In-Reply-To: Your message of "Thu, 09 Mar 2000 21:13:51 EST."
             <000601bf8a36$46ebf880$58a2143f@tim> 
References: <000601bf8a36$46ebf880$58a2143f@tim> 
Message-ID: <200003101506.KAA28358@eric.cnri.reston.va.us>

OK, here's a patch for writelines() that supports arbitrary sequences
and fixes the lock problem using Tim's solution #1.5 (slicing 1000
items at a time).  It contains a fast path for when the argument is a
list, using PyList_GetSlice; otherwise it uses PyObject_GetItem and a
fixed list.

Please have a good look at this; I've only tested it lightly.

--Guido van Rossum (home page: http://www.python.org/~guido/)

Index: fileobject.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Objects/fileobject.c,v
retrieving revision 2.70
diff -c -r2.70 fileobject.c
*** fileobject.c	2000/02/29 13:59:28	2.70
--- fileobject.c	2000/03/10 14:55:47
***************
*** 884,923 ****
  	PyFileObject *f;
  	PyObject *args;
  {
! 	int i, n;
  	if (f->f_fp == NULL)
  		return err_closed();
! 	if (args == NULL || !PyList_Check(args)) {
  		PyErr_SetString(PyExc_TypeError,
! 			   "writelines() requires list of strings");
  		return NULL;
  	}
! 	n = PyList_Size(args);
! 	f->f_softspace = 0;
! 	Py_BEGIN_ALLOW_THREADS
! 	errno = 0;
! 	for (i = 0; i < n; i++) {
! 		PyObject *line = PyList_GetItem(args, i);
! 		int len;
! 		int nwritten;
! 		if (!PyString_Check(line)) {
! 			Py_BLOCK_THREADS
! 			PyErr_SetString(PyExc_TypeError,
! 				   "writelines() requires list of strings");
  			return NULL;
  		}
! 		len = PyString_Size(line);
! 		nwritten = fwrite(PyString_AsString(line), 1, len, f->f_fp);
! 		if (nwritten != len) {
! 			Py_BLOCK_THREADS
! 			PyErr_SetFromErrno(PyExc_IOError);
! 			clearerr(f->f_fp);
! 			return NULL;
  		}
  	}
! 	Py_END_ALLOW_THREADS
  	Py_INCREF(Py_None);
! 	return Py_None;
  }
  
  static PyMethodDef file_methods[] = {
--- 884,975 ----
  	PyFileObject *f;
  	PyObject *args;
  {
! #define CHUNKSIZE 1000
! 	PyObject *list, *line;
! 	PyObject *result;
! 	int i, j, index, len, nwritten, islist;
! 
  	if (f->f_fp == NULL)
  		return err_closed();
! 	if (args == NULL || !PySequence_Check(args)) {
  		PyErr_SetString(PyExc_TypeError,
! 			   "writelines() requires sequence of strings");
  		return NULL;
  	}
! 	islist = PyList_Check(args);
! 
! 	/* Strategy: slurp CHUNKSIZE lines into a private list,
! 	   checking that they are all strings, then write that list
! 	   without holding the interpreter lock, then come back for more. */
! 	index = 0;
! 	if (islist)
! 		list = NULL;
! 	else {
! 		list = PyList_New(CHUNKSIZE);
! 		if (list == NULL)
  			return NULL;
+ 	}
+ 	result = NULL;
+ 
+ 	for (;;) {
+ 		if (islist) {
+ 			Py_XDECREF(list);
+ 			list = PyList_GetSlice(args, index, index+CHUNKSIZE);
+ 			if (list == NULL)
+ 				return NULL;
+ 			j = PyList_GET_SIZE(list);
  		}
! 		else {
! 			for (j = 0; j < CHUNKSIZE; j++) {
! 				line = PySequence_GetItem(args, index+j);
! 				if (line == NULL) {
! 					if (PyErr_ExceptionMatches(PyExc_IndexError)) {
! 						PyErr_Clear();
! 						break;
! 					}
! 					/* Some other error occurred.
! 					   Note that we may lose some output. */
! 					goto error;
! 				}
! 				if (!PyString_Check(line)) {
! 					PyErr_SetString(PyExc_TypeError,
! 					 "writelines() requires sequences of strings");
! 					goto error;
! 				}
! 				PyList_SetItem(list, j, line);
! 			}
! 		}
! 		if (j == 0)
! 			break;
! 
! 		Py_BEGIN_ALLOW_THREADS
! 		f->f_softspace = 0;
! 		errno = 0;
! 		for (i = 0; i < j; i++) {
! 			line = PyList_GET_ITEM(list, i);
! 			len = PyString_GET_SIZE(line);
! 			nwritten = fwrite(PyString_AS_STRING(line),
! 					  1, len, f->f_fp);
! 			if (nwritten != len) {
! 				Py_BLOCK_THREADS
! 				PyErr_SetFromErrno(PyExc_IOError);
! 				clearerr(f->f_fp);
! 				Py_DECREF(list);
! 				return NULL;
! 			}
  		}
+ 		Py_END_ALLOW_THREADS
+ 
+ 		if (j < CHUNKSIZE)
+ 			break;
+ 		index += CHUNKSIZE;
  	}
! 
  	Py_INCREF(Py_None);
! 	result = Py_None;
!   error:
! 	Py_XDECREF(list);
! 	return result;
  }
  
  static PyMethodDef file_methods[] = {


From skip at mojam.com  Fri Mar 10 16:28:13 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 10 Mar 2000 09:28:13 -0600
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
Message-ID: <200003101528.JAA15951@beluga.mojam.com>

Consider the following snippet of code from MySQLdb.py:

    try:
        self._query(query % escape_row(args, qc))
    except TypeError:
        self._query(query % escape_dict(args, qc))

It's not quite right.  There are at least four reasons I can think of why
the % operator might raise a TypeError:

    1. query has not enough format specifiers
    2. query has too many format specifiers
    3. argument type mismatch between individual format specifier and
       corresponding argument
    4. query expects dist-style interpolation

The except clause only handles the last case.  That leaves the other three
cases mishandled.  The above construct pretends that all TypeErrors possible
are handled by calling escape_dict() instead of escape_row().

I stumbled on case 2 yesterday and got a fairly useless error message when
the code in the except clause also bombed.  Took me a few minutes of head
scratching to see that I had an extra %s in my format string.  A note to
Andy Dustman, MySQLdb's author, yielded the following modified version:

    try:
        self._query(query % escape_row(args, qc))
    except TypeError, m:
        if m.args[0] == "not enough arguments for format string": raise
        if m.args[0] == "not all arguments converted": raise
        self._query(query % escape_dict(args, qc))

This will do the trick for me for the time being.  Note, however, that the
only way for Andy to decide which of the cases occurred (case 3 still isn't
handled above, but should occur very rarely in MySQLdb since it only uses
the more accommodating %s as a format specifier) is to compare the string
value of the message to see which of the four cases was raised.

This strong coupling via the error message text between the exception being
raised (in C code, in this case) and the place where it's caught seems bad
to me and encourages authors to either not recover from errors or to recover
from them in the crudest fashion.  If Guido decides to tweak the TypeError
message in any fashion, perhaps to include the count of arguments in the
format string and argument tuple, this code will break.  It makes me wonder
if there's not a better mechanism waiting to be discovered.  Would it be
possible to publish an interface of some sort via the exceptions module that
would allow symbolic names or dictionary references to be used to decide
which case is being handled?  I envision something like the following in
exceptions.py:

    UNKNOWN_ERROR_CATEGORY = 0
    TYP_SHORT_FORMAT = 1
    TYP_LONG_FORMAT = 2
    ...
    IND_BAD_RANGE = 1

    message_map = {
        # leave
        (TypeError, ("not enough arguments for format string",)):
	    TYP_SHORT_FORMAT,
	(TypeError, ("not all arguments converted",)):
	    TYP_LONG_FORMAT,
	...
	(IndexError, ("list index out of range",)): IND_BAD_RANGE,
	...
    }

This would isolate the raw text of exception strings to just a single place
(well, just one place on the exception handling side of things).  It would
be used something like

    try:
        self._query(query % escape_row(args, qc))
    except TypeError, m:
        from exceptions import *
        exc_case = message_map.get((TypeError, m.args), UNKNOWN_ERROR_CATEGORY)
        if exc_case in [UNKNOWN_ERROR_CATEGORY,TYP_SHORT_FORMAT,
		        TYP_LONG_FORMAT]: raise
        self._query(query % escape_dict(args, qc))

This could be added to exceptions.py without breaking existing code.

Does this (or something like it) seem like a reasonable enhancement for
Py2K?  If we can narrow things down to an implementable solution I'll create 
a patch.

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From guido at python.org  Fri Mar 10 17:17:56 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Mar 2000 11:17:56 -0500
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: Your message of "Fri, 10 Mar 2000 09:28:13 CST."
             <200003101528.JAA15951@beluga.mojam.com> 
References: <200003101528.JAA15951@beluga.mojam.com> 
Message-ID: <200003101617.LAA28722@eric.cnri.reston.va.us>

> Consider the following snippet of code from MySQLdb.py:

Skip, I'm not familiar with MySQLdb.py, and I have no idea what your
example is about.  From the rest of the message I feel it's not about
MySQLdb at all, but about string formatting, butthe point escapes me
because you never quite show what's in the format string and what
error that gives.  Could you give some examples based on first
principles?  A simple interactive session showing the various errors
would be helpful...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward at cnri.reston.va.us  Fri Mar 10 20:05:04 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Fri, 10 Mar 2000 14:05:04 -0500
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: <200003101617.LAA28722@eric.cnri.reston.va.us>; from guido@python.org on Fri, Mar 10, 2000 at 11:17:56AM -0500
References: <200003101528.JAA15951@beluga.mojam.com> <200003101617.LAA28722@eric.cnri.reston.va.us>
Message-ID: <20000310140503.A8619@cnri.reston.va.us>

On 10 March 2000, Guido van Rossum said:
> Skip, I'm not familiar with MySQLdb.py, and I have no idea what your
> example is about.  From the rest of the message I feel it's not about
> MySQLdb at all, but about string formatting, butthe point escapes me
> because you never quite show what's in the format string and what
> error that gives.  Could you give some examples based on first
> principles?  A simple interactive session showing the various errors
> would be helpful...

I think Skip's point was just this: "TypeError" isn't expressive
enough.  If you catch TypeError on a statement with multiple possible
type errors, you don't know which one you caught.  Same holds for any
exception type, really: a given statement could blow up with ValueError
for any number of reasons.  Etc., etc.

One possible solution, and I think this is what Skip was getting at, is
to add an "error code" to the exception object that identifies the error
more reliably than examining the error message.  It's just the
errno/strerror dichotomy: strerror is for users, errno is for code.  I
think Skip is just saying that Pythone exception objets need an errno
(although it doesn't have to be a number).  It would probably only make
sense to define error codes for exceptions that can be raised by Python
itself, though.

        Greg


From skip at mojam.com  Fri Mar 10 21:17:30 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 10 Mar 2000 14:17:30 -0600 (CST)
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: <200003101617.LAA28722@eric.cnri.reston.va.us>
References: <200003101528.JAA15951@beluga.mojam.com>
	<200003101617.LAA28722@eric.cnri.reston.va.us>
Message-ID: <14537.22618.656740.296408@beluga.mojam.com>

    Guido> Skip, I'm not familiar with MySQLdb.py, and I have no idea what
    Guido> your example is about.  From the rest of the message I feel it's
    Guido> not about MySQLdb at all, but about string formatting, 

My apologies.  You're correct, it's really not about MySQLdb. It's about
handling multiple cases raised by the same exception.

First, a more concrete example that just uses simple string formats:

    code		exception
    "%s" % ("a", "b")	TypeError: 'not all arguments converted'
    "%s %s" % "a"	TypeError: 'not enough arguments for format string'
    "%(a)s" % ("a",)	TypeError: 'format requires a mapping'
    "%d" % {"a": 1}	TypeError: 'illegal argument type for built-in operation'

Let's presume hypothetically that it's possible to recover from some subset
of the TypeErrors that are raised, but not all of them.  Now, also presume
that the format strings and the tuple, string or dict literals I've given
above can be stored in variables (which they can).

If we wrap the code in a try/except statement, we can catch the TypeError
exception and try to do something sensible.  This is precisely the trick
that Andy Dustman uses in MySQLdb: first try expanding the format string
using a tuple as the RH operand, then try with a dict if that fails.

Unfortunately, as you can see from the above examples, there are four cases
that need to be handled.  To distinguish them currently, you have to compare
the message you get with the exception to string literals that are generally
defined in C code in the interpreter.  Here's what Andy's original code
looked like stripped of the MySQLdb-ese:

    try:
        x = format % tuple_generating_function(...)
    except TypeError:
        x = format % dict_generating_function(...)

That doesn't handle the first two cases above.  You have to inspect the
message that raise sends out:

    try:
        x = format % tuple_generating_function(...)
    except TypeError, m:
        if m.args[0] == "not all arguments converted": raise
        if m.args[0] == "not enough arguments for format string": raise
        x = format % dict_generating_function(...)

This comparison of except arguments with hard-coded strings (especially ones
the programmer has no direct control over) seems fragile to me.  If you
decide to reword the error message strings, you break someone's code.

In my previous message I suggested collecting this fragility in the
exceptions module where it can be better isolated.  My solution is a bit
cumbersome, but could probably be cleaned up somewhat, but basically looks
like 

    try:
        x = format % tuple_generating_function(...)
    except TypeError, m:
        import exceptions
	msg_case = exceptions.message_map.get((TypeError, m.args),
				              exceptions.UNKNOWN_ERROR_CATEGORY)
	# punt on the cases we can't recover from
        if msg_case == exceptions.TYP_SHORT_FORMAT: raise
        if msg_case == exceptions.TYP_LONG_FORMAT: raise
        if msg_case == exceptions.UNKNOWN_ERROR_CATEGORY: raise
	# handle the one we can
        x = format % dict_generating_function(...)

In private email that crossed my original message, Andy suggested defining
more standard exceptions, e.g.:

    class FormatError(TypeError): pass
    class TooManyElements(FormatError): pass
    class TooFewElements(FormatError): pass

then raising the appropriate error based on the circumstance.  Code that
catches TypeError exceptions would still work.

So there are two possible changes on the table:

    1. define more standard exceptions so you can distinguish classes of
       errors on a more fine-grained basis using just the first argument of
       the except clause.

    2. provide some machinery in exceptions.py to allow programmers a
       measure of uncoupling from using hard-coded strings to distinguish
       cases. 

Skip


From skip at mojam.com  Fri Mar 10 21:21:11 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 10 Mar 2000 14:21:11 -0600 (CST)
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: <20000310140503.A8619@cnri.reston.va.us>
References: <200003101528.JAA15951@beluga.mojam.com>
	<200003101617.LAA28722@eric.cnri.reston.va.us>
	<20000310140503.A8619@cnri.reston.va.us>
Message-ID: <14537.22839.664131.373727@beluga.mojam.com>

    Greg> One possible solution, and I think this is what Skip was getting
    Greg> at, is to add an "error code" to the exception object that
    Greg> identifies the error more reliably than examining the error
    Greg> message.  It's just the errno/strerror dichotomy: strerror is for
    Greg> users, errno is for code.  I think Skip is just saying that
    Greg> Pythone exception objets need an errno (although it doesn't have
    Greg> to be a number).  It would probably only make sense to define
    Greg> error codes for exceptions that can be raised by Python itself,
    Greg> though.

I'm actually allowing the string to be used as the error code.  If you raise 
TypeError with "not all arguments converted" as the argument, then that
string literal will appear in the definition of exceptions.message_map as
part of a key.  The programmer would only refer to the args attribute of the 
object being raised.

either-or-makes-no-real-difference-to-me-ly y'rs,

Skip


From bwarsaw at cnri.reston.va.us  Fri Mar 10 21:56:45 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Fri, 10 Mar 2000 15:56:45 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <000701bf89ab$80cb8e20$0d2d153f@tim>
	<200003091955.OAA26217@eric.cnri.reston.va.us>
	<14536.30810.720836.886023@anthem.cnri.reston.va.us>
	<200003101346.IAA27847@eric.cnri.reston.va.us>
Message-ID: <14537.24973.579056.533282@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido at python.org> writes:

    >> Given sufficient resolution of your system
    >> clock, you should never have two objects with the same
    >> timestamp.

    GvR> Forget the clock -- just use a counter that is incremented on
    GvR> each allocation.

Good idea.

    GvR> Suppose I have a tree with parent and child links.  And
    GvR> suppose I have a rule that children need to be finalized
    GvR> before their parents (maybe they represent a Unix directory
    GvR> tree, where you must rm the files before you can rmdir the
    GvR> directory).  This suggests that we should choose LIFO: you
    GvR> must create the parents first (you have to create a directory
    GvR> before you can create files in it).  However, now we add
    GvR> operations to move nodes around in the tree.  Suddenly you
    GvR> can have a child that is older than its parent! Conclusion:
    GvR> the creation time is useless; the application logic and
    GvR> actual link relationships are needed.

One potential way to solve this is to provide an interface for
refreshing the counter; for discussion purposes, I'll call this
sys.gcrefresh(obj).  Throws a TypeError if obj isn't a finalizable
instance.  Otherwise, it sets the "timestamp" to the current counter
value and increments the counter.

Thus, in your example, when the child node is reparented, you
sys.gcrefresh(child) and now the parent is automatically older.  Of
course, what if the child has its own children?  You've now got an age
graph like this

    parent > child < grandchild

with the wrong age relationship between the parent and grandchild.  So
when you refresh, you've got to walk down the containment tree making
sure your grandkids are "younger" than yourself.  E.g.:

class Node:
    ...
    def __del__(self):
	...

    def reparent(self, node):
	self.parent = node
	self.refresh()

    def refresh(self):
	sys.gcrefresh(self)
	for c in self.children:
	    c.refresh()

The point to all this is that it gives explicit control of the
finalizable cycle reclamation order to the user, via a fairly easy to
understand, and manipulate mechanism.

twas-only-a-flesh-wound-but-waiting-for-the-next-stroke-ly y'rs,
-Barry


From jim at interet.com  Fri Mar 10 22:14:45 2000
From: jim at interet.com (James C. Ahlstrom)
Date: Fri, 10 Mar 2000 16:14:45 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff 
 going to be adopted?
References: <000801bf8a3b$aa0c4e60$58a2143f@tim>
Message-ID: <38C965C4.B164C2D5@interet.com>

Tim Peters wrote:
> 
> [Fred L. Drake, Jr.]
> > Tim (& others),
> >   Would this additional text be sufficient for the os.popen()
> > documentation?
> >
> >       \strong{Note:} This function behaves unreliably under Windows
> >         due to the native implementation of \cfunction{popen()}.
> 
> Yes, that's good!  If Mark/Bill's alternatives don't make it in, would also
> be good to point to the PythonWin extensions (although MarkH will have to
> give us the Official Name for that).

Well, it looks like this thread has fizzled out.  But what did we
decide?

Changing the docs to say popen() "doesn't work reliably" is
a little weak.  Maybe removing popen() is better, and demanding
that Windows users use win32pipe.

I played around with a patch to posixmodule.c which eliminates
_popen() and implements os.popen() using CreatePipe().  It
sort of works on NT and fails on 95.  Anyway, I am stuck on
how to make a Python file object from a pipe handle.

Would it be a good idea to extract the Wisdom from win32pipe
and re-implement os.popen() either in C or by using win32pipe
directly?  Using C is simple and to the point.

I feel Tim's original complaint that popen() is a Problem
still hasn't been fixed.

JimA


From moshez at math.huji.ac.il  Fri Mar 10 22:29:05 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Fri, 10 Mar 2000 23:29:05 +0200 (IST)
Subject: [Python-Dev] finalization again
In-Reply-To: <14537.24973.579056.533282@anthem.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003102324250.4723-100000@sundial>

On Fri, 10 Mar 2000 bwarsaw at cnri.reston.va.us wrote:

> One potential way to solve this is to provide an interface for
> refreshing the counter; for discussion purposes, I'll call this
> sys.gcrefresh(obj).

Barry, there are other problems with your scheme, but I won't even try to 
point those out: having to call a function whose purpose can only be
described in terms of a concrete implementation of a garbage collection
scheme is simply unacceptable. I can almost see you shouting "Come back
here, I'll bite your legs off" <wink>.

> The point to all this is that it gives explicit control of the
> finalizable cycle reclamation order to the user, via a fairly easy to
> understand, and manipulate mechanism.

Oh? This sounds like the most horrendus mechanism alive....

you-probably-jammed-a-*little*-too-loud-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From bwarsaw at cnri.reston.va.us  Fri Mar 10 23:15:27 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Fri, 10 Mar 2000 17:15:27 -0500 (EST)
Subject: [Python-Dev] finalization again
References: <14537.24973.579056.533282@anthem.cnri.reston.va.us>
	<Pine.GSO.4.10.10003102324250.4723-100000@sundial>
Message-ID: <14537.29695.532507.197580@anthem.cnri.reston.va.us>

Just throwing out ideas.


From DavidA at ActiveState.com  Fri Mar 10 23:20:45 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Fri, 10 Mar 2000 14:20:45 -0800
Subject: [Python-Dev] finalization again
In-Reply-To: <Pine.GSO.4.10.10003102324250.4723-100000@sundial>
Message-ID: <LMBBIEIJKMPMLBONJMFCIEHMCCAA.DavidA@ActiveState.com>

Moshe, some _arguments_ backing your feelings might give them more weight...
As they stand, they're just insults, and if I were Barry I'd ignore them.

--david ascher

Moshe Zadka:

> Barry, there are other problems with your scheme, but I won't even try to
> point those out: having to call a function whose purpose can only be
> described in terms of a concrete implementation of a garbage collection
> scheme is simply unacceptable. I can almost see you shouting "Come back
> here, I'll bite your legs off" <wink>.
> [...]
> Oh? This sounds like the most horrendus mechanism alive....


From skip at mojam.com  Fri Mar 10 23:40:02 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 10 Mar 2000 16:40:02 -0600
Subject: [Python-Dev] on the suitability of ideas tossed out to python-dev
Message-ID: <200003102240.QAA07881@beluga.mojam.com>

Folks, let's not forget that python-dev is a place where oftentimes
half-baked ideas will get advanced.  I came up with an idea about decoupling
error handling from exception message strings.  I don't expect my idea to be
adopted as is.  Similarly, Barry's ideas about object timestamps were
admittedly conceived late at night in the thrill following an apparently
good gig. (I like the idea that every object has a modtime, but for other
reasons than Barry suggested.)

My feeling is that bad ideas will get winnowed out or drastically modified
quickly enough anyway.  Think of these early ideas as little more than
brainstorms.  A lot of times if I have an idea, I feel I need to put it down
on my virtual whiteboard quickly, because a) I often don't have a lot of
time to pursue stuff (do it now or it won't get done), b) because bad ideas
can be the catalyst for better ideas, and c) if I don't do it immediately,
I'll probably forget the idea altogether, thus missing the opportunity for
reason b altogether.

Try and collect a bunch of ideas before shooting any down and see what falls
out.  The best ideas will survive.  When people start proving things and
using fancy diagrams like "a <=> b -> C", then go ahead and get picky... ;-)

Have a relaxing, thought provoking weekend.  I'm going to go see a movie
this evening with my wife and youngest son, appropriately enough titled, "My
Dog Skip".  Enough Pythoneering for one day...

bow-wow-ly y'rs,

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From guido at python.org  Sat Mar 11 01:20:01 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Mar 2000 19:20:01 -0500
Subject: [Python-Dev] Unicode patches checked in
Message-ID: <200003110020.TAA17777@eric.cnri.reston.va.us>

I've just checked in a massive patch from Marc-Andre Lemburg which
adds Unicode support to Python.  This work was financially supported
by Hewlett-Packard.  Marc-Andre has done a tremendous amount of work,
for which I cannot thank him enough.

We're still awaiting some more things: Marc-Andre gave me
documentation patches which will be reviewed by Fred Drake before they
are checked in; Fredrik Lundh has developed a new regular expression
which is Unicode-aware and which should be checked in real soon now.
Also, the documentation is probably incomplete and will be updated,
and of course there may be bugs -- this should be considered alpha
software.  However, I believe it is quite good already, otherwise I
wouldn't have checked it in!

I'd like to invite everyone with an interest in Unicode or Python 1.6
to check out this new Unicode-aware Python, so that we can ensure a
robust code base by the time Python 1.6 is released (planned release
date: June 1, 2000).  The download links are below.

Links:

http://www.python.org/download/cvs.html
    Instructions on how to get access to the CVS version.
    (David Ascher is making nightly tarballs of the CVS version
    available at http://starship.python.net/crew/da/pythondists/)

http://starship.python.net/crew/lemburg/unicode-proposal.txt
    The latest version of the specification on which the Marc
    has based his implementation.

http://www.python.org/sigs/i18n-sig/
    Home page of the i18n-sig (Internationalization SIG), which has
    lots of other links about this and related issues.

http://www.python.org/search/search_bugs.html
    The Python Bugs List.  Use this for all bug reports.

Note that next Tuesday I'm going on a 10-day trip, with limited time
to read email and no time to solve problems.  The usual crowd will
take care of urgent updates.  See you at the Intel Computing Continuum
Conference in San Francisco or at the Python Track at Software
Development 2000 in San Jose!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Sat Mar 11 03:03:47 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 10 Mar 2000 21:03:47 -0500
Subject: [Python-Dev] Finalization in Eiffel
Message-ID: <000701bf8afe$0a0fd800$a42d153f@tim>

Eiffel is Bertrand Meyer's "design by contract" OO language.  Meyer took
extreme care in its design, and has written extensively and articulately
about the design -- agree with him or not, he's always worth reading!

I used Eiffel briefly a few years ago, just out of curiosity.  I didn't
recall even bumping into a notion of destructors.  Turns out it does have
them, but they're appallingly (whether relative to Eiffel's usual clarity,
or even relative to C++'s usual lack thereof <0.9 wink>) ill-specified.

An Eiffel class can register a destructor by inheriting from the system
MEMORY class and overriding the latter's "dispose()".  This appears to be
viewed as a low-level facility, and neither OOSC (2nd ed) nor "Eiffel: The
Language" say much about its semantics.  Within dispose, you're explicitly
discouraged from invoking methods on *any* other object, and resurrection is
right out the window.  But the language doesn't appear to check for any of
that, which is extremely un-Eiffel-like.  Many msgs on comp.lang.eiffel from
people who should know suggest that all but one Eiffel implementation pay no
attention at all to reachability during gc, and that none support
resurrection.  If you need ordering during finalization, the advice is to
write that part in C/C++.  Violations of the vague rules appear to lead to
random system damage(!).

Looking at various Eiffel pkgs on the web, the sole use of dispose was in
one-line bodies that released external resources (like memory & db
connections) via calling an external C/C++ function.

jealous-&-appalled-at-the-same-time<wink>-ly y'rs  - tim


From tim_one at email.msn.com  Sat Mar 11 03:03:50 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 10 Mar 2000 21:03:50 -0500
Subject: [Python-Dev] Conventional wisdom on finalization
Message-ID: <000801bf8afe$0b3df7c0$a42d153f@tim>

David Chase maintains a well-regarded GC FAQ, at

    http://www.iecc.com/gclist/GC-faq.html

Interested folks should look it up.  A couple highlights:

On cycles with finalizers:

    In theory, of course, a cycle in the graph of objects to be finalized
    will prevent a topological sort from succeeding.  In practice, the
    "right" thing to do appears to be to signal an error (at least when
    debugging) and let the programmer clean this up.  People with experience
    on large systems report that such cycles are in fact exceedingly rare
    (note, however, that some languages define "finalizers" for almost
    every object, and that was not the case for the large systems studied
    -- there, finalizers were not too common).

On Java's "finalizer called only once" rule:

    if an object is revived in finalization, that is fine, but its
    finalizer will not run a second time. (It isn't clear if this is a
    matter of design, or merely an accident of the first implementation
    of the language, but it is in the specification now. Obviously, this
    encourages careful use of finalization, in much the same way that
    driving without seatbelts encourages careful driving.)

Until today, I had no idea I was so resolutely conventional <wink>.

seems-we're-trying-to-do-more-than-anyone-other-than-us-expects-ly
    y'rs  - tim


From shichang at icubed.com  Fri Mar 10 23:33:11 2000
From: shichang at icubed.com (Shichang Zhao)
Date: Fri, 10 Mar 2000 22:33:11 -0000
Subject: [Python-Dev] RE: Unicode patches checked in
Message-ID: <01BF8AE0.9E911980.shichang@icubed.com>

I would love to test the Python 1.6 (Unicode support) in Chinese language 
aspect, but I don't know where I can get a copy of OS that supports 
Chinese. Anyone can point me a direction?

-----Original Message-----
From:	Guido van Rossum [SMTP:guido at python.org]
Sent:	Saturday, March 11, 2000 12:20 AM
To:	Python mailing list; python-announce at python.org; python-dev at python.org; 
i18n-sig at python.org; string-sig at python.org
Cc:	Marc-Andre Lemburg
Subject:	Unicode patches checked in

I've just checked in a massive patch from Marc-Andre Lemburg which
adds Unicode support to Python.  This work was financially supported
by Hewlett-Packard.  Marc-Andre has done a tremendous amount of work,
for which I cannot thank him enough.

We're still awaiting some more things: Marc-Andre gave me
documentation patches which will be reviewed by Fred Drake before they
are checked in; Fredrik Lundh has developed a new regular expression
which is Unicode-aware and which should be checked in real soon now.
Also, the documentation is probably incomplete and will be updated,
and of course there may be bugs -- this should be considered alpha
software.  However, I believe it is quite good already, otherwise I
wouldn't have checked it in!

I'd like to invite everyone with an interest in Unicode or Python 1.6
to check out this new Unicode-aware Python, so that we can ensure a
robust code base by the time Python 1.6 is released (planned release
date: June 1, 2000).  The download links are below.

Links:

http://www.python.org/download/cvs.html
    Instructions on how to get access to the CVS version.
    (David Ascher is making nightly tarballs of the CVS version
    available at http://starship.python.net/crew/da/pythondists/)

http://starship.python.net/crew/lemburg/unicode-proposal.txt
    The latest version of the specification on which the Marc
    has based his implementation.

http://www.python.org/sigs/i18n-sig/
    Home page of the i18n-sig (Internationalization SIG), which has
    lots of other links about this and related issues.

http://www.python.org/search/search_bugs.html
    The Python Bugs List.  Use this for all bug reports.

Note that next Tuesday I'm going on a 10-day trip, with limited time
to read email and no time to solve problems.  The usual crowd will
take care of urgent updates.  See you at the Intel Computing Continuum
Conference in San Francisco or at the Python Track at Software
Development 2000 in San Jose!

--Guido van Rossum (home page: http://www.python.org/~guido/)

--
http://www.python.org/mailman/listinfo/python-list


From moshez at math.huji.ac.il  Sat Mar 11 10:10:12 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 11 Mar 2000 11:10:12 +0200 (IST)
Subject: [Python-Dev] Unicode: When Things Get Hairy
Message-ID: <Pine.GSO.4.10.10003111108090.8019-100000@sundial>

The following "problem" is easy to fix. However, what I wanted to know is
if people (Skip and Guido most importantly) think it is a problem:

>>> "a" in u"bbba"
1
>>> u"a" in "bbba"
Traceback (innermost last):
  File "<stdin>", line 1, in ?
TypeError: string member test needs char left operand

Suggested fix: in stringobject.c, explicitly allow a unicode char left
operand.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From mal at lemburg.com  Sat Mar 11 11:24:26 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 11 Mar 2000 11:24:26 +0100
Subject: [Python-Dev] Unicode: When Things Get Hairy
References: <Pine.GSO.4.10.10003111108090.8019-100000@sundial>
Message-ID: <38CA1EDA.423F8A2C@lemburg.com>

Moshe Zadka wrote:
> 
> The following "problem" is easy to fix. However, what I wanted to know is
> if people (Skip and Guido most importantly) think it is a problem:
> 
> >>> "a" in u"bbba"
> 1
> >>> u"a" in "bbba"
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
> TypeError: string member test needs char left operand
> 
> Suggested fix: in stringobject.c, explicitly allow a unicode char left
> operand.

Hmm, this must have been introduced by your contains code...
it did work before.

The normal action taken by the Unicode and the string
code in these mixed type situations is to first
convert everything to Unicode and then retry the operation.
Strings are interpreted as UTF-8 during this conversion.

To simplify this task, I added method APIs to the
Unicode object which do the conversion for you (they
apply all the necessariy coercion business to all arguments).
I guess adding another PyUnicode_Contains() wouldn't hurt :-)

Perhaps I should also add a tp_contains slot to the
Unicode object which then uses the above API as well.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From moshez at math.huji.ac.il  Sat Mar 11 12:05:48 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 11 Mar 2000 13:05:48 +0200 (IST)
Subject: [Python-Dev] Unicode: When Things Get Hairy
In-Reply-To: <38CA1EDA.423F8A2C@lemburg.com>
Message-ID: <Pine.GSO.4.10.10003111300320.8673-100000@sundial>

On Sat, 11 Mar 2000, M.-A. Lemburg wrote:

> Hmm, this must have been introduced by your contains code...
> it did work before.

Nope: the string "in" semantics were forever special-cased. Guido beat me
soundly for trying to change the semantics...

> The normal action taken by the Unicode and the string
> code in these mixed type situations is to first
> convert everything to Unicode and then retry the operation.
> Strings are interpreted as UTF-8 during this conversion.

Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments.
Should it? (Again, it didn't before). If it does, then the order of
testing for seq_contains and seq_getitem and conversions 

> Perhaps I should also add a tp_contains slot to the
> Unicode object which then uses the above API as well.

But that wouldn't help at all for 

u"a" in "abbbb"

PySequence_Contains only dispatches on the container argument :-(

(BTW: I discovered it while contemplating adding a seq_contains (not
tp_contains) to unicode objects to optimize the searching for a bit.)

PS:
MAL: thanks for the a great birthday present! I'm enjoying the unicode
patch a lot.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From guido at python.org  Sat Mar 11 13:16:06 2000
From: guido at python.org (Guido van Rossum)
Date: Sat, 11 Mar 2000 07:16:06 -0500
Subject: [Python-Dev] Unicode: When Things Get Hairy
In-Reply-To: Your message of "Sat, 11 Mar 2000 13:05:48 +0200."
             <Pine.GSO.4.10.10003111300320.8673-100000@sundial> 
References: <Pine.GSO.4.10.10003111300320.8673-100000@sundial> 
Message-ID: <200003111216.HAA12651@eric.cnri.reston.va.us>

[Moshe discovers that u"a" in "bbba" raises TypeError]

[Marc-Andre]
> > Hmm, this must have been introduced by your contains code...
> > it did work before.
> 
> Nope: the string "in" semantics were forever special-cased. Guido beat me
> soundly for trying to change the semantics...

But I believe that Marc-Andre added a special case for Unicode in
PySequence_Contains.  I looked for evidence, but the last snapshot that
I actually saved and built before Moshe's code was checked in is from
2/18 and it isn't in there.  Yet I believe Marc-Andre.  The special
case needs to be added back to string_contains in stringobject.c.

> > The normal action taken by the Unicode and the string
> > code in these mixed type situations is to first
> > convert everything to Unicode and then retry the operation.
> > Strings are interpreted as UTF-8 during this conversion.
> 
> Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments.
> Should it? (Again, it didn't before). If it does, then the order of
> testing for seq_contains and seq_getitem and conversions 

Or it could be done this way.

> > Perhaps I should also add a tp_contains slot to the
> > Unicode object which then uses the above API as well.

Yes.

> But that wouldn't help at all for 
> 
> u"a" in "abbbb"

It could if PySeqeunce_Contains would first look for a string and a
unicode argument (in either order) and in that case convert the string
to unicode.

> PySequence_Contains only dispatches on the container argument :-(
> 
> (BTW: I discovered it while contemplating adding a seq_contains (not
> tp_contains) to unicode objects to optimize the searching for a bit.)

You may beat Marc-Andre to it, but I'll have to let him look at the
code anyway -- I'm not sufficiently familiar with the Unicode stuff
myself yet.

BTW, I added a tag "pre-unicode" to the CVS tree to the revisions
before the Unicode changes were made.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Sat Mar 11 14:32:57 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 11 Mar 2000 14:32:57 +0100
Subject: [Python-Dev] Unicode: When Things Get Hairy
References: <Pine.GSO.4.10.10003111300320.8673-100000@sundial> <200003111216.HAA12651@eric.cnri.reston.va.us>
Message-ID: <38CA4B08.7B13438D@lemburg.com>

Guido van Rossum wrote:
> 
> [Moshe discovers that u"a" in "bbba" raises TypeError]
> 
> [Marc-Andre]
> > > Hmm, this must have been introduced by your contains code...
> > > it did work before.
> >
> > Nope: the string "in" semantics were forever special-cased. Guido beat me
> > soundly for trying to change the semantics...
> 
> But I believe that Marc-Andre added a special case for Unicode in
> PySequence_Contains.  I looked for evidence, but the last snapshot that
> I actually saved and built before Moshe's code was checked in is from
> 2/18 and it isn't in there.  Yet I believe Marc-Andre.  The special
> case needs to be added back to string_contains in stringobject.c.

Moshe was right: I had probably not checked the code because
the obvious combinations worked out of the box... the
only combination which doesn't work is "unicode in string".
I'll fix it next week.

BTW, there's a good chance that the string/Unicode integration
is not complete yet: just keep looking for them.

> > > The normal action taken by the Unicode and the string
> > > code in these mixed type situations is to first
> > > convert everything to Unicode and then retry the operation.
> > > Strings are interpreted as UTF-8 during this conversion.
> >
> > Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments.
> > Should it? (Again, it didn't before). If it does, then the order of
> > testing for seq_contains and seq_getitem and conversions
> 
> Or it could be done this way.
> 
> > > Perhaps I should also add a tp_contains slot to the
> > > Unicode object which then uses the above API as well.
> 
> Yes.
> 
> > But that wouldn't help at all for
> >
> > u"a" in "abbbb"
> 
> It could if PySeqeunce_Contains would first look for a string and a
> unicode argument (in either order) and in that case convert the string
> to unicode.

I think the right way to do
this is to add a special case to seq_contains in the
string implementation. That's how most other auto-coercions
work too.

Instead of raising an error, the implementation would then
delegate the work to PyUnicode_Contains().
 
> > PySequence_Contains only dispatches on the container argument :-(
> >
> > (BTW: I discovered it while contemplating adding a seq_contains (not
> > tp_contains) to unicode objects to optimize the searching for a bit.)
> 
> You may beat Marc-Andre to it, but I'll have to let him look at the
> code anyway -- I'm not sufficiently familiar with the Unicode stuff
> myself yet.

I'll add that one too.
 
BTW, Happy Birthday, Moshe :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Sat Mar 11 14:57:34 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 11 Mar 2000 14:57:34 +0100
Subject: [Python-Dev] Unicode: When Things Get Hairy
References: <Pine.GSO.4.10.10003111300320.8673-100000@sundial> <200003111216.HAA12651@eric.cnri.reston.va.us> <38CA4B08.7B13438D@lemburg.com>
Message-ID: <38CA50CE.BEEFAB5E@lemburg.com>

I couldn't resist :-) Here's the patch...

BTW, how should we proceed with future patches ? Should I wrap
them together about once a week, or send them as soon as they
are done ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/
-------------- next part --------------
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Include/unicodeobject.h Python+Unicode/Include/unicodeobject.h
--- CVS-Python/Include/unicodeobject.h	Fri Mar 10 23:33:05 2000
+++ Python+Unicode/Include/unicodeobject.h	Sat Mar 11 14:45:59 2000
@@ -683,6 +683,17 @@
     PyObject *args		/* Argument tuple or dictionary */
     );
 
+/* Checks whether element is contained in container and return 1/0
+   accordingly.
+
+   element has to coerce to an one element Unicode string. -1 is
+   returned in case of an error. */
+
+extern DL_IMPORT(int) PyUnicode_Contains(
+    PyObject *container,	/* Container string */ 
+    PyObject *element		/* Element string */
+    );
+
 /* === Characters Type APIs =============================================== */
 
 /* These should not be used directly. Use the Py_UNICODE_IS* and
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py
--- CVS-Python/Lib/test/test_unicode.py	Sat Mar 11 00:23:20 2000
+++ Python+Unicode/Lib/test/test_unicode.py	Sat Mar 11 14:52:29 2000
@@ -219,6 +219,19 @@
 test('translate', u"abababc", u'iiic', {ord('a'):None, ord('b'):ord('i')})
 test('translate', u"abababc", u'iiix', {ord('a'):None, ord('b'):ord('i'), ord('c'):u'x'})
 
+# Contains:
+print 'Testing Unicode contains method...',
+assert ('a' in 'abdb') == 1
+assert ('a' in 'bdab') == 1
+assert ('a' in 'bdaba') == 1
+assert ('a' in 'bdba') == 1
+assert ('a' in u'bdba') == 1
+assert (u'a' in u'bdba') == 1
+assert (u'a' in u'bdb') == 0
+assert (u'a' in 'bdb') == 0
+assert (u'a' in 'bdba') == 1
+print 'done.'
+
 # Formatting:
 print 'Testing Unicode formatting strings...',
 assert u"%s, %s" % (u"abc", "abc") == u'abc, abc'
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt
--- CVS-Python/Misc/unicode.txt	Sat Mar 11 00:14:11 2000
+++ Python+Unicode/Misc/unicode.txt	Sat Mar 11 14:53:37 2000
@@ -743,8 +743,9 @@
 stream codecs as available through the codecs module should 
 be used.
 
-XXX There should be a short-cut open(filename,mode,encoding) available which
-    also assures that mode contains the 'b' character when needed.
+The codecs module should provide a short-cut open(filename,mode,encoding)
+available which also assures that mode contains the 'b' character when
+needed.
 
 
 File/Stream Input:
@@ -810,6 +811,10 @@
 Introduction to Unicode (a little outdated by still nice to read):
         http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html
 
+For comparison:
+	Introducing Unicode to ECMAScript --
+	http://www-4.ibm.com/software/developer/library/internationalization-support.html
+
 Encodings:
 
     Overview:
@@ -832,7 +837,7 @@
 
 History of this Proposal:
 -------------------------
-1.2: 
+1.2: Removed POD about codecs.open()
 1.1: Added note about comparisons and hash values. Added note about
      case mapping algorithms. Changed stream codecs .read() and
      .write() method to match the standard file-like object methods
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/stringobject.c Python+Unicode/Objects/stringobject.c
--- CVS-Python/Objects/stringobject.c	Sat Mar 11 10:55:09 2000
+++ Python+Unicode/Objects/stringobject.c	Sat Mar 11 14:47:45 2000
@@ -389,7 +389,9 @@
 {
 	register char *s, *end;
 	register char c;
-	if (!PyString_Check(el) || PyString_Size(el) != 1) {
+	if (!PyString_Check(el))
+		return PyUnicode_Contains(a, el);
+	if (PyString_Size(el) != 1) {
 		PyErr_SetString(PyExc_TypeError,
 				"string member test needs char left operand");
 		return -1;
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/unicodeobject.c Python+Unicode/Objects/unicodeobject.c
--- CVS-Python/Objects/unicodeobject.c	Fri Mar 10 23:53:23 2000
+++ Python+Unicode/Objects/unicodeobject.c	Sat Mar 11 14:48:52 2000
@@ -2737,6 +2737,49 @@
     return -1;
 }
 
+int PyUnicode_Contains(PyObject *container,
+		       PyObject *element)
+{
+    PyUnicodeObject *u = NULL, *v = NULL;
+    int result;
+    register const Py_UNICODE *p, *e;
+    register Py_UNICODE ch;
+
+    /* Coerce the two arguments */
+    u = (PyUnicodeObject *)PyUnicode_FromObject(container);
+    if (u == NULL)
+	goto onError;
+    v = (PyUnicodeObject *)PyUnicode_FromObject(element);
+    if (v == NULL)
+	goto onError;
+
+    /* Check v in u */
+    if (PyUnicode_GET_SIZE(v) != 1) {
+	PyErr_SetString(PyExc_TypeError,
+			"string member test needs char left operand");
+	goto onError;
+    }
+    ch = *PyUnicode_AS_UNICODE(v);
+    p = PyUnicode_AS_UNICODE(u);
+    e = p + PyUnicode_GET_SIZE(u);
+    result = 0;
+    while (p < e) {
+	if (*p++ == ch) {
+	    result = 1;
+	    break;
+	}
+    }
+
+    Py_DECREF(u);
+    Py_DECREF(v);
+    return result;
+
+onError:
+    Py_XDECREF(u);
+    Py_XDECREF(v);
+    return -1;
+}
+
 /* Concat to string or Unicode object giving a new Unicode object. */
 
 PyObject *PyUnicode_Concat(PyObject *left,
@@ -3817,6 +3860,7 @@
     (intintargfunc) unicode_slice, 	/* sq_slice */
     0, 					/* sq_ass_item */
     0, 					/* sq_ass_slice */
+    (objobjproc)PyUnicode_Contains, 	/*sq_contains*/
 };
 
 static int

From tim_one at email.msn.com  Sat Mar 11 21:10:23 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 11 Mar 2000 15:10:23 -0500
Subject: [Python-Dev] finalization again
In-Reply-To: <14536.30810.720836.886023@anthem.cnri.reston.va.us>
Message-ID: <000e01bf8b95$d52939e0$c72d153f@tim>

[Barry A. Warsaw, jamming after hours]
> ...
> What if you timestamp instances when you create them?  Then when you
> have trash cycles with finalizers, you sort them and finalize in
> chronological order.

Well, I strongly agree that would be better than finalizing them in
increasing order of storage address <wink>.

> ...
> - FIFO order /seems/ more natural to me than FILO,

Forget cycles for a moment, and consider just programs that manipulate
*immutable* containers (the simplest kind to think about):  at the time you
create an immutable container, everything *contained* must already be in
existence, so every pointer goes from a newer object (container) to an older
one (containee).  This is the "deep" reason for why, e.g., you can't build a
cycle out of pure tuples in Python (if every pointer goes new->old, you
can't get a loop, else each node in the loop would be (transitively) older
than itself!).

Then, since a finalizer can see objects pointed *to*, a finalizer can see
only older objects.  Since it's desirable that a finalizer see only wholly
intact (unfinalized) objects, it is in fact the oldest object ("first in")
that needs to be cleaned up last ("last out").  So, under the assumption of
immutability, FILO is sufficient, but FIFO dangerous.  So your muse inflamed
you with an interesting tune, but you fingered the riff backwards <wink>.

One problem is that it all goes out the window as soon as mutation is
allowed.  It's *still* desirable that a finalizer see only unfinalized
objects, but in the presence of mutation that no longer bears any
relationship to relative creation time.

Another problem is in Guido's directory example, which we can twist to view
as an "immutable container" problem that builds its image of the directory
bottom-up, and where a finalizer on each node tries to remove the file (or
delete the directory, whichever the node represents).  In this case the
physical remove/delete/unlink operations have to follow a *postorder*
traversal of the container tree, so that "finalizer sees only unfinalized
objects" is the opposite of what the app needs!

The lesson to take from that is that the implementation can't possibly guess
what ordering an app may need in a fancy finalizer.  At best it can promise
to follow a "natural" ordering based on the points-to relationship, and
while "finalizer sees only unfinalized objects" is at least clear, it's
quite possibly unhelpful (in Guido's particular case, it *can* be exploited,
though, by adding a postorder remove/delete/unlink method to nodes, and
explicitly calling it from __del__ -- "the rules" guarantee that the root of
the tree will get finalized first, and the code can rely on that in its own
explicit postorder traversal).

>   but then I rarely create cyclic objects, and almost never use __del__,
>   so this whole argument has been somewhat academic to me :).

Well, not a one of us creates cycles often in CPython today, simply because
we don't want to track down leaks <0.5 wink>.  It seems that nobody here
uses __del__ much, either; indeed, my primary use of __del__ is simply to
call an explicit break_cycles() function from the header node of a graph!
The need for that goes away as soon as Python reclaims cycles by itself, and
I may never use __del__ at all then in the vast bulk of my code.

It's because we've seen no evidence here (and also that I've seen none
elsewhere either) that *anyone* is keen on mixing cycles with finalizers
that I've been so persistent in saying "screw it -- let it leak, but let the
user get at it if they insist on doing it".  Seems we're trying to provide
slick support for something nobody wants to do.  If it happens by accident
anyway, well, people sometimes divide by 0 by accident too <0.0 wink>:  give
them a way to know about it, but don't move heaven & earth trying to treat
it like a normal case.

if-it-were-easy-to-implement-i-wouldn't-care-ly y'rs  - tim


From moshez at math.huji.ac.il  Sat Mar 11 21:35:43 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 11 Mar 2000 22:35:43 +0200 (IST)
Subject: [Python-Dev] finalization again
In-Reply-To: <000e01bf8b95$d52939e0$c72d153f@tim>
Message-ID: <Pine.GSO.4.10.10003112233240.12810-100000@sundial>

In a continuation (yes, a dangerous word in these parts) of the timbot's
looks at the way other languages handle finalization, let me add something 
from the Sather manual I'm now reading (when I'm done with it, you'll see
me begging for iterators here, and having some weird ideas in the
types-sig):

===============================
   Finalization will only occur once, even if new references are created
   to the object during finalization. Because few guarantees can be made
   about the environment in which finalization occurs, finalization is
   considered dangerous and should only be used in the rare cases that
   conventional coding will not suffice.
===============================

(Sather is garbage-collected, BTW)
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html


From tim_one at email.msn.com  Sat Mar 11 21:51:47 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 11 Mar 2000 15:51:47 -0500
Subject: [Python-Dev] Py3K: indirect coupling between raise and exception handler
In-Reply-To: <200003101528.JAA15951@beluga.mojam.com>
Message-ID: <001001bf8b9b$9e09d720$c72d153f@tim>

[Skip Montanaro, with an expression that may raise TypeError for any of
 several distinct reasons, and wants to figure out which one after the fact]

The existing exception machinery is sufficiently powerful for building a
solution, so nothing new is needed in the language.  What you really need
here is an exhaustive list of all exceptions the language can raise, and
when, and why, and a formally supported "detail" field (whether numeric id
or string or whatever) that you can rely on to tell them apart at runtime.

There are at least a thousand cases that need to be so documented and
formalized.  That's why not a one of them is now <0.9 wink>.

If P3K is a rewrite from scratch, a rational scheme could be built in from
the start.  Else it would seem to require a volunteer with even less of a
life than us <wink>.


From tim_one at email.msn.com  Sat Mar 11 21:51:49 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 11 Mar 2000 15:51:49 -0500
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <38C965C4.B164C2D5@interet.com>
Message-ID: <001101bf8b9b$9f37f6e0$c72d153f@tim>

[James C. Ahlstrom]
> Well, it looks like this thread has fizzled out.  But what did we
> decide?

Far as I could tell, nothing specific.

> ...
> I feel Tim's original complaint that popen() is a Problem
> still hasn't been fixed.

I was passing it on from MikeF's c.l.py posting.  This isn't a new problem,
of course, it just drags on year after year -- which is the heart of MikeF's
gripe.  People have code that *does* work, but for whatever reasons it never
gets moved to the core.  In the meantime, the Library Ref implies the broken
code that is in the core does work.  One or the other has to change, and it
looks most likely to me that Fred will change the docs for 1.6.  While not
ideal, that would be a huge improvement over the status quo.

luckily-few-people-expect-windows-to-work-anyway<0.9-wink>-ly y'rs  - tim


From mhammond at skippinet.com.au  Mon Mar 13 04:50:35 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Mon, 13 Mar 2000 14:50:35 +1100
Subject: [Python-Dev] string.replace behaviour change since Unicode patch.
Message-ID: <ECEPKNMJLHAPFFJHDOJBKEGMCGAA.mhammond@skippinet.com.au>

Hi,
	After applying the Unicode changes string.replace() seems to have changed
its behaviour:

Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import string
>>> string.replace("foo\nbar", "\n", "")
'foobar'
>>>

But since the Unicode update:

Python 1.5.2+ (#0, Feb  2 2000, 16:46:55) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import string
>>> string.replace("foo\nbar", "\n", "")
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "L:\src\python-cvs\lib\string.py", line 407, in replace
    return s.replace(old, new, maxsplit)
ValueError: empty replacement string
>>>

The offending check is stringmodule.c, line 1578:
	if (repl_len <= 0) {
		PyErr_SetString(PyExc_ValueError, "empty replacement string");
		return NULL;
	}

Changing the check to "< 0" fixes the immediate problem, but it is unclear
why the check was added at all, so I didnt bother submitting a patch...

Mark.


From mal at lemburg.com  Mon Mar 13 10:13:50 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 13 Mar 2000 10:13:50 +0100
Subject: [Python-Dev] string.replace behaviour change since Unicode patch.
References: <ECEPKNMJLHAPFFJHDOJBKEGMCGAA.mhammond@skippinet.com.au>
Message-ID: <38CCB14D.C07ACC26@lemburg.com>

Mark Hammond wrote:
> 
> Hi,
>         After applying the Unicode changes string.replace() seems to have changed
> its behaviour:
> 
> Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> import string
> >>> string.replace("foo\nbar", "\n", "")
> 'foobar'
> >>>
> 
> But since the Unicode update:
> 
> Python 1.5.2+ (#0, Feb  2 2000, 16:46:55) [MSC 32 bit (Intel)] on win32
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> import string
> >>> string.replace("foo\nbar", "\n", "")
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
>   File "L:\src\python-cvs\lib\string.py", line 407, in replace
>     return s.replace(old, new, maxsplit)
> ValueError: empty replacement string
> >>>
> 
> The offending check is stringmodule.c, line 1578:
>         if (repl_len <= 0) {
>                 PyErr_SetString(PyExc_ValueError, "empty replacement string");
>                 return NULL;
>         }
>
> Changing the check to "< 0" fixes the immediate problem, but it is unclear
> why the check was added at all, so I didnt bother submitting a patch...

Dang. Must have been my mistake -- it should read:

        if (sub_len <= 0) {
                PyErr_SetString(PyExc_ValueError, "empty pattern string");
                return NULL;
        }

Thanks for reporting this... I'll include the fix in the
next patch set.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake at acm.org  Mon Mar 13 16:43:09 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 13 Mar 2000 10:43:09 -0500 (EST)
Subject: [Python-Dev] FW: Fixing os.popen on Win32 => is the win32pipe stuff going to be adopted?
In-Reply-To: <001101bf8b9b$9f37f6e0$c72d153f@tim>
References: <38C965C4.B164C2D5@interet.com>
	<001101bf8b9b$9f37f6e0$c72d153f@tim>
Message-ID: <14541.3213.590243.359394@weyr.cnri.reston.va.us>

Tim Peters writes:
 > code that is in the core does work.  One or the other has to change, and it
 > looks most likely to me that Fred will change the docs for 1.6.  While not
 > ideal, that would be a huge improvement over the status quo.

  Actually, I just checked in my proposed change for the 1.5.2 doc
update that I'm releasing soon.
  I'd like to remove it for 1.6, if the appropriate implementation is
moved into the core.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From gvwilson at nevex.com  Mon Mar 13 22:10:52 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Mon, 13 Mar 2000 16:10:52 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
Message-ID: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>

Once 1.6 is out the door, would people be willing to consider extending
Python's token set to make HTML/XML-ish spellings using entity references
legal?  This would make the following 100% legal Python:

i = 0
while i &lt; 10:
    print i &amp; 1
    i = i + 1

which would in turn make it easier to embed Python in XML such as
config-files-for-whatever-Software-Carpentry-produces-to-replace-make,
PMZ, and so on.

Greg


From skip at mojam.com  Mon Mar 13 22:23:17 2000
From: skip at mojam.com (Skip Montanaro)
Date: Mon, 13 Mar 2000 15:23:17 -0600 (CST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <14541.23621.89087.357783@beluga.mojam.com>

    Greg> Once 1.6 is out the door, would people be willing to consider
    Greg> extending Python's token set to make HTML/XML-ish spellings using
    Greg> entity references legal?  This would make the following 100% legal
    Greg> Python:

    Greg> i = 0
    Greg> while i &lt; 10:
    Greg>     print i &amp; 1
    Greg>     i = i + 1

What makes it difficult to pump your Python code through cgi.escape when
embedding it?  There doesn't seem to be an inverse function to cgi.escape
(at least not in the cgi module), but I suspect it could rather easily be
written. 

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From akuchlin at mems-exchange.org  Mon Mar 13 22:23:29 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Mon, 13 Mar 2000 16:23:29 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <14541.23633.873411.86833@amarok.cnri.reston.va.us>

gvwilson at nevex.com writes:
>Once 1.6 is out the door, would people be willing to consider extending
>Python's token set to make HTML/XML-ish spellings using entity references
>legal?  This would make the following 100% legal Python:
>
>i = 0
>while i &lt; 10:
>    print i &amp; 1
>    i = i + 1

I don't think that would be sufficient.  What about user-defined
entities, as in r&eacute;sultat = max(a,b)?  (r?sultat, in French.)
Would Python have to also parse a DTD from somewhere?  What about
other places when Python and XML syntax collide, as in this contrived
example:

<![CDATA[
# Python code starts here
if a[index[1]]>b:
    print ...

Oops!  The ]]> looks like the end of the CDATA section, but it's legal
Python code.  IMHO whatever tool is outputting the XML should handle
escaping wacky characters in the Python code, which will be undone
by the parser when the XML gets parsed.  Users certainly won't be
writing this XML by hand; writing 'if (i &lt; 10)' is very strange.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Art history is the nightmare from which art is struggling to awake.
    -- Robert Fulford


From gvwilson at nevex.com  Mon Mar 13 22:58:27 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Mon, 13 Mar 2000 16:58:27 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <14541.23633.873411.86833@amarok.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003131638180.12270-100000@akbar.nevex.com>

> >Greg Wilson wrote:
> >...would people be willing to consider extending
> >Python's token set to make HTML/XML-ish spellings using entity references
> >legal?
> >
> >i = 0
> >while i &lt; 10:
> >    print i &amp; 1
> >    i = i + 1

> Skip Montanaro wrote:
> What makes it difficult to pump your Python code through cgi.escape when
> embedding it?

Most non-programmers use WYSIWYG editor, and many of these are moving
toward XML-compliant formats.  Parsing the standard character entities
seemed like a good first step toward catering to this (large) audience.

> Andrew Kuchling wrote:
> I don't think that would be sufficient.  What about user-defined
> entities, as in r&eacute;sultat = max(a,b)?  (r?sultat, in French.)
> Would Python have to also parse a DTD from somewhere?

Longer term, I believe that someone is going to come out with a
programming language that (finally) leaves the flat-ASCII world behind,
and lets people use the structuring mechanisms (e.g. XML) that we have
developed for everyone else's data.  I think it would be to Python's
advantage to be first, and if I'm wrong, there's little harm done.
User-defined entities, DTD's, and the like are probably part of that, but
I don't think I know enough to know what to ask for.  Escaping the
standard entites seems like an easy start.

> Andrew Kuchling also wrote:
> What about other places when Python and XML syntax collide, as in this
> contrived example:
> 
> <![CDATA[
> # Python code starts here
> if a[index[1]]>b:
>     print ...
> 
> Oops!  The ]]> looks like the end of the CDATA section, but it's legal
> Python code.

Yup; that's one of the reasons I'd like to be able to write:

<python>
# Python code starts here
if a[index[1]]&gt;b:
    print ...
</python>

> Users certainly won't be writing this XML by hand; writing 'if (i &lt;
> 10)' is very strange.

I'd expect my editor to put '&lt;' in the file when I press the '<' key,
and to display '<' on the screen when viewing the file.

thanks,
Greg


From beazley at rustler.cs.uchicago.edu  Mon Mar 13 23:35:24 2000
From: beazley at rustler.cs.uchicago.edu (David M. Beazley)
Date: Mon, 13 Mar 2000 16:35:24 -0600 (CST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <200003132235.QAA08031@rustler.cs.uchicago.edu>

gvwilson at nevex.com writes:
 > Once 1.6 is out the door, would people be willing to consider extending
 > Python's token set to make HTML/XML-ish spellings using entity references
 > legal?  This would make the following 100% legal Python:
 > 
 > i = 0
 > while i &lt; 10:
 >     print i &amp; 1
 >     i = i + 1
 > 
 > which would in turn make it easier to embed Python in XML such as
 > config-files-for-whatever-Software-Carpentry-produces-to-replace-make,
 > PMZ, and so on.
 > 

Sure, and while we're at it, maybe we can add support for C trigraph
sequences as well.  Maybe I'm missing the point, but why can't you
just use a filter (cgi.escape() or something comparable)?  I for one,
am *NOT* in favor of complicating the Python parser in this most bogus
manner.

Furthermore, with respect to the editor argument, I can't think of a
single reason why any sane programmer would be writing programs in
Microsoft Word or whatever it is that you're talking about.
Therefore, I don't think that the Python parser should be modified in
any way to account for XML tags, entities, or other extraneous markup
that's not part of the core language.  I know that I, for one, would
be extremely pissed if I fired up emacs and had to maintain someone
else's code that had all of this garbage in it.  Just my 0.02.

-- Dave


From gvwilson at nevex.com  Mon Mar 13 23:48:33 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Mon, 13 Mar 2000 17:48:33 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <200003132235.QAA08031@rustler.cs.uchicago.edu>
Message-ID: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>

> David M. Beazley wrote:

> ...and while we're at it, maybe we can add support for C trigraph
> sequences as well.

I don't know of any mass-market editors that generate C trigraphs.

> ...I can't think of a single reason why any sane programmer would be
> writing programs in Microsoft Word or whatever it is that you're
> talking about.

'S funny --- my non-programmer friends can't figure out why any sane
person would use a glorified glass TTY like emacs... or why they should
have to, just to program... I just think that someone's going to do this
for some language, some time soon, and I'd rather Python be in the lead
than play catch-up.

Thanks,
Greg


From effbot at telia.com  Tue Mar 14 00:16:41 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Tue, 14 Mar 2000 00:16:41 +0100
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
Message-ID: <00ca01bf8d42$6a154500$34aab5d4@hagrid>

Greg wrote:

> > ...I can't think of a single reason why any sane programmer would be
> > writing programs in Microsoft Word or whatever it is that you're
> > talking about.
> 
> 'S funny --- my non-programmer friends can't figure out why any sane
> person would use a glorified glass TTY like emacs... or why they should
> have to, just to program... I just think that someone's going to do this
> for some language, some time soon, and I'd rather Python be in the lead
> than play catch-up.

I don't get it.  the XML specification contains a lot of stuff,
and I completely fail to see how adding support for a very
small part of XML would make it possible to use XML editors
to write Python code.

what am I missing?

</F>


From DavidA at ActiveState.com  Tue Mar 14 00:15:25 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Mon, 13 Mar 2000 15:15:25 -0800
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
Message-ID: <NDBBJPNCJLKKIOBLDOMJEENOCBAA.DavidA@ActiveState.com>

> 'S funny --- my non-programmer friends can't figure out why any sane
> person would use a glorified glass TTY like emacs... or why they should
> have to, just to program... I just think that someone's going to do this
> for some language, some time soon, and I'd rather Python be in the lead
> than play catch-up.

But the scheme you put forth causes major problems for current Python users
who *are* using glass TTYs, so I don't think it'll fly for very basic
political reasons nicely illustrated by Dave-the-diplomat's response.

While storage of Python files in XML documents is a good thing, it's hard to
see why XML should be viewed as the only storage format for Python files.  I
think a much richer XML schema could be useful in some distant future:

<class name="Foo">
  <method name="Foo">
    <argumentlist>
      <argument name="self">
      ...

What might be more useful in the short them IMO is to define a _standard_
mechanism for Python-in-XML encoding/decoding, so that all code which
encodes Python in XML is done the same way, and so that XML editors can
figure out once and for all how to decode Python-in-CDATA.

Strawman Encoding # 1:
  replace < with &lt; and > with &gt; when not in strings, and vice versa on
the decoding side.

Strawman Encoding # 2:
  - do Strawman 1, AND
  - replace space-determined indentation with { and } tokens or other INDENT
and DEDENT markers using some rare Unicode characters to work around
inevitable bugs in whitespace handling of XML processors.

--david


From gvwilson at nevex.com  Tue Mar 14 00:14:43 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Mon, 13 Mar 2000 18:14:43 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJEENOCBAA.DavidA@ActiveState.com>
Message-ID: <Pine.LNX.4.10.10003131811020.14229-100000@akbar.nevex.com>

> David Ascher wrote:
> But the scheme you put forth causes major problems for current Python
> users who *are* using glass TTYs, so I don't think it'll fly for very
> basic political reasons nicely illustrated by Dave's response.

Understood.  I thought that handling standard entities might be a
useful first step toward storage of Python as XML, which in turn would
help make Python more accessible to people who don't want to switch
editors just to program.  I felt that an all-or-nothing approach would be
even less likely to get a favorable response than handling entities... :-)

Greg


From beazley at rustler.cs.uchicago.edu  Tue Mar 14 00:12:55 2000
From: beazley at rustler.cs.uchicago.edu (David M. Beazley)
Date: Mon, 13 Mar 2000 17:12:55 -0600 (CST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
References: <200003132235.QAA08031@rustler.cs.uchicago.edu>
	<Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
Message-ID: <200003132312.RAA08107@rustler.cs.uchicago.edu>

gvwilson at nevex.com writes:
 > 
 > 'S funny --- my non-programmer friends can't figure out why any sane
 > person would use a glorified glass TTY like emacs... or why they should
 > have to, just to program...

Look, I'm all for CP4E and making programming more accessible to the
masses, but as a professional programmer, I frankly do not care what
non-programmers think about the tools that I (and most of the
programming world) use to write software.  Furthermore, if all of your
non-programmer friends don't want to care about the underlying
details, they certainly won't care how programs are
represented---including a nice and *simple* text representation
without markup, entities, and other syntax that is not an essential
part of the language.  However, as a professional, I most certainly DO
care about how programs are represented--specifically, I want to be
able to move them around between machines. Edit them with essentially
any editor, transform them as I see fit, and be able to easily read
them and have a sense of what is going on.  Markup is just going to
make this a huge pain in the butt. No, I'm not for this idea one
bit. Sorry.

 > I just think that someone's going to do this
 > for some language, some time soon, and I'd rather Python be in the lead
 > than play catch-up.

What gives you the idea that Python is behind?  What is it playing
catch up to?

-- Dave


From DavidA at ActiveState.com  Tue Mar 14 00:36:54 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Mon, 13 Mar 2000 15:36:54 -0800
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131811020.14229-100000@akbar.nevex.com>
Message-ID: <NDBBJPNCJLKKIOBLDOMJEEOACBAA.DavidA@ActiveState.com>

> > David Ascher wrote:
> > But the scheme you put forth causes major problems for current Python
> > users who *are* using glass TTYs, so I don't think it'll fly for very
> > basic political reasons nicely illustrated by Dave's response.
>
> Understood.  I thought that handling standard entities might be a
> useful first step toward storage of Python as XML, which in turn would
> help make Python more accessible to people who don't want to switch
> editors just to program.  I felt that an all-or-nothing approach would be
> even less likely to get a favorable response than handling entities... :-)
>
> Greg

If you propose a transformation between Python Syntax and XML, then you
potentially have something which all parties can agree to as being a good
thing.  Forcing one into the other is denying the history and current
practices of both domains and user populations.  You cannot ignore the fact
that "I can read anyone's Python" is a key selling point of Python among its
current practitioners, or that its cleanliness and lack of magic characters
($ is usually invoked, but &lt; is just as magic/ugly) are part of its
appeal/success.

No XML editor is going to edit all XML documents without custom editors
anyway!  I certainly don't expect to be drawing SVG diagrams with a
keyboard!  That's what schemas and custom editors are for.  Define a schema
for 'encoded Python' (well, first, find a schema notation that will
survive), write a plugin to your favorite XML editor, and then your
(theoretical? =) users can use the same 'editor' to edit PythonXML or any
other XML.  Most XML probably won't be edited with a keyboard but with a
pointing device or a speech recognizer anyway...

IMO, you're being seduced by the apparent closeness between XML and
Python-in-ASCII.  It's only superficial...  Think of Python-in-ASCII as a
rendering of Python-in-XML, Dave will think of Python-in-XML as a rendering
of Python-in-ASCII, and everyone will be happy (as long as everyone agrees
on the one-to-one transformation).

--david


From paul at prescod.net  Tue Mar 14 00:43:48 2000
From: paul at prescod.net (Paul Prescod)
Date: Mon, 13 Mar 2000 15:43:48 -0800
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <38CD7D34.6569C1AA@prescod.net>

You should use your entities in the XML files, and then whatever
application actually launches Python (PMZ, your make engine, XMetaL)
could decode the data and launch Python. 
This is already how it works in XMetaL. I've just reinstalled recently
so I don't have my macro file. Therefore, please excuse the Javascript
(not Python) example.

<MACRO name="Revert To Saved" lang="JScript" id="90" 
desc="Opens last saved version of the current document">
<![CDATA[
if (!ActiveDocument.Saved) {
  retVal = Application.Confirm("If you continue you will lose changes to
this document.\nDo you want to revert to the last-saved version?");
  if (retVal) {
    ActiveDocument.Reload();
  }
}
]]></MACRO> 
 
This is in "journalist.mcr" in the "Macros" folder of XMetaL. This
already works fine for Python. You change lang="Python" and thanks to
the benevalence of Bill Gates and the hard work of Mark Hammond, you can
use Python for XMetaL macros. It doesn't work perfectly: exceptions
crash XMetaL, last I tried.

As long as you don't make mistakes, everything works nicely. :) You can
write XMetaL macros in Python and the whole thing is stored as XML.
Still, XMetaL is not very friendly as a Python editor. It doesn't have
nice whitespace handling!

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Out of timber so crooked as that which man is made nothing entirely
straight can be built. - Immanuel Kant


From paul at prescod.net  Tue Mar 14 00:59:23 2000
From: paul at prescod.net (Paul Prescod)
Date: Mon, 13 Mar 2000 15:59:23 -0800
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131739380.13953-100000@akbar.nevex.com>
Message-ID: <38CD80DB.39150F33@prescod.net>

gvwilson at nevex.com wrote:
> 
> 'S funny --- my non-programmer friends can't figure out why any sane
> person would use a glorified glass TTY like emacs... or why they should
> have to, just to program... I just think that someone's going to do this
> for some language, some time soon, and I'd rather Python be in the lead
> than play catch-up.

Your goal is worth pursuing but I agree with the others that the syntax
change is not the right way.

It _is_ possible to teach XMetaL to edit Python programs -- structurally
-- just as it does XML. What you do is hook into the macro engine (which
already supports Python) and use the Python tokenizer to build a parse
tree. You copy that into a DOM using the same elements and attributes
you would use if you were doing some kind of batch conversion. Then on
"save" you reverse the process. Implementation time: ~3 days.

The XMetaL competitor, Documentor has an API specifically designed to
make this sort of thing easy.

Making either of them into a friendly programmer's editor is a much
larger task. I think this is where the majority of the R&D should occur,
not at the syntax level. If one invents a fundamentally better way of
working with the structures behind Python code, then it would be
relatively easy to write code that maps that to today's Python syntax.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Out of timber so crooked as that which man is made nothing entirely
straight can be built. - Immanuel Kant


From moshez at math.huji.ac.il  Tue Mar 14 02:14:09 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 14 Mar 2000 03:14:09 +0200 (IST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.LNX.4.10.10003131549530.12270-100000@akbar.nevex.com>
Message-ID: <Pine.GSO.4.10.10003140312520.12735-100000@sundial>

On Mon, 13 Mar 2000 gvwilson at nevex.com wrote:

> Once 1.6 is out the door, would people be willing to consider extending
> Python's token set to make HTML/XML-ish spellings using entity references
> legal?  This would make the following 100% legal Python:
> 
> i = 0
> while i &lt; 10:
>     print i &amp; 1
>     i = i + 1
> 
> which would in turn make it easier to embed Python in XML such as
> config-files-for-whatever-Software-Carpentry-produces-to-replace-make,
> PMZ, and so on.

Why? Whatever XML parser you use will output "i&lt;1" as "i<1", so 
the Python that comes out of the XML parser is quite all right. Why change
Python to do an XML parser job?
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mhammond at skippinet.com.au  Tue Mar 14 02:18:45 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue, 14 Mar 2000 12:18:45 +1100
Subject: [Python-Dev] unicode objects and C++
Message-ID: <ECEPKNMJLHAPFFJHDOJBIEHMCGAA.mhammond@skippinet.com.au>

I struck a bit of a snag with the Unicode support when trying to use the
most recent update in a C++ source file.

The problem turned out to be that unicodeobject.h did a #include "wchar.h",
but did it while an 'extern "C"' block was open.  This upset the MSVC6
wchar.h, as it has special C++ support.

Attached below is a patch I made to unicodeobject.h that solved my problem
and allowed my compilations to succeed.  Theoretically the same problem
could exist for wctype.h, and probably lots of other headers, but this is
the immediate problem :-)

An alternative patch would be to #include "whcar.h" in PC\config.h outside
of any 'extern "C"' blocks - wchar.h on Windows has guards that allows for
multiple includes, so the unicodeobject.h include of that file will succeed,
but not have the side-effect it has now.

Im not sure what the preferred solution is - quite possibly the PC\config.h
change, but Ive include the unicodeobject.h patch anyway :-)

Mark.

*** unicodeobject.h	2000/03/13 23:22:24	2.2
--- unicodeobject.h	2000/03/14 01:06:57
***************
*** 85,91 ****
--- 85,101 ----
  #endif

  #ifdef HAVE_WCHAR_H
+
+ #ifdef __cplusplus
+ } /* Close the 'extern "C"' before bringing in system headers */
+ #endif
+
  # include "wchar.h"
+
+ #ifdef __cplusplus
+ extern "C" {
+ #endif
+
  #endif

  #ifdef HAVE_USABLE_WCHAR_T


From mal at lemburg.com  Tue Mar 14 00:31:30 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 14 Mar 2000 00:31:30 +0100
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131811020.14229-100000@akbar.nevex.com>
Message-ID: <38CD7A52.5709DF5F@lemburg.com>

gvwilson at nevex.com wrote:
> 
> > David Ascher wrote:
> > But the scheme you put forth causes major problems for current Python
> > users who *are* using glass TTYs, so I don't think it'll fly for very
> > basic political reasons nicely illustrated by Dave's response.
> 
> Understood.  I thought that handling standard entities might be a
> useful first step toward storage of Python as XML, which in turn would
> help make Python more accessible to people who don't want to switch
> editors just to program.  I felt that an all-or-nothing approach would be
> even less likely to get a favorable response than handling entities... :-)

This should be easy to implement provided a hook for compile()
is added to e.g. the sys-module which then gets used instead
of calling the byte code compiler directly...

Then you could redirect the compile() arguments to whatever
codec you wish (e.g. a SGML entity codec) and the builtin
compiler would only see the output of that codec.

Well, just a thought... I don't think encoding programs would
make life as a programmer easier, but instead harder. It adds
one more level of confusion on top of it all.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Tue Mar 14 10:45:49 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 14 Mar 2000 10:45:49 +0100
Subject: [Python-Dev] unicode objects and C++
References: <ECEPKNMJLHAPFFJHDOJBIEHMCGAA.mhammond@skippinet.com.au>
Message-ID: <38CE0A4D.1209B830@lemburg.com>

Mark Hammond wrote:
> 
> I struck a bit of a snag with the Unicode support when trying to use the
> most recent update in a C++ source file.
> 
> The problem turned out to be that unicodeobject.h did a #include "wchar.h",
> but did it while an 'extern "C"' block was open.  This upset the MSVC6
> wchar.h, as it has special C++ support.

Thanks for reporting this.
 
> Attached below is a patch I made to unicodeobject.h that solved my problem
> and allowed my compilations to succeed.  Theoretically the same problem
> could exist for wctype.h, and probably lots of other headers, but this is
> the immediate problem :-)
> 
> An alternative patch would be to #include "whcar.h" in PC\config.h outside
> of any 'extern "C"' blocks - wchar.h on Windows has guards that allows for
> multiple includes, so the unicodeobject.h include of that file will succeed,
> but not have the side-effect it has now.
> 
> Im not sure what the preferred solution is - quite possibly the PC\config.h
> change, but Ive include the unicodeobject.h patch anyway :-)
> 
> Mark.
> 
> *** unicodeobject.h     2000/03/13 23:22:24     2.2
> --- unicodeobject.h     2000/03/14 01:06:57
> ***************
> *** 85,91 ****
> --- 85,101 ----
>   #endif
> 
>   #ifdef HAVE_WCHAR_H
> +
> + #ifdef __cplusplus
> + } /* Close the 'extern "C"' before bringing in system headers */
> + #endif
> +
>   # include "wchar.h"
> +
> + #ifdef __cplusplus
> + extern "C" {
> + #endif
> +
>   #endif
> 
>   #ifdef HAVE_USABLE_WCHAR_T
> 

I've included this patch (should solve the problem for all inlcuded
system header files, since it wraps only the Unicode 
APIs in extern "C"):

--- /home/lemburg/clients/cnri/CVS-Python/Include/unicodeobject.h       Fri Mar 10 23:33:05 2000
+++ unicodeobject.h     Tue Mar 14 10:38:08 2000
@@ -1,10 +1,7 @@
 #ifndef Py_UNICODEOBJECT_H
 #define Py_UNICODEOBJECT_H
-#ifdef __cplusplus
-extern "C" {
-#endif
 
 /*
 
 Unicode implementation based on original code by Fredrik Lundh,
 modified by Marc-Andre Lemburg (mal at lemburg.com) according to the
@@ -167,10 +165,14 @@ typedef unsigned short Py_UNICODE;
 
 #define Py_UNICODE_MATCH(string, offset, substring)\
     (!memcmp((string)->str + (offset), (substring)->str,\
              (substring)->length*sizeof(Py_UNICODE)))
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* --- Unicode Type ------------------------------------------------------- */
 
 typedef struct {
     PyObject_HEAD
     int length;                        /* Length of raw Unicode data in buffer */


I'll post a complete Unicode update patch by the end of the week
for inclusion in CVS.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From ping at lfw.org  Tue Mar 14 12:19:59 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Tue, 14 Mar 2000 06:19:59 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <Pine.GSO.4.10.10003140312520.12735-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003140616390.558-100000@skuld.lfw.org>

On Tue, 14 Mar 2000, Moshe Zadka wrote:
> On Mon, 13 Mar 2000 gvwilson at nevex.com wrote:
> > legal?  This would make the following 100% legal Python:
> > 
> > i = 0
> > while i &lt; 10:
> >     print i &amp; 1
> >     i = i + 1
> 
> Why? Whatever XML parser you use will output "i&lt;1" as "i<1", so 
> the Python that comes out of the XML parser is quite all right. Why change
> Python to do an XML parser job?

I totally agree.

To me, this is the key issue: it is NOT the responsibility of the
programming language to accommodate any particular encoding format.

While we're at it, why don't we change Python to accept
quoted-printable source code?  Or base64-encoded source code?

XML already defines a perfectly reasonable mechanism for
escaping a plain stream of text -- adding this processing to
Python adds nothing but confusion.  The possible useful
benefit from adding the proposed "feature" is exactly zero.


-- ?!ng

"This code is better than any code that doesn't work has any right to be."
    -- Roger Gregory, on Xanadu


From ping at lfw.org  Tue Mar 14 12:21:59 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Tue, 14 Mar 2000 06:21:59 -0500 (EST)
Subject: [Python-Dev] Python 1.7 tokenization feature request
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJEEOACBAA.DavidA@ActiveState.com>
Message-ID: <Pine.LNX.4.10.10003140620470.558-100000@skuld.lfw.org>

On Mon, 13 Mar 2000, David Ascher wrote:
> 
> If you propose a transformation between Python Syntax and XML, then you
> potentially have something which all parties can agree to as being a good
> thing.

Indeed.  I know that i wouldn't have any use for it at the moment,
but i can see the potential for usefulness of a structured representation
for Python source code (like an AST in XML) which could be directly
edited in an XML editor, and processed (by an XSL stylesheet?) to produce
actual runnable Python.  But attempting to mix the two doesn't get
you anywhere.


-- ?!ng

"This code is better than any code that doesn't work has any right to be."
    -- Roger Gregory, on Xanadu


From effbot at telia.com  Tue Mar 14 16:41:01 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Tue, 14 Mar 2000 16:41:01 +0100
Subject: [Python-Dev] Python 1.7 tokenization feature request
References: <Pine.LNX.4.10.10003131811020.14229-100000@akbar.nevex.com>
Message-ID: <002201bf8dcb$ba9a11c0$34aab5d4@hagrid>

Greg:

> Understood.  I thought that handling standard entities might be a
> useful first step toward storage of Python as XML, which in turn would
> help make Python more accessible to people who don't want to switch
> editors just to program.  I felt that an all-or-nothing approach would be
> even less likely to get a favorable response than handling entities... :-)

well, I would find it easier to support a more aggressive
proposal:

    make sure Python 1.7 can deal with source code
    written in Unicode, using any supported encoding.

with that in place, you can plug in your favourite unicode
encoding via the Unicode framework.

</F>


From effbot at telia.com  Tue Mar 14 23:21:38 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Tue, 14 Mar 2000 23:21:38 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
Message-ID: <000901bf8e03$abf88420$34aab5d4@hagrid>

> I've just checked in a massive patch from Marc-Andre Lemburg which
> adds Unicode support to Python.

massive, indeed.

didn't notice this before, but I just realized that after the
latest round of patches, the python15.dll is now 700k larger
than it was for 1.5.2 (more than twice the size).

my original unicode DLL was 13k.

hmm...

</F>


From akuchlin at mems-exchange.org  Tue Mar 14 23:19:44 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Tue, 14 Mar 2000 17:19:44 -0500 (EST)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <000901bf8e03$abf88420$34aab5d4@hagrid>
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
	<000901bf8e03$abf88420$34aab5d4@hagrid>
Message-ID: <14542.47872.184978.985612@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>didn't notice this before, but I just realized that after the
>latest round of patches, the python15.dll is now 700k larger
>than it was for 1.5.2 (more than twice the size).

Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source
code, and produces a 632168-byte .o file on my Sparc.  (Will some
compiler systems choke on a file that large?  Could we read database
info from a file instead, or mmap it into memory?)

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
    "Are you OK, dressed like that? You don't seem to notice the cold."
    "I haven't come ten thousand miles to discuss the weather, Mr Moberly."
    -- Moberly and the Doctor, in "The Seeds of Doom"


From mal at lemburg.com  Wed Mar 15 09:32:29 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 09:32:29 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
		<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us>
Message-ID: <38CF4A9D.13A0080@lemburg.com>

"Andrew M. Kuchling" wrote:
> 
> Fredrik Lundh writes:
> >didn't notice this before, but I just realized that after the
> >latest round of patches, the python15.dll is now 700k larger
> >than it was for 1.5.2 (more than twice the size).
> 
> Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source
> code, and produces a 632168-byte .o file on my Sparc.  (Will some
> compiler systems choke on a file that large?  Could we read database
> info from a file instead, or mmap it into memory?)

That is dues to the unicodedata module being compiled
into the DLL statically. On Unix you can build it shared too
-- there are no direct references to it in the implementation.
I suppose that on Windows the same should be done... the
question really is whether this is intended or not -- moving
the module into a DLL is at least technically no problem
(someone would have to supply a patch for the MSVC project
files though).

Note that unicodedata is only needed by programs which do
a lot of Unicode manipulations and in the future probably
by some codecs too.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From pf at artcom-gmbh.de  Wed Mar 15 11:42:26 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 15 Mar 2000 11:42:26 +0100 (MET)
Subject: [Python-Dev] Unicode in Python and Tcl/Tk compared (was Unicode patches checked in...)
In-Reply-To: <38CF4A9D.13A0080@lemburg.com> from "M.-A. Lemburg" at "Mar 15, 2000  9:32:29 am"
Message-ID: <m12VBFy-000CnCC@artcom0.artcom-gmbh.de>

Hi!

> > Fredrik Lundh writes:
> > >didn't notice this before, but I just realized that after the
> > >latest round of patches, the python15.dll is now 700k larger
> > >than it was for 1.5.2 (more than twice the size).
> > 
> "Andrew M. Kuchling" wrote:
> > Most of that is due to Modules/unicodedata.c, which is 2.1Mb of source
> > code, and produces a 632168-byte .o file on my Sparc.  (Will some
> > compiler systems choke on a file that large?  Could we read database
> > info from a file instead, or mmap it into memory?)
> 
M.-A. Lemburg wrote:
> That is dues to the unicodedata module being compiled
> into the DLL statically. On Unix you can build it shared too
> -- there are no direct references to it in the implementation.
> I suppose that on Windows the same should be done... the
> question really is whether this is intended or not -- moving
> the module into a DLL is at least technically no problem
> (someone would have to supply a patch for the MSVC project
> files though).
> 
> Note that unicodedata is only needed by programs which do
> a lot of Unicode manipulations and in the future probably
> by some codecs too.

Now as the unicode patches were checked in and as Fredrik Lundh
noticed a considerable increase of the size of the python-DLL,
which was obviously mostly caused by those tables, I had some fear
that a Python/Tcl/Tk based application could eat up much more memory,
if we update from Python1.5.2 and Tcl/Tk 8.0.5 
to Python 1.6 and Tcl/Tk 8.3.0.

As some of you certainly know, some kind of unicode support has
also been added to Tcl/Tk since 8.1.  So I did some research and
would like to share what I have found out so far:

Here are the compared sizes of the tcl/tk shared libs on Linux:

   old:                   | new:                   | bloat increase in %:
   -----------------------+------------------------+---------------------
   libtcl8.0.so    533414 | libtcl8.3.so    610241 | 14.4 %
   libtk8.0.so     714908 | libtk8.3.so     811916 | 13.6 %

The addition of unicode wasn't the only change to TclTk.  So this
seems reasonable.  Unfortunately there is no python shared library,
so a direct comparison of increased memory consumption is impossible.
Nevertheless I've the following figures (stripped binary sizes of
the Python interpreter):
   1.5.2           382616 
   CVS_10-02-00    393668 (a month before unicode)
   CVS_12-03-00    507448 (just after unicode)
That is an increase of "only" 111 kBytes.  Not so bad but nevertheless
a "bloat increase" of 32.6 %.  And additionally there is now
   unicodedata.so  634940 
   _codecsmodule.so 38955 
which (I guess) will also be loaded if the application starts using some
of the new features.

Since I didn't take care of unicode in the past, I feel unable to
compare the implementations of unicode in both systems and what impact
they will have on the real memory performance and even more important on
the functionality of the combined use of both packages together with
Tkinter.

Tcl/Tk keeps around a sub-directory called 'encoding', which --I guess--
contains information somehow similar or related to that in 'unicodedata.so', 
but separated into several files?

So below I included a shortened excerpts from the 200k+ tcl8.3.0/changes
and the tk8.3.0/changes files about unicode.  May be someone
else more involved with unicode can shed some light on this topic?

Do we need some changes to Tkinter.py or _tkinter or both?

---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ----
[...]
======== Changes for 8.1 go below this line ========

6/18/97 (new feature) Tcl now supports international character sets:
    - All C APIs now accept UTF-8 strings instead of iso8859-1 strings,
      wherever you see "char *", unless explicitly noted otherwise.
    - All Tcl strings represented in UTF-8, which is a convenient
      multi-byte encoding of Unicode.  Variable names, procedure names,
      and all other values in Tcl may include arbitrary Unicode characters.
      For example, the Tcl command "string length" returns how many
      Unicode characters are in the argument string.
    - For Java compatibility, embedded null bytes in C strings are
      represented as \xC080 in UTF-8 strings, but the null byte at the end
      of a UTF-8 string remains \0.  Thus Tcl strings once again do not
      contain null bytes, except for termination bytes.
    - For Java compatibility, "\uXXXX" is used in Tcl to enter a Unicode
      character.  "\u0000" through "\uffff" are acceptable Unicode 
      characters.  
    - "\xXX" is used to enter a small Unicode character (between 0 and 255)
      in Tcl.
    - Tcl automatically translates between UTF-8 and the normal encoding for
      the platform during interactions with the system.
    - The fconfigure command now supports a -encoding option for specifying
      the encoding of an open file or socket.  Tcl will automatically
      translate between the specified encoding and UTF-8 during I/O. 
      See the directory library/encoding to find out what encodings are
      supported (eventually there will be an "encoding" command that
      makes this information more accessible).
    - There are several new C APIs that support UTF-8 and various encodings.
      See Utf.3 for procedures that translate between Unicode and UTF-8
      and manipulate UTF-8 strings. See Encoding.3 for procedures that
      create new encodings and translate between encodings.  See
      ToUpper.3 for procedures that perform case conversions on UTF-8
      strings.
[...]
1/16/98 (new feature) Tk now supports international characters sets:
    - Font display mechanism overhauled to display Unicode strings
      containing full set of international characters.  You do not need
      Unicode fonts on your system in order to use tk or see international
      characters.  For those familiar with the Japanese or Chinese patches,
      there is no "-kanjifont" option.  Characters from any available fonts
      will automatically be used if the widget's originally selected font is
      not capable of displaying a given character.  
    - Textual widgets are international aware.  For instance, cursor
      positioning commands would now move the cursor forwards/back by 1
      international character, not by 1 byte.  
    - Input Method Editors (IMEs) work on Mac and Windows.  Unix is still in
      progress.
[...]
10/15/98 (bug fix) Changed regexp and string commands to properly
handle case folding according to the Unicode character
tables. (stanton)

10/21/98 (new feature) Added an "encoding" command to facilitate
translations of strings between different character encodings.  See
the encoding.n manual entry for more details. (stanton)

11/3/98 (bug fix) The regular expression character classification
syntax now includes Unicode characters in the supported
classes. (stanton)
[...]
11/17/98 (bug fix) "scan" now correctly handles Unicode
characters. (stanton)
[...]
11/19/98 (bug fix) Fixed menus and titles so they properly display
Unicode characters under Windows. [Bug: 819] (stanton)
[...]
4/2/99 (new apis)  Made various Unicode utility functions public.
Tcl_UtfToUniCharDString, Tcl_UniCharToUtfDString, Tcl_UniCharLen,
Tcl_UniCharNcmp, Tcl_UniCharIsAlnum, Tcl_UniCharIsAlpha,
Tcl_UniCharIsDigit, Tcl_UniCharIsLower, Tcl_UniCharIsSpace,
Tcl_UniCharIsUpper, Tcl_UniCharIsWordChar, Tcl_WinUtfToTChar,
Tcl_WinTCharToUtf (stanton)
[...]
4/5/99 (bug fix) Fixed handling of Unicode in text searches.  The
-count option was returning byte counts instead of character counts.
[...]
5/18/99 (bug fix) Fixed clipboard code so it handles Unicode data
properly on Windows NT and 95. [Bug: 1791] (stanton)
[...]
6/3/99  (bug fix) Fixed selection code to handle Unicode data in
COMPOUND_TEXT and STRING selections.  [Bug: 1791] (stanton)
[...]
6/7/99  (new feature) Optimized string index, length, range, and
append commands. Added a new Unicode object type. (hershey)
[...]
6/14/99 (new feature) Merged string and Unicode object types.  Added
new public Tcl API functions:  Tcl_NewUnicodeObj, Tcl_SetUnicodeObj,
Tcl_GetUnicode, Tcl_GetUniChar, Tcl_GetCharLength, Tcl_GetRange,
Tcl_AppendUnicodeToObj. (hershey)
[...]
6/23/99 (new feature) Updated Unicode character tables to reflect
Unicode 2.1 data. (stanton)
[...]

--- Released 8.3.0, February 10, 2000 --- See ChangeLog for details ---
---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ----

Sorry if this was boring old stuff for some of you.

Best Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From marangoz at python.inrialpes.fr  Wed Mar 15 12:40:21 2000
From: marangoz at python.inrialpes.fr (Vladimir Marangozov)
Date: Wed, 15 Mar 2000 12:40:21 +0100 (CET)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38CF4A9D.13A0080@lemburg.com> from "M.-A. Lemburg" at Mar 15, 2000 09:32:29 AM
Message-ID: <200003151140.MAA30301@python.inrialpes.fr>

M.-A. Lemburg wrote:
> 
> Note that unicodedata is only needed by programs which do
> a lot of Unicode manipulations and in the future probably
> by some codecs too.

Perhaps it would make sense to move the Unicode database on the
Python side (write it in Python)? Or init the database dynamically
in the unicodedata module on import? It's quite big, so if it's
possible to avoid the static declaration (and if the unicodata module
is enabled by default), I'd vote for a dynamic initialization of the
database from reference (Python ?) file(s).

M-A, is something in this spirit doable?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tismer at tismer.com  Wed Mar 15 13:57:04 2000
From: tismer at tismer.com (Christian Tismer)
Date: Wed, 15 Mar 2000 13:57:04 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
			<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com>
Message-ID: <38CF88A0.CF876A74@tismer.com>


"M.-A. Lemburg" wrote:
...

> Note that unicodedata is only needed by programs which do
> a lot of Unicode manipulations and in the future probably
> by some codecs too.

Would it be possible to make the Unicode support configurable?

My problem is that patches in the CVS are of different kinds.
Some are error corrections and enhancements which I would
definately like to use.
Others are brand new features like the Unicode support.
Absolutely great stuff! But this will most probably change
a number of times again, and I think it is a bad idea when
I include it into my Stackless distribution.

I'd appreciate it very much if I could use the same CVS tree
for testing new stuff, and to build my distribution, with
new features switched off. Please :-)

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From jim at digicool.com  Wed Mar 15 14:35:48 2000
From: jim at digicool.com (Jim Fulton)
Date: Wed, 15 Mar 2000 08:35:48 -0500
Subject: [Python-Dev] Finalizers considered questionable ;)
Message-ID: <38CF91B4.A36C8C5@digicool.com>

Here's my $0.02.

I agree with the sentiments that use of finalizers 
should be discouraged.  They are extremely helpful
in cases like tempfile.TemporaryFileWrapper, so I
think that they should be supported. I do think that
the language should not promise a high level of service.

Some observations:

  - I spent a little bit of time on the ANSI 
    Smalltalk committee, where I naively advocated
    adding finalizers to the language. I was resoundingly
    told no. :)

  - Most of the Python objects I deal with these days
    are persistent. Their lifetimes are a lot more complicated
    that most Python objects.  They get created once, but they
    get loaded into and out of memory many times.  In fact, they
    can be in memory many times simultaneously. :) A couple
    of years ago I realized that it only made sense to call
    __init__ when an object was first created, not when it is
    subsequently (re)loaded into memory.  This led to a 
    change in Python pickling semantics and the deprecation
    of the loathsome __getinitargs__ protocol. :)

    For me, a similar case can be made against use of __del__
    for persistent objects.  For persistent objects, a __del__
    method should only be used for cleaning up the most volatile
    of resources. A persistent object __del__ should not perform
    any semantically meaningful operations because __del__ has 
    no semantic meaning.

  - Zope has a few uses of __del__. These are all for
    non-persistent objects. Interesting, in grepping for __del__,
    I found a lot of cases where __del__ was used and then commented 
    out.  Finalizers seem to be the sort of thing that people
    want initially and then get over.

I'm inclined to essentially keep the current rules and
simply not promise that __del__ will be able to run correctly.
That is, Python should call __del__ and ignore exceptions raised
(or provide some *optional* logging or other debugging facility).
There is no reason for __del__ to fail unless it depends on
cyclicly-related objects, which should be viewed as a design
mistake.

OTOH, __del__ should never fail because module globals go away. 
IMO, the current circular references involving module globals are
unnecessary, but that's a different topic. ;)

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From mal at lemburg.com  Wed Mar 15 16:00:14 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 16:00:14 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
				<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com>
Message-ID: <38CFA57E.21A3B3EF@lemburg.com>

Christian Tismer wrote:
> 
> "M.-A. Lemburg" wrote:
> ...
> 
> > Note that unicodedata is only needed by programs which do
> > a lot of Unicode manipulations and in the future probably
> > by some codecs too.
> 
> Would it be possible to make the Unicode support configurable?

This is currently not planned as the Unicode integration
touches many different parts of the interpreter to
enhance string/Unicode integration... sorry.

Also, I'm not sure whether adding #ifdefs throuhgout
the code would increase its elegance ;-)
 
> My problem is that patches in the CVS are of different kinds.
> Some are error corrections and enhancements which I would
> definately like to use.
> Others are brand new features like the Unicode support.
> Absolutely great stuff! But this will most probably change
> a number of times again, and I think it is a bad idea when
> I include it into my Stackless distribution.

Why not ? All you have to do is rebuild the distribution
every time you push a new version -- just like I did
for the Unicode version before the CVS checkin was done.
 
> I'd appreciate it very much if I could use the same CVS tree
> for testing new stuff, and to build my distribution, with
> new features switched off. Please :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Mar 15 15:57:13 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 15:57:13 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003151140.MAA30301@python.inrialpes.fr>
Message-ID: <38CFA4C9.E6B8EB5D@lemburg.com>

Vladimir Marangozov wrote:
> 
> M.-A. Lemburg wrote:
> >
> > Note that unicodedata is only needed by programs which do
> > a lot of Unicode manipulations and in the future probably
> > by some codecs too.
> 
> Perhaps it would make sense to move the Unicode database on the
> Python side (write it in Python)? Or init the database dynamically
> in the unicodedata module on import? It's quite big, so if it's
> possible to avoid the static declaration (and if the unicodata module
> is enabled by default), I'd vote for a dynamic initialization of the
> database from reference (Python ?) file(s).

The unicodedatabase module contains the Unicode database
as static C data - this makes it shareable among (Python)
processes.

Python modules don't provide this feature: instead a dictionary
would have to be built on import which would increase the heap
size considerably. Those dicts would *not* be shareable.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From tismer at tismer.com  Wed Mar 15 16:20:06 2000
From: tismer at tismer.com (Christian Tismer)
Date: Wed, 15 Mar 2000 16:20:06 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
					<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com>
Message-ID: <38CFAA26.2B2F0D01@tismer.com>


"M.-A. Lemburg" wrote:
> 
> Christian Tismer wrote:
...
> > Absolutely great stuff! But this will most probably change
> > a number of times again, and I think it is a bad idea when
> > I include it into my Stackless distribution.
> 
> Why not ? All you have to do is rebuild the distribution
> every time you push a new version -- just like I did
> for the Unicode version before the CVS checkin was done.

But how can I then publish my source code, when I always
pull Unicode into it. I don't like to be exposed to
side effects like 700kb code bloat, just by chance, since it
is in the dist right now (and will vanish again).

I don't say there must be #ifdefs all and everywhere, but
can I build without *using* Unicode? I don't want to
introduce something new to my users what they didn't ask for.
And I don't want to take care about their installations.
Finally I will for sure not replace a 500k DLL by a 1.2M
monster, so this is definately not what I want at the moment.

How do I build a dist that doesn't need to change a lot of
stuff in the user's installation?
Note that Stackless Python is a drop-in replacement,
not a Python distribution. Or should it be?

ciao - chris   (who really wants to get SLP 1.1 out)

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From effbot at telia.com  Wed Mar 15 17:04:54 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Wed, 15 Mar 2000 17:04:54 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>				<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com>
Message-ID: <014001bf8e98$35644480$34aab5d4@hagrid>

CT:
> How do I build a dist that doesn't need to change a lot of
> stuff in the user's installation?

somewhere in this thread, Guido wrote:

> BTW, I added a tag "pre-unicode" to the CVS tree to the revisions
> before the Unicode changes were made.

maybe you could base SLP on that one?

</F>


From marangoz at python.inrialpes.fr  Wed Mar 15 17:27:36 2000
From: marangoz at python.inrialpes.fr (Vladimir Marangozov)
Date: Wed, 15 Mar 2000 17:27:36 +0100 (CET)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38CFA4C9.E6B8EB5D@lemburg.com> from "M.-A. Lemburg" at Mar 15, 2000 03:57:13 PM
Message-ID: <200003151627.RAA32543@python.inrialpes.fr>

> [me]
> > 
> > Perhaps it would make sense to move the Unicode database on the
> > Python side (write it in Python)? Or init the database dynamically
> > in the unicodedata module on import? It's quite big, so if it's
> > possible to avoid the static declaration (and if the unicodata module
> > is enabled by default), I'd vote for a dynamic initialization of the
> > database from reference (Python ?) file(s).

[Marc-Andre]
> 
> The unicodedatabase module contains the Unicode database
> as static C data - this makes it shareable among (Python)
> processes.

The static data is shared if the module is a shared object (.so).
If unicodedata is not a .so, then you'll have a seperate copy of the
database in each process.

> 
> Python modules don't provide this feature: instead a dictionary
> would have to be built on import which would increase the heap
> size considerably. Those dicts would *not* be shareable.

I haven't mentioned dicts, have I? I suggested that the entries in the
C version of the database be rewritten in Python (or a text file)
The unicodedata module would, in it's init function, allocate memory
for the database and would populate it before returning "import okay"
to Python -- this is one way to init the db dynamically, among others.

As to sharing the database among different processes, this is a classic
IPC pb, which has nothing to do with the static C declaration of the db.
Or, hmmm, one of us is royally confused <wink>.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tismer at tismer.com  Wed Mar 15 17:22:42 2000
From: tismer at tismer.com (Christian Tismer)
Date: Wed, 15 Mar 2000 17:22:42 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>				<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid>
Message-ID: <38CFB8D2.537FCAD9@tismer.com>


Fredrik Lundh wrote:
> 
> CT:
> > How do I build a dist that doesn't need to change a lot of
> > stuff in the user's installation?
> 
> somewhere in this thread, Guido wrote:
> 
> > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions
> > before the Unicode changes were made.
> 
> maybe you could base SLP on that one?

I have no idea how this works. Would this mean that I cannot
get patctes which come after unicode?

Meanwhile, I've looked into the sources. It is easy for me
to get rid of the problem by supplying my own unicodedata.c,
where I replace all functions by some unimplemented exception.

Furthermore, I wondered about the data format. Is the unicode
database used inyou re package as well? Otherwise, I see
only references form unicodedata.c, and that means the data
structure can be massively enhanced.
At the moment, that baby is 64k entries long, with four bytes
and an optional string.
This is a big waste. The strings are almost all some distinct
<xxx> prefixes, together with a list of hex smallwords. This
is done as strings, probably this makes 80 percent of the space.

The only function that uses the "decomposition" field (namely
the string) is unicodedata_decomposition. It does nothing
more than to wrap it into a PyObject.
We can do a little better here. I gues I can bring it down
to a third of this space without much effort, just by using
- binary encoding for the <xxx> tags as enumeration
- binary encoding of the hexed entries
- omission of the spaces
Instead of a 64 k of structures which contain pointers anyway,
I can use a 64k pointer array with offsets into one packed
table.

The unicodedata access functions would change *slightly*,
just building some hex strings and so on. I guess this
is not a time critical section?

Should I try this evening? :-)

cheers - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From mal at lemburg.com  Wed Mar 15 17:04:43 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 17:04:43 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
						<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com>
Message-ID: <38CFB49B.885B8B16@lemburg.com>

Christian Tismer wrote:
> 
> "M.-A. Lemburg" wrote:
> >
> > Christian Tismer wrote:
> ...
> > > Absolutely great stuff! But this will most probably change
> > > a number of times again, and I think it is a bad idea when
> > > I include it into my Stackless distribution.
> >
> > Why not ? All you have to do is rebuild the distribution
> > every time you push a new version -- just like I did
> > for the Unicode version before the CVS checkin was done.
> 
> But how can I then publish my source code, when I always
> pull Unicode into it. I don't like to be exposed to
> side effects like 700kb code bloat, just by chance, since it
> is in the dist right now (and will vanish again).

All you have to do is build the unicodedata module shared
and not statically bound into python.dll. This one module
causes most of the code bloat...
 
> I don't say there must be #ifdefs all and everywhere, but
> can I build without *using* Unicode? I don't want to
> introduce something new to my users what they didn't ask for.
> And I don't want to take care about their installations.
> Finally I will for sure not replace a 500k DLL by a 1.2M
> monster, so this is definately not what I want at the moment.
> 
> How do I build a dist that doesn't need to change a lot of
> stuff in the user's installation?

I don't think that the Unicode stuff will disable
the running environment... (haven't tried this though).
The unicodedata module is not used by the interpreter
and the rest is imported on-the-fly, not during init
time, so at least in theory, not using Unicode will
result in Python not looking for e.g. the encodings
package.

> Note that Stackless Python is a drop-in replacement,
> not a Python distribution. Or should it be?

Probably... I think it's simply easier to install
and probably also easier to maintain because it doesn't
cause dependencies on other "default" installations.
The user will then explicitly know that she is installing
something a little different from the default distribution...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Mar 15 18:26:15 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 18:26:15 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>				<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> <38CFB8D2.537FCAD9@tismer.com>
Message-ID: <38CFC7B7.A1ABD51C@lemburg.com>

Christian Tismer wrote:
> 
> Fredrik Lundh wrote:
> >
> > CT:
> > > How do I build a dist that doesn't need to change a lot of
> > > stuff in the user's installation?
> >
> > somewhere in this thread, Guido wrote:
> >
> > > BTW, I added a tag "pre-unicode" to the CVS tree to the revisions
> > > before the Unicode changes were made.
> >
> > maybe you could base SLP on that one?
> 
> I have no idea how this works. Would this mean that I cannot
> get patctes which come after unicode?
> 
> Meanwhile, I've looked into the sources. It is easy for me
> to get rid of the problem by supplying my own unicodedata.c,
> where I replace all functions by some unimplemented exception.

No need (see my other posting): simply disable the module
altogether... this shouldn't hurt any part of the interpreter
as the module is a user-land only module.

> Furthermore, I wondered about the data format. Is the unicode
> database used inyou re package as well? Otherwise, I see
> only references form unicodedata.c, and that means the data
> structure can be massively enhanced.
> At the moment, that baby is 64k entries long, with four bytes
> and an optional string.
> This is a big waste. The strings are almost all some distinct
> <xxx> prefixes, together with a list of hex smallwords. This
> is done as strings, probably this makes 80 percent of the space.

I have made no attempt to optimize the structure... (due
to lack of time mostly) the current implementation is
really not much different from a rewrite of the UnicodeData.txt
file availble at the unicode.org site.

If you want to, I can mail you the marshalled Python dict version of
that database to play with.
 
> The only function that uses the "decomposition" field (namely
> the string) is unicodedata_decomposition. It does nothing
> more than to wrap it into a PyObject.
> We can do a little better here. I gues I can bring it down
> to a third of this space without much effort, just by using
> - binary encoding for the <xxx> tags as enumeration
> - binary encoding of the hexed entries
> - omission of the spaces
> Instead of a 64 k of structures which contain pointers anyway,
> I can use a 64k pointer array with offsets into one packed
> table.
> 
> The unicodedata access functions would change *slightly*,
> just building some hex strings and so on. I guess this
> is not a time critical section?

It may be if these functions are used in codecs, so you should
pay attention to speed too...
 
> Should I try this evening? :-)

Sure :-) go ahead...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Mar 15 18:39:14 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 15 Mar 2000 18:39:14 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003151627.RAA32543@python.inrialpes.fr>
Message-ID: <38CFCAC2.7690DF55@lemburg.com>

Vladimir Marangozov wrote:
> 
> > [me]
> > >
> > > Perhaps it would make sense to move the Unicode database on the
> > > Python side (write it in Python)? Or init the database dynamically
> > > in the unicodedata module on import? It's quite big, so if it's
> > > possible to avoid the static declaration (and if the unicodata module
> > > is enabled by default), I'd vote for a dynamic initialization of the
> > > database from reference (Python ?) file(s).
> 
> [Marc-Andre]
> >
> > The unicodedatabase module contains the Unicode database
> > as static C data - this makes it shareable among (Python)
> > processes.
> 
> The static data is shared if the module is a shared object (.so).
> If unicodedata is not a .so, then you'll have a seperate copy of the
> database in each process.

Uhm, comparing the two versions Python 1.5 and the current
CVS Python I get these figures on Linux:

Executing : ./python -i -c '1/0'

Python 1.5: 1208kB / 728 kB (resident/shared)
Python CVS: 1280kB / 808 kB ("/")

Not much of a change if you ask me and the CVS version has the
unicodedata module linked statically... so there's got to be
some sharing and load-on-demand going on behind the scenes:
this is what I was referring to when I mentioned static
C data. The OS can much better deal with these sharing techniques
and delayed loads than anything we could implement on top of
it in C or Python.

But perhaps this is Linux-specific...
 
> > Python modules don't provide this feature: instead a dictionary
> > would have to be built on import which would increase the heap
> > size considerably. Those dicts would *not* be shareable.
> 
> I haven't mentioned dicts, have I? I suggested that the entries in the
> C version of the database be rewritten in Python (or a text file)
> The unicodedata module would, in it's init function, allocate memory
> for the database and would populate it before returning "import okay"
> to Python -- this is one way to init the db dynamically, among others.

I'm leaving this as exercise to the interested reader ;-)
Really, if you have better ideas for the unicodedata module,
please go ahead.
 
> As to sharing the database among different processes, this is a classic
> IPC pb, which has nothing to do with the static C declaration of the db.
> Or, hmmm, one of us is royally confused <wink>.

Could you check this on other platforms ? Perhaps Linux is
doing more than other OSes are in this field.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From effbot at telia.com  Wed Mar 15 19:23:59 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Wed, 15 Mar 2000 19:23:59 +0100
Subject: [Python-Dev] first public SRE snapshot now available!
References: <200003151627.RAA32543@python.inrialpes.fr> <38CFCAC2.7690DF55@lemburg.com>
Message-ID: <01f901bf8eab$a353e780$34aab5d4@hagrid>

I just uploaded the first public SRE snapshot to:

    http://w1.132.telia.com/~u13208596/sre.htm

-- this kit contains windows binaries only (make
   sure you have built the interpreter from a recent
   CVS version)

-- the engine fully supports unicode target strings.
   (not sure about the pattern compiler, though...)

-- it's probably buggy as hell.  for things I'm working
   on at this very moment, see:

   http://w1.132.telia.com/~u13208596/sre/status.htm

I hope to get around to fix the core dump (it crashes half-
ways through sre_fulltest.py, by no apparent reason) and
the backreferencing problem later today.  stay tuned.

</F>

PS. note that "public" doesn't really mean "suitable for the
c.l.python crowd", or "suitable for production use".  in other
words, let's keep this one on this list for now.  thanks!


From tismer at tismer.com  Wed Mar 15 19:15:27 2000
From: tismer at tismer.com (Christian Tismer)
Date: Wed, 15 Mar 2000 19:15:27 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>				<000901bf8e03$abf88420$34aab5d4@hagrid> <14542.47872.184978.985612@amarok.cnri.reston.va.us> <38CF4A9D.13A0080@lemburg.com> <38CF88A0.CF876A74@tismer.com> <38CFA57E.21A3B3EF@lemburg.com> <38CFAA26.2B2F0D01@tismer.com> <014001bf8e98$35644480$34aab5d4@hagrid> <38CFB8D2.537FCAD9@tismer.com> <38CFC7B7.A1ABD51C@lemburg.com>
Message-ID: <38CFD33F.3C02BF43@tismer.com>


"M.-A. Lemburg" wrote:
> 
> Christian Tismer wrote:

[the old data comression guy has been reanimated]

> If you want to, I can mail you the marshalled Python dict version of
> that database to play with.
...
> > Should I try this evening? :-)
> 
> Sure :-) go ahead...

Thank you. Meanwhile I've heard that there is some well-known
bot working on that under the hood, with a much better approach
than mine. So I'll take your advice, and continue to write
silly stackless enhancements. They say this is my destiny :-)

ciao - continuous

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From DavidA at ActiveState.com  Wed Mar 15 19:21:40 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Wed, 15 Mar 2000 10:21:40 -0800
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38CFA4C9.E6B8EB5D@lemburg.com>
Message-ID: <NDBBJPNCJLKKIOBLDOMJKEAJCCAA.DavidA@ActiveState.com>

> The unicodedatabase module contains the Unicode database
> as static C data - this makes it shareable among (Python)
> processes.
>
> Python modules don't provide this feature: instead a dictionary
> would have to be built on import which would increase the heap
> size considerably. Those dicts would *not* be shareable.

I know it's complicating things, but wouldn't an mmap'ed buffer allow
inter-process sharing while keeping DLL size down and everything on-disk
until needed?

Yes, I know, mmap calls aren't uniform across platforms and isn't supported
on all platforms -- I still think that it's silly not to use it on those
platforms where it is available, and I'd like to see mmap unification move
forward, so this is as good a motivation as any to bite the bullet.

Just a thought,

--david


From jim at digicool.com  Wed Mar 15 19:24:53 2000
From: jim at digicool.com (Jim Fulton)
Date: Wed, 15 Mar 2000 13:24:53 -0500
Subject: [Python-Dev] Allowing multiple socket maps in asyncore (and asynchat)
Message-ID: <38CFD575.A0536439@digicool.com>

I find asyncore to be quite useful, however, it is currently
geared to having a single main loop. It uses a global socket
map that all asyncore dispatchers register with.

I have an application in which I want to have multiple 
socket maps.

I propose that we start moving toward a model in which 
selection of a socket map and control of the asyncore loop
is a bit more explicit.  

If no one objects, I'll work up some initial patches.

Who should I submit these to? Sam? 
Should the medusa public CVS form the basis?

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From jcw at equi4.com  Wed Mar 15 20:39:37 2000
From: jcw at equi4.com (Jean-Claude Wippler)
Date: Wed, 15 Mar 2000 20:39:37 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <NDBBJPNCJLKKIOBLDOMJKEAJCCAA.DavidA@ActiveState.com>
Message-ID: <38CFE6F9.3E8E9385@equi4.com>

David Ascher wrote:

[shareable unicodedatabase]
> I know it's complicating things, but wouldn't an mmap'ed buffer allow
> inter-process sharing while keeping DLL size down and everything
> on-disk until needed?

AFAIK, on platforms which support mmap, static data already gets mmap'ed
in by the OS (just like all code), so this might have little effect.

I'm more concerned by the distribution size increase.

-jcw


From bwarsaw at cnri.reston.va.us  Wed Mar 15 19:41:00 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Wed, 15 Mar 2000 13:41:00 -0500 (EST)
Subject: [Python-Dev] Unicode patches checked in
References: <200003110020.TAA17777@eric.cnri.reston.va.us>
	<000901bf8e03$abf88420$34aab5d4@hagrid>
	<14542.47872.184978.985612@amarok.cnri.reston.va.us>
	<38CF4A9D.13A0080@lemburg.com>
	<38CF88A0.CF876A74@tismer.com>
	<38CFA57E.21A3B3EF@lemburg.com>
	<38CFAA26.2B2F0D01@tismer.com>
	<014001bf8e98$35644480$34aab5d4@hagrid>
Message-ID: <14543.55612.969101.206695@anthem.cnri.reston.va.us>

>>>>> "FL" == Fredrik Lundh <effbot at telia.com> writes:

    FL> somewhere in this thread, Guido wrote:

    >> BTW, I added a tag "pre-unicode" to the CVS tree to the
    >> revisions before the Unicode changes were made.

    FL> maybe you could base SLP on that one?

/F's got it exactly right.  Check out a new directory using a stable
tag (maybe you want to base your changes on pre-unicode tag, or python
1.52?).  Patch in that subtree and then eventually you'll have to
merge your changes into the head of the branch.

-Barry


From rushing at nightmare.com  Thu Mar 16 02:52:22 2000
From: rushing at nightmare.com (Sam Rushing)
Date: Wed, 15 Mar 2000 17:52:22 -0800 (PST)
Subject: [Python-Dev] Allowing multiple socket maps in asyncore (and asynchat)
In-Reply-To: <38CFD575.A0536439@digicool.com>
References: <38CFD575.A0536439@digicool.com>
Message-ID: <14544.15958.546712.466506@seattle.nightmare.com>

Jim Fulton writes:
 > I find asyncore to be quite useful, however, it is currently
 > geared to having a single main loop. It uses a global socket
 > map that all asyncore dispatchers register with.
 > 
 > I have an application in which I want to have multiple 
 > socket maps.

But still only a single event loop, yes?
Why do you need multiple maps?  For a priority system of some kind?

 > I propose that we start moving toward a model in which selection of
 > a socket map and control of the asyncore loop is a bit more
 > explicit.
 > 
 > If no one objects, I'll work up some initial patches.

If it can be done in a backward-compatible fashion, that sounds fine;
but it sounds tricky.  Even the simple {<descriptor>:object...} change
broke so many things that we're still using the old stuff at eGroups.

 > Who should I submit these to? Sam? 
 > Should the medusa public CVS form the basis?

Yup, yup.

-Sam


From tim_one at email.msn.com  Thu Mar 16 08:06:23 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 16 Mar 2000 02:06:23 -0500
Subject: [Python-Dev] Finalizers considered questionable ;)
In-Reply-To: <38CF91B4.A36C8C5@digicool.com>
Message-ID: <000201bf8f16$237e5e80$662d153f@tim>

[Jim Fulton]
> ...
> There is no reason for __del__ to fail unless it depends on
> cyclicly-related objects, which should be viewed as a design
> mistake.
>
> OTOH, __del__ should never fail because module globals go away.
> IMO, the current circular references involving module globals are
> unnecessary, but that's a different topic. ;)

IOW, you view "the current circular references involving module globals" as
"a design mistake" <wink>.  And perhaps they are!  I wouldn't call it a
different topic, though:  so long as people are *viewing* shutdown __del__
problems as just another instance of finalizers in cyclic trash, it makes
the latter *seem* inescapably "normal", and so something that has to be
catered to.  If you have a way to take the shutdown problems out of the
discussion, it would help clarify both topics, at the very least by
deconflating them.

it's-a-mailing-list-so-no-need-to-stay-on-topic<wink>-ly y'rs  - tim


From gstein at lyra.org  Thu Mar 16 13:01:36 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 16 Mar 2000 04:01:36 -0800 (PST)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38CF88A0.CF876A74@tismer.com>
Message-ID: <Pine.LNX.4.10.10003160357500.2258-100000@nebula.lyra.org>

On Wed, 15 Mar 2000, Christian Tismer wrote:
>...
> Would it be possible to make the Unicode support configurable?

This might be interesting from the standpoint of those guys who are doing
the tiny Python interpreter thingy for embedded systems.

> My problem is that patches in the CVS are of different kinds.
> Some are error corrections and enhancements which I would
> definately like to use.
> Others are brand new features like the Unicode support.
> Absolutely great stuff! But this will most probably change
> a number of times again, and I think it is a bad idea when
> I include it into my Stackless distribution.
> 
> I'd appreciate it very much if I could use the same CVS tree
> for testing new stuff, and to build my distribution, with
> new features switched off. Please :-)

But! I find this reason completely off the mark. In essence, you're
arguing that we should not put *any* new feature into the CVS repository
because it might mess up what *you* are doing.

Sorry, but that just irks me. If you want a stable Python, then don't use
the CVS version. Or base it off a specific tag in CVS. Or something. Just
don't ask for development to be stopped.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Thu Mar 16 13:08:43 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 16 Mar 2000 04:08:43 -0800 (PST)
Subject: [Python-Dev] const data (was: Unicode patches checked in)
In-Reply-To: <200003151627.RAA32543@python.inrialpes.fr>
Message-ID: <Pine.LNX.4.10.10003160401570.2258-100000@nebula.lyra.org>

On Wed, 15 Mar 2000, Vladimir Marangozov wrote:
> > [me]
> > > 
> > > Perhaps it would make sense to move the Unicode database on the
> > > Python side (write it in Python)? Or init the database dynamically
> > > in the unicodedata module on import? It's quite big, so if it's
> > > possible to avoid the static declaration (and if the unicodata module
> > > is enabled by default), I'd vote for a dynamic initialization of the
> > > database from reference (Python ?) file(s).
> 
> [Marc-Andre]
> > 
> > The unicodedatabase module contains the Unicode database
> > as static C data - this makes it shareable among (Python)
> > processes.
> 
> The static data is shared if the module is a shared object (.so).
> If unicodedata is not a .so, then you'll have a seperate copy of the
> database in each process.

Nope. A shared module means that multiple executables can share the code.
Whether the const data resides in an executable or a .so, the OS will map
it into readonly memory and share it across all procsses.

> > Python modules don't provide this feature: instead a dictionary
> > would have to be built on import which would increase the heap
> > size considerably. Those dicts would *not* be shareable.
> 
> I haven't mentioned dicts, have I? I suggested that the entries in the
> C version of the database be rewritten in Python (or a text file)
> The unicodedata module would, in it's init function, allocate memory
> for the database and would populate it before returning "import okay"
> to Python -- this is one way to init the db dynamically, among others.

This would place all that data into the per-process heap. Definitely not
shared, and definitely a big hit for each Python process.

> As to sharing the database among different processes, this is a classic
> IPC pb, which has nothing to do with the static C declaration of the db.
> Or, hmmm, one of us is royally confused <wink>.

This isn't IPC. It is sharing of some constant data. The most effective
way to manage this is through const C data. The OS will properly manage
it.

And sorry, David, but mmap'ing a file will simply add complexity. As jcw
mentioned, the OS is pretty much doing this anyhow when it deals with a
const data segment in your executable.

I don't believe this is Linux specific. This kind of stuff has been done
for a *long* time on the platforms, too.

Side note: the most effective way of exposing this const data up to Python
(without shoving it onto the heap) is through buffers created via:
   PyBuffer_FromMemory(ptr, size)
This allows the data to reside in const, shared memory while it is also
exposed up to Python.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From marangoz at python.inrialpes.fr  Thu Mar 16 13:39:42 2000
From: marangoz at python.inrialpes.fr (Vladimir Marangozov)
Date: Thu, 16 Mar 2000 13:39:42 +0100 (CET)
Subject: [Python-Dev] const data (was: Unicode patches checked in)
In-Reply-To: <Pine.LNX.4.10.10003160401570.2258-100000@nebula.lyra.org> from "Greg Stein" at Mar 16, 2000 04:08:43 AM
Message-ID: <200003161239.NAA01671@python.inrialpes.fr>

Greg Stein wrote:
> 
> [me]
> > The static data is shared if the module is a shared object (.so).
> > If unicodedata is not a .so, then you'll have a seperate copy of the
> > database in each process.
> 
> Nope. A shared module means that multiple executables can share the code.
> Whether the const data resides in an executable or a .so, the OS will map
> it into readonly memory and share it across all procsses.

I must have been drunk yesterday<wink>. You're right.

> I don't believe this is Linux specific. This kind of stuff has been done
> for a *long* time on the platforms, too.

Yes.

> 
> Side note: the most effective way of exposing this const data up to Python
> (without shoving it onto the heap) is through buffers created via:
>    PyBuffer_FromMemory(ptr, size)
> This allows the data to reside in const, shared memory while it is also
> exposed up to Python.

And to avoid the size increase of the Python library, perhaps unicodedata
needs to be uncommented by default in Setup.in (for the release, not now).
As M-A pointed out, the module isn't isn't necessary for the normal
operation of the interpreter.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From gstein at lyra.org  Thu Mar 16 13:56:21 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 16 Mar 2000 04:56:21 -0800 (PST)
Subject: [Python-Dev] Finalizers considered questionable ;)
In-Reply-To: <000201bf8f16$237e5e80$662d153f@tim>
Message-ID: <Pine.LNX.4.10.10003160455020.2258-100000@nebula.lyra.org>

On Thu, 16 Mar 2000, Tim Peters wrote:
>...
> IOW, you view "the current circular references involving module globals" as
> "a design mistake" <wink>.  And perhaps they are!  I wouldn't call it a
> different topic, though:  so long as people are *viewing* shutdown __del__
> problems as just another instance of finalizers in cyclic trash, it makes
> the latter *seem* inescapably "normal", and so something that has to be
> catered to.  If you have a way to take the shutdown problems out of the
> discussion, it would help clarify both topics, at the very least by
> deconflating them.

Bah. Module globals are easy. My tp_clean suggestion handles them quite
easily at shutdown. No more special-code in import.c.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tismer at tismer.com  Thu Mar 16 13:53:46 2000
From: tismer at tismer.com (Christian Tismer)
Date: Thu, 16 Mar 2000 13:53:46 +0100
Subject: [Python-Dev] Unicode patches checked in
References: <Pine.LNX.4.10.10003160357500.2258-100000@nebula.lyra.org>
Message-ID: <38D0D95A.B13EC17E@tismer.com>


Greg Stein wrote:
> 
> On Wed, 15 Mar 2000, Christian Tismer wrote:
> >...
> > Would it be possible to make the Unicode support configurable?
> 
> This might be interesting from the standpoint of those guys who are doing
> the tiny Python interpreter thingy for embedded systems.
> 
> > My problem is that patches in the CVS are of different kinds.
> > Some are error corrections and enhancements which I would
> > definately like to use.
> > Others are brand new features like the Unicode support.
> > Absolutely great stuff! But this will most probably change
> > a number of times again, and I think it is a bad idea when
> > I include it into my Stackless distribution.
> >
> > I'd appreciate it very much if I could use the same CVS tree
> > for testing new stuff, and to build my distribution, with
> > new features switched off. Please :-)
> 
> But! I find this reason completely off the mark. In essence, you're
> arguing that we should not put *any* new feature into the CVS repository
> because it might mess up what *you* are doing.

No, this is your interpretation, and a reduction which I can't follow.
There are inprovements and features in the CVS version which I need.
I prefer to build against it, instead of the old 1.5.2. What's wrong
with that? I want to find a way that gives me the least trouble
in doing so.

> Sorry, but that just irks me. If you want a stable Python, then don't use
> the CVS version. Or base it off a specific tag in CVS. Or something. Just
> don't ask for development to be stopped.

No, I ask for development to be stopped. Code freeze until Y3k :-)
Why are you trying to put such a nonsense into my mouth?
You know that I know that you know better.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From tismer at tismer.com  Thu Mar 16 14:25:48 2000
From: tismer at tismer.com (Christian Tismer)
Date: Thu, 16 Mar 2000 14:25:48 +0100
Subject: [Python-Dev] const data (was: Unicode patches checked in)
References: <200003161239.NAA01671@python.inrialpes.fr>
Message-ID: <38D0E0DC.B997F836@tismer.com>


Vladimir Marangozov wrote:
> 
> Greg Stein wrote:

> > Side note: the most effective way of exposing this const data up to Python
> > (without shoving it onto the heap) is through buffers created via:
> >    PyBuffer_FromMemory(ptr, size)
> > This allows the data to reside in const, shared memory while it is also
> > exposed up to Python.
> 
> And to avoid the size increase of the Python library, perhaps unicodedata
> needs to be uncommented by default in Setup.in (for the release, not now).
> As M-A pointed out, the module isn't isn't necessary for the normal
> operation of the interpreter.

Sounds like a familiar idea. :-)

BTW., yesterday evening I wrote an analysis script, to see how
far this data is compactable without going into real compression,
just redundancy folding and byte/short indexing was used.
If I'm not wrong, this reduces the size of the database to less
than 25kb. That small amount of extra data would make the
uncommenting feature quite unimportant, except for the issue
of building tiny Pythons.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From gstein at lyra.org  Thu Mar 16 14:06:46 2000
From: gstein at lyra.org (Greg Stein)
Date: Thu, 16 Mar 2000 05:06:46 -0800 (PST)
Subject: [Python-Dev] Unicode patches checked in
In-Reply-To: <38D0D95A.B13EC17E@tismer.com>
Message-ID: <Pine.LNX.4.10.10003160502590.2258-100000@nebula.lyra.org>

On Thu, 16 Mar 2000, Christian Tismer wrote:
> Greg Stein wrote:
>...
> > Sorry, but that just irks me. If you want a stable Python, then don't use
> > the CVS version. Or base it off a specific tag in CVS. Or something. Just
> > don't ask for development to be stopped.
> 
> No, I ask for development to be stopped. Code freeze until Y3k :-)
> Why are you trying to put such a nonsense into my mouth?
> You know that I know that you know better.

Simply because that is what it sounds like on this side of my monitor :-)

I'm seeing your request as asking for people to make special
considerations in their patches for your custom distribution. While I
don't have a problem with making Python more flexible to distro
maintainers, it seemed like you were approaching it from the "wrong"
angle. Like I said, making Unicode optional for the embedded space makes
sense; making it optional so it doesn't bloat your distro didn't :-)

Not a big deal... it is mostly a perception on my part. I also tend to
dislike things that hold development back.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Fri Mar 17 19:53:39 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 17 Mar 2000 19:53:39 +0100
Subject: [Python-Dev] Unicode Update 2000-03-17
Message-ID: <38D27F33.4055A942@lemburg.com>

Attached you find an update of the Unicode implementation.

The patch is against the current CVS version. I would appreciate
if someone with CVS checkin permissions could check the changes
in.

The patch contains all bugs and patches sent this week and
also fixes a leak in the codecs code and a bug in the free list
code for Unicode objects (which only shows up when compiling
Python with Py_DEBUG; thanks to MarkH for spotting this one).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/
-------------- next part --------------
Only in CVS-Python/Doc/tools: anno-api.py
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Include/unicodeobject.h Python+Unicode/Include/unicodeobject.h
--- CVS-Python/Include/unicodeobject.h	Fri Mar 17 15:24:30 2000
+++ Python+Unicode/Include/unicodeobject.h	Tue Mar 14 10:38:08 2000
@@ -1,8 +1,5 @@
 #ifndef Py_UNICODEOBJECT_H
 #define Py_UNICODEOBJECT_H
-#ifdef __cplusplus
-extern "C" {
-#endif
 
 /*
 
@@ -109,8 +106,9 @@
 /* --- Internal Unicode Operations ---------------------------------------- */
 
 /* If you want Python to use the compiler's wctype.h functions instead
-   of the ones supplied with Python, define WANT_WCTYPE_FUNCTIONS.
-   This reduces the interpreter's code size. */
+   of the ones supplied with Python, define WANT_WCTYPE_FUNCTIONS or
+   configure Python using --with-ctype-functions.  This reduces the
+   interpreter's code size. */
 
 #if defined(HAVE_USABLE_WCHAR_T) && defined(WANT_WCTYPE_FUNCTIONS)
 
@@ -169,6 +167,10 @@
     (!memcmp((string)->str + (offset), (substring)->str,\
              (substring)->length*sizeof(Py_UNICODE)))
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* --- Unicode Type ------------------------------------------------------- */
 
 typedef struct {
@@ -647,7 +649,7 @@
     int direction		/* Find direction: +1 forward, -1 backward */
     );
 
-/* Count the number of occurances of substr in str[start:end]. */
+/* Count the number of occurrences of substr in str[start:end]. */
 
 extern DL_IMPORT(int) PyUnicode_Count(
     PyObject *str,		/* String */ 
@@ -656,7 +658,7 @@
     int end			/* Stop index */
     );
 
-/* Replace at most maxcount occurances of substr in str with replstr
+/* Replace at most maxcount occurrences of substr in str with replstr
    and return the resulting Unicode object. */
 
 extern DL_IMPORT(PyObject *) PyUnicode_Replace(
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/codecs.py Python+Unicode/Lib/codecs.py
--- CVS-Python/Lib/codecs.py	Sat Mar 11 00:20:43 2000
+++ Python+Unicode/Lib/codecs.py	Mon Mar 13 14:33:54 2000
@@ -55,7 +55,7 @@
     """
     def encode(self,input,errors='strict'):
         
-        """ Encodes the object intput and returns a tuple (output
+        """ Encodes the object input and returns a tuple (output
             object, length consumed).
 
             errors defines the error handling to apply. It defaults to
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/encodings/__init__.py Python+Unicode/Lib/encodings/__init__.py
--- CVS-Python/Lib/encodings/__init__.py	Sat Mar 11 00:17:18 2000
+++ Python+Unicode/Lib/encodings/__init__.py	Mon Mar 13 14:30:33 2000
@@ -30,13 +30,13 @@
 import string,codecs,aliases
 
 _cache = {}
-_unkown = '--unkown--'
+_unknown = '--unknown--'
 
 def search_function(encoding):
     
     # Cache lookup
-    entry = _cache.get(encoding,_unkown)
-    if entry is not _unkown:
+    entry = _cache.get(encoding,_unknown)
+    if entry is not _unknown:
         return entry
 
     # Import the module
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_string.py Python+Unicode/Lib/test/test_string.py
--- CVS-Python/Lib/test/test_string.py	Sat Mar 11 10:52:43 2000
+++ Python+Unicode/Lib/test/test_string.py	Mon Mar 13 10:12:46 2000
@@ -143,6 +143,7 @@
 test('translate', 'xyz', 'xyz', table)
 
 test('replace', 'one!two!three!', 'one at two!three!', '!', '@', 1)
+test('replace', 'one!two!three!', 'onetwothree', '!', '')
 test('replace', 'one!two!three!', 'one at two@three!', '!', '@', 2)
 test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 3)
 test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 4)
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py
--- CVS-Python/Lib/test/test_unicode.py	Fri Mar 17 15:24:31 2000
+++ Python+Unicode/Lib/test/test_unicode.py	Mon Mar 13 10:13:05 2000
@@ -108,6 +108,7 @@
     test('translate', u'xyz', u'xyz', table)
 
 test('replace', u'one!two!three!', u'one at two!three!', u'!', u'@', 1)
+test('replace', u'one!two!three!', u'onetwothree', '!', '')
 test('replace', u'one!two!three!', u'one at two@three!', u'!', u'@', 2)
 test('replace', u'one!two!three!', u'one at two@three@', u'!', u'@', 3)
 test('replace', u'one!two!three!', u'one at two@three@', u'!', u'@', 4)
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt
--- CVS-Python/Misc/unicode.txt	Sat Mar 11 00:14:11 2000
+++ Python+Unicode/Misc/unicode.txt	Fri Mar 17 16:55:11 2000
@@ -743,8 +743,9 @@
 stream codecs as available through the codecs module should 
 be used.
 
-XXX There should be a short-cut open(filename,mode,encoding) available which
-    also assures that mode contains the 'b' character when needed.
+The codecs module should provide a short-cut open(filename,mode,encoding)
+available which also assures that mode contains the 'b' character when
+needed.
 
 
 File/Stream Input:
@@ -810,6 +811,10 @@
 Introduction to Unicode (a little outdated by still nice to read):
         http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html
 
+For comparison:
+	Introducing Unicode to ECMAScript --
+	http://www-4.ibm.com/software/developer/library/internationalization-support.html
+
 Encodings:
 
     Overview:
@@ -832,7 +837,7 @@
 
 History of this Proposal:
 -------------------------
-1.2: 
+1.2: Removed POD about codecs.open()
 1.1: Added note about comparisons and hash values. Added note about
      case mapping algorithms. Changed stream codecs .read() and
      .write() method to match the standard file-like object methods
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Modules/stropmodule.c Python+Unicode/Modules/stropmodule.c
--- CVS-Python/Modules/stropmodule.c	Wed Mar  1 10:22:53 2000
+++ Python+Unicode/Modules/stropmodule.c	Mon Mar 13 14:33:23 2000
@@ -1054,7 +1054,7 @@
 
   strstr replacement for arbitrary blocks of memory.
 
-  Locates the first occurance in the memory pointed to by MEM of the
+  Locates the first occurrence in the memory pointed to by MEM of the
   contents of memory pointed to by PAT.  Returns the index into MEM if
   found, or -1 if not found.  If len of PAT is greater than length of
   MEM, the function returns -1.
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/stringobject.c Python+Unicode/Objects/stringobject.c
--- CVS-Python/Objects/stringobject.c	Tue Mar 14 00:14:17 2000
+++ Python+Unicode/Objects/stringobject.c	Mon Mar 13 14:33:24 2000
@@ -1395,7 +1395,7 @@
 
   strstr replacement for arbitrary blocks of memory.
 
-  Locates the first occurance in the memory pointed to by MEM of the
+  Locates the first occurrence in the memory pointed to by MEM of the
   contents of memory pointed to by PAT.  Returns the index into MEM if
   found, or -1 if not found.  If len of PAT is greater than length of
   MEM, the function returns -1.
@@ -1578,7 +1578,7 @@
 		return NULL;
 
 	if (sub_len <= 0) {
-		PyErr_SetString(PyExc_ValueError, "empty replacement string");
+		PyErr_SetString(PyExc_ValueError, "empty pattern string");
 		return NULL;
 	}
 	new_s = mymemreplace(str,len,sub,sub_len,repl,repl_len,count,&out_len);
Only in CVS-Python/Objects: stringobject.c.orig
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Objects/unicodeobject.c Python+Unicode/Objects/unicodeobject.c
--- CVS-Python/Objects/unicodeobject.c	Tue Mar 14 00:14:17 2000
+++ Python+Unicode/Objects/unicodeobject.c	Wed Mar 15 10:49:19 2000
@@ -83,7 +83,7 @@
    all objects on the free list having a size less than this
    limit. This reduces malloc() overhead for small Unicode objects.  
 
-   At worse this will result in MAX_UNICODE_FREELIST_SIZE *
+   At worst this will result in MAX_UNICODE_FREELIST_SIZE *
    (sizeof(PyUnicodeObject) + STAYALIVE_SIZE_LIMIT +
    malloc()-overhead) bytes of unused garbage.
 
@@ -180,7 +180,7 @@
         unicode_freelist = *(PyUnicodeObject **)unicode_freelist;
         unicode_freelist_size--;
         unicode->ob_type = &PyUnicode_Type;
-        _Py_NewReference(unicode);
+        _Py_NewReference((PyObject *)unicode);
 	if (unicode->str) {
 	    if (unicode->length < length &&
 		_PyUnicode_Resize(unicode, length)) {
@@ -199,16 +199,19 @@
 	unicode->str = PyMem_NEW(Py_UNICODE, length + 1);
     }
 
-    if (!unicode->str) {
-        PyMem_DEL(unicode);
-        PyErr_NoMemory();
-        return NULL;
-    }
+    if (!unicode->str) 
+	goto onError;
     unicode->str[length] = 0;
     unicode->length = length;
     unicode->hash = -1;
     unicode->utf8str = NULL;
     return unicode;
+
+ onError:
+    _Py_ForgetReference((PyObject *)unicode);
+    PyMem_DEL(unicode);
+    PyErr_NoMemory();
+    return NULL;
 }
 
 static
@@ -224,7 +227,6 @@
         *(PyUnicodeObject **)unicode = unicode_freelist;
         unicode_freelist = unicode;
         unicode_freelist_size++;
-        _Py_ForgetReference(unicode);
     }
     else {
 	free(unicode->str);
@@ -489,7 +491,7 @@
     }
     else {
         PyErr_Format(PyExc_ValueError,
-                     "UTF-8 decoding error; unkown error handling code: %s",
+                     "UTF-8 decoding error; unknown error handling code: %s",
                      errors);
         return -1;
     }
@@ -611,7 +613,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "UTF-8 encoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -733,7 +735,7 @@
     }
     else {
         PyErr_Format(PyExc_ValueError,
-                     "UTF-16 decoding error; unkown error handling code: %s",
+                     "UTF-16 decoding error; unknown error handling code: %s",
                      errors);
         return -1;
     }
@@ -921,7 +923,7 @@
     else {
         PyErr_Format(PyExc_ValueError,
                      "Unicode-Escape decoding error; "
-                     "unkown error handling code: %s",
+                     "unknown error handling code: %s",
                      errors);
         return -1;
     }
@@ -1051,6 +1053,10 @@
 
 */
 
+static const Py_UNICODE *findchar(const Py_UNICODE *s,
+				  int size,
+				  Py_UNICODE ch);
+
 static
 PyObject *unicodeescape_string(const Py_UNICODE *s,
                                int size,
@@ -1069,9 +1075,6 @@
     p = q = PyString_AS_STRING(repr);
 
     if (quotes) {
-        static const Py_UNICODE *findchar(const Py_UNICODE *s,
-					  int size,
-					  Py_UNICODE ch);
         *p++ = 'u';
         *p++ = (findchar(s, size, '\'') && 
                 !findchar(s, size, '"')) ? '"' : '\'';
@@ -1298,7 +1301,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "Latin-1 encoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1369,7 +1372,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "ASCII decoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1431,7 +1434,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "ASCII encoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1502,7 +1505,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "charmap decoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1618,7 +1621,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "charmap encoding error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
@@ -1750,7 +1753,7 @@
     else {
 	PyErr_Format(PyExc_ValueError,
 		     "translate error; "
-		     "unkown error handling code: %s",
+		     "unknown error handling code: %s",
 		     errors);
 	return -1;
     }
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Python/codecs.c Python+Unicode/Python/codecs.c
--- CVS-Python/Python/codecs.c	Fri Mar 10 23:57:27 2000
+++ Python+Unicode/Python/codecs.c	Wed Mar 15 11:27:54 2000
@@ -93,9 +93,14 @@
 
 PyObject *_PyCodec_Lookup(const char *encoding)
 {
-    PyObject *result, *args = NULL, *v;
+    PyObject *result, *args = NULL, *v = NULL;
     int i, len;
 
+    if (_PyCodec_SearchCache == NULL || _PyCodec_SearchPath == NULL) {
+	PyErr_SetString(PyExc_SystemError,
+			"codec module not properly initialized");
+	goto onError;
+    }
     if (!import_encodings_called)
 	import_encodings();
 
@@ -109,6 +114,7 @@
     result = PyDict_GetItem(_PyCodec_SearchCache, v);
     if (result != NULL) {
 	Py_INCREF(result);
+	Py_DECREF(v);
 	return result;
     }
     
@@ -121,6 +127,7 @@
     if (args == NULL)
 	goto onError;
     PyTuple_SET_ITEM(args,0,v);
+    v = NULL;
 
     for (i = 0; i < len; i++) {
 	PyObject *func;
@@ -146,7 +153,7 @@
     if (i == len) {
 	/* XXX Perhaps we should cache misses too ? */
 	PyErr_SetString(PyExc_LookupError,
-			"unkown encoding");
+			"unknown encoding");
 	goto onError;
     }
 
@@ -156,6 +163,7 @@
     return result;
 
  onError:
+    Py_XDECREF(v);
     Py_XDECREF(args);
     return NULL;
 }
@@ -378,5 +386,7 @@
 void _PyCodecRegistry_Fini()
 {
     Py_XDECREF(_PyCodec_SearchPath);
+    _PyCodec_SearchPath = NULL;
     Py_XDECREF(_PyCodec_SearchCache);
+    _PyCodec_SearchCache = NULL;
 }

From bwarsaw at cnri.reston.va.us  Fri Mar 17 20:16:02 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 17 Mar 2000 14:16:02 -0500 (EST)
Subject: [Python-Dev] Unicode Update 2000-03-17
References: <38D27F33.4055A942@lemburg.com>
Message-ID: <14546.33906.771022.916209@anthem.cnri.reston.va.us>

>>>>> "M" == M  <mal at lemburg.com> writes:

    M> The patch is against the current CVS version. I would
    M> appreciate if someone with CVS checkin permissions could check
    M> the changes in.

Hi MAL, I just tried to apply your patch against the tree, however
patch complains that the Lib/codecs.py patch is reversed.  I haven't
looked closely at it, but do you have any ideas?  Or why don't you
just send me Lib/codecs.py and I'll drop it in place.

Everything else patched cleanly.

-Barry


From ping at lfw.org  Fri Mar 17 15:06:13 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 08:06:13 -0600 (CST)
Subject: [Python-Dev] Boolean type for Py3K?
Message-ID: <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org>

I wondered to myself today while reading through the Python
tutorial whether it would be a good idea to have a separate
boolean type in Py3K.  Would this help catch common mistakes?

I won't presume to truly understand the new-to-Python experience,
but one might *guess* that

    >>> 5 > 3
    true

would make a little more sense to a beginner than

    >>> 5 > 3
    1

Of course this means introducing "true" and "false" as keywords 
(or built-in values like None -- perhaps they should be spelled
True and False?) and completely changing the way a lot of code
runs by introducing a bunch of type checking, so it may be too
radical a change, but --

And i don't know if it's already been discussed a lot, but --

I thought it wouldn't hurt just to raise the question.


-- ?!ng


From ping at lfw.org  Fri Mar 17 15:06:55 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 08:06:55 -0600 (CST)
Subject: [Python-Dev] Should None be a keyword?
Message-ID: <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org>

Related to my last message: should None become a keyword in Py3K?


-- ?!ng


From bwarsaw at cnri.reston.va.us  Fri Mar 17 21:49:24 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 17 Mar 2000 15:49:24 -0500 (EST)
Subject: [Python-Dev] Boolean type for Py3K?
References: <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org>
Message-ID: <14546.39508.312796.221069@anthem.cnri.reston.va.us>

>>>>> "KY" == Ka-Ping Yee <ping at lfw.org> writes:

    KY> I wondered to myself today while reading through the Python
    KY> tutorial whether it would be a good idea to have a separate
    KY> boolean type in Py3K.  Would this help catch common mistakes?

Almost a year ago, I mused about a boolean type in c.l.py, and came up
with this prototype in Python.

-------------------- snip snip --------------------
class Boolean:
    def __init__(self, flag=0):
        self.__flag = not not flag

    def __str__(self):
        return self.__flag and 'true' or 'false'

    def __repr__(self):
        return self.__str__()

    def __nonzero__(self):
        return self.__flag == 1

    def __cmp__(self, other):
        if (self.__flag and other) or (not self.__flag and not other):
            return 0
        else:
            return 1

    def __rcmp__(self, other):
        return -self.__cmp__(other)

true = Boolean(1)
false = Boolean()
-------------------- snip snip --------------------

I think it makes sense to augment Python's current truth rules with a
built-in boolean type and True and False values.  But unless it's tied
in more deeply (e.g. comparisons return one of these instead of
integers -- and what are the implications of that?) then it's pretty
much just syntactic sugar <0.75 lick>.

-Barry


From bwarsaw at cnri.reston.va.us  Fri Mar 17 21:50:00 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 17 Mar 2000 15:50:00 -0500 (EST)
Subject: [Python-Dev] Should None be a keyword?
References: <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org>
Message-ID: <14546.39544.673335.378797@anthem.cnri.reston.va.us>

>>>>> "KY" == Ka-Ping Yee <ping at lfw.org> writes:

    KY> Related to my last message: should None become a keyword in
    KY> Py3K?

Why?  Just to reserve it?
-Barry


From moshez at math.huji.ac.il  Fri Mar 17 21:52:29 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Fri, 17 Mar 2000 22:52:29 +0200 (IST)
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: <14546.39508.312796.221069@anthem.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003172248210.16605-100000@sundial>

On Fri, 17 Mar 2000, Barry A. Warsaw wrote:

> Almost a year ago, I mused about a boolean type in c.l.py, and came up
> with this prototype in Python.

Cool prototype!
However, I think I have a problem with the proposed semantics:

>     def __cmp__(self, other):
>         if (self.__flag and other) or (not self.__flag and not other):
>             return 0
>         else:
>             return 1

This means:

true == 1
true == 2

But 

1 != 2

I have some difficulty with == not being an equivalence relation...

> I think it makes sense to augment Python's current truth rules with a
> built-in boolean type and True and False values.

Right on! Except for the built-in...why not have it like exceptions.py,
Python code necessary for the interpreter? Languages which compile
themselves are not unheard of <wink>

> But unless it's tied
> in more deeply (e.g. comparisons return one of these instead of
> integers -- and what are the implications of that?) 

Breaking loads of horrible code. Unacceptable for the 1.x series, but 
perfectly fine in Py3K

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From effbot at telia.com  Fri Mar 17 22:12:15 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Fri, 17 Mar 2000 22:12:15 +0100
Subject: [Python-Dev] Should None be a keyword?
References: <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org> <14546.39544.673335.378797@anthem.cnri.reston.va.us>
Message-ID: <004e01bf9055$79012000$34aab5d4@hagrid>

Barry A. Warsaw wrote:
> >>>>> "KY" == Ka-Ping Yee <ping at lfw.org> writes:
> 
>     KY> Related to my last message: should None become a keyword in
>     KY> Py3K?
> 
> Why?  Just to reserve it?

to avoid stuff errors like:

    def foo():

        result = None

        # two screenfuls of code

        None, a, b = mytuple # perlish unpacking

which gives an interesting error on the first line, instead
of a syntax error on the last.

</F>


From guido at python.org  Fri Mar 17 22:20:05 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 17 Mar 2000 16:20:05 -0500
Subject: [Python-Dev] Should None be a keyword?
In-Reply-To: Your message of "Fri, 17 Mar 2000 08:06:55 CST."
             <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org> 
References: <Pine.LNX.4.10.10003170806210.16045-100000@server1.lfw.org> 
Message-ID: <200003172120.QAA09045@eric.cnri.reston.va.us>

Yes.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Fri Mar 17 22:20:36 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 17 Mar 2000 16:20:36 -0500
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: Your message of "Fri, 17 Mar 2000 08:06:13 CST."
             <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org> 
References: <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org> 
Message-ID: <200003172120.QAA09115@eric.cnri.reston.va.us>

Yes.  True and False make sense.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pf at artcom-gmbh.de  Fri Mar 17 22:17:06 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Fri, 17 Mar 2000 22:17:06 +0100 (MET)
Subject: [Python-Dev] Should None be a keyword?
In-Reply-To: <14546.39544.673335.378797@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at "Mar 17, 2000  3:50: 0 pm"
Message-ID: <m12W47G-000CnCC@artcom0.artcom-gmbh.de>

> >>>>> "KY" == Ka-Ping Yee <ping at lfw.org> writes:
> 
>     KY> Related to my last message: should None become a keyword in
>     KY> Py3K?

Barry A. Warsaw schrieb:
> Why?  Just to reserve it?

This is related to the general type checking discussion.  IMO the suggested
    >>> 1 > 0
    True
wouldn't buy us much, as long the following behaviour stays in Py3K:
    >>> a = '2' ; b = 3
    >>> a < b
    0
    >>> a > b
    1
This is irritating to Newcomers (at least from rather short time experience
as member of python-help)!  And this is esspecially irritating, since you 
can't do
    >>> c = a + b
    Traceback (innermost last):
      File "<stdin>", line 1, in ?
    TypeError: illegal argument type for built-in operation

IMO this difference is far more difficult to catch for newcomers than 
the far more often discussed 5/3 == 1 behaviour.

Have a nice weekend and don't forget to hunt for remaining bugs in 
Fred upcoming 1.5.2p2 docs ;-), Peter.
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From ping at lfw.org  Fri Mar 17 16:53:38 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 09:53:38 -0600 (CST)
Subject: [Python-Dev] list.shift()
Message-ID: <Pine.LNX.4.10.10003170950440.16448-100000@server1.lfw.org>

Has list.shift() been proposed?

    # pretend lists are implemented in Python and 'self' is a list
    def shift(self):
        item = self[0]
        del self[:1]
        return item

This would make queues read nicely... use "append" and "pop" for
a stack, "append" and "shift" for a queue.

(This is while on the thought-train of "making built-in types do
more, rather than introducing more special types", as you'll see
in my next message.)


-- ?!ng


From gvanrossum at beopen.com  Fri Mar 17 23:00:18 2000
From: gvanrossum at beopen.com (Guido van Rossum)
Date: Fri, 17 Mar 2000 17:00:18 -0500
Subject: [Python-Dev] list.shift()
References: <Pine.LNX.4.10.10003170950440.16448-100000@server1.lfw.org>
Message-ID: <38D2AAF2.CFBF3A2@beopen.com>

Ka-Ping Yee wrote:
> 
> Has list.shift() been proposed?
> 
>     # pretend lists are implemented in Python and 'self' is a list
>     def shift(self):
>         item = self[0]
>         del self[:1]
>         return item
> 
> This would make queues read nicely... use "append" and "pop" for
> a stack, "append" and "shift" for a queue.
> 
> (This is while on the thought-train of "making built-in types do
> more, rather than introducing more special types", as you'll see
> in my next message.)

You can do this using list.pop(0).  I don't think the name "shift" is very
intuitive (smells of sh and Perl :-).  Do we need a new function?

--Guido


From ping at lfw.org  Fri Mar 17 17:08:37 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:08:37 -0600 (CST)
Subject: [Python-Dev] Using lists as sets
Message-ID: <Pine.LNX.4.10.10003170953410.16448-100000@server1.lfw.org>

A different way to provide sets in Python, which occurred to
me on Wednesday at Guido's talk in Mountain View (hi Guido!),
is to just make lists work better.

Someone asked Guido a question about the ugliness of using
dicts in a certain way, and it was clear that what he wanted
was a real set.  Guido's objection to introducing more core
data types is that it makes it more difficult to choose which
data type to use, and opens the possibility of using entirely
the wrong one -- a very well-taken point, i thought.

(That recently-mentioned study of scripting vs. system language
performance seems relevant here: a few of the C programs
submitted were much *slower* than the ones in Python or Perl
just because people had to choose and implement their own data
structures, and so they were able to completely shoot themselves
in both feet and lose a leg or two in the process.)

So...

Hypothesis: The only real reason people might want a separate
set type, or have to use dicts as sets, is that linear search
on a list is too slow.

Therefore: All we have to do is speed up "in" on lists, and now
we have a set type that is nice to read and write, and already
has nice spellings for set semantics like "in".

Implementation possibilities:

    + Whip up a hash table behind the scenes if "in" gets used
      a lot on a particular list and all its members are hashable.
      This makes "in" no longer O(n), which is most of the battle.
      remove() can also be cheap -- though you have to do a little
      more bookkeeping to take care of multiple copies of elements.

    + Or, add a couple of methods, e.g. take() appends an item to
      a list if it's not there already, drop() removes all copies
      of an item from a list.  These tip us off: the first time one
      of these methods gets used, we make the hash table then.

I think the semantics would be pretty understandable and simple to
explain, which is the main thing.

Any thoughts?


-- ?!ng


From ping at lfw.org  Fri Mar 17 17:12:22 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:12:22 -0600 (CST)
Subject: [Python-Dev] list.shift()
In-Reply-To: <38D2AAF2.CFBF3A2@beopen.com>
Message-ID: <Pine.LNX.4.10.10003171009150.16549-100000@server1.lfw.org>

On Fri, 17 Mar 2000, Guido van Rossum wrote:
> You can do this using list.pop(0).  I don't think the name "shift" is very
> intuitive (smells of sh and Perl :-).  Do we need a new function?

Oh -- sorry, that's my ignorance showing.  I didn't know pop()
took an argument (of course it would -- duh...).  No need to
add anything more, then, i think.  Sorry!

Fred et al. on doc-sig: it would be really good for the tutorial 
to show a queue example and a stack example in the section where
list methods are introduced.


-- ?!ng


From ping at lfw.org  Fri Mar 17 17:13:44 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:13:44 -0600 (CST)
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: <200003172120.QAA09115@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003171012440.16549-100000@server1.lfw.org>

Guido: (re None being a keyword)
> Yes.

Guido: (re booleans)
> Yes.  True and False make sense.


Astounding.  I don't think i've ever seen such quick agreement on
anything!  And twice in one day!  I'm think i'm going to go lie down.

:)  :)


-- ?!ng


From DavidA at ActiveState.com  Fri Mar 17 23:23:53 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Fri, 17 Mar 2000 14:23:53 -0800
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003170953410.16448-100000@server1.lfw.org>
Message-ID: <NDBBJPNCJLKKIOBLDOMJAEEJCCAA.DavidA@ActiveState.com>

> I think the semantics would be pretty understandable and simple to
> explain, which is the main thing.
>
> Any thoughts?

Would

	(a,b) in Set

return true of (a,b) was a subset of Set, or if (a,b) was an element of Set?

--david


From mal at lemburg.com  Fri Mar 17 23:41:46 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 17 Mar 2000 23:41:46 +0100
Subject: [Python-Dev] Boolean type for Py3K?
References: <Pine.LNX.4.10.10003170753490.16045-100000@server1.lfw.org> <200003172120.QAA09115@eric.cnri.reston.va.us>
Message-ID: <38D2B4AA.2EE933BD@lemburg.com>

Guido van Rossum wrote:
> 
> Yes.  True and False make sense.

mx.Tools defines these as new builtins... and they correspond
to the C level singletons Py_True and Py_False.

# Truth constants
True = (1==1)
False = (1==0)

I'm not sure whether breaking the idiom of True == 1 and
False == 0 (or in other words: truth values are integers)
would be such a good idea. Nothing against adding name
bindings in __builtins__ though...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From ping at lfw.org  Fri Mar 17 17:53:12 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:53:12 -0600 (CST)
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: <14546.39508.312796.221069@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003170940500.16448-100000@server1.lfw.org>

On Fri, 17 Mar 2000, Barry A. Warsaw wrote:
> Almost a year ago, I mused about a boolean type in c.l.py, and came up
> with this prototype in Python.
> 
> -------------------- snip snip --------------------
> class Boolean:
[...]
> 
> I think it makes sense to augment Python's current truth rules with a
> built-in boolean type and True and False values.  But unless it's tied
> in more deeply (e.g. comparisons return one of these instead of
> integers -- and what are the implications of that?) then it's pretty
> much just syntactic sugar <0.75 lick>.

Yeah, and the whole point *is* the change in semantics, not the
syntactic sugar.  I'm hoping we can gain some safety from the
type checking... though i can't seem to think of a good example
off the top of my head.

It's easier to think of examples if things like 'if', 'and', 'or',
etc. only accept booleans as conditional arguments -- but i can't
imagine going that far, as that would just be really annoying.

Let's see.  Specifically, the following would probably return
booleans:

    magnitude comparisons:      <, >, <=, >=  (and __cmp__)
    value equality comparisons: ==, !=
    identity comparisons:       is, is not
    containment tests:          in, not in (and __contains__)

... and booleans would be different from integers in that
arithmetic would be illegal... but that's about it. (?)
Booleans are still storable immutable values; they could be
keys to dicts but not lists; i don't know what else.

Maybe this wouldn't actually buy us anything except for the
nicer spelling of "True" and "False", which might not be worth
it.  ... Hmm.  Can anyone think of common cases where this
could help?


-- n!?g


From ping at lfw.org  Fri Mar 17 17:59:17 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 10:59:17 -0600 (CST)
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJAEEJCCAA.DavidA@ActiveState.com>
Message-ID: <Pine.LNX.4.10.10003171053290.16648-100000@server1.lfw.org>

On Fri, 17 Mar 2000, David Ascher wrote:
> > I think the semantics would be pretty understandable and simple to
> > explain, which is the main thing.
> >
> > Any thoughts?
> 
> Would
> 
> 	(a,b) in Set
> 
> return true of (a,b) was a subset of Set, or if (a,b) was an element of Set?

This would return true if (a, b) was an element of the set --
exactly the same semantics as we currently have for lists.

Ideally it would also be kind of nice to use < > <= >= as
subset/superset operators, but that requires revising the
way we do comparisons, and you know, it might not really be
used all that often anyway.

-, |, and & could operate on lists sensibly when we use
them as sets -- just define a few simple rules for ordering
and you should be fine.  e.g.

    c = a - b is equivalent to

        c = a
        for item in b: c.drop(item)

    c = a | b is equivalent to

        c = a
        for item in b: c.take(item)

    c = a & b is equivalent to

        c = []
        for item in a:
            if item in b: c.take(item)

where

    c.take(item) is equivalent to

        if item not in c: c.append(item)

    c.drop(item) is equivalent to

        while item in c: c.remove(item)


The above is all just semantics, of course, to make the point
that the semantics can be simple.  The implementation could do
different things that are much faster when there's a hash table
helping out.


-- ?!ng


From gvwilson at nevex.com  Sat Mar 18 00:28:05 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Fri, 17 Mar 2000 18:28:05 -0500 (EST)
Subject: [Python-Dev] Boolean type for Py3K?
In-Reply-To: <Pine.LNX.4.10.10003171012440.16549-100000@server1.lfw.org>
Message-ID: <Pine.LNX.4.10.10003171825420.20710-100000@akbar.nevex.com>

> Guido: (re None being a keyword)
> > Yes.

> Guido: (re booleans)
> > Yes.  True and False make sense.

> Ka-Ping:
> Astounding.  I don't think i've ever seen such quick agreement on
> anything!  And twice in one day!  I'm think i'm going to go lie down.

No, no, keep going --- you're on a roll.

Greg


From ping at lfw.org  Fri Mar 17 18:49:18 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 11:49:18 -0600 (CST)
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003171053290.16648-100000@server1.lfw.org>
Message-ID: <Pine.LNX.4.10.10003171147520.16707-100000@server1.lfw.org>

On Fri, 17 Mar 2000, Ka-Ping Yee wrote:
> 
>     c.take(item) is equivalent to
> 
>         if item not in c: c.append(item)
> 
>     c.drop(item) is equivalent to
> 
>         while item in c: c.remove(item)

I think i've decided that i like the verb "include" much better than
the rather vague word "take".  Perhaps this also suggests "exclude"
instead of "drop".


-- ?!ng


From klm at digicool.com  Sat Mar 18 01:32:56 2000
From: klm at digicool.com (Ken Manheimer)
Date: Fri, 17 Mar 2000 19:32:56 -0500 (EST)
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003171053290.16648-100000@server1.lfw.org>
Message-ID: <Pine.LNX.4.21.0003171909080.3101-100000@korak.digicool.com>

On Fri, 17 Mar 2000, Ka-Ping Yee wrote:

> On Fri, 17 Mar 2000, David Ascher wrote:
> > > I think the semantics would be pretty understandable and simple to
> > > explain, which is the main thing.
> > >
> > > Any thoughts?
> > 
> > Would
> > 
> > 	(a,b) in Set
> > 
> > return true of (a,b) was a subset of Set, or if (a,b) was an element of Set?
> 
> This would return true if (a, b) was an element of the set --
> exactly the same semantics as we currently have for lists.

I really like the idea of using dynamically-tuned lists provide set
functionality!  I often wind up needing something like set functionality,
and implementing little convenience routines (unique, difference, etc)
repeatedly.  I don't mind that so much, but the frequency signifies that
i, at least, would benefit from built-in support for sets...

I guess the question is whether it's practical to come up with a
reasonably adequate, reasonably general dynamic optimization strategy.  
Seems like an interesting challenge - is there prior art?

As ping says, maintaining the existing list semantics handily answers
challenges like david's question.  New methods, like [].subset('a', 'b'),
could provide the desired additional functionality - and contribute to
biasing the object towards set optimization, etc.  Neato!

Ken
klm at digicool.com


From ping at lfw.org  Fri Mar 17 20:02:13 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 17 Mar 2000 13:02:13 -0600 (CST)
Subject: [Python-Dev] Using lists as sets
In-Reply-To: <Pine.LNX.4.21.0003171909080.3101-100000@korak.digicool.com>
Message-ID: <Pine.LNX.4.10.10003171247020.16707-100000@server1.lfw.org>

On Fri, 17 Mar 2000, Ken Manheimer wrote:
> 
> I really like the idea of using dynamically-tuned lists provide set
> functionality!  I often wind up needing something like set functionality,
> and implementing little convenience routines (unique, difference, etc)
> repeatedly.  I don't mind that so much, but the frequency signifies that
> i, at least, would benefit from built-in support for sets...

Greg asked about how to ensure that a given item only appears
once in each list when used as a set, and whether i would
flag the list as "i'm now operating as a set".  My answer is no --
i don't want there to be any visible state on the list.  (It can
internally decide to optimize its behaviour for a particular purpose,
but in no event should this decision ever affect the semantics of
its manifested behaviour.)  Externally visible state puts us back
right where we started -- now the user has to decide what type of
thing she wants to use, and that's more decisions and loaded guns
pointing at feet that we were trying to avoid in the first place.

There's something very nice about there being just two mutable
container types in Python.  As Guido said, the first two types
you learn are lists and dicts, and it's pretty obvious which
one to pick for your purposes, and you can't really go wrong.

I'd like to copy my reply to Greg here because it exposes some of the
philosophy i'm attempting with this proposal:

    You'd trust the client to use take() (or should i say include())
    instead of append().  But, in the end, this wouldn't make any
    difference to the result of "in".  In fact, you could do multisets
    since lists already have count().

    What i'm trying to do is to put together a few very simple pieces
    to get all the behaviour necessary to work with sets, if you want
    it.  I don't want the object itself to have any state that manifests
    itself as "now i'm a set", or "now i'm a list".  You just pick the
    methods you want to use.

    It's just like stacks and queues.  There's no state on the list that
    says "now i'm a stack, so read from the end" or "now i'm a queue,
    so read from the front".  You decide where you want to read items
    by picking the appropriate method, and this lets you get the best
    of both worlds -- flexibility and simplicity.

Back to Ken:
> I guess the question is whether it's practical to come up with a
> reasonably adequate, reasonably general dynamic optimization strategy.  
> Seems like an interesting challenge - is there prior art?

I'd be quite happy with just turning on set optimization when
include() and exclude() get used (nice and predictable).  Maybe you
could provide a set() built-in that would construct you a list with
set optimization turned on, but i'm not too sure if we really want
to expose it that way.


-- ?!ng


From moshez at math.huji.ac.il  Sat Mar 18 06:27:13 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 18 Mar 2000 07:27:13 +0200 (IST)
Subject: [Python-Dev] list.shift()
In-Reply-To: <Pine.LNX.4.10.10003170950440.16448-100000@server1.lfw.org>
Message-ID: <Pine.GSO.4.10.10003180721560.18689-100000@sundial>

On Fri, 17 Mar 2000, Ka-Ping Yee wrote:

> 
> Has list.shift() been proposed?
> 
>     # pretend lists are implemented in Python and 'self' is a list
>     def shift(self):
>         item = self[0]
>         del self[:1]
>         return item
> 
> This would make queues read nicely... use "append" and "pop" for
> a stack, "append" and "shift" for a queue.

Actually, I once thought about writing a Deque in Python for a couple
of hours (I later wrote it, and then threw it away because I had nothing
to do with it, but that isn't my point). So I did write "shift" (though
I'm certain I didn't call it that). It's not as easy to write a
maintainable yet efficient "shift": I got stuck with a pointer to the 
beginning of the "real list" which I incremented on a "shift", and a
complex heuristic for when lists de- and re-allocate.

I think the tradeoffs are shaky enough that it is better to write it in 
pure Python rather then having more functions in C (whether in an old
builtin type rather then a new one). Anyone needing to treat a list as a 
Deque would just construct one

l = Deque(l)

built-in-functions:-just-say-no-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From artcom0!pf at artcom-gmbh.de  Fri Mar 17 23:43:35 2000
From: artcom0!pf at artcom-gmbh.de (artcom0!pf at artcom-gmbh.de)
Date: Fri, 17 Mar 2000 23:43:35 +0100 (MET)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <38D2AAF2.CFBF3A2@beopen.com> from Guido van Rossum at "Mar 17, 2000  5: 0:18 pm"
Message-ID: <m12WICF-000CnCC@artcom0.artcom-gmbh.de>

Ka-Ping Yee wrote:
[...]
> >     # pretend lists are implemented in Python and 'self' is a list
> >     def shift(self):
> >         item = self[0]
> >         del self[:1]
> >         return item
[...]

Guido van Rossum:
> You can do this using list.pop(0).  I don't think the name "shift" is very
> intuitive (smells of sh and Perl :-).  Do we need a new function?

I think no.  But what about this one?:

	# pretend self and dict are dictionaries:
	def supplement(self, dict):
	    for k, v in dict.items():
	        if not self.data.has_key(k):
		    self.data[k] = v

Note the similarities to {}.update(dict), but update replaces existing
entries in self, which is sometimes not desired.  I know, that supplement
can also simulated with:
	tmp = dict.copy()
	tmp.update(self)
	self.data = d
But this is stll a little ugly.  IMO a builtin method to supplement
(complete?) a dictionary with default values from another dictionary 
would sometimes be a useful tool.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From ping at lfw.org  Sat Mar 18 19:48:10 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Sat, 18 Mar 2000 10:48:10 -0800 (PST)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <m12WICF-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.LNX.4.10.10003181047250.1758-100000@localhost>

On Fri, 17 Mar 2000 artcom0!pf at artcom-gmbh.de wrote:
> 
> I think no.  But what about this one?:
> 
> 	# pretend self and dict are dictionaries:
> 	def supplement(self, dict):
> 	    for k, v in dict.items():
> 	        if not self.data.has_key(k):
> 		    self.data[k] = v

I'd go for that.  It would be nice to have a non-overwriting update().
The only issue is the choice of verb; "supplement" sounds pretty
reasonable to me.


-- ?!ng

"If I have not seen as far as others, it is because giants were standing
on my shoulders."
    -- Hal Abelson


From pf at artcom-gmbh.de  Sat Mar 18 20:23:37 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Sat, 18 Mar 2000 20:23:37 +0100 (MET)
Subject: [Python-Dev] dict.supplement()
In-Reply-To: <Pine.LNX.4.10.10003181047250.1758-100000@localhost> from Ka-Ping Yee at "Mar 18, 2000 10:48:10 am"
Message-ID: <m12WOoz-000CnCC@artcom0.artcom-gmbh.de>

Hi!
> > 	# pretend self and dict are dictionaries:
> > 	def supplement(self, dict):
> > 	    for k, v in dict.items():
> > 	        if not self.data.has_key(k):
> > 		    self.data[k] = v
 
Ka-Ping Yee schrieb:
> I'd go for that.  It would be nice to have a non-overwriting update().
> The only issue is the choice of verb; "supplement" sounds pretty
> reasonable to me.

In German we have the verb "erg?nzen" which translates 
either into "supplement" or "complete" (from my  dictionary).  
"supplement" has the disadvantage of being rather long for 
the name of a builtin method.

Nevertheless I've used this in my class derived from UserDict.UserDict.

Now let's witch topic to the recent discussion about Set type:  
you all certainly know, that something similar has been done before by 
Aaron Watters?  see:
  <http://starship.python.net/crew/aaron_watters/kjbuckets/kjbuckets.html>

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From gvwilson at nevex.com  Mon Mar 20 15:52:12 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Mon, 20 Mar 2000 09:52:12 -0500 (EST)
Subject: [Python-Dev] re: Using lists as sets
Message-ID: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>

[After discussion with Ping, and weekend thought]

I would like to vote against using lists as sets:

1. It blurs Python's categorization of containers.  The rest of the world
   thinks of sets as unordered, associative, and binary-valued (a term I
   just made up to mean "containing 0 or 1 instance of X").  Lists, on the
   other hand, are ordered, positionally-indexed, and multi-valued.
   While a list is always a legal queue or stack (although lists permit
   state transitions that are illegal for queues or stacks), most lists
   are not legal sets.

2. Python has, in dictionaries, a much more logical starting point for
   sets.  A set is exactly a dictionary whose keys matter, and whose
   values don't.  Adding operations to dictionaries to insert keys, etc.,
   without having to supply a value, naively appears no harder than adding
   operations to lists, and would probably be much easier to explain when
   teaching a class.

3. (Long-term speculation) Even if P3K isn't written in C++, many modules
   for it will be.  It would therefore seem sensible to design P3K in a
   C++-friendly way --- in particular, to align Python's container  
   hierarchy with that used in the Standard Template Library.  Using lists
   as a basis for sets would give Python a very different container type
   hierarchy than the STL, which could make it difficult for automatic
   tools like SWIG to map STL-based things to Python and vice versa.
   Using dictionaries as a basis for sets would seem to be less
   problematic.  (Note that if Wadler et al's Generic Java proposal
   becomes part of that language, an STL clone will almost certainly
   become part of that language, and require JPython interfacing.)

On a semi-related note, can someone explain why programs are not allowed
to iterate directly through the elements of a dictionary:

   for (key, value) in dict:
      ...body...

Thanks,

Greg

      "No XML entities were harmed in the production of this message."


From moshez at math.huji.ac.il  Mon Mar 20 16:03:47 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Mon, 20 Mar 2000 17:03:47 +0200 (IST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>
Message-ID: <Pine.GSO.4.10.10003201656060.29136-100000@sundial>

On Mon, 20 Mar 2000 gvwilson at nevex.com wrote:

> [After discussion with Ping, and weekend thought]
> 
> I would like to vote against using lists as sets:

I'd like to object too, but for slightly different reasons: 20-something
lines of Python can implement a set (I just chacked it) with the new 
__contains__. We can just suply it in the standard library (Set module?)
and be over and done with. 
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From jcw at equi4.com  Mon Mar 20 16:37:19 2000
From: jcw at equi4.com (Jean-Claude Wippler)
Date: Mon, 20 Mar 2000 16:37:19 +0100
Subject: [Python-Dev] re: Using lists as sets
References: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>
Message-ID: <38D645AF.661CA335@equi4.com>

gvwilson at nevex.com wrote:
> 
> [After discussion with Ping, and weekend thought]

[good stuff]

Allow me to offer yet another perspective on this.  I'll keep it short.

Python has sequences (indexable collections) and maps (associative
collections).  C++'s STL has vectors, sets, multi-sets, maps, and
multi-maps.

I find the distinction between these puzzling, and hereby offer another,
somewhat relational-database minded, categorization as food for thought:

- collections consist of objects, each of them with attributes
- the first N attributes form the "key", the rest is the "residue"
- there is also an implicit position attribute, which I'll call "#"
- so an object consists of attributes: (K1,K2,...KN,#,R1,R2,...,RM)
- one more bit of specification is needed: whether # is part of the key

Let me mark the position between key attributes and residue with ":", so
everything before the colon marks the uniquely identifying attributes.

  A vector (sequence) is:  #:R1,R2,...,RM
  A set is:                K1,K2,...KN:
  A multi-set is:          K1,K2,...KN,#:
  A map is:                K1,K2,...KN:#,R1,R2,...,RM
  A multi-map is:          K1,K2,...KN,#:R1,R2,...,RM

And a somewhat esoteric member of this classification:

  A singleton is:          :R1,R2,...,RM

I have no idea what this means for Python, but merely wanted to show how
a relational, eh, "view" on all this might perhaps simplify the issues.

-jcw


From fdrake at acm.org  Mon Mar 20 17:55:59 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 20 Mar 2000 11:55:59 -0500 (EST)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <m12WICF-000CnCC@artcom0.artcom-gmbh.de>
References: <38D2AAF2.CFBF3A2@beopen.com>
	<m12WICF-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <14550.22559.550660.403909@weyr.cnri.reston.va.us>

artcom0!pf at artcom-gmbh.de writes:
 > Note the similarities to {}.update(dict), but update replaces existing
 > entries in self, which is sometimes not desired.  I know, that supplement
 > can also simulated with:

Peter,
  I like this!

 > 	tmp = dict.copy()
 > 	tmp.update(self)
 > 	self.data = d

  I presume you mean "self.data = tmp"; "self.data.update(tmp)" would
be just a little more robust, at the cost of an additional update.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From tismer at tismer.com  Mon Mar 20 18:10:34 2000
From: tismer at tismer.com (Christian Tismer)
Date: Mon, 20 Mar 2000 18:10:34 +0100
Subject: [Python-Dev] re: Using lists as sets
References: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com> <38D645AF.661CA335@equi4.com>
Message-ID: <38D65B8A.50B81D08@tismer.com>


Jean-Claude Wippler wrote:
[relational notation]

>   A vector (sequence) is:  #:R1,R2,...,RM
>   A set is:                K1,K2,...KN:
>   A multi-set is:          K1,K2,...KN,#:
>   A map is:                K1,K2,...KN:#,R1,R2,...,RM
>   A multi-map is:          K1,K2,...KN,#:R1,R2,...,RM

This is a nice classification!
To my understanding, why not
   A map is:                K1,K2,...KN:R1,R2,...,RM

Where is a # in a map?

And what do you mean by N and M?
Is K1..KN one key, mae up of N sub keys, or do you mean the
whole set of keys, where each one is mapped somehow.
I guess not, the notation looks like I should think of tuples.
No, that would imply that N and M were fixed, but they are not.
But you say
"- collections consist of objects, each of them with attributes".
Ok, N and M seem to be individual for each object, right?

But when defining a map for instance, and we're talking of the
objects, then the map is the set of these objects, and I have to
think of
  K[0]..K(N(o)):R[0]..R(M(o))
where N and M are functions of the individual object o, right?

Isn't it then better to think different of these objects, saying
they can produce some key object and some value object of any
shape, and a position, where each of these can be missing?

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From jeremy at cnri.reston.va.us  Mon Mar 20 18:28:28 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Mon, 20 Mar 2000 12:28:28 -0500 (EST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com>
Message-ID: <14550.24508.341533.908941@goon.cnri.reston.va.us>

>>>>> "GVW" == gvwilson  <gvwilson at nevex.com> writes:

  GVW> On a semi-related note, can someone explain why programs are
  GVW> not allowed to iterate directly through the elements of a
  GVW> dictionary:

  GVW>    for (key, value) in dict:
              ...body...

Pythonic design rules #2:
     Explicit is better than implicit.

There are at least three "natural" ways to interpret "for ... in dict:"
In addition to the version that strikes you as most natural, some
people also imagine that a for loop should iterate over the keys or the
values.  Instead of guessing, Python provides explicit methods for
each possibility: items, keys, values.

Yet another possibility, implemented in early versions of JPython and
later removed, was to treat a dictionary exactly like a list: Call
__getitem__(0), then 1, ..., until a KeyError was raised.  In other
words, a dictionary could behave like a list provided that it had
integer keys.

Jeremy


From jcw at equi4.com  Mon Mar 20 18:56:44 2000
From: jcw at equi4.com (Jean-Claude Wippler)
Date: Mon, 20 Mar 2000 18:56:44 +0100
Subject: [Python-Dev] re: Using lists as sets
References: <Pine.LNX.4.10.10003200934071.14606-100000@akbar.nevex.com> <38D645AF.661CA335@equi4.com> <38D65B8A.50B81D08@tismer.com>
Message-ID: <38D6665C.ECDE09DE@equi4.com>

Christian,

>    A map is:                K1,K2,...KN:R1,R2,...,RM

Yes, my list was inconsistent.

> Is K1..KN one key, made up of N sub keys, or do you mean the
> whole set of keys, where each one is mapped somehow.
[...]
> Ok, N and M seem to be individual for each object, right?
[...] 
> Isn't it then better to think different of these objects, saying
> they can produce some key object and some value object of any
> shape, and a position, where each of these can be missing?

Depends on your perspective.  In the relational world, the (K1,...,KN)
attributes identify the object, but they are not themselves considered
an object.  In OO-land, (K1,...,KN) is an object, and a map takes such
as an object as input and delivers (R1,...,RM) as result.

This tension shows the boundary of both relational and OO models, IMO.
I wish it'd be possible to unify them, but I haven't figured it out.

-jcw, concept maverick / fool on the hill - pick one :)


From pf at artcom-gmbh.de  Mon Mar 20 19:28:17 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Mon, 20 Mar 2000 19:28:17 +0100 (MET)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <14550.22559.550660.403909@weyr.cnri.reston.va.us> from "Fred L. Drake, Jr." at "Mar 20, 2000 11:55:59 am"
Message-ID: <m12X6uX-000CnCC@artcom0.artcom-gmbh.de>

I wrote:
>  > Note the similarities to {}.update(dict), but update replaces existing
>  > entries in self, which is sometimes not desired.  I know, that supplement
>  > can also simulated with:
> 
Fred L. Drake, Jr.:
> Peter,
>   I like this!
> 
>  > 	tmp = dict.copy()
>  > 	tmp.update(self)
>  > 	self.data = d
> 
>   I presume you mean "self.data = tmp"; "self.data.update(tmp)" would
> be just a little more robust, at the cost of an additional update.

Ouppss... I should have tested this before posting.  But currently I use 
the more explicit (and probably slower version) in my code:

class ConfigDict(UserDict.UserDict):
    def supplement(self, defaults):
    	for k, v in defaults.items():
	    if not self.data.has_key(k):
		self.data[k] = v

Works fine so far, although it requires usually an additional copy operation.
Consider another example, where arbitrary instance attributes should be
specified as keyword arguments to the constructor:

  >>> class Example:
  ...     _defaults = {'a': 1, 'b': 2}
  ...     _config = _defaults
  ...     def __init__(self, **kw):
  ...         if kw:
  ...             self._config = self._defaults.copy()
  ...             self._config.update(kw)
  ... 
  >>> A = Example(a=12345)
  >>> A._config
  {'b': 2, 'a': 12345}
  >>> B = Example(c=3)
  >>> B._config
  {'b': 2, 'c': 3, 'a': 1}

If 'supplement' were a dictionary builtin method, this would become simply:
	kw.supplement(self._defaults)
	self._config = kw

Unfortunately this can't be achieved using a wrapper class like UserDict,
since the **kw argument is always a builtin dictionary object.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, 27777 Ganderkesee, Tel: 04222 9502 70, Fax: -60


From ping at lfw.org  Mon Mar 20 13:36:34 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Mon, 20 Mar 2000 06:36:34 -0600 (CST)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <m12X6uX-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.LNX.4.10.10003200634430.22891-100000@server1.lfw.org>

On Mon, 20 Mar 2000, Peter Funk wrote:
> Consider another example, where arbitrary instance attributes should be
> specified as keyword arguments to the constructor:
> 
>   >>> class Example:
>   ...     _defaults = {'a': 1, 'b': 2}
>   ...     _config = _defaults
>   ...     def __init__(self, **kw):
>   ...         if kw:
>   ...             self._config = self._defaults.copy()
>   ...             self._config.update(kw)

Yes!  I do this all the time.  I wrote a user-interface module
to take care of exactly this kind of hassle when creating lots
of UI components.  When you're making UI, you can easily drown in
keyword arguments and default values if you're not careful.


-- ?!ng


From fdrake at acm.org  Mon Mar 20 20:02:48 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 20 Mar 2000 14:02:48 -0500 (EST)
Subject: [Python-Dev] dict.supplement() (was Re: list.shift())
In-Reply-To: <m12X6uX-000CnCC@artcom0.artcom-gmbh.de>
References: <14550.22559.550660.403909@weyr.cnri.reston.va.us>
	<m12X6uX-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <14550.30168.129259.356581@weyr.cnri.reston.va.us>

Peter Funk writes:
 > Ouppss... I should have tested this before posting.  But currently I use 
 > the more explicit (and probably slower version) in my code:

  The performance is based entirely on the size of each; in the
(probably typical) case of smallish dictionaries (<50 entries), it's
probably cheaper to use a temporary dict and do the update.
  For large dicts (on the defaults side), it may make more sense to
reduce the number of objects that need to be created:

       target = ...
       has_key = target.has_key
       for key in defaults.keys():
           if not has_key(key):
               target[key] = defaults[key]

  This saves the construction of len(defaults) 2-tuples.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From moshez at math.huji.ac.il  Mon Mar 20 20:23:01 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Mon, 20 Mar 2000 21:23:01 +0200 (IST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <14550.24508.341533.908941@goon.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003202118470.4407-100000@sundial>

On Mon, 20 Mar 2000, Jeremy Hylton wrote:

> Yet another possibility, implemented in early versions of JPython and
> later removed, was to treat a dictionary exactly like a list: Call
> __getitem__(0), then 1, ..., until a KeyError was raised.  In other
> words, a dictionary could behave like a list provided that it had
> integer keys.

Two remarks: Jeremy meant "consecutive natural keys starting with 0",
(yes, I've managed to learn mind-reading from the timbot) and that (the
following is considered a misfeature):

import UserDict
a = UserDict.UserDict()
a[0]="hello"
a[1]="world"

for word in a:
	print word

Will print "hello", "world", and then die with KeyError.
I realize why this is happening, and realize it could only be fixed in
Py3K. However, a temporary (though not 100% backwards compatible) fix is
that "for" will catch LookupError, rather then IndexError.

Any comments?
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mhammond at skippinet.com.au  Mon Mar 20 20:39:31 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Mon, 20 Mar 2000 11:39:31 -0800
Subject: [Python-Dev] Unicode and Windows
Message-ID: <ECEPKNMJLHAPFFJHDOJBAENNCGAA.mhammond@skippinet.com.au>

I would like to discuss Unicode on the Windows platform, and how it relates
to MBCS that Windows uses.

My main goal here is to ensure that Unicode on Windows can make a round-trip
to and from native Unicode stores.  As an example, let's take the registry -
a Windows user should be able to read a Unicode value from the registry then
write it back.  The value written back should be _identical_ to the value
read.  Ditto for the file system: If the filesystem is Unicode, then I would
expect the following code:
  for fname in os.listdir():
    f = open(fname + ".tmp", "w")

To create filenames on the filesystem with the exact base name even when the
basename contains non-ascii characters.


However, the Unicode patches do not appear to make this possible.  open()
uses PyArg_ParseTuple(args, "s...");  PyArg_ParseTuple() will automatically
convert a Unicode object to UTF-8, so we end up passing a UTF-8 encoded
string to the C runtime fopen function.

The end result of all this is that we end up with UTF-8 encoded names in the
registry/on the file system.  It does not seem possible to get a true
Unicode string onto either the file system or in the registry.

Unfortunately, Im not experienced enough to know the full ramifications, but
it _appears_ that on Windows the default "unicode to string" translation
should be done via the WideCharToMultiByte() API.  This will then pass an
MBCS encoded ascii string to Windows, and the "right thing" should magically
happen.  Unfortunately, MBCS encoding is dependant on the current locale
(ie, one MBCS sequence will mean completely different things depending on
the locale).  I dont see a portability issue here, as the documentation
could state that "Unicode->ASCII conversions use the most appropriate
conversion for the platform.  If the platform is not Unicode aware, then
UTF-8 will be used."

This issue is the final one before I release the win32reg module.  It seems
_critical_ to me that if Python supports Unicode and the platform supports
Unicode, then Python unicode values must be capable of being passed to the
platform.  For the win32reg module I could quite possibly hack around the
problem, but the more general problem (categorized by the open() example
above) still remains...

Any thoughts?

Mark.


From jeremy at cnri.reston.va.us  Mon Mar 20 20:51:28 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Mon, 20 Mar 2000 14:51:28 -0500 (EST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <Pine.GSO.4.10.10003202118470.4407-100000@sundial>
References: <14550.24508.341533.908941@goon.cnri.reston.va.us>
	<Pine.GSO.4.10.10003202118470.4407-100000@sundial>
Message-ID: <14550.33088.110785.78631@goon.cnri.reston.va.us>

>>>>> "MZ" == Moshe Zadka <moshez at math.huji.ac.il> writes:

  MZ> On Mon, 20 Mar 2000, Jeremy Hylton wrote:
  >> Yet another possibility, implemented in early versions of JPython
  >> and later removed, was to treat a dictionary exactly like a list:
  >> Call __getitem__(0), then 1, ..., until a KeyError was raised.
  >> In other words, a dictionary could behave like a list provided
  >> that it had integer keys.

  MZ> Two remarks: Jeremy meant "consecutive natural keys starting
  MZ> with 0", (yes, I've managed to learn mind-reading from the
  MZ> timbot) 

I suppose I meant that (perhaps you can read my mind as well as I
can);  I also meant using values of Python's integer datatype :-).


and that (the following is considered a misfeature):

  MZ> import UserDict 
  MZ> a = UserDict.UserDict() 
  MZ> a[0]="hello"
  MZ> a[1]="world"

  MZ> for word in a: print word

  MZ> Will print "hello", "world", and then die with KeyError.  I
  MZ> realize why this is happening, and realize it could only be
  MZ> fixed in Py3K. However, a temporary (though not 100% backwards
  MZ> compatible) fix is that "for" will catch LookupError, rather
  MZ> then IndexError.

I'm not sure what you mean by "fix."  (Please read your mind for me
<wink>.)  I think by fix you mean, "allow the broken code above to
execute without raising an exception."  Yuck!

As far as I can tell, the problem is caused by the special
way that a for loop uses the __getitem__ protocol.  There are two
related issues that lead to confusion.

In cases other than for loops, __getitem__ is invoked when the
syntactic construct x[i] is used.  This means either lookup in a list
or in a dict depending on the type of x.  If it is a list, the index
must be an integer and IndexError can be raised.  If it is a dict, the
index can be anything (even an unhashable type; TypeError is only
raised by insertion for this case) and KeyError can be raised.

In a for loop, the same protocol (__getitem__) is used, but with the
special convention that the object should be a sequence.  Python will
detect when you try to use a builtin type that is not a sequence,
e.g. a dictionary.  If the for loop iterates over an instance type
rather than a builtin type, there is no way to check whether the
__getitem__ protocol is being implemented by a sequence or a mapping.

The right solution, I think, is to allow a means for stating
explicitly whether a class with an __getitem__ method is a sequence or
a mapping (or both?).  Then UserDict can declare itself to be a
mapping and using it in a for loop will raise the TypeError, "loop
over non-sequence" (which has a standard meaning defined in Skip's
catalog <0.8 wink>).

I believe this is where types-vs.-classes meets
subtyping-vs.-inheritance.  I suspect that the right solution, circa
Py3K, is that classes must explicitly state what types they are
subtypes of or what interfaces they implement.

Jeremy


From moshez at math.huji.ac.il  Mon Mar 20 21:13:20 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Mon, 20 Mar 2000 22:13:20 +0200 (IST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <14550.33088.110785.78631@goon.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003202205420.4980-100000@sundial>

On Mon, 20 Mar 2000, Jeremy Hylton wrote:

> I'm not sure what you mean by "fix."

I mean any sane behaviour -- either failing on TypeError at the beginning, 
like "for" does, or executing without raising an exception. Raising an
exception in the middle which is imminent is definitely (for the right
values of definitely) a suprising behaviour (I know it suprised me!).

>I think by fix you mean, "allow the broken code above to
> execute without raising an exception."  Yuck!

I agree it is yucky -- it is all a weird echo of the yuckiness of the
type/class dichotomy. What I suggested it a temporary patch...
 
> As far as I can tell, the problem is caused by the special
> way that a for loop uses the __getitem__ protocol.

Well, my look is that it is caused by the fact __getitem__ is used both
for the sequence protocol and the mapping protocol (well, I'm cheating
through my teeth here, but you understand what I mean <wink>)

Agreed though, that the whole iteration protocol should be revisited --
but that is a subject for another post.

> The right solution, I think, is to allow a means for stating
> explicitly whether a class with an __getitem__ method is a sequence or
> a mapping (or both?).

And this is the fix I wanted for Py3K (details to be debated, still).
See? You read my mind perfectly.

> I suspect that the right solution, circa
> Py3K, is that classes must explicitly state what types they are
> subtypes of or what interfaces they implement.

Exactly. And have subclassable built-in classes in the same fell swoop.

getting-all-excited-for-py3k-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping at lfw.org  Mon Mar 20 15:34:12 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Mon, 20 Mar 2000 08:34:12 -0600 (CST)
Subject: [Python-Dev] Set options
Message-ID: <Pine.LNX.4.10.10003200805080.23306-100000@server1.lfw.org>

I think that at this point the possibilities for doing sets
come down to four options:


1. use lists

    visible changes:   new methods l.include, l.exclude

    invisible changes: faster 'in'

    usage:             s = [1, 2], s.include(3), s.exclude(3),
                       if item in s, for item in s

2. use dicts

    visible changes:   for/if x in dict means keys
                       accept dicts without values (e.g. {1, 2})
                       new special non-printing value ": Present"
                       new method d.insert(x) means d[x] = Present

    invisible changes: none

    usage:             s = {1, 2}, s.insert(3), del s[3],
                       if item in s, for item in s

3. new type

    visible changes:   set() built-in
                       new <type 'set'> with methods .insert, .remove

    invisible changes: none

    usage:             s = set(1, 2), s.insert(3), s.remove(3)
                       if item in s, for item in s

4. do nothing

    visible changes:   none

    invisible changes: none

    usage:             s = {1: 1, 2: 1}, s[3] = 1, del s[3],
                       if s.has_key(item), for item in s.keys()


Let me say a couple of things about #1 and #2.  I'm happy with both.
I quite like the idea of using dicts this way (#2), in fact -- i
think it was the first idea i remember chatting about.

If i remember correctly, Guido's objection to #2 was that "in" on
a dictionary would work on the keys, which isn't consistent with
the fact that "in" on a list works on the values.

However, this doesn't really bother me at all.  It's a very simple
rule, especially when you think of how people understand dictionaries.
If you hand someone a *real* dictionary, and ask them

    Is the word "python" in the dictionary?
    
they'll go look up "python" in the *keys* of the dictionary (the
words), not the values (the definitions).

So i'm quite all right with saying

    for x in dict:

and having that loop over the keys, or saying

    if x in dict:

and having that check whether x is a valid key.  It makes perfect
sense to me.  My main issue with #2 was that sets would print like

    {"Alice": 1, "Bob": 1, "Ted": 1}

and this would look weird.  However, as Greg explained to me, it
would be possible to introduce a default value to go with set
members that just says "i'm here", such as 'Present' (read as:
"Alice" is present in the set) or 'Member' or even 'None', and
this value wouldn't print out -- thus

    s = {"Bob"}
    s.include("Alice")
    print s

would produce

    {"Alice", "Bob"}

representing a dictionary that actually contained

    {"Alice": Present, "Bob": Present}

You'd construct set constants like this too:

    {2, 4, 7}

Using dicts this way (rather than having a separate set type
that just happened to be spelled with {}) avoids the parsing
issue: no need for look-ahead; you just toss in "Present" when
the text doesn't supply a colon, and move on.

I'd be okay with this, though i'm not sure everyone would; and
together with Guido's initial objection, that's what motivated me
to propose the lists-as-sets thing: fewer changes all around, no
ambiguities introduced -- just two new methods, and we're done.

Hmm.

I know someone who's just learning Python.  I will attempt to
ask some questions about what she would find natural, and see
if that reveals anything interesting.


-- ?!ng


From bwarsaw at cnri.reston.va.us  Mon Mar 20 23:01:00 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Mon, 20 Mar 2000 17:01:00 -0500 (EST)
Subject: [Python-Dev] re: Using lists as sets
References: <14550.24508.341533.908941@goon.cnri.reston.va.us>
	<Pine.GSO.4.10.10003202118470.4407-100000@sundial>
	<14550.33088.110785.78631@goon.cnri.reston.va.us>
Message-ID: <14550.40860.72418.648591@anthem.cnri.reston.va.us>

>>>>> "JH" == Jeremy Hylton <jeremy at cnri.reston.va.us> writes:

    JH> As far as I can tell, the problem is caused by the special way
    JH> that a for loop uses the __getitem__ protocol.  There are two
    JH> related issues that lead to confusion.

>>>>> "MZ" == Moshe Zadka <moshez at math.huji.ac.il> writes:

    MZ> Well, my look is that it is caused by the fact __getitem__ is
    MZ> used both for the sequence protocol and the mapping protocol

Right.

    MZ> Agreed though, that the whole iteration protocol should be
    MZ> revisited -- but that is a subject for another post.

Yup.

    JH> The right solution, I think, is to allow a means for stating
    JH> explicitly whether a class with an __getitem__ method is a
    JH> sequence or a mapping (or both?).

Or should the two protocol use different method names (code breakage!).

    JH> I believe this is where types-vs.-classes meets
    JH> subtyping-vs.-inheritance.

meets protocols-vs.-interfaces.


From moshez at math.huji.ac.il  Tue Mar 21 06:16:00 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 21 Mar 2000 07:16:00 +0200 (IST)
Subject: [Python-Dev] re: Using lists as sets
In-Reply-To: <14550.40860.72418.648591@anthem.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003210712270.8637-100000@sundial>

On Mon, 20 Mar 2000, Barry A. Warsaw wrote:

>     MZ> Agreed though, that the whole iteration protocol should be
>     MZ> revisited -- but that is a subject for another post.
> 
> Yup.

(Go Stackless, go!?)

>     JH> I believe this is where types-vs.-classes meets
>     JH> subtyping-vs.-inheritance.
> 
> meets protocols-vs.-interfaces.

It took me 5 minutes of intensive thinking just to understand what Barry
meant. Just wait until we introduce Sather-like "supertypes" (which are
pretty Pythonic, IMHO)

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Tue Mar 21 06:21:24 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 21 Mar 2000 07:21:24 +0200 (IST)
Subject: [Python-Dev] Set options
In-Reply-To: <Pine.LNX.4.10.10003200805080.23306-100000@server1.lfw.org>
Message-ID: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>

On Mon, 20 Mar 2000, Ka-Ping Yee wrote:

> I think that at this point the possibilities for doing sets
> come down to four options:
> 
> 
> 1. use lists
> 2. use dicts
> 3. new type
> 4. do nothing

5. new Python module with a class "Set"
(The issues are similar to #3, but this has the advantage of not changing
the interpreter)
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mal at lemburg.com  Tue Mar 21 01:25:09 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 21 Mar 2000 01:25:09 +0100
Subject: [Python-Dev] Unicode and Windows
References: <ECEPKNMJLHAPFFJHDOJBAENNCGAA.mhammond@skippinet.com.au>
Message-ID: <38D6C165.EEF58232@lemburg.com>

Mark Hammond wrote:
> 
> I would like to discuss Unicode on the Windows platform, and how it relates
> to MBCS that Windows uses.
> 
> My main goal here is to ensure that Unicode on Windows can make a round-trip
> to and from native Unicode stores.  As an example, let's take the registry -
> a Windows user should be able to read a Unicode value from the registry then
> write it back.  The value written back should be _identical_ to the value
> read.  Ditto for the file system: If the filesystem is Unicode, then I would
> expect the following code:
>   for fname in os.listdir():
>     f = open(fname + ".tmp", "w")
> 
> To create filenames on the filesystem with the exact base name even when the
> basename contains non-ascii characters.
> 
> However, the Unicode patches do not appear to make this possible.  open()
> uses PyArg_ParseTuple(args, "s...");  PyArg_ParseTuple() will automatically
> convert a Unicode object to UTF-8, so we end up passing a UTF-8 encoded
> string to the C runtime fopen function.

Right. The idea with open() was to write a special version (using
#ifdefs) for use on Windows platforms which does all the needed
magic to convert Unicode to whatever the native format and locale
is...

Using parser markers for this is obviously *not* the right way
to get to the core of the problem. Basically, you will have to
write a helper which takes a string, Unicode or some other
"t" compatible object as name object and then converts it to
the system's view of things.

I think we had a private discussion about this a few months ago:
there was some way to convert Unicode to a platform independent
format which then got converted to MBCS -- don't remember the details
though.

> The end result of all this is that we end up with UTF-8 encoded names in the
> registry/on the file system.  It does not seem possible to get a true
> Unicode string onto either the file system or in the registry.
> 
> Unfortunately, Im not experienced enough to know the full ramifications, but
> it _appears_ that on Windows the default "unicode to string" translation
> should be done via the WideCharToMultiByte() API.  This will then pass an
> MBCS encoded ascii string to Windows, and the "right thing" should magically
> happen.  Unfortunately, MBCS encoding is dependant on the current locale
> (ie, one MBCS sequence will mean completely different things depending on
> the locale).  I dont see a portability issue here, as the documentation
> could state that "Unicode->ASCII conversions use the most appropriate
> conversion for the platform.  If the platform is not Unicode aware, then
> UTF-8 will be used."

No, no, no... :-) The default should be (and is) UTF-8 on all platforms
-- whether the platform supports Unicode or not. If a platform
uses a different encoding, an encoder should be used which applies
the needed transformation.
 
> This issue is the final one before I release the win32reg module.  It seems
> _critical_ to me that if Python supports Unicode and the platform supports
> Unicode, then Python unicode values must be capable of being passed to the
> platform.  For the win32reg module I could quite possibly hack around the
> problem, but the more general problem (categorized by the open() example
> above) still remains...
> 
> Any thoughts?

Can't you use the wchar_t interfaces for the task (see
the unicodeobject.h file for details) ? Perhaps you can
first transfer Unicode to wchar_t and then on to MBCS
using a win32 API ?!

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Tue Mar 21 10:27:56 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 21 Mar 2000 10:27:56 +0100
Subject: [Python-Dev] Set options
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
Message-ID: <38D7409C.169B0C42@lemburg.com>

Moshe Zadka wrote:
> 
> On Mon, 20 Mar 2000, Ka-Ping Yee wrote:
> 
> > I think that at this point the possibilities for doing sets
> > come down to four options:
> >
> >
> > 1. use lists
> > 2. use dicts
> > 3. new type
> > 4. do nothing
> 
> 5. new Python module with a class "Set"
> (The issues are similar to #3, but this has the advantage of not changing
> the interpreter)

Perhaps someone could take Aaron's kjbuckets and write
a Python emulation for it (I think he's even already done something
like this for gadfly). Then the emulation could go into the
core and if people want speed they can install his extension
(the emulation would have to detect this and use the real thing
then).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack at oratrix.nl  Tue Mar 21 12:54:30 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Tue, 21 Mar 2000 12:54:30 +0100
Subject: [Python-Dev] Unicode and Windows 
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
	     Tue, 21 Mar 2000 01:25:09 +0100 , <38D6C165.EEF58232@lemburg.com> 
Message-ID: <20000321115430.88A11370CF2@snelboot.oratrix.nl>

I guess we need another format specifier than "s" here. "s" does the 
conversion to standard-python-utf8 for wide strings, and we'd need another 
format for conversion to current-local-os-convention-8-bit-encoding-of-unicode-
strings.

I assume that that would also come in handy for MacOS, where we'll have the 
same problem (filenames are in Apple's proprietary 8bit encoding).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal at lemburg.com  Tue Mar 21 13:14:54 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 21 Mar 2000 13:14:54 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000321115430.88A11370CF2@snelboot.oratrix.nl>
Message-ID: <38D767BE.C45F8286@lemburg.com>

Jack Jansen wrote:
> 
> I guess we need another format specifier than "s" here. "s" does the
> conversion to standard-python-utf8 for wide strings,

Actually, "t" does the UTF-8 conversion... "s" will give you
the raw internal UTF-16 representation in platform byte order.

> and we'd need another
> format for conversion to current-local-os-convention-8-bit-encoding-of-unicode-
> strings.

I'd suggest adding some king of generic

	PyOS_FilenameFromObject(PyObject *v,
				void *buffer,
				int buffer_len)

API for the conversion of strings, Unicode and text buffers
to an OS dependent filename buffer.

And/or perhaps sepcific APIs for each OS... e.g.

	PyOS_MBCSFromObject() (only on WinXX)
	PyOS_AppleFromObject() (only on Mac ;)

> I assume that that would also come in handy for MacOS, where we'll have the
> same problem (filenames are in Apple's proprietary 8bit encoding).

Is that encoding already supported by the encodings package ?
If not, could you point me to a map file for the encoding ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake at acm.org  Tue Mar 21 15:56:47 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue, 21 Mar 2000 09:56:47 -0500 (EST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <38D767BE.C45F8286@lemburg.com>
References: <20000321115430.88A11370CF2@snelboot.oratrix.nl>
	<38D767BE.C45F8286@lemburg.com>
Message-ID: <14551.36271.33825.841965@weyr.cnri.reston.va.us>

M.-A. Lemburg writes:
 > And/or perhaps sepcific APIs for each OS... e.g.
 > 
 > 	PyOS_MBCSFromObject() (only on WinXX)
 > 	PyOS_AppleFromObject() (only on Mac ;)

  Another approach may be to add some format modifiers:

	te -- text in an encoding specified by a C string (somewhat
              similar to O&)
        tE -- text, encoding specified by a Python object (probably a
              string passed as a parameter or stored from some other
              call)

  (I'd prefer the [eE] before the t, but the O modifiers follow, so
consistency requires this ugly construct.)
  This brings up the issue of using a hidden conversion function which 
may create a new object that needs the same lifetime guarantees as the 
real parameters; we discussed this issue a month or two ago.
  Somewhere, there's a call context that includes the actual parameter 
tuple.  PyArg_ParseTuple() could have access to a "scratch" area where
it could place objects constructed during parameter parsing.  This
area could just be a hidden tuple.  When the C call returns, the
scratch area can be discarded.
  The difficulty is in giving PyArg_ParseTuple() access to the scratch 
area, but I don't know how hard that would be off the top of my head.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From jeremy at cnri.reston.va.us  Tue Mar 21 18:14:07 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Tue, 21 Mar 2000 12:14:07 -0500 (EST)
Subject: [Python-Dev] Set options
In-Reply-To: <38D7409C.169B0C42@lemburg.com>
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
	<38D7409C.169B0C42@lemburg.com>
Message-ID: <14551.44511.805860.808811@goon.cnri.reston.va.us>

>>>>> "MAL" == M -A Lemburg <mal at lemburg.com> writes:

  MAL> Perhaps someone could take Aaron's kjbuckets and write a Python
  MAL> emulation for it (I think he's even already done something like
  MAL> this for gadfly). Then the emulation could go into the core and
  MAL> if people want speed they can install his extension (the
  MAL> emulation would have to detect this and use the real thing
  MAL> then).

I've been waiting for Tim Peters to say something about sets, but I'll
chime in with what I recall him saying last time a discussion like
this came up on c.l.py.  (I may misremember, in which case I'll at
least draw him into the discussion in order to correct me <0.5 wink>.)

The problem with a set module is that there are a number of different
ways to implement them -- in C using kjbuckets is one example.  Each
approach is appropriate for some applications, but not for every one.
A set is pretty simple to build from a list or a dictionary, so we
leave it to application writers to write the one that is appropriate
for their application.

Jeremy


From skip at mojam.com  Tue Mar 21 18:25:57 2000
From: skip at mojam.com (Skip Montanaro)
Date: Tue, 21 Mar 2000 11:25:57 -0600 (CST)
Subject: [Python-Dev] Set options
In-Reply-To: <38D7409C.169B0C42@lemburg.com>
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
	<38D7409C.169B0C42@lemburg.com>
Message-ID: <14551.45221.447838.534003@beluga.mojam.com>

    Marc> Perhaps someone could take Aaron's kjbuckets and write a Python
    Marc> emulation for it ...

Any reason why kjbuckets and friends have never been placed in the core?
If, as it seems from the discussion, a set type is a good thing to add to
the core, it seems to me that Aaron's code would be a good candidate
implementation/foundation. 

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From bwarsaw at cnri.reston.va.us  Tue Mar 21 18:47:49 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 21 Mar 2000 12:47:49 -0500 (EST)
Subject: [Python-Dev] Set options
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
	<38D7409C.169B0C42@lemburg.com>
	<14551.45221.447838.534003@beluga.mojam.com>
Message-ID: <14551.46533.918688.13801@anthem.cnri.reston.va.us>

>>>>> "SM" == Skip Montanaro <skip at mojam.com> writes:

    SM> Any reason why kjbuckets and friends have never been placed in
    SM> the core?  If, as it seems from the discussion, a set type is
    SM> a good thing to add to the core, it seems to me that Aaron's
    SM> code would be a good candidate implementation/foundation.

It would seem to me that distutils is a better way to go for
kjbuckets.  The core already has basic sets (via dictionaries).  We're
pretty much just quibbling about efficiency, API, and syntax, aren't
we?

-Barry


From mhammond at skippinet.com.au  Tue Mar 21 18:48:06 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue, 21 Mar 2000 09:48:06 -0800
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <38D6C165.EEF58232@lemburg.com>
Message-ID: <ECEPKNMJLHAPFFJHDOJBKEOKCGAA.mhammond@skippinet.com.au>

>
> Right. The idea with open() was to write a special version (using
> #ifdefs) for use on Windows platforms which does all the needed
> magic to convert Unicode to whatever the native format and locale
> is...

That works for open() - but what about other extension modules?

This seems to imply that any Python extension on Windows that wants to pass
a Unicode string to an external function can not use PyArg_ParseTuple() with
anything other than "O", and perform the magic themselves.

This just seems a little back-to-front to me.  Platforms that have _no_
native Unicode support have useful utilities for working with Unicode.
Platforms that _do_ have native Unicode support can not make use of these
utilities.  Is this by design, or simply a sad side-effect of the design?

So - it is trivial to use Unicode on platforms that dont support it, but
quite difficult on platforms that do.

> Using parser markers for this is obviously *not* the right way
> to get to the core of the problem. Basically, you will have to
> write a helper which takes a string, Unicode or some other
> "t" compatible object as name object and then converts it to
> the system's view of things.

Why "obviously"?  What on earth does the existing mechamism buy me on
Windows, other than grief that I can not use it?

> I think we had a private discussion about this a few months ago:
> there was some way to convert Unicode to a platform independent
> format which then got converted to MBCS -- don't remember the details
> though.

There is a Win32 API function for this.  However, as you succinctly pointed
out, not many people are going to be aware of its name, or how to use the
multitude of flags offered by these conversion functions, or know how to
deal with the memory management, etc.

> Can't you use the wchar_t interfaces for the task (see
> the unicodeobject.h file for details) ? Perhaps you can
> first transfer Unicode to wchar_t and then on to MBCS
> using a win32 API ?!

Sure - I can.  But can everyone who writes interfaces to Unicode functions?
You wrote the Python Unicode support but dont know its name - pity the poor
Joe Average trying to write an extension.

It seems to me that, on Windows, the Python Unicode support as it stands is
really internal.  I can not think of a single time that an extension writer
on Windows would ever want to use the "t" markers - am I missing something?
I dont believe that a single Unicode-aware function in the Windows
extensions (of which there are _many_) could be changed to use the "t"
markers.

It still seems to me that the Unicode support works well on platforms with
no Unicode support, and is fairly useless on platforms with the support.  I
dont believe that any extension on Windows would want to use the "t"
marker - so, as Fred suggested, how about providing something for us that
can help us interface to the platform's Unicode?

This is getting too hard for me - I will release my windows registry module
without Unicode support, and hope that in the future someone cares enough to
address it, and to add a large number of LOC that will be needed simply to
get Unicode talking to Unicode...

Mark.


From skip at mojam.com  Tue Mar 21 19:04:11 2000
From: skip at mojam.com (Skip Montanaro)
Date: Tue, 21 Mar 2000 12:04:11 -0600 (CST)
Subject: [Python-Dev] Set options
In-Reply-To: <14551.46533.918688.13801@anthem.cnri.reston.va.us>
References: <Pine.GSO.4.10.10003210718020.8637-100000@sundial>
	<38D7409C.169B0C42@lemburg.com>
	<14551.45221.447838.534003@beluga.mojam.com>
	<14551.46533.918688.13801@anthem.cnri.reston.va.us>
Message-ID: <14551.47515.648064.969034@beluga.mojam.com>

    BAW> It would seem to me that distutils is a better way to go for
    BAW> kjbuckets.  The core already has basic sets (via dictionaries).
    BAW> We're pretty much just quibbling about efficiency, API, and syntax,
    BAW> aren't we?

Yes (though I would quibble with your use of the word "quibbling" ;-).  If
new syntax is in the offing as some have proposed, why not go for a more
efficient implementation at the same time?  I believe Aaron has maintained
that kjbuckets is generally more efficient than Python's dictionary object.

Skip


From mal at lemburg.com  Tue Mar 21 18:44:11 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 21 Mar 2000 18:44:11 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000321115430.88A11370CF2@snelboot.oratrix.nl>
		<38D767BE.C45F8286@lemburg.com> <14551.36271.33825.841965@weyr.cnri.reston.va.us>
Message-ID: <38D7B4EB.66DAEBF3@lemburg.com>

"Fred L. Drake, Jr." wrote:
> 
> M.-A. Lemburg writes:
>  > And/or perhaps sepcific APIs for each OS... e.g.
>  >
>  >      PyOS_MBCSFromObject() (only on WinXX)
>  >      PyOS_AppleFromObject() (only on Mac ;)
> 
>   Another approach may be to add some format modifiers:
> 
>         te -- text in an encoding specified by a C string (somewhat
>               similar to O&)
>         tE -- text, encoding specified by a Python object (probably a
>               string passed as a parameter or stored from some other
>               call)
> 
>   (I'd prefer the [eE] before the t, but the O modifiers follow, so
> consistency requires this ugly construct.)
>   This brings up the issue of using a hidden conversion function which
> may create a new object that needs the same lifetime guarantees as the
> real parameters; we discussed this issue a month or two ago.
>   Somewhere, there's a call context that includes the actual parameter
> tuple.  PyArg_ParseTuple() could have access to a "scratch" area where
> it could place objects constructed during parameter parsing.  This
> area could just be a hidden tuple.  When the C call returns, the
> scratch area can be discarded.
>   The difficulty is in giving PyArg_ParseTuple() access to the scratch
> area, but I don't know how hard that would be off the top of my head.

Some time ago, I considered adding "U+" with builtin auto-conversion
to the tuple parser... after some discussion about the error
handling issues involved with this I quickly dropped that idea
again and used the standard "O" approach plus a call to a helper
function which then applied the conversion.

(Note the "+" behind "U": this was intended to indicate that the
returned object has had the refcount incremented and that the
caller must take care of decrementing it again.)

The "O" + helper approach is a little clumsy, but works
just fine. Plus it doesn't add any more overhead to the
already convoluted PyArg_ParseTuple().

BTW, what other external char formats are we talking about ?
E.g. how do you handle MBCS or DBCS under WinXX ? Are there
routines to have wchar_t buffers converted into the two ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gmcm at hypernet.com  Tue Mar 21 19:25:43 2000
From: gmcm at hypernet.com (Gordon McMillan)
Date: Tue, 21 Mar 2000 13:25:43 -0500
Subject: [Python-Dev] Set options
In-Reply-To: <14551.44511.805860.808811@goon.cnri.reston.va.us>
References: <38D7409C.169B0C42@lemburg.com>
Message-ID: <1258459347-36172889@hypernet.com>

Jeremy wrote:

> The problem with a set module is that there are a number of different
> ways to implement them -- in C using kjbuckets is one example.  

Nah. Sets are pretty unambiguous. They're also easy, and 
boring. The interesting stuff is graphs and operations like 
composition, closure and transpositions. That's also where 
stuff gets ambiguous. E.g., what's the right behavior when you 
invert {'a':1,'b':1}? Hint: any answer you give will be met by the 
wrath of God.

I would love this stuff, and as a faithful worshipper of Our Lady 
of Corrugated Ironism, I could probably live with whatever rules 
are arrived at; but I'm afraid I would have to considerably 
enlarge my kill file.


- Gordon


From gstein at lyra.org  Tue Mar 21 19:40:20 2000
From: gstein at lyra.org (Greg Stein)
Date: Tue, 21 Mar 2000 10:40:20 -0800 (PST)
Subject: [Python-Dev] Set options
In-Reply-To: <14551.44511.805860.808811@goon.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003211039420.19728-100000@nebula.lyra.org>

On Tue, 21 Mar 2000, Jeremy Hylton wrote:
> >>>>> "MAL" == M -A Lemburg <mal at lemburg.com> writes:
>   MAL> Perhaps someone could take Aaron's kjbuckets and write a Python
>   MAL> emulation for it (I think he's even already done something like
>   MAL> this for gadfly). Then the emulation could go into the core and
>   MAL> if people want speed they can install his extension (the
>   MAL> emulation would have to detect this and use the real thing
>   MAL> then).
> 
> I've been waiting for Tim Peters to say something about sets, but I'll
> chime in with what I recall him saying last time a discussion like
> this came up on c.l.py.  (I may misremember, in which case I'll at
> least draw him into the discussion in order to correct me <0.5 wink>.)
> 
> The problem with a set module is that there are a number of different
> ways to implement them -- in C using kjbuckets is one example.  Each
> approach is appropriate for some applications, but not for every one.
> A set is pretty simple to build from a list or a dictionary, so we
> leave it to application writers to write the one that is appropriate
> for their application.

Yah... +1 on what Jeremy said.

Leave them out of the distro since we can't do them Right for all people.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From moshez at math.huji.ac.il  Tue Mar 21 19:34:56 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 21 Mar 2000 20:34:56 +0200 (IST)
Subject: [Python-Dev] Set options
In-Reply-To: <14551.47515.648064.969034@beluga.mojam.com>
Message-ID: <Pine.GSO.4.10.10003212027090.28133-100000@sundial>

On Tue, 21 Mar 2000, Skip Montanaro wrote:

>     BAW> It would seem to me that distutils is a better way to go for
>     BAW> kjbuckets.  The core already has basic sets (via dictionaries).
>     BAW> We're pretty much just quibbling about efficiency, API, and syntax,
>     BAW> aren't we?
> 
> If new syntax is in the offing as some have proposed,

FWIW, I'm against new syntax. The core-language has changed quite a lot
between 1.5.2 and 1.6 --

* strings have grown methods
* there are unicode strings
* "in" operator overloadable

The second change even includes a syntax change (u"some string") whose
variants I'm still not familiar enough to comment on (ru"some\string"?
ur"some\string"? Both legal?). I feel too many changes destabilize the
language (this might seem a bit extreme, considering I pushed towards one
of the changes), and we should try to improve on things other then the
core -- one of these is a more hierarchical standard library, and a
standard distribution mechanism, to rival CPAN -- then anyone could 

import data.sets.kjbuckets

With only a trivial 

>>> import dist
>>> dist.install("data.sets.kjbuckets")

> why not go for a more efficient implementation at the same time? 

Because Python dicts are "pretty efficient", and it is not a trivial
question to check optimiality in this area: tests can be rigged to prove
almost anything with the right test-cases, and there's no promise we'll
choose the "right ones".

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Tue Mar 21 19:38:02 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 21 Mar 2000 20:38:02 +0200 (IST)
Subject: [Python-Dev] Set options
In-Reply-To: <1258459347-36172889@hypernet.com>
Message-ID: <Pine.GSO.4.10.10003212036480.28133-100000@sundial>

On Tue, 21 Mar 2000, Gordon McMillan wrote:

> E.g., what's the right behavior when you 
> invert {'a':1,'b':1}? Hint: any answer you give will be met by the 
> wrath of God.

Isn't "wrath of God" translated into Python is "an exception"?

raise ValueError("dictionary is not 1-1") 

seems fine to me.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From skip at mojam.com  Tue Mar 21 19:42:55 2000
From: skip at mojam.com (Skip Montanaro)
Date: Tue, 21 Mar 2000 12:42:55 -0600 (CST)
Subject: [Python-Dev] Set options
In-Reply-To: <Pine.GSO.4.10.10003212027090.28133-100000@sundial>
References: <14551.47515.648064.969034@beluga.mojam.com>
	<Pine.GSO.4.10.10003212027090.28133-100000@sundial>
Message-ID: <14551.49839.377385.99637@beluga.mojam.com>

    Skip> If new syntax is in the offing as some have proposed,

    Moshe> FWIW, I'm against new syntax. The core-language has changed quite
    Moshe> a lot between 1.5.2 and 1.6 --

I thought we were talking about Py3K, where syntax changes are somewhat more
expected.  Just to make things clear, the syntax change I was referring to
was the value-less dict syntax that someone proposed a few days ago:

    myset = {"a", "b", "c"}

Note that I wasn't necessarily supporting the proposal, only acknowledging
that it had been made.

In general, I think we need to keep straight where people feel various
proposals are going to fit.  When a thread goes for more than a few messages 
it's easy to forget.

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From ping at lfw.org  Tue Mar 21 14:07:51 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Tue, 21 Mar 2000 07:07:51 -0600 (CST)
Subject: [Python-Dev] Set options
In-Reply-To: <14551.46533.918688.13801@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003210643440.27995-100000@server1.lfw.org>

Jeremy Hylton wrote:
> The problem with a set module is that there are a number of different
> ways to implement them -- in C using kjbuckets is one example.  Each
> approach is appropriate for some applications, but not for every one.

For me, anyway, this is not about trying to engineer a universally
perfect solution into Python -- it's about providing some simple, basic,
easy-to-understand functionality that takes care of the common case.
For example, dictionaries are simple, their workings are easy enough
to understand, and they aren't written to efficiently support things
like inversion and composition because most of the time no one needs
to do these things.

The same holds true for sets.  All i would want is something i can
put things into, and take things out of, and ask about what's inside.

Barry Warsaw wrote:
> It would seem to me that distutils is a better way to go for
> kjbuckets.  The core already has basic sets (via dictionaries).  We're
> pretty much just quibbling about efficiency, API, and syntax, aren't we?

Efficiency: Hashtables have proven quite adequate for dicts, so
i think they're quite adequate for sets.

API and syntax: I believe the goal is obvious, because Python already
has very nice notation ("in", "not in") -- it just doesn't work quite
the way one would want.  It works semantically right on lists, but
they're a little slow.  It doesn't work on dicts, but we can make it so.

Here is where my "explanation metric" comes into play.  How much
additional explaining do you have to do in each case to answer the
question "what do i do when i need a set"?


1.  Use lists.

    Explain that "include()" means "append if not already present",
    and "exclude()" means "remove if present".  You are done.


2.  Use dicts.
    
    Explain that "for x in dict" iterates over the keys, and
    "if x in dict" looks for a key.  Explain what happens when
    you write "{1, 2, 3}", and the special non-printing value
    constant.  Explain how to add elements to a set and remove
    elements from a set.


3.  Create a new type.

    Explain that there exists another type "set" with methods
    "insert" and "remove".  Explain how to construct sets.
    Explain how "in" and "not in" work, where this type fits
    in with the other types, and when to choose this type
    over other types.


4.  Do nothing.

    Explain that dictionaries can be used as sets if you assign
    keys a dummy value, use "del" to remove keys, iterate over
    "dict.keys()", and use "dict.has_key()" to test membership.


This is what motivated my proposal for using lists: it requires
by far the least explanation.  This is no surprise because a lot
of things about lists have been explained already.

My preference in terms of elegance is about equal for 1, 2, 3,
with 4 distinctly behind; but my subjective ranking of "explanation
complexity" (as in "how to get there from here") is 1 < 4 < 3 < 2.


-- ?!ng


From tismer at tismer.com  Tue Mar 21 21:13:38 2000
From: tismer at tismer.com (Christian Tismer)
Date: Tue, 21 Mar 2000 21:13:38 +0100
Subject: [Python-Dev] Unicode Database Compression
Message-ID: <38D7D7F2.14A2FBB5@tismer.com>

Hi,

I have spent the last four days on compressing the
Unicode database.

With little decoding effort, I can bring the data down to 25kb.
This would still be very fast, since codes are randomly
accessible, although there are some simple shifts and masks.

With a bit more effort, this can be squeezed down to 15kb
by some more aggressive techniques like common prefix
elimination. Speed would be *slightly* worse, since a small
loop (average 8 cycles) is performed to obtain a character
from a packed nybble.

This is just all the data which is in Marc's unicodedatabase.c
file. I checked efficiency by creating a delimited file like
the original database text file with only these columns and
ran PkZip over it. The result was 40kb. This says that I found
a lot of correlations which automatic compressors cannot see.

Now, before generating the final C code, I'd like to ask some
questions:

What is more desirable: Low compression and blinding speed?
Or high compression and less speed, since we always want to
unpack a whole code page?

Then, what about the other database columns?
There are a couple of extra atrributes which I find coded
as switch statements elsewhere. Should I try to pack these
codes into my squeezy database, too?

And last: There are also two quite elaborated columns with
textual descriptions of the codes (the uppercase blah version
of character x). Do we want these at all? And if so, should
I try to compress them as well? Should these perhaps go
into a different source file as a dynamic module, since they
will not be used so often?

waiting for directives - ly y'rs - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From moshez at math.huji.ac.il  Wed Mar 22 06:44:00 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 22 Mar 2000 07:44:00 +0200 (IST)
Subject: [1.x] Re: [Python-Dev] Set options
In-Reply-To: <14551.49839.377385.99637@beluga.mojam.com>
Message-ID: <Pine.GSO.4.10.10003212114530.29516-100000@sundial>

On Tue, 21 Mar 2000, Skip Montanaro wrote:

>     Skip> If new syntax is in the offing as some have proposed,
> 
>     Moshe> FWIW, I'm against new syntax. The core-language has changed quite
>     Moshe> a lot between 1.5.2 and 1.6 --
> 
> I thought we were talking about Py3K

My argument was strictly a 1.x argument. I'm hoping to get sets it in 1.7
or 1.8.

> In general, I think we need to keep straight where people feel various
> proposals are going to fit. 

You're right. I'll start prefixing my posts accordingally.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mal at lemburg.com  Wed Mar 22 11:11:25 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 22 Mar 2000 11:11:25 +0100
Subject: [Python-Dev] Re: Unicode Database Compression
References: <38D7D7F2.14A2FBB5@tismer.com>
Message-ID: <38D89C4D.370C19D@lemburg.com>

Christian Tismer wrote:
> 
> Hi,
> 
> I have spent the last four days on compressing the
> Unicode database.

Cool :-)
 
> With little decoding effort, I can bring the data down to 25kb.
> This would still be very fast, since codes are randomly
> accessible, although there are some simple shifts and masks.
> 
> With a bit more effort, this can be squeezed down to 15kb
> by some more aggressive techniques like common prefix
> elimination. Speed would be *slightly* worse, since a small
> loop (average 8 cycles) is performed to obtain a character
> from a packed nybble.
> 
> This is just all the data which is in Marc's unicodedatabase.c
> file. I checked efficiency by creating a delimited file like
> the original database text file with only these columns and
> ran PkZip over it. The result was 40kb. This says that I found
> a lot of correlations which automatic compressors cannot see.

Not bad ;-)
 
> Now, before generating the final C code, I'd like to ask some
> questions:
> 
> What is more desirable: Low compression and blinding speed?
> Or high compression and less speed, since we always want to
> unpack a whole code page?

I'd say high speed and less compression. The reason is that
the Asian codecs will need fast access to the database. With
their large mapping tables size the few more kB don't hurt,
I guess.

> Then, what about the other database columns?
> There are a couple of extra atrributes which I find coded
> as switch statements elsewhere. Should I try to pack these
> codes into my squeezy database, too?

You basically only need to provide the APIs (and columns)
defined in the unicodedata Python API, e.g. the
character description column is not needed.
 
> And last: There are also two quite elaborated columns with
> textual descriptions of the codes (the uppercase blah version
> of character x). Do we want these at all? And if so, should
> I try to compress them as well? Should these perhaps go
> into a different source file as a dynamic module, since they
> will not be used so often?

I guess you are talking about the "Unicode 1.0 Name"
and the "10646 comment field" -- see above, there's no
need to include these descriptions in the database...
 
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Mar 22 12:04:32 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 22 Mar 2000 12:04:32 +0100
Subject: [Python-Dev] Unicode and Windows
References: <ECEPKNMJLHAPFFJHDOJBKEOKCGAA.mhammond@skippinet.com.au>
Message-ID: <38D8A8C0.66123F2C@lemburg.com>

Mark Hammond wrote:
> 
> >
> > Right. The idea with open() was to write a special version (using
> > #ifdefs) for use on Windows platforms which does all the needed
> > magic to convert Unicode to whatever the native format and locale
> > is...
> 
> That works for open() - but what about other extension modules?
> 
> This seems to imply that any Python extension on Windows that wants to pass
> a Unicode string to an external function can not use PyArg_ParseTuple() with
> anything other than "O", and perform the magic themselves.
> 
> This just seems a little back-to-front to me.  Platforms that have _no_
> native Unicode support have useful utilities for working with Unicode.
> Platforms that _do_ have native Unicode support can not make use of these
> utilities.  Is this by design, or simply a sad side-effect of the design?
> 
> So - it is trivial to use Unicode on platforms that dont support it, but
> quite difficult on platforms that do.

The problem is that Windows seems to use a completely different
internal Unicode format than most of the rest of the world.

As I've commented on in a different post, the only way to have
PyArg_ParseTuple() perform auto-conversion is by allowing it
to return objects which are garbage collected by the caller.
The problem with this is error handling, since PyArg_ParseTuple()
will have to keep track of all objects it created until the
call returns successfully. An alternative approach is sketched
below.

Note that *all* platforms will have to use this approach...
not only Windows or other platforms with Unicode support.

> > Using parser markers for this is obviously *not* the right way
> > to get to the core of the problem. Basically, you will have to
> > write a helper which takes a string, Unicode or some other
> > "t" compatible object as name object and then converts it to
> > the system's view of things.
> 
> Why "obviously"?  What on earth does the existing mechamism buy me on
> Windows, other than grief that I can not use it?

Sure, you can :-) Just fetch the object, coerce it to
Unicode and then encode it according to your platform needs
(PyUnicode_FromObject() takes care of the coercion part for you).
 
> > I think we had a private discussion about this a few months ago:
> > there was some way to convert Unicode to a platform independent
> > format which then got converted to MBCS -- don't remember the details
> > though.
> 
> There is a Win32 API function for this.  However, as you succinctly pointed
> out, not many people are going to be aware of its name, or how to use the
> multitude of flags offered by these conversion functions, or know how to
> deal with the memory management, etc.
> 
> > Can't you use the wchar_t interfaces for the task (see
> > the unicodeobject.h file for details) ? Perhaps you can
> > first transfer Unicode to wchar_t and then on to MBCS
> > using a win32 API ?!
> 
> Sure - I can.  But can everyone who writes interfaces to Unicode functions?
> You wrote the Python Unicode support but dont know its name - pity the poor
> Joe Average trying to write an extension.

Hey, Mark... I'm not a Windows geek. How can I know which APIs
are available and which of them to use ?

And that's my point: add conversion APIs and codecs for the different
OSes which make the extension writer life easier.
 
> It seems to me that, on Windows, the Python Unicode support as it stands is
> really internal.  I can not think of a single time that an extension writer
> on Windows would ever want to use the "t" markers - am I missing something?
> I dont believe that a single Unicode-aware function in the Windows
> extensions (of which there are _many_) could be changed to use the "t"
> markers.

"t" is intended to return a text representation of a buffer
interface aware type... this happens to be UTF-8 for Unicode
objects -- what other encoding would you have expected ?

> It still seems to me that the Unicode support works well on platforms with
> no Unicode support, and is fairly useless on platforms with the support.  I
> dont believe that any extension on Windows would want to use the "t"
> marker - so, as Fred suggested, how about providing something for us that
> can help us interface to the platform's Unicode?

That's exactly what I'm talking about all the time... 
there currently are PyUnicode_AsWideChar() and PyUnicode_FromWideChar()
to interface to the compiler's wchar_t type. I have no problem
adding more of these APIs for the various OSes -- but they
would have to be coded by someone with Unicode skills on each
of those platforms, e.g. PyUnicode_AsMBCS() and PyUnicode_FromMBCS()
on Windows.
 
> This is getting too hard for me - I will release my windows registry module
> without Unicode support, and hope that in the future someone cares enough to
> address it, and to add a large number of LOC that will be needed simply to
> get Unicode talking to Unicode...

I think you're getting this wrong: I'm not argueing against adding
better support for Windows.

The only way I can think of using parser markers in this context
would be by having PyArg_ParseTuple() *copy* data into a given
data buffer rather than only passing a reference to it. This
would enable PyArg_ParseTuple() to apply whatever conversion
is needed while still keeping the temporary objects internal.

Hmm, sketching a little:

"es#",&encoding,&buffer,&buffer_len
	-- could mean: coerce the object to Unicode, then
	   encode it using the given encoding and then 
	   copy at most buffer_len bytes of data into
	   buffer and update buffer_len to the number of bytes
	   copied

This costs some cycles for copying data, but gets rid off
the problems involved in cleaning up after errors. The
caller will have to ensure that the buffer is large enough
and that the encoding fits the application's needs. Error
handling will be poor since the caller can't take any
action other than to pass on the error generated by
PyArg_ParseTuple().

Thoughts ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Mar 22 14:40:23 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 22 Mar 2000 14:40:23 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000322113129.5E67C370CF2@snelboot.oratrix.nl>
Message-ID: <38D8CD47.E573A246@lemburg.com>

Jack Jansen wrote:
> 
> > "es#",&encoding,&buffer,&buffer_len
> >       -- could mean: coerce the object to Unicode, then
> >          encode it using the given encoding and then
> >          copy at most buffer_len bytes of data into
> >          buffer and update buffer_len to the number of bytes
> >          copied
> 
> This is a possible solution, but I think I would really prefer to also have
>  "eS", &encoding, &buffer_ptr
>  -- coerce the object to Unicode, then encode it using the given
>     encoding, malloc() a buffer to put the result in and return that.
> 
> I don't mind doing something like
> 
> {
>    char *filenamebuffer = NULL;
> 
>    if ( PyArg_ParseTuple(args, "eS", &macencoding, &filenamebuffer)
>        ...
>    open(filenamebuffer, ....);
>    PyMem_XDEL(filenamebuffer);
>    ...
> }
> 
> I think this would be much less error-prone than having fixed-length buffers
> all over the place.

PyArg_ParseTuple() should probably raise an error in case the
data doesn't fit into the buffer.

> And if this is indeed going to be used mainly in open()
> calls and such the cost of the extra malloc()/free() is going to be dwarfed by
> what the underlying OS call is going to use.

Good point. You'll still need the buffer_len output parameter
though -- otherwise you wouldn't be able tell the size of the
allocated buffer (the returned data may not be terminated).

How about this:

"es#", &encoding, &buffer, &buffer_len
	-- both buffer and buffer_len are in/out parameters
	-- if **buffer is non-NULL, copy the data into it
	   (at most buffer_len bytes) and update buffer_len
	   on output; truncation produces an error
	-- if **buffer is NULL, malloc() a buffer of size
	   buffer_len and return it through *buffer; if buffer_len
	   is -1, the allocated buffer should be large enough
	   to hold all data; again, truncation is an error
	-- apply coercion and encoding as described above

(could be that I've got the '*'s wrong, but you get the picture...:)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack at oratrix.nl  Wed Mar 22 14:46:50 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 22 Mar 2000 14:46:50 +0100
Subject: [Python-Dev] Unicode and Windows 
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
	     Wed, 22 Mar 2000 14:40:23 +0100 , <38D8CD47.E573A246@lemburg.com> 
Message-ID: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl>

> > [on the user-supplies-buffer interface]
> > I think this would be much less error-prone than having fixed-length buffers
> > all over the place.
> 
> PyArg_ParseTuple() should probably raise an error in case the
> data doesn't fit into the buffer.

Ah, that's right, that solves most of that problem.

> > [on the malloced interface]
> Good point. You'll still need the buffer_len output parameter
> though -- otherwise you wouldn't be able tell the size of the
> allocated buffer (the returned data may not be terminated).

Are you sure? I would expect the "eS" format to be used to obtain 8-bit data 
in some local encoding, and I would expect that all 8-bit encodings of unicode 
data would still allow for null-termination. Or are there 8-bit encodings out 
there where a zero byte is normal occurrence and where it can't be used as 
terminator?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal at lemburg.com  Wed Mar 22 17:31:26 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 22 Mar 2000 17:31:26 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl>
Message-ID: <38D8F55E.6E324281@lemburg.com>

Jack Jansen wrote:
> 
> > > [on the user-supplies-buffer interface]
> > > I think this would be much less error-prone than having fixed-length buffers
> > > all over the place.
> >
> > PyArg_ParseTuple() should probably raise an error in case the
> > data doesn't fit into the buffer.
> 
> Ah, that's right, that solves most of that problem.
> 
> > > [on the malloced interface]
> > Good point. You'll still need the buffer_len output parameter
> > though -- otherwise you wouldn't be able tell the size of the
> > allocated buffer (the returned data may not be terminated).
> 
> Are you sure? I would expect the "eS" format to be used to obtain 8-bit data
> in some local encoding, and I would expect that all 8-bit encodings of unicode
> data would still allow for null-termination. Or are there 8-bit encodings out
> there where a zero byte is normal occurrence and where it can't be used as
> terminator?

Not sure whether these exist or not, but they are certainly
a possibility to keep in mind.

Perhaps adding "es#" and "es" (with 0-byte check) would be
ideal ?!

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From pf at artcom-gmbh.de  Wed Mar 22 17:54:42 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 22 Mar 2000 17:54:42 +0100 (MET)
Subject: [Python-Dev] Nitpicking on UserList implementation
Message-ID: <m12XoP4-000CnDC@artcom0.artcom-gmbh.de>

Hi!

Please have a look at the following method cited from Lib/UserList.py:

    def __radd__(self, other):
        if isinstance(other, UserList):                    # <-- ? 
            return self.__class__(other.data + self.data)  # <-- ?
        elif isinstance(other, type(self.data)):
            return self.__class__(other + self.data)
        else:
            return self.__class__(list(other) + self.data)

The reference manual tells about the __r*__ methods: 

    """These functions are only called if the left operand does not 
       support the corresponding operation."""

So if the left operand is a UserList instance, it should always have
a __add__ method, which will be called instead of the right operands
__radd__.  So I think the condition 'isinstance(other, UserList)'
in __radd__ above will always evaluate to False and so the two lines
marked with '# <-- ?' seem to be superfluous.

But 'UserList' is so mature:  Please tell me what I've oveerlooked before
I make a fool of myself and submit a patch removing this two lines.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From gvwilson at nevex.com  Thu Mar 23 18:10:16 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Thu, 23 Mar 2000 12:10:16 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
Message-ID: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>

[The following passed the Ping test, so I'm posting it here]

If None becomes a keyword, I would like to ask whether it could be used to
signal that a method is a class method, as opposed to an instance method:

class Ping:

    def __init__(self, arg):
        ...as usual...

    def method(self, arg):
        ...no change...

    def classMethod(None, arg):
        ...equivalent of C++ 'static'...

p = Ping("thinks this is cool")    # as always
p.method("who am I to argue?")     # as always
Ping.classMethod("hey, cool!")     # no 'self'
p.classMethod("hey, cool!")        # also selfless


I'd also like to ask (separately) that assignment to None be defined as a
no-op, so that programmers can write:

    year, month, None, None, None, None, weekday, None, None = gmtime(time())

instead of having to create throw-away variables to fill in slots in
tuples that they don't care about.  I think both behaviors are readable;
the first provides genuinely new functionality, while I often found the
second handy when I was doing logic programming.

Greg


From jim at digicool.com  Thu Mar 23 18:18:29 2000
From: jim at digicool.com (Jim Fulton)
Date: Thu, 23 Mar 2000 12:18:29 -0500
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <38DA51E5.B39D3E7B@digicool.com>

gvwilson at nevex.com wrote:
> 
> [The following passed the Ping test, so I'm posting it here]
> 
> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:
> 
> class Ping:
> 
>     def __init__(self, arg):
>         ...as usual...
> 
>     def method(self, arg):
>         ...no change...
> 
>     def classMethod(None, arg):
>         ...equivalent of C++ 'static'...

(snip)

As a point of jargon, please lets call this thing a "static 
method" (or an instance function, or something) rather than
a "class method".  

The distinction between "class methods" and "static methods"
has been discussed at length in the types sig (over a year
ago). If this proposal goes forward and the name "class method"
is used, I'll have to argue strenuously, and I really don't want
to do that. :] So, if you can live with the term "static method", 
you could save us alot of trouble by just saying "static method".

Jim

--
Jim Fulton           mailto:jim at digicool.com
Technical Director   (888) 344-4332              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From gvwilson at nevex.com  Thu Mar 23 18:21:48 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Thu, 23 Mar 2000 12:21:48 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <38DA51E5.B39D3E7B@digicool.com>
Message-ID: <Pine.LNX.4.10.10003231221170.890-100000@akbar.nevex.com>

> As a point of jargon, please lets call this thing a "static method"
> (or an instance function, or something) rather than a "class method".

I'd call it a penguin if that was what it took to get something like this
implemented... :-)

greg


From jim at digicool.com  Thu Mar 23 18:28:25 2000
From: jim at digicool.com (Jim Fulton)
Date: Thu, 23 Mar 2000 12:28:25 -0500
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231221170.890-100000@akbar.nevex.com>
Message-ID: <38DA5439.F5FE8FE6@digicool.com>

gvwilson at nevex.com wrote:
> 
> > As a point of jargon, please lets call this thing a "static method"
> > (or an instance function, or something) rather than a "class method".
> 
> I'd call it a penguin if that was what it took to get something like this
> implemented... :-)

Thanks a great name. Let's go with penguin. :)

Jim

--
Jim Fulton           mailto:jim at digicool.com
Technical Director   (888) 344-4332              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From mhammond at skippinet.com.au  Thu Mar 23 18:29:53 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu, 23 Mar 2000 09:29:53 -0800
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <ECEPKNMJLHAPFFJHDOJBEEAKCHAA.mhammond@skippinet.com.au>

...
> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:
>
>     def classMethod(None, arg):
>         ...equivalent of C++ 'static'...
...

> I'd also like to ask (separately) that assignment to None be defined as a
> no-op, so that programmers can write:
>
>     year, month, None, None, None, None, weekday, None, None =
> gmtime(time())

In the vernacular of a certain Mr Stein...

+2 on both of these :-)

[Although I do believe "static method" is a better name than "penguin" :-]


Mark.


From ping at lfw.org  Thu Mar 23 18:47:47 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Thu, 23 Mar 2000 09:47:47 -0800 (PST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <Pine.LNX.4.10.10003230942180.1187-100000@localhost>

On Thu, 23 Mar 2000 gvwilson at nevex.com wrote:
> 
> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:
> 
> class Ping:

[...]

Ack!  I've been reduced to a class with just three methods.
Oh well, i never really considered it a such a bad thing
to be called "simple-minded".  :)

>     def classMethod(None, arg):
>         ...equivalent of C++ 'static'...

Yeah, i agree with Jim; you might as well call this a "static
method" as opposed to a "class method".

I like the way "None" is explicitly stated here, so there's
no confusion about what the method does.  (Without it, there's
the question of whether the first argument will get thrown in,
or what...)

Hmm... i guess this also means one should ask what

    def function(None, arg):
        ...

does outside a class definition.  I suppose that should simply
be illegal.

> I'd also like to ask (separately) that assignment to None be defined as a
> no-op, so that programmers can write:
> 
>     year, month, None, None, None, None, weekday, None, None = gmtime(time())
> 
> instead of having to create throw-away variables to fill in slots in
> tuples that they don't care about.

For what it's worth, i sometimes use "_" for this purpose
(shades of Prolog!) but i can't make much of an argument
for its readability...


-- ?!ng

        I never dreamt that i would get to be
        The creature that i always meant to be
        But i thought, in spite of dreams,
        You'd be sitting somewhere here with me.


From fdrake at acm.org  Thu Mar 23 19:11:39 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 23 Mar 2000 13:11:39 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <14554.24155.948286.451340@weyr.cnri.reston.va.us>

gvwilson at nevex.com writes:
 > p.classMethod("hey, cool!")        # also selfless

  This is the example that I haven't seen before (I'm not on the
types-sig, so it may have been presented there), and I think this is
what makes it interesting; a method in a module isn't quite sufficient 
here, since a subclass can override or extend the penguin this way.
  (Er, if we *do* go with penguin, does this mean it only works on
Linux?  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From pf at artcom-gmbh.de  Thu Mar 23 19:25:57 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Thu, 23 Mar 2000 19:25:57 +0100 (MET)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com> from "gvwilson@nevex.com" at "Mar 23, 2000 12:10:16 pm"
Message-ID: <m12YCIv-000CnDC@artcom0.artcom-gmbh.de>

Hi!

gvwilson at nevex.com:
> I'd also like to ask (separately) that assignment to None be defined as a
> no-op, so that programmers can write:
> 
>     year, month, None, None, None, None, weekday, None, None = gmtime(time())

You can already do this today with 1.5.2, if you use a 'del None' statement:

Python 1.5.2 (#1, Jul 23 1999, 06:38:16)  [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> from time import time, gmtime
>>> year, month, None, None, None, None, weekday, None, None = gmtime(time())
>>> print year, month, None, weekday
2000 3 0 3
>>> del None
>>> print year, month, None, weekday
2000 3 None 3
>>> 

if None will become a keyword in Py3K this pyidiom should better be written as 
    year, month, None, None, None, None, ... = ...	
    if sys.version[0] == '1': del None

or
    try:
        del None
    except SyntaxError:
        pass # Wow running Py3K here!

I wonder, how much existinng code the None --> keyword change would brake.

Regards, Peter


From paul at prescod.net  Thu Mar 23 19:47:55 2000
From: paul at prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 10:47:55 -0800
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <38DA66DB.635E8731@prescod.net>

gvwilson at nevex.com wrote:
> 
> [The following passed the Ping test, so I'm posting it here]
> 
> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:

+1

Idea is good, but I'm not really happy with any of the the proposed
terminology...Python doesn't really have static anything.

I would vote at the same time to make self a keyword and signal if the
first argument is not one of None or self. Even now, one of my most
common Python mistakes is in forgetting self. I expect it happens to
anyone who shifts between other languages and Python.

Why does None have an upper case "N"? Maybe the keyword version should
be lower-case...

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"I and my companions suffer from a disease of the heart that can only
be cured with gold", Hernan Cortes


From bwarsaw at cnri.reston.va.us  Thu Mar 23 19:57:00 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 23 Mar 2000 13:57:00 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <14554.26876.514559.320219@anthem.cnri.reston.va.us>

>>>>> "gvwilson" ==   <gvwilson at nevex.com> writes:

    gvwilson> If None becomes a keyword, I would like to ask whether
    gvwilson> it could be used to signal that a method is a class
    gvwilson> method, as opposed to an instance method:

It still seems mildly weird that None would be a special kind of
keyword, one that has a value and is used in ways that no other
keyword is used.  Greg gives an example, and here's a few more:

def baddaboom(x, y, z=None):
    ...

if z is None:
    ...

try substituting `else' for `None' in these examples. ;)

Putting that issue aside, Greg's suggestion for static method
definitions is interesting.

class Ping:
    # would this be a SyntaxError?
    def __init__(None, arg):
	...

    def staticMethod(None, arg):
	...

p = Ping()
Ping.staticMethod(p, 7)  # TypeError
Ping.staticMethod(7)     # This is fine
p.staticMethod(7)        # So's this
Ping.staticMethod(p)     # and this !!

-Barry


From paul at prescod.net  Thu Mar 23 19:52:25 2000
From: paul at prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 10:52:25 -0800
Subject: [Python-Dev] dir()
Message-ID: <38DA67E9.AA593B7A@prescod.net>

Can someone explain why dir(foo) does not return all of foo's methods? I
know it's documented that way, I just don't know why it *is* that way.

I'm also not clear why instances don't have auto-populated __methods__
and __members__ members?

If there isn't a good reason (there probably is) then I would advocate
that these functions and members should be more comprehensive.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"I and my companions suffer from a disease of the heart that can only
be cured with gold", Hernan Cortes


From bwarsaw at cnri.reston.va.us  Thu Mar 23 20:00:57 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 23 Mar 2000 14:00:57 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
	<m12YCIv-000CnDC@artcom0.artcom-gmbh.de>
Message-ID: <14554.27113.546575.170565@anthem.cnri.reston.va.us>

>>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:

    |     try:
    |         del None
    |     except SyntaxError:
    |         pass # Wow running Py3K here!

I know how to break your Py3K code: stick None=None some where higher
up :)

    PF> I wonder, how much existinng code the None --> keyword change
    PF> would brake.

Me too.
-Barry


From gvwilson at nevex.com  Thu Mar 23 20:01:06 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Thu, 23 Mar 2000 14:01:06 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.26876.514559.320219@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003231359020.4065-100000@akbar.nevex.com>

> class Ping:
>     # would this be a SyntaxError?
>     def __init__(None, arg):
> 	...

Absolutely a syntax error; ditto any of the other special names (e.g.
__add__).

Greg


From akuchlin at mems-exchange.org  Thu Mar 23 20:06:33 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 23 Mar 2000 14:06:33 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.27113.546575.170565@anthem.cnri.reston.va.us>
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
	<m12YCIv-000CnDC@artcom0.artcom-gmbh.de>
	<14554.27113.546575.170565@anthem.cnri.reston.va.us>
Message-ID: <14554.27449.69043.924322@amarok.cnri.reston.va.us>

Barry A. Warsaw writes:
>>>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:
>    PF> I wonder, how much existinng code the None --> keyword change
>    PF> would brake.
>Me too.

I can't conceive of anyone using None as a function name or a variable
name, except through a bug or thinking that 'None, useful, None =
1,2,3' works.  Even though None isn't a fixed constant, it might as
well be.  How much C code have you see lately that starts with int
function(void *NULL) ?

Being able to do "None = 2" also smacks a bit of those legendary
Fortran compilers that let you accidentally change 2 into 4.  +1 on
this change for Py3K, and I doubt it would cause breakage even if
introduced into 1.x.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
    Principally I played pedants, idiots, old fathers, and drunkards.
    As you see, I had a narrow escape from becoming a professor.
    -- Robertson Davies, "Shakespeare over the Port"


From paul at prescod.net  Thu Mar 23 20:02:33 2000
From: paul at prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 11:02:33 -0800
Subject: [Python-Dev] Unicode character names
Message-ID: <38DA6A49.A60E405B@prescod.net>

Here's a feature I like from Perl's Unicode support:

"""
Support for interpolating named characters

The new \N escape interpolates named characters within strings. For
example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
unicode smiley face at the end. 
"""

I get really tired of looking up the Unicode character for "ndash" or
"right dagger". Does our Unicode database have enough information to
make something like this possible?

Obviously using the official (English) name is only really helpful for
people who speak English, so we should not remove the numeric option.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"I and my companions suffer from a disease of the heart that can only
be cured with gold", Hernan Cortes


From tismer at tismer.com  Thu Mar 23 20:27:53 2000
From: tismer at tismer.com (Christian Tismer)
Date: Thu, 23 Mar 2000 20:27:53 +0100
Subject: [Python-Dev] None as a keyword / class methods
References: <ECEPKNMJLHAPFFJHDOJBEEAKCHAA.mhammond@skippinet.com.au>
Message-ID: <38DA7039.B7CDC6FF@tismer.com>


Mark Hammond wrote:
> 
> ...
> > If None becomes a keyword, I would like to ask whether it could be used to
> > signal that a method is a class method, as opposed to an instance method:
> >
> >     def classMethod(None, arg):
> >         ...equivalent of C++ 'static'...
> ...
> 
> > I'd also like to ask (separately) that assignment to None be defined as a
> > no-op, so that programmers can write:
> >
> >     year, month, None, None, None, None, weekday, None, None =
> > gmtime(time())
> 
> In the vernacular of a certain Mr Stein...
> 
> +2 on both of these :-)

me 2, ?h 1.5...

The assignment no-op seems to be ok. Having None as a place
holder for static methods creates the problem that we loose
compatibility with ordinary functions.
What I would propose instead is:

make the parameter name "self" mandatory for methods, and turn
everything else into a static method. This does not change
function semantics, but just the way the method binding works.

> [Although I do believe "static method" is a better name than "penguin" :-]

pynguin

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From gvwilson at nevex.com  Thu Mar 23 20:33:47 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Thu, 23 Mar 2000 14:33:47 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <38DA7039.B7CDC6FF@tismer.com>
Message-ID: <Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>

Hi, Christian; thanks for your mail.

> What I would propose instead is:
> make the parameter name "self" mandatory for methods, and turn
> everything else into a static method.

In my experience, significant omissions (i.e. something being important
because it is *not* there) often give beginners trouble.  For example,
in C++, you can't tell whether:

int foo::bar(int bah)
{
  return 0;
}

belongs to instances, or to the class as a whole, without referring back
to the header file [1].  To quote the immortal Jeremy Hylton:

    Pythonic design rules #2:
         Explicit is better than implicit.

Also, people often ask why 'self' is required as a method argument in
Python, when it is not in C++ or Java; this proposal would (retroactively)
answer that question...

Greg

[1] I know this isn't a problem in Java or Python; I'm just using it as an
illustration.


From skip at mojam.com  Thu Mar 23 21:09:00 2000
From: skip at mojam.com (Skip Montanaro)
Date: Thu, 23 Mar 2000 14:09:00 -0600 (CST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.27449.69043.924322@amarok.cnri.reston.va.us>
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
	<m12YCIv-000CnDC@artcom0.artcom-gmbh.de>
	<14554.27113.546575.170565@anthem.cnri.reston.va.us>
	<14554.27449.69043.924322@amarok.cnri.reston.va.us>
Message-ID: <14554.31196.387213.472302@beluga.mojam.com>

    AMK> +1 on this change for Py3K, and I doubt it would cause breakage
    AMK> even if introduced into 1.x.

Or if it did, it's probably code that's marginally broken already...

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From tismer at tismer.com  Thu Mar 23 21:21:09 2000
From: tismer at tismer.com (Christian Tismer)
Date: Thu, 23 Mar 2000 21:21:09 +0100
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>
Message-ID: <38DA7CB5.87D62E14@tismer.com>

Yo,

gvwilson at nevex.com wrote:
> 
> Hi, Christian; thanks for your mail.
> 
> > What I would propose instead is:
> > make the parameter name "self" mandatory for methods, and turn
> > everything else into a static method.
> 
> In my experience, significant omissions (i.e. something being important
> because it is *not* there) often give beginners trouble.  For example,
> in C++, you can't tell whether:
> 
> int foo::bar(int bah)
> {
>   return 0;
> }
> 
> belongs to instances, or to the class as a whole, without referring back
> to the header file [1].  To quote the immortal Jeremy Hylton:
> 
>     Pythonic design rules #2:
>          Explicit is better than implicit.

Sure. I am explicitly *not* using self if I want no self. :-)

> Also, people often ask why 'self' is required as a method argument in
> Python, when it is not in C++ or Java; this proposal would (retroactively)
> answer that question...

You prefer to use the explicit keyword None? How would you then deal
with

def outside(None, blah):
    pass # stuff

I believe one answer about the explicit "self" is that it should
be simple and compatible with ordinary functions. Guido had just
to add the semantics that in methods the first parameter
automatically binds to the instance.

The None gives me a bit of trouble, but not much.
What I would like to spell is

ordinary functions                    (as it is now)
functions which are instance methods  (with the immortal self)
functions which are static methods    ???
functions which are class methods     !!!

Static methods can work either with the "1st param==None" rule
or with the "1st paramname!=self" rule or whatever.
But how would you do class methods, which IMHO should have
their class passed in as first parameter?
Do you see a clean syntax for this?

I thought of some weirdness like

def meth(self, ...
def static(self=None, ...           # eek
def classm(self=class, ...          # ahem

but this breaks the rule of default argument order.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From akuchlin at mems-exchange.org  Thu Mar 23 21:27:41 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 23 Mar 2000 15:27:41 -0500 (EST)
Subject: [Python-Dev] Unicode character names
In-Reply-To: <38DA6A49.A60E405B@prescod.net>
References: <38DA6A49.A60E405B@prescod.net>
Message-ID: <14554.32317.730574.967165@amarok.cnri.reston.va.us>

Paul Prescod writes:
>The new \N escape interpolates named characters within strings. For
>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
>unicode smiley face at the end. 

Cute idea, and it certainly means you can avoid looking up Unicode
numbers.  (You can look up names instead. :) )  Note that this means the
Unicode database is no longer optional if this is done; it has to be
around at code-parsing time.  Python could import it automatically, as
exceptions.py is imported.  Christian's work on compressing
unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
dragging around the Unicode database in the binary, or is it read out
of some external file or data structure?)

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
About ten days later, it being the time of year when the National collected
down and outs to walk on and understudy I arrived at the head office of the
National Theatre in Aquinas Street in Waterloo.
    -- Tom Baker, in his autobiography


From bwarsaw at cnri.reston.va.us  Thu Mar 23 21:39:43 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 23 Mar 2000 15:39:43 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
References: <38DA7039.B7CDC6FF@tismer.com>
	<Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>
Message-ID: <14554.33039.4390.591036@anthem.cnri.reston.va.us>

>>>>> "gvwilson" ==   <gvwilson at nevex.com> writes:

    gvwilson> belongs to instances, or to the class as a whole,
    gvwilson> without referring back to the header file [1].  To quote
    gvwilson> the immortal Jeremy Hylton:

Not to take anything away from Jeremy, who has contributed some
wonderfully Pythonic quotes of his own, but this one is taken from Tim
Peters' Zen of Python

    http://www.python.org/doc/Humor.html#zen

timbot-is-the-only-one-who's-gonna-outlive-his-current-chip-set-
around-here-ly y'rs,

-Barry


From jeremy at cnri.reston.va.us  Thu Mar 23 21:55:25 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Thu, 23 Mar 2000 15:55:25 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>
References: <38DA7039.B7CDC6FF@tismer.com>
	<Pine.LNX.4.10.10003231426450.4218-100000@akbar.nevex.com>
Message-ID: <14554.33590.844200.145871@walden>

>>>>> "GVW" == gvwilson  <gvwilson at nevex.com> writes:

  GVW> To quote the immortal Jeremy Hylton:

  GVW>     Pythonic design rules #2: 
  GVW>             Explicit is better than implicit.

I wish I could take credit for that :-).  Tim Peters posted a list of
20 Pythonic theses to comp.lang.python under the title "The Python
Way."  I'll collect them all here in hopes of future readers mistaking
me for Tim again <wink>.

     Beautiful is better than ugly.
     Explicit is better than implicit.
     Simple is better than complex.
     Complex is better than complicated.
     Flat is better than nested.
     Sparse is better than dense.
     Readability counts.
     Special cases aren't special enough to break the rules.
     Although practicality beats purity.
     Errors should never pass silently.
     Unless explicitly silenced.
     In the face of ambiguity, refuse the temptation to guess.
     There should be one-- and preferably only one --obvious way to do it.
     Although that way may not be obvious at first unless you're Dutch.     
     Now is better than never.
     Although never is often better than *right* now.
     If the implementation is hard to explain, it's a bad idea.
     If the implementation is easy to explain, it may be a good idea.
     Namespaces are one honking great idea -- let's do more of those! 
  
See
http://x27.deja.com/getdoc.xp?AN=485548918&CONTEXT=953844380.1254555688&hitnum=9
for the full post.

to-be-immortal-i'd-need-to-be-a-bot-ly y'rs
Jeremy


From jeremy at alum.mit.edu  Thu Mar 23 22:01:01 2000
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Thu, 23 Mar 2000 16:01:01 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <14554.34037.232728.670271@walden>

>>>>> "GVW" == gvwilson  <gvwilson at nevex.com> writes:

  GVW> I'd also like to ask (separately) that assignment to None be
  GVW> defined as a no-op, so that programmers can write:

  GVW>     year, month, None, None, None, None, weekday, None, None =
  GVW> gmtime(time())

  GVW> instead of having to create throw-away variables to fill in
  GVW> slots in tuples that they don't care about.  I think both
  GVW> behaviors are readable; the first provides genuinely new
  GVW> functionality, while I often found the second handy when I was
  GVW> doing logic programming.

-1 on this proposal

Pythonic design rule #8:
    Special cases aren't special enough to break the rules.

I think it's confusing to have assignment mean pop the top of the
stack for the special case that the name is None.  If Py3K makes None
a keyword, then it would also be the only keyword that can be used in
an assignment.  Finally, we'd need to explain to the rare newbie 
who used None as variable name why they assigned 12 to None but that
it's value was its name when it was later referenced.  (Think 
'print None'.)

When I need to ignore some of the return values, I use the name nil.

year, month, nil, nil, nil, nil, weekday, nil, nil = gmtime(time())

I think that's just as clear, only a whisker less efficient, and
requires no special cases.  Heck, it's even less typing <0.5 wink>.

Jeremy


From gvwilson at nevex.com  Thu Mar 23 21:59:41 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Thu, 23 Mar 2000 15:59:41 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.33590.844200.145871@walden>
Message-ID: <Pine.LNX.4.10.10003231558330.4218-100000@akbar.nevex.com>

>   GVW> To quote the immortal Jeremy Hylton:
>   GVW>     Pythonic design rules #2: 
>   GVW>             Explicit is better than implicit.
> 
> I wish I could take credit for that :-).  Tim Peters posted a list of
> 20 Pythonic theses to comp.lang.python under the title "The Python
> Way."

Traceback (innermost last):
  File "<stdin>", line 1, in ?
AttributionError: insight incorrectly ascribed


From paul at prescod.net  Thu Mar 23 22:26:42 2000
From: paul at prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 13:26:42 -0800
Subject: [Python-Dev] None as a keyword / class methods
References: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com> <14554.34037.232728.670271@walden>
Message-ID: <38DA8C12.DFFD63D5@prescod.net>

Jeremy Hylton wrote:
> 
> ...
> year, month, nil, nil, nil, nil, weekday, nil, nil = gmtime(time())

So you're proposing nil as a new keyword?

I like it. +2

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"No, I'm not QUITE that stupid", Paul Prescod


From pf at artcom-gmbh.de  Thu Mar 23 22:46:49 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Thu, 23 Mar 2000 22:46:49 +0100 (MET)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.27113.546575.170565@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at "Mar 23, 2000  2: 0:57 pm"
Message-ID: <m12YFRJ-000CnDC@artcom0.artcom-gmbh.de>

Hi Barry!

> >>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:
> 
>     |     try:
>     |         del None
>     |     except SyntaxError:
>     |         pass # Wow running Py3K here!
 
Barry A. Warsaw:
> I know how to break your Py3K code: stick None=None some where higher
> up :)

Hmm.... I must admit, that I don't understand your argument.

In Python <= 1.5.2 'del None' works fine, iff it follows any assignment
to None in the same scope regardless, whether there has been a None=None
in the surrounding scope or in the same scope before this.

Since something like 'del for' or 'del import' raises a SyntaxError 
exception in Py152, I expect 'del None' to raise the same exception in
Py3K, after None has become a keyword.  Right?

Regards, Peter


From andy at reportlab.com  Thu Mar 23 22:54:23 2000
From: andy at reportlab.com (Andy Robinson)
Date: Thu, 23 Mar 2000 21:54:23 GMT
Subject: [Python-Dev] Unicode Character Names
In-Reply-To: <20000323202533.ABDB31CEF8@dinsdale.python.org>
References: <20000323202533.ABDB31CEF8@dinsdale.python.org>
Message-ID: <38da90b4.756297@post.demon.co.uk>

>Message: 20
>From: "Andrew M. Kuchling" <akuchlin at mems-exchange.org>
>Date: Thu, 23 Mar 2000 15:27:41 -0500 (EST)
>To: "python-dev at python.org" <python-dev at python.org>
>Subject: Re: [Python-Dev] Unicode character names
>
>Paul Prescod writes:
>>The new \N escape interpolates named characters within strings. For
>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
>>unicode smiley face at the end. 
>
>Cute idea, and it certainly means you can avoid looking up Unicode
>numbers.  (You can look up names instead. :) )  Note that this means the
>Unicode database is no longer optional if this is done; it has to be
>around at code-parsing time.  Python could import it automatically, as
>exceptions.py is imported.  Christian's work on compressing
>unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
>dragging around the Unicode database in the binary, or is it read out
>of some external file or data structure?)

I agree - the names are really useful.  If you are doing conversion
work, often you want to know what a character is, but don't have a
complete Unicode font handy.  Being able to get the description for a
Unicode character is useful, as well as being able to use the
description as a constructor for it.

Also, there are some language specific things that might make it
useful to have the full character descriptions in Christian's
database.  For example, we'll have an (optional, not in the standard
library) Japanese module with functions like 
isHalfWidthKatakana(), isFullWidthKatakana() to help normalize things.
Parsing the database and looking for strings in the descriptions is
one way to build this - not the only one, but it might be useful.

So I'd vote to put names in at first, and give us a few weeks to see
how useful they are before a final decision.

- Andy Robinson


From paul at prescod.net  Thu Mar 23 23:09:42 2000
From: paul at prescod.net (Paul Prescod)
Date: Thu, 23 Mar 2000 14:09:42 -0800
Subject: [Python-Dev] Unicode character names
References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us>
Message-ID: <38DA9626.8B62DB77@prescod.net>

"Andrew M. Kuchling" wrote:
> 
> Paul Prescod writes:
> >The new \N escape interpolates named characters within strings. For
> >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> >unicode smiley face at the end.
> 
> Cute idea, and it certainly means you can avoid looking up Unicode
> numbers.  (You can look up names instead. :) )  

More important, though, the code is "self documenting". You never have
to go from the number back to the name.

> Note that this means the
> Unicode database is no longer optional if this is done; it has to be
> around at code-parsing time.  

I don't like the idea enough to exclude support for small machines or
anything like that. We should way the costs of requiring the Unicode
database at compile time.

> (Is Perl5.6 actually
> dragging around the Unicode database in the binary, or is it read out
> of some external file or data structure?)

I have no idea.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"I and my companions suffer from a disease of the heart that can only
be cured with gold", Hernan Cortes


From pf at artcom-gmbh.de  Thu Mar 23 23:12:25 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Thu, 23 Mar 2000 23:12:25 +0100 (MET)
Subject: [Python-Dev] Py3K: True and False builtin or keyword?
Message-ID: <m12YFq5-000CnDC@artcom0.artcom-gmbh.de>

Regarding the discussion about None becoming a keyword
in Py3K:  Recently the truth values True and False have been
mentioned.  Should they become builtin values --like None is now--
or should they become keywords?

Nevertheless: for the time being I came up with the following
weird idea:  If you put this in front of the main module of a Python app:

#!/usr/bin/env python
if __name__ == "__main__":
    import sys
    if sys.version[0] <= '1':
        __builtins__.True  = 1
        __builtins__.False = 0
    del sys
# --- continue with your app from here: ---
import foo, bar, ...
....

Now you can start to use False and True in any immported module 
as if they were already builtins.  Of course this is no surprise here
and Python is really fun, Peter.


From mal at lemburg.com  Thu Mar 23 22:07:35 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 23 Mar 2000 22:07:35 +0100
Subject: [Python-Dev] Unicode character names
References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us>
Message-ID: <38DA8797.F16301E4@lemburg.com>

"Andrew M. Kuchling" wrote:
> 
> Paul Prescod writes:
> >The new \N escape interpolates named characters within strings. For
> >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> >unicode smiley face at the end.
> 
> Cute idea, and it certainly means you can avoid looking up Unicode
> numbers.  (You can look up names instead. :) )  Note that this means the
> Unicode database is no longer optional if this is done; it has to be
> around at code-parsing time.  Python could import it automatically, as
> exceptions.py is imported.  Christian's work on compressing
> unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
> dragging around the Unicode database in the binary, or is it read out
> of some external file or data structure?)

Sorry to disappoint you guys, but the Unicode name and comments
are *not* included in the unicodedatabase.c file Christian
is currently working on. The reason is simple: it would add
huge amounts of string data to the file. So this is a no-no
for the core distribution...

Still, the above is easily possible by inventing a new
encoding, say unicode-with-smileys, which then reads in
a file containing the Unicode names and applies the necessary
magic to decode/encode data as Paul described above.

Would probably make a cool fun-project for someone who wants
to dive into writing codecs.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From bwarsaw at cnri.reston.va.us  Fri Mar 24 00:02:06 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Thu, 23 Mar 2000 18:02:06 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
References: <14554.27113.546575.170565@anthem.cnri.reston.va.us>
	<m12YFRJ-000CnDC@artcom0.artcom-gmbh.de>
Message-ID: <14554.41582.688247.569547@anthem.cnri.reston.va.us>

Hi Peter!

>>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:

    PF> Since something like 'del for' or 'del import' raises a
    PF> SyntaxError exception in Py152, I expect 'del None' to raise
    PF> the same exception in Py3K, after None has become a keyword.
    PF> Right?

I misread your example the first time through, but it still doesn't
quite parse on my second read.

-------------------- snip snip --------------------
pyvers = '2k'
try:
    del import
except SyntaxError:
    pyvers = '3k'
-------------------- snip snip --------------------
% python /tmp/foo.py
  File "/tmp/foo.py", line 3
    del import
             ^
SyntaxError: invalid syntax
-------------------- snip snip --------------------

See, you can't catch that SyntaxError because it doesn't happen at
run-time.  Maybe you meant to wrap the try suite in an exec?  Here's a
code sample that ought to work with 1.5.2 and the mythical
Py3K-with-a-None-keyword.

-------------------- snip snip --------------------
pyvers = '2k'
try:
    exec "del None"
except SyntaxError:
    pyvers = '3k'
except NameError:
    pass

print pyvers
-------------------- snip snip --------------------

Cheers,
-Barry


From klm at digicool.com  Fri Mar 24 00:05:08 2000
From: klm at digicool.com (Ken Manheimer)
Date: Thu, 23 Mar 2000 18:05:08 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <m12YFRJ-000CnDC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.LNX.4.21.0003231759571.3101-100000@korak.digicool.com>

On Thu, 23 Mar 2000 pf at artcom-gmbh.de wrote:

> Hi Barry!
> 
> > >>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:
> > 
> >     |     try:
> >     |         del None
> >     |     except SyntaxError:
> >     |         pass # Wow running Py3K here!
>  
> Barry A. Warsaw:
> > I know how to break your Py3K code: stick None=None some where higher
> > up :)

Huh.  Does anyone really think we're going to catch SyntaxError at
runtime, ever?  Seems like the code fragment above wouldn't work in the
first place.

But i suppose, with most of a millennium to emerge, py3k could have more
fundamental changes than i could even imagine...-)

Ken
klm at digicool.com


From pf at artcom-gmbh.de  Thu Mar 23 23:53:34 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Thu, 23 Mar 2000 23:53:34 +0100 (MET)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.27449.69043.924322@amarok.cnri.reston.va.us> from "Andrew M. Kuchling" at "Mar 23, 2000  2: 6:33 pm"
Message-ID: <m12YGTu-000CnDC@artcom0.artcom-gmbh.de>

Hi!

> Barry A. Warsaw writes:
> >>>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:
> >    PF> I wonder, how much existinng code the None --> keyword change
> >    PF> would brake.
> >Me too.
 
Andrew M. Kuchling:
> I can't conceive of anyone using None as a function name or a variable
> name, except through a bug or thinking that 'None, useful, None =
> 1,2,3' works.  Even though None isn't a fixed constant, it might as
> well be.  How much C code have you see lately that starts with int
> function(void *NULL) ?

I agree.  

urban legend:  Once upon a time someone found the following neat snippet 
of C source hidden in some header file of a very very huge software, 
after he has spend some nights trying to figure out, why some simple edits 
he made in order to make the code more readable broke the system:
	#ifdef TRUE
	/* eat this: you arrogant Quiche Eaters */
	#undef TRUE
	#undef FALSE
	#define TRUE (0)
	#define FALSE (1)
	#endif
Obviously the poor guy would have found this particular small piece of evil 
code much earlier, if he had simply 'grep'ed for comments... there were not 
so many in this system. ;-)

> Being able to do "None = 2" also smacks a bit of those legendary
> Fortran compilers that let you accidentally change 2 into 4.  +1 on
> this change for Py3K, and I doubt it would cause breakage even if
> introduced into 1.x.

We'll see: those "Real Programmers" never die.  Fortunately they
prefer Perl over Python. <0.5 grin>

Regards, Peter


From klm at digicool.com  Fri Mar 24 00:15:42 2000
From: klm at digicool.com (Ken Manheimer)
Date: Thu, 23 Mar 2000 18:15:42 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <14554.41582.688247.569547@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003231809240.3101-100000@korak.digicool.com>

On Thu, 23 Mar 2000 bwarsaw at cnri.reston.va.us wrote:

> See, you can't catch that SyntaxError because it doesn't happen at
> run-time.  Maybe you meant to wrap the try suite in an exec?  Here's a

Huh.  Guess i should have read barry's re-response before i posted mine:

Desperately desiring to redeem myself, and contribute something to the
discussion, i'll settle the class/static method naming quandry with the
obvious alternative:

>  > p.classMethod("hey, cool!")        # also selfless

These should be called buddha methods - no self, samadhi, one with
everything, etc.

There, now i feel better.

:-)

Ken
klm at digicool.com

 A Zen monk walks up to a hotdog vendor and says "make me one with
 everything."

 Ha.  But that's not all.

 He gets the hot dog and pays with a ten.  After several moments
 waiting, he says to the vendor, "i was expecting change", and the
 vendor say, "you of all people should know, change comes from inside."

 That's all.


From bwarsaw at cnri.reston.va.us  Fri Mar 24 00:19:28 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 23 Mar 2000 18:19:28 -0500 (EST)
Subject: [Python-Dev] Py3K: True and False builtin or keyword?
References: <m12YFq5-000CnDC@artcom0.artcom-gmbh.de>
Message-ID: <14554.42624.213027.854942@anthem.cnri.reston.va.us>

>>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:

    PF> Now you can start to use False and True in any immported
    PF> module as if they were already builtins.  Of course this is no
    PF> surprise here and Python is really fun, Peter.

You /can/ do this, but that doesn't mean you /should/ :) Mucking with
builtins is fun the way huffing dry erase markers is fun.  Things are
very pretty at first, but eventually the brain cell lossage will more
than outweigh that cheap thrill.

I've seen a few legitimate uses for hacking builtins.  In Zope, I
believe Jim hacks get_transaction() or somesuch into builtins because
that way it's easy to get at without passing it through the call tree.
And in Zope it makes sense since this is a fancy database application
and your current transaction is a central concept.

I've occasionally wrapped an existing builtin because I needed to
extend it's functionality while keeping it's semantics and API
unchanged.  An example of this was my pre-Python-1.5.2 open_ex() in
Mailman's CGI driver script.  Before builtin open() would print the
failing file name, my open_ex() -- shown below -- would hack that into
the exception object.

But one of the things about Python that I /really/ like is that YOU
KNOW WHERE THINGS COME FROM.  If I suddenly start seeing True and
False in your code, I'm going to look for function locals and args,
then module globals, then from ... import *'s.  If I don't see it in
any of those, I'm going to put down my dry erase markers, look again,
and then utter a loud "huh?" :)

-Barry

realopen = open
def open_ex(filename, mode='r', bufsize=-1, realopen=realopen):
    from Mailman.Utils import reraise
    try:
        return realopen(filename, mode, bufsize)
    except IOError, e:
        strerror = e.strerror + ': ' + filename
        e.strerror = strerror
        e.filename = filename
        e.args = (e.args[0], strerror)
        reraise(e)

import __builtin__
__builtin__.__dict__['open'] = open_ex


From pf at artcom-gmbh.de  Fri Mar 24 00:23:57 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Fri, 24 Mar 2000 00:23:57 +0100 (MET)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.21.0003231759571.3101-100000@korak.digicool.com> from Ken Manheimer at "Mar 23, 2000  6: 5: 8 pm"
Message-ID: <m12YGxJ-000CnDC@artcom0.artcom-gmbh.de>

Hi!

> > >     |     try:
> > >     |         del None
> > >     |     except SyntaxError:
> > >     |         pass # Wow running Py3K here!
> >  
> > Barry A. Warsaw:
> > > I know how to break your Py3K code: stick None=None some where higher
> > > up :)
> 
Ken Manheimer:
> Huh.  Does anyone really think we're going to catch SyntaxError at
> runtime, ever?  Seems like the code fragment above wouldn't work in the
> first place.

Ouuppps... 

Unfortunately I had no chance to test this with Py3K before making a
fool of myself by posting this silly example.  Now I understand what
Barry meant.  So if None really becomes a keyword in Py3K we can be
sure to catch all those imaginary 'del None' statements very quickly.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From billtut at microsoft.com  Fri Mar 24 03:46:06 2000
From: billtut at microsoft.com (Bill Tutt)
Date: Thu, 23 Mar 2000 18:46:06 -0800
Subject: [Python-Dev] Re: Unicode character names
Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>

MAL wrote:

>Andrew M. Kuchling" wrote:
>> 
>> Paul Prescod writes:
>>>The new \N escape interpolates named characters within strings. For
>>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
>>>unicode smiley face at the end.
>> 
>> Cute idea, and it certainly means you can avoid looking up Unicode
>> numbers.  (You can look up names instead. :) )  Note that this means the
>> Unicode database is no longer optional if this is done; it has to be
>> around at code-parsing time.  Python could import it automatically, as
>> exceptions.py is imported.  Christian's work on compressing
>> unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
>> dragging around the Unicode database in the binary, or is it read out
>> of some external file or data structure?)
>
> Sorry to disappoint you guys, but the Unicode name and comments
> are *not* included in the unicodedatabase.c file Christian
> is currently working on. The reason is simple: it would add
> huge amounts of string data to the file. So this is a no-no
> for the core distribution...
>

Ok, now you're just being silly. Its possible to put the character names in
a separate structure so that they don't automatically get paged in with the
normal unicode character property data. If you never use it, it won't get
paged in, its that simple....

Looking up the Unicode code value from the Unicode character name smells
like a good time to use gperf to generate a perfect hash function for the
character names. Esp. for the Unicode 3.0 character namespace. Then you can
just store the hashkey -> Unicode character mapping, and hardly ever need to
page in the actual full character name string itself.

I haven't looked at what the comment field contains, so I have no idea how
useful that info is.

*waits while gperf crunches through the ~10,550 Unicode characters where
this would be useful*

Bill


From akuchlin at mems-exchange.org  Fri Mar 24 03:51:25 2000
From: akuchlin at mems-exchange.org (Andrew Kuchling)
Date: Thu, 23 Mar 2000 21:51:25 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
Message-ID: <200003240251.VAA19921@newcnri.cnri.reston.va.us>

I've written up a list of things that need to get done before 1.6 is
finished.  This is my vision of what needs to be done, and doesn't
have an official stamp of approval from GvR or anyone else.  So it's
very probably wrong.

http://starship.python.net/crew/amk/python/1.6-jobs.html

Here's the list formatted as text.  The major outstanding things at
the moment seem to be sre and Distutils; once they go in, you could
probably release an alpha, because the other items are relatively
minor.

Still to do

     * XXX Revamped import hooks (or is this a post-1.6 thing?)
     * Update the documentation to match 1.6 changes.
     * Document more undocumented modules
     * Unicode: Add Unicode support for open() on Windows
     * Unicode: Compress the size of unicodedatabase
     * Unicode: Write \N{SMILEY} codec for Unicode
     * Unicode: the various XXX items in Misc/unicode.txt
     * Add module: Distutils
     * Add module: Jim Ahlstrom's zipfile.py
     * Add module: PyExpat interface
     * Add module: mmapfile
     * Add module: sre
     * Drop cursesmodule and package it separately. (Any other obsolete
       modules that should go?)
     * Delete obsolete subdirectories in Demo/ directory
     * Refurbish Demo subdirectories to be properly documented, match
       modern coding style, etc.
     * Support Unicode strings in PyExpat interface
     * Fix ./ld_so_aix installation problem on AIX
     * Make test.regrtest.py more usable outside of the Python test suite
     * Conservative garbage collection of cycles (maybe?)
     * Write friendly "What's New in 1.6" document/article

Done

   Nothing at the moment.

After 1.7

     * Rich comparisons
     * Revised coercions
     * Parallel for loop (for i in L; j in M: ...),
     * Extended slicing for all sequences.
     * GvR: "I've also been thinking about making classes be types (not
       as huge a change as you think, if you don't allow subclassing
       built-in types), and adding a built-in array type suitable for use
       by NumPy."

--amk


From esr at thyrsus.com  Fri Mar 24 04:30:53 2000
From: esr at thyrsus.com (Eric S. Raymond)
Date: Thu, 23 Mar 2000 22:30:53 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003240251.VAA19921@newcnri.cnri.reston.va.us>; from Andrew Kuchling on Thu, Mar 23, 2000 at 09:51:25PM -0500
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
Message-ID: <20000323223053.J28880@thyrsus.com>

Andrew Kuchling <akuchlin at mems-exchange.org>:
>      * Drop cursesmodule and package it separately. (Any other obsolete
>        modules that should go?)

Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel
configuration system I'm writing.  Why is it on the hit list?
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

Still, if you will not fight for the right when you can easily
win without bloodshed, if you will not fight when your victory
will be sure and not so costly, you may come to the moment when
you will have to fight with all the odds against you and only a
precarious chance for survival. There may be a worse case.  You
may have to fight when there is no chance of victory, because it
is better to perish than to live as slaves.
	--Winston Churchill


From dan at cgsoftware.com  Fri Mar 24 04:52:54 2000
From: dan at cgsoftware.com (Daniel Berlin+list.python-dev)
Date: 23 Mar 2000 22:52:54 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: "Eric S. Raymond"'s message of "Thu, 23 Mar 2000 22:30:53 -0500"
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com>
Message-ID: <4s9x6n3d.fsf@dan.resnet.rochester.edu>

"Eric S. Raymond" <esr at thyrsus.com> writes:


> Andrew Kuchling <akuchlin at mems-exchange.org>:
> >      * Drop cursesmodule and package it separately. (Any other obsolete
> >        modules that should go?)
> 
> Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel
> configuration system I'm writing.  Why is it on the hit list?

IIRC, it's because nobody really maintains it, and those that care
about it, use a different one (either ncurses module, or a newer cursesmodule).
So from what i understand, you get complaints, but no real advantage
to having it there.
I'm just trying to summarize, not fall on either side (some people get
touchy about issues like this).

--Dan


From esr at thyrsus.com  Fri Mar 24 05:11:37 2000
From: esr at thyrsus.com (Eric S. Raymond)
Date: Thu, 23 Mar 2000 23:11:37 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <4s9x6n3d.fsf@dan.resnet.rochester.edu>; from Daniel Berlin+list.python-dev on Thu, Mar 23, 2000 at 10:52:54PM -0500
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu>
Message-ID: <20000323231137.U28880@thyrsus.com>

Daniel Berlin+list.python-dev <dan at cgsoftware.com>:
> > Andrew Kuchling <akuchlin at mems-exchange.org>:
> > >      * Drop cursesmodule and package it separately. (Any other obsolete
> > >        modules that should go?)
> > 
> > Annoyingly enough, I may need this to stay in, for use by the new Linux-kernel
> > configuration system I'm writing.  Why is it on the hit list?
> 
> IIRC, it's because nobody really maintains it, and those that care
> about it, use a different one (either ncurses module, or a newer cursesmodule).
> So from what i understand, you get complaints, but no real advantage
> to having it there.

OK.  Then what I guess I'd like is for a maintained equivalent of this
to join the core -- the ncurses module you referred to, for choice.

I'm not being random.  I'm trying to replace the mess that currently 
constitutes the kbuild system -- but I'll need to support an equivalent
of menuconfig.
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

"The state calls its own violence `law', but that of the individual `crime'"
	-- Max Stirner


From akuchlin at mems-exchange.org  Fri Mar 24 05:33:24 2000
From: akuchlin at mems-exchange.org (Andrew Kuchling)
Date: Thu, 23 Mar 2000 23:33:24 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <20000323231137.U28880@thyrsus.com>
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
	<20000323223053.J28880@thyrsus.com>
	<4s9x6n3d.fsf@dan.resnet.rochester.edu>
	<20000323231137.U28880@thyrsus.com>
Message-ID: <14554.61460.311650.599253@newcnri.cnri.reston.va.us>

Eric S. Raymond writes:
>OK.  Then what I guess I'd like is for a maintained equivalent of this
>to join the core -- the ncurses module you referred to, for choice.

See the "Whither cursesmodule" thread in the python-dev archives:
http://www.python.org/pipermail/python-dev/2000-February/003796.html

One possibility was to blow off backward compatibility; are there any
systems that only have BSD curses, not SysV curses / ncurses?  Given
that Pavel Curtis announced he was dropping BSD curses maintainance
some years ago, I expect even the *BSDs use ncurses these days. 

However, Oliver Andrich doesn't seem interested in maintaining his
ncurses module, and someone just started a SWIG-generated interface
(http://pyncurses.sourceforge.net), so it's not obvious which one
you'd use.  (I *would* be willing to take over maintaining Andrich's
code; maintaining the BSD curses version just seems pointless these
days.)

--amk


From dan at cgsoftware.com  Fri Mar 24 05:43:51 2000
From: dan at cgsoftware.com (Daniel Berlin+list.python-dev)
Date: 23 Mar 2000 23:43:51 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Andrew Kuchling's message of "Thu, 23 Mar 2000 23:33:24 -0500 (EST)"
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> <14554.61460.311650.599253@newcnri.cnri.reston.va.us>
Message-ID: <em915660.fsf@dan.resnet.rochester.edu>

Andrew Kuchling <akuchlin at mems-exchange.org> writes:


> Eric S. Raymond writes:
> >OK.  Then what I guess I'd like is for a maintained equivalent of this
> >to join the core -- the ncurses module you referred to, for choice.
> 
> See the "Whither cursesmodule" thread in the python-dev archives:
> http://www.python.org/pipermail/python-dev/2000-February/003796.html
> 
> One possibility was to blow off backward compatibility; are there any
> systems that only have BSD curses, not SysV curses / ncurses?  Given
> that Pavel Curtis announced he was dropping BSD curses maintainance
> some years ago, I expect even the *BSDs use ncurses these days. 

Yes, they do.
ls /usr/src/lib/libncurses/
Makefile  ncurses_cfg.h  pathnames.h termcap.c
grep 5\.0 /usr/src/contrib/ncurses/*
<Shows the source tree contains ncurses 5.0>

At least, this is FreeBSD.
So there is no need for BSD curses anymore, on FreeBSD's account.


> --amk
> 


From esr at thyrsus.com  Fri Mar 24 05:47:56 2000
From: esr at thyrsus.com (Eric S. Raymond)
Date: Thu, 23 Mar 2000 23:47:56 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14554.61460.311650.599253@newcnri.cnri.reston.va.us>; from Andrew Kuchling on Thu, Mar 23, 2000 at 11:33:24PM -0500
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us> <20000323223053.J28880@thyrsus.com> <4s9x6n3d.fsf@dan.resnet.rochester.edu> <20000323231137.U28880@thyrsus.com> <14554.61460.311650.599253@newcnri.cnri.reston.va.us>
Message-ID: <20000323234756.A29775@thyrsus.com>

Andrew Kuchling <akuchlin at mems-exchange.org>:
> Eric S. Raymond writes:
> >OK.  Then what I guess I'd like is for a maintained equivalent of this
> >to join the core -- the ncurses module you referred to, for choice.
> 
> See the "Whither cursesmodule" thread in the python-dev archives:
> http://www.python.org/pipermail/python-dev/2000-February/003796.html
> 
> One possibility was to blow off backward compatibility; are there any
> systems that only have BSD curses, not SysV curses / ncurses?  Given
> that Pavel Curtis announced he was dropping BSD curses maintainance
> some years ago, I expect even the *BSDs use ncurses these days. 

BSD curses was officially declared dead by its maintainer, Keith
Bostic, in early 1995.  Keith and I conspired to kill it of in favor
of ncurses :-).
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

If gun laws in fact worked, the sponsors of this type of legislation
should have no difficulty drawing upon long lists of examples of
criminal acts reduced by such legislation. That they cannot do so
after a century and a half of trying -- that they must sweep under the
rug the southern attempts at gun control in the 1870-1910 period, the
northeastern attempts in the 1920-1939 period, the attempts at both
Federal and State levels in 1965-1976 -- establishes the repeated,
complete and inevitable failure of gun laws to control serious crime.
        -- Senator Orrin Hatch, in a 1982 Senate Report


From andy at reportlab.com  Fri Mar 24 11:14:44 2000
From: andy at reportlab.com (Andy Robinson)
Date: Fri, 24 Mar 2000 10:14:44 GMT
Subject: [Python-Dev] Unicode character names
In-Reply-To: <20000324024913.B8C3A1CF22@dinsdale.python.org>
References: <20000324024913.B8C3A1CF22@dinsdale.python.org>
Message-ID: <38db3fc6.7370137@post.demon.co.uk>

On Thu, 23 Mar 2000 21:49:13 -0500 (EST), you wrote:

>Sorry to disappoint you guys, but the Unicode name and comments
>are *not* included in the unicodedatabase.c file Christian
>is currently working on. The reason is simple: it would add
>huge amounts of string data to the file. So this is a no-no
>for the core distribution...


You're right about what is compiled into the core.  I have to keep
reminding myself to distinguish three places functionality can live:

1. What is compiled into the Python core
2. What is in the standard Python library relating to encodings.  
3. Completely separate add-on packages, maintained outside of Python,
to provide extra functionality for (e.g.) Asian encodings.

It is clear that both the Unicode database, and the mapping tables and
other files at unicode.org, are a great resource; but they could be
placed in (2) or (3) easily, along with scripts to unpack them.  It
probably makes sense for the i18n-sig to kick off a separate
'CodecKit' project for now, and we can see what good emerges from it
before thinking about what should go into the library.

>Still, the above is easily possible by inventing a new
>encoding, say unicode-with-smileys, which then reads in
>a file containing the Unicode names and applies the necessary
>magic to decode/encode data as Paul described above.
>Would probably make a cool fun-project for someone who wants
>to dive into writing codecs.
Yup.  Prime candidate for CodecKit.


- Andy


From mal at lemburg.com  Fri Mar 24 09:52:36 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 09:52:36 +0100
Subject: [Python-Dev] Re: Unicode character names
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
Message-ID: <38DB2CD4.CAD9F0E2@lemburg.com>

Bill Tutt wrote:
> 
> MAL wrote:
> 
> >Andrew M. Kuchling" wrote:
> >>
> >> Paul Prescod writes:
> >>>The new \N escape interpolates named characters within strings. For
> >>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> >>>unicode smiley face at the end.
> >>
> >> Cute idea, and it certainly means you can avoid looking up Unicode
> >> numbers.  (You can look up names instead. :) )  Note that this means the
> >> Unicode database is no longer optional if this is done; it has to be
> >> around at code-parsing time.  Python could import it automatically, as
> >> exceptions.py is imported.  Christian's work on compressing
> >> unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
> >> dragging around the Unicode database in the binary, or is it read out
> >> of some external file or data structure?)
> >
> > Sorry to disappoint you guys, but the Unicode name and comments
> > are *not* included in the unicodedatabase.c file Christian
> > is currently working on. The reason is simple: it would add
> > huge amounts of string data to the file. So this is a no-no
> > for the core distribution...
> >
> 
> Ok, now you're just being silly. Its possible to put the character names in
> a separate structure so that they don't automatically get paged in with the
> normal unicode character property data. If you never use it, it won't get
> paged in, its that simple....

Sure, but it would still cause the interpreter binary or DLL
to increase in size considerably... that caused some major
noise a few days ago due to the fact that the unicodedata module
adds some 600kB to the interpreter -- even though it would
only get swapped in when needed (the interpreter itself doesn't
use it).
 
> Looking up the Unicode code value from the Unicode character name smells
> like a good time to use gperf to generate a perfect hash function for the
> character names. Esp. for the Unicode 3.0 character namespace. Then you can
> just store the hashkey -> Unicode character mapping, and hardly ever need to
> page in the actual full character name string itself.

Great idea, but why not put this into separate codec module ?
 
> I haven't looked at what the comment field contains, so I have no idea how
> useful that info is.

Probably not worth looking at...
 
> *waits while gperf crunches through the ~10,550 Unicode characters where
> this would be useful*

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Fri Mar 24 11:37:53 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 11:37:53 +0100
Subject: [Python-Dev] Unicode and Windows
References: <20000322134650.ED1BD370CF2@snelboot.oratrix.nl> <38D8F55E.6E324281@lemburg.com>
Message-ID: <38DB4581.EB5315E0@lemburg.com>

Ok, I've just added two new parser markers to PyArg_ParseTuple()
which will hopefully make life a little easier for extension
writers.

The new code will be in the next patch set which I will release
early next week.

Here are the docs:

Internal Argument Parsing:
--------------------------

These markers are used by the PyArg_ParseTuple() APIs:

  "U":  Check for Unicode object and return a pointer to it

  "s":  For Unicode objects: auto convert them to the <default encoding>
        and return a pointer to the object's <defencstr> buffer.

  "s#": Access to the Unicode object via the bf_getreadbuf buffer interface 
        (see Buffer Interface); note that the length relates to the buffer
        length, not the Unicode string length (this may be different
        depending on the Internal Format).

  "t#": Access to the Unicode object via the bf_getcharbuf buffer interface
        (see Buffer Interface); note that the length relates to the buffer
        length, not necessarily to the Unicode string length (this may
        be different depending on the <default encoding>).

  "es": 
	Takes two parameters: encoding (const char **) and
	buffer (char **). 

	The input object is first coerced to Unicode in the usual way
	and then encoded into a string using the given encoding.

	On output, a buffer of the needed size is allocated and
	returned through *buffer as NULL-terminated string.
	The encoded may not contain embedded NULL characters.
	The caller is responsible for free()ing the allocated *buffer
	after usage.

  "es#":
	Takes three parameters: encoding (const char **),
	buffer (char **) and buffer_len (int *).
	
	The input object is first coerced to Unicode in the usual way
	and then encoded into a string using the given encoding.

	If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer)
	on input. Output is then copied to *buffer.

	If *buffer is NULL, a buffer of the needed size is
	allocated and output copied into it. *buffer is then
	updated to point to the allocated memory area. The caller
	is responsible for free()ing *buffer after usage.

	In both cases *buffer_len is updated to the number of
	characters written (excluding the trailing NULL-byte).
	The output buffer is assured to be NULL-terminated.

Examples:

Using "es#" with auto-allocation:

    static PyObject *
    test_parser(PyObject *self,
		PyObject *args)
    {
	PyObject *str;
	const char *encoding = "latin-1";
	char *buffer = NULL;
	int buffer_len = 0;

	if (!PyArg_ParseTuple(args, "es#:test_parser",
			      &encoding, &buffer, &buffer_len))
	    return NULL;
	if (!buffer) {
	    PyErr_SetString(PyExc_SystemError,
			    "buffer is NULL");
	    return NULL;
	}
	str = PyString_FromStringAndSize(buffer, buffer_len);
	free(buffer);
	return str;
    }

Using "es" with auto-allocation returning a NULL-terminated string:    
    
    static PyObject *
    test_parser(PyObject *self,
		PyObject *args)
    {
	PyObject *str;
	const char *encoding = "latin-1";
	char *buffer = NULL;

	if (!PyArg_ParseTuple(args, "es:test_parser",
			      &encoding, &buffer))
	    return NULL;
	if (!buffer) {
	    PyErr_SetString(PyExc_SystemError,
			    "buffer is NULL");
	    return NULL;
	}
	str = PyString_FromString(buffer);
	free(buffer);
	return str;
    }

Using "es#" with a pre-allocated buffer:
    
    static PyObject *
    test_parser(PyObject *self,
		PyObject *args)
    {
	PyObject *str;
	const char *encoding = "latin-1";
	char _buffer[10];
	char *buffer = _buffer;
	int buffer_len = sizeof(_buffer);

	if (!PyArg_ParseTuple(args, "es#:test_parser",
			      &encoding, &buffer, &buffer_len))
	    return NULL;
	if (!buffer) {
	    PyErr_SetString(PyExc_SystemError,
			    "buffer is NULL");
	    return NULL;
	}
	str = PyString_FromStringAndSize(buffer, buffer_len);
	return str;
    }

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein at lyra.org  Fri Mar 24 11:54:02 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 02:54:02 -0800 (PST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <38DB4581.EB5315E0@lemburg.com>
Message-ID: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, M.-A. Lemburg wrote:
>...
>   "s":  For Unicode objects: auto convert them to the <default encoding>
>         and return a pointer to the object's <defencstr> buffer.

Guess that I didn't notice this before, but it seems wierd that "s" and
"s#" return different encodings.

Why?

>   "es": 
> 	Takes two parameters: encoding (const char **) and
> 	buffer (char **). 
>...
>   "es#":
> 	Takes three parameters: encoding (const char **),
> 	buffer (char **) and buffer_len (int *).

I see no reason to make the encoding (const char **) rather than
(const char *). We are never returning a value, so this just makes it
harder to pass the encoding into ParseTuple.

There is precedent for passing in single-ref pointers. For example:

  PyArg_ParseTuple(args, "O!", &s, PyString_Type)


I would recommend using just one pointer level for the encoding.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Fri Mar 24 12:29:12 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 12:29:12 +0100
Subject: [Python-Dev] Unicode and Windows
References: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
Message-ID: <38DB5188.AA580652@lemburg.com>

Greg Stein wrote:
> 
> On Fri, 24 Mar 2000, M.-A. Lemburg wrote:
> >...
> >   "s":  For Unicode objects: auto convert them to the <default encoding>
> >         and return a pointer to the object's <defencstr> buffer.
> 
> Guess that I didn't notice this before, but it seems wierd that "s" and
> "s#" return different encodings.
> 
> Why?

This is due to the buffer interface being used for "s#". Since
"s#" refers to the getreadbuf slot, it returns raw data. In
this case this is UTF-16 in platform dependent byte order.

"s" relies on NULL-terminated strings and doesn't use the
buffer interface at all. Thus "s" returns NULL-terminated
UTF-8 (UTF-16 is full of NULLs).
 
"t#" uses the getcharbuf slot and thus should return character
data. UTF-8 is the right encoding here.

> >   "es":
> >       Takes two parameters: encoding (const char **) and
> >       buffer (char **).
> >...
> >   "es#":
> >       Takes three parameters: encoding (const char **),
> >       buffer (char **) and buffer_len (int *).
> 
> I see no reason to make the encoding (const char **) rather than
> (const char *). We are never returning a value, so this just makes it
> harder to pass the encoding into ParseTuple.
> 
> There is precedent for passing in single-ref pointers. For example:
> 
>   PyArg_ParseTuple(args, "O!", &s, PyString_Type)
> 
> I would recommend using just one pointer level for the encoding.

You have a point there... even though it breaks the notion
of prepending all parameters with an '&' (ok, except the
type check one). OTOH, it would allow passing the encoding
right with the PyArg_ParseTuple() call which probably makes
more sense in this context.

I'll change it...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From tismer at tismer.com  Fri Mar 24 14:13:02 2000
From: tismer at tismer.com (Christian Tismer)
Date: Fri, 24 Mar 2000 14:13:02 +0100
Subject: [Python-Dev] Unicode character names
References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> <38DA8797.F16301E4@lemburg.com>
Message-ID: <38DB69DE.6D04B084@tismer.com>


"M.-A. Lemburg" wrote:
> 
> "Andrew M. Kuchling" wrote:
> >
> > Paul Prescod writes:
> > >The new \N escape interpolates named characters within strings. For
> > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> > >unicode smiley face at the end.
> >
> > Cute idea, and it certainly means you can avoid looking up Unicode
> > numbers.  (You can look up names instead. :) )  Note that this means the
> > Unicode database is no longer optional if this is done; it has to be
> > around at code-parsing time.  Python could import it automatically, as
> > exceptions.py is imported.  Christian's work on compressing
> > unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
> > dragging around the Unicode database in the binary, or is it read out
> > of some external file or data structure?)
> 
> Sorry to disappoint you guys, but the Unicode name and comments
> are *not* included in the unicodedatabase.c file Christian
> is currently working on. The reason is simple: it would add
> huge amounts of string data to the file. So this is a no-no
> for the core distribution...

This is not settled, still an open question.
What I have for non-textual data:
25 kb with dumb compression
15 kb with enhanced compression

What amounts of data am I talking about?
- The whole unicode database text file has size 
  632 kb.
- With PkZip this goes down to 
  96 kb.

Now, I produced another text file with just the currently
used data in it, and this sounds so:
- the stripped unicode text file has size
  216 kb.
- PkZip melts this down to
  40 kb.

Please compare that to my results above: I can do at least
twice as good. I hope I can compete for the text sections
as well (since this is something where zip is *good* at),
but just let me try.
Let's target 60 kb for the whole crap, and I'd be very pleased.

Then, there is still the question where to put the data.
Having one file in the dll and another externally would
be an option. I could also imagine to use a binary external
file all the time, with maximum possible compression.
By loading this structure, this would be partially expanded
to make it fast.
An advantage is that the compressed Unicode database
could become a stand-alone product. The size is in fact
so crazy small, that I'd like to make this available
to any other language.

> Still, the above is easily possible by inventing a new
> encoding, say unicode-with-smileys, which then reads in
> a file containing the Unicode names and applies the necessary
> magic to decode/encode data as Paul described above.

That sounds reasonable. Compression makes sense as well here,
since the expanded stuff makes quite an amount of kb, compared
to what it is "worth", compared to, say, the Python dll.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From mal at lemburg.com  Fri Mar 24 14:41:27 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 14:41:27 +0100
Subject: [Python-Dev] Unicode character names
References: <38DA6A49.A60E405B@prescod.net> <14554.32317.730574.967165@amarok.cnri.reston.va.us> <38DA8797.F16301E4@lemburg.com> <38DB69DE.6D04B084@tismer.com>
Message-ID: <38DB7087.1B105AC7@lemburg.com>

Christian Tismer wrote:
> 
> "M.-A. Lemburg" wrote:
> >
> > "Andrew M. Kuchling" wrote:
> > >
> > > Paul Prescod writes:
> > > >The new \N escape interpolates named characters within strings. For
> > > >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> > > >unicode smiley face at the end.
> > >
> > > Cute idea, and it certainly means you can avoid looking up Unicode
> > > numbers.  (You can look up names instead. :) )  Note that this means the
> > > Unicode database is no longer optional if this is done; it has to be
> > > around at code-parsing time.  Python could import it automatically, as
> > > exceptions.py is imported.  Christian's work on compressing
> > > unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
> > > dragging around the Unicode database in the binary, or is it read out
> > > of some external file or data structure?)
> >
> > Sorry to disappoint you guys, but the Unicode name and comments
> > are *not* included in the unicodedatabase.c file Christian
> > is currently working on. The reason is simple: it would add
> > huge amounts of string data to the file. So this is a no-no
> > for the core distribution...
> 
> This is not settled, still an open question.

Well, ok, depends on how much you can sqeeze out of the
text columns ;-) I still think that its better to leave
these gimmicks out of the core and put them into some
add-on, though.

> What I have for non-textual data:
> 25 kb with dumb compression
> 15 kb with enhanced compression

Looks good :-) With these sizes I think we could even integrate
the unicodedatabase.c + API into the core interpreter and
only have the unicodedata module to access the database
from within Python.
 
> What amounts of data am I talking about?
> - The whole unicode database text file has size
>   632 kb.
> - With PkZip this goes down to
>   96 kb.
> 
> Now, I produced another text file with just the currently
> used data in it, and this sounds so:
> - the stripped unicode text file has size
>   216 kb.
> - PkZip melts this down to
>   40 kb.
> 
> Please compare that to my results above: I can do at least
> twice as good. I hope I can compete for the text sections
> as well (since this is something where zip is *good* at),
> but just let me try.
> Let's target 60 kb for the whole crap, and I'd be very pleased.
>
> Then, there is still the question where to put the data.
> Having one file in the dll and another externally would
> be an option. I could also imagine to use a binary external
> file all the time, with maximum possible compression.
> By loading this structure, this would be partially expanded
> to make it fast.
> An advantage is that the compressed Unicode database
> could become a stand-alone product. The size is in fact
> so crazy small, that I'd like to make this available
> to any other language.

You could take the unicodedatabase.c file (+ header file)
and use it everywhere... I don't think it needs to contain
any Python specific code. The API names would have to follow
the Python naming schemes though.
 
> > Still, the above is easily possible by inventing a new
> > encoding, say unicode-with-smileys, which then reads in
> > a file containing the Unicode names and applies the necessary
> > magic to decode/encode data as Paul described above.
> 
> That sounds reasonable. Compression makes sense as well here,
> since the expanded stuff makes quite an amount of kb, compared
> to what it is "worth", compared to, say, the Python dll.

With 25kB for the non-text columns, I'd suggest simply
adding the file to the core. Text columns could then
go into a separate module.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Fri Mar 24 15:14:51 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 09:14:51 -0500
Subject: [Python-Dev] Hi -- I'm back!
Message-ID: <200003241414.JAA11740@eric.cnri.reston.va.us>

I'm back from ten days on the road.  I'll try to dig through the
various mailing list archives over the next few days, but it would be
more efficient if you are waiting for me to take action or express an
opinion on a particular issue (in *any* Python-related mailing list)
to mail me a summary or at least a pointer.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jack at oratrix.nl  Fri Mar 24 16:01:25 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Fri, 24 Mar 2000 16:01:25 +0100
Subject: [Python-Dev] None as a keyword / class methods 
In-Reply-To: Message by Ka-Ping Yee <ping@lfw.org> ,
	     Thu, 23 Mar 2000 09:47:47 -0800 (PST) , <Pine.LNX.4.10.10003230942180.1187-100000@localhost> 
Message-ID: <20000324150125.7144A370CF2@snelboot.oratrix.nl>

> Hmm... i guess this also means one should ask what
> 
>     def function(None, arg):
>         ...
> 
> does outside a class definition.  I suppose that should simply
> be illegal.

No, it forces you to call the function with keyword arguments!
(initially meant jokingly, but thinking about it for a couple of seconds there 
might actually be cases where this is useful)
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From skip at mojam.com  Fri Mar 24 16:14:11 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 24 Mar 2000 09:14:11 -0600 (CST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
Message-ID: <14555.34371.749039.946891@beluga.mojam.com>

    AMK> I've written up a list of things that need to get done before 1.6
    AMK> is finished.  This is my vision of what needs to be done, and
    AMK> doesn't have an official stamp of approval from GvR or anyone else.
    AMK> So it's very probably wrong.

Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
of general usefulness (this is at least generally useful for anyone writing
web spiders ;-) shouldn't live in Tools, because it's not always available
and users need to do extra work to make them available.

I'd be happy to write up some documentation for it and twiddle the module to 
include doc strings.

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From fdrake at acm.org  Fri Mar 24 16:20:03 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 10:20:03 -0500 (EST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
References: <38DB4581.EB5315E0@lemburg.com>
	<Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
Message-ID: <14555.34723.841426.504538@weyr.cnri.reston.va.us>

Greg Stein writes:
 > There is precedent for passing in single-ref pointers. For example:
 > 
 >   PyArg_ParseTuple(args, "O!", &s, PyString_Type)
                                  ^^^^^^^^^^^^^^^^^

  Feeling ok?  I *suspect* these are reversed.  :)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake at acm.org  Fri Mar 24 16:24:13 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 10:24:13 -0500 (EST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <38DB5188.AA580652@lemburg.com>
References: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
	<38DB5188.AA580652@lemburg.com>
Message-ID: <14555.34973.303273.716146@weyr.cnri.reston.va.us>

M.-A. Lemburg writes:
 > You have a point there... even though it breaks the notion
 > of prepending all parameters with an '&' (ok, except the

  I've never heard of this notion; I hope I didn't just miss it in the 
docs!
  The O& also doesn't require a & in front of the name of the
conversion function, you just pass the right value.  So there are at
least two cases where you *typically* don't use a &.  (Other cases in
the 1.5.2 API are probably just plain weird if they don't!)
  Changing it to avoid the extra machinery is the Right Thing; you get 
to feel good today.  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From mal at lemburg.com  Fri Mar 24 17:38:06 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 17:38:06 +0100
Subject: [Python-Dev] Unicode and Windows
References: <Pine.LNX.4.10.10003240249370.27878-100000@nebula.lyra.org>
		<38DB5188.AA580652@lemburg.com> <14555.34973.303273.716146@weyr.cnri.reston.va.us>
Message-ID: <38DB99EE.F5949889@lemburg.com>

"Fred L. Drake, Jr." wrote:
> 
> M.-A. Lemburg writes:
>  > You have a point there... even though it breaks the notion
>  > of prepending all parameters with an '&' (ok, except the
> 
>   I've never heard of this notion; I hope I didn't just miss it in the
> docs!

If you scan the parameters list in getargs.c you'll come to
this conclusion and thus my notion: I've been programming like
this for years now :-)

>   The O& also doesn't require a & in front of the name of the
> conversion function, you just pass the right value.  So there are at
> least two cases where you *typically* don't use a &.  (Other cases in
> the 1.5.2 API are probably just plain weird if they don't!)
>   Changing it to avoid the extra machinery is the Right Thing; you get
> to feel good today.  ;)

Ok, feeling good now ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at python.org  Fri Mar 24 21:44:02 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 15:44:02 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 09:14:11 CST."
             <14555.34371.749039.946891@beluga.mojam.com> 
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>  
            <14555.34371.749039.946891@beluga.mojam.com> 
Message-ID: <200003242044.PAA00677@eric.cnri.reston.va.us>

> Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
> of general usefulness (this is at least generally useful for anyone writing
> web spiders ;-) shouldn't live in Tools, because it's not always available
> and users need to do extra work to make them available.
> 
> I'd be happy to write up some documentation for it and twiddle the module to 
> include doc strings.

Deal.  Soon as we get the docs we'll move it to Lib.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gstein at lyra.org  Fri Mar 24 21:50:43 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 12:50:43 -0800 (PST)
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: <14555.34723.841426.504538@weyr.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241248010.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Fred L. Drake, Jr. wrote:
> Greg Stein writes:
>  > There is precedent for passing in single-ref pointers. For example:
>  > 
>  >   PyArg_ParseTuple(args, "O!", &s, PyString_Type)
>                                   ^^^^^^^^^^^^^^^^^
> 
>   Feeling ok?  I *suspect* these are reversed.  :)

I just checked the code to ensure that it took a single pointer rather
than a double-pointer. I guess that I didn't verify the order :-)

Concept is valid, tho... the params do not necessarily require an
ampersand.

oop! Actually... this does require an ampersand:

    PyArg_ParseTuple(args, "O!", &PyString_Type, &s)

Don't want to pass the whole structure...

Well, regardless: I would much prefer to see the encoding passed as a
constant string, rather than having to shove the sucker into a variable
first, just so that I can insert a useless address-of operator.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From akuchlin at mems-exchange.org  Fri Mar 24 21:51:56 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 24 Mar 2000 15:51:56 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242044.PAA00677@eric.cnri.reston.va.us>
References: <200003240251.VAA19921@newcnri.cnri.reston.va.us>
	<14555.34371.749039.946891@beluga.mojam.com>
	<200003242044.PAA00677@eric.cnri.reston.va.us>
Message-ID: <14555.54636.811100.254309@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>> Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
>Deal.  Soon as we get the docs we'll move it to Lib.

What about putting it in a package like 'www' or 'web'?  Packagizing
the existing library is hard because of backward compatibility, but
there's no such constraint for new modules.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
One need not be a chamber to be haunted; / One need not be a house; / The
brain has corridors surpassing / Material place.
    -- Emily Dickinson, "Time and Eternity"


From gstein at lyra.org  Fri Mar 24 22:00:25 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:00:25 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.54636.811100.254309@amarok.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Andrew M. Kuchling wrote:
> Guido van Rossum writes:
> >> Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
> >Deal.  Soon as we get the docs we'll move it to Lib.
> 
> What about putting it in a package like 'www' or 'web'?  Packagizing
> the existing library is hard because of backward compatibility, but
> there's no such constraint for new modules.

Or in the "network" package that was suggested a month ago?

And why *can't* we start on repackaging old module? I think the only
reason that somebody came up with to NOT do it was "well, if we don't
repackage the whole thing, then we should repackage nothing."  Which, IMO,
is totally bogus. We'll never get anywhere operating under that principle.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From fdrake at acm.org  Fri Mar 24 22:00:19 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 16:00:19 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>
References: <14555.54636.811100.254309@amarok.cnri.reston.va.us>
	<Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>
Message-ID: <14555.55139.484135.602894@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Or in the "network" package that was suggested a month ago?

  +1

 > And why *can't* we start on repackaging old module? I think the only
 > reason that somebody came up with to NOT do it was "well, if we don't
 > repackage the whole thing, then we should repackage nothing."  Which, IMO,
 > is totally bogus. We'll never get anywhere operating under that principle.

  That doesn't bother me, but I tend to be a little conservative
(though usually not as conservative as Guido on such matters).  I
*would* like to decided theat 1.7 will be fully packagized, and not
wait until 2.0.  As long as 1.7 is a "testing the evolutionary path"
release, I think that's the right thing to do.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido at python.org  Fri Mar 24 22:03:54 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:03:54 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
Message-ID: <200003242103.QAA03288@eric.cnri.reston.va.us>

Someone noticed that socket.connect() and a few related functions
(connect_ex() and bind()) take either a single (host, port) tuple or
two separate arguments, but that only the tuple is documented.

Similar to append(), I'd like to close this gap, and I've made the
necessary changes.  This will probably break lots of code.

Similar to append(), I'd like people to fix their code rather than
whine -- two-arg connect() has never been documented, although it's
found in much code (even the socket module test code :-( ).

Similar to append(), I may revert the change if it is shown to cause
too much pain during beta testing...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Fri Mar 24 22:05:57 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:05:57 -0500
Subject: [Python-Dev] Unicode and Windows
In-Reply-To: Your message of "Fri, 24 Mar 2000 12:50:43 PST."
             <Pine.LNX.4.10.10003241248010.27878-100000@nebula.lyra.org> 
References: <Pine.LNX.4.10.10003241248010.27878-100000@nebula.lyra.org> 
Message-ID: <200003242105.QAA03543@eric.cnri.reston.va.us>

> Well, regardless: I would much prefer to see the encoding passed as a
> constant string, rather than having to shove the sucker into a variable
> first, just so that I can insert a useless address-of operator.

Of course.  Use & for output args, not as a matter of principle.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Fri Mar 24 22:11:25 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:11:25 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 13:00:25 PST."
             <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org> 
References: <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org> 
Message-ID: <200003242111.QAA04208@eric.cnri.reston.va.us>

[Greg]
> And why *can't* we start on repackaging old module? I think the only
> reason that somebody came up with to NOT do it was "well, if we don't
> repackage the whole thing, then we should repackage nothing."  Which, IMO,
> is totally bogus. We'll never get anywhere operating under that principle.

The reason is backwards compatibility.  Assume we create a package
"web" and move all web related modules into it: httplib, urllib,
htmllib, etc.  Now for backwards compatibility, we add the web
directory to sys.path, so one can write either "import web.urllib" or
"import urllib".  But that loads the same code twice!  And in this
(carefully chosen :-) example, urllib actually has some state which
shouldn't be replicated.

Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
door, and there's a lot of other stuff I need to do besides moving
modules around.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Fri Mar 24 22:15:00 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:15:00 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 16:00:19 EST."
             <14555.55139.484135.602894@weyr.cnri.reston.va.us> 
References: <14555.54636.811100.254309@amarok.cnri.reston.va.us> <Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>  
            <14555.55139.484135.602894@weyr.cnri.reston.va.us> 
Message-ID: <200003242115.QAA04648@eric.cnri.reston.va.us>

> Greg Stein writes:
>  > Or in the "network" package that was suggested a month ago?

[Fred]
>   +1

Which reminds me of another reason to wait: coming up with the right
package hierarchy is hard.  (E.g. I find network too long; plus, does
htmllib belong there?)

>   That doesn't bother me, but I tend to be a little conservative
> (though usually not as conservative as Guido on such matters).  I
> *would* like to decided theat 1.7 will be fully packagized, and not
> wait until 2.0.  As long as 1.7 is a "testing the evolutionary path"
> release, I think that's the right thing to do.

Agreed.

At the SD conference I gave a talk about the future of Python, and
there was (again) a good suggestion about forwards compatibility.
Starting with 1.7 (if not sooner), several Python 3000 features that
necessarily have to be incompatible (like 1/2 yielding 0.5 instead of
0) could issue warnings when (or unless?) Python is invoked with a
compatibility flag.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From bwarsaw at cnri.reston.va.us  Fri Mar 24 22:21:54 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 24 Mar 2000 16:21:54 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
Message-ID: <14555.56434.974884.832078@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido at python.org> writes:

    GvR> Someone noticed that socket.connect() and a few related
    GvR> functions (connect_ex() and bind()) take either a single
    GvR> (host, port) tuple or two separate arguments, but that only
    GvR> the tuple is documented.

    GvR> Similar to append(), I'd like to close this gap, and I've
    GvR> made the necessary changes.  This will probably break lots of
    GvR> code.

I don't agree that socket.connect() and friends need this fix.  Yes,
obviously append() needed fixing because of the application of Tim's
Twelfth Enlightenment to the semantic ambiguity.  But socket.connect()
has no such ambiguity; you may spell it differently, but you know
exactly what you mean.

My suggestion would be to not break any code, but extend connect's
interface to allow an optional second argument.  Thus all of these
calls would be legal:

sock.connect(addr)
sock.connect(addr, port)
sock.connect((addr, port))

One nit on the documentation of the socket module.  The second entry
says:

    bind (address) 
	 Bind the socket to address. The socket must not already be
	 bound. (The format of address depends on the address family --
	 see above.)

Huh?  What "above" part should I see?  Note that I'm reading this doc
off the web!

-Barry


From gstein at lyra.org  Fri Mar 24 22:27:57 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:27:57 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242111.QAA04208@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Guido van Rossum wrote:
> [Greg]
> > And why *can't* we start on repackaging old module? I think the only
> > reason that somebody came up with to NOT do it was "well, if we don't
> > repackage the whole thing, then we should repackage nothing."  Which, IMO,
> > is totally bogus. We'll never get anywhere operating under that principle.
> 
> The reason is backwards compatibility.  Assume we create a package
> "web" and move all web related modules into it: httplib, urllib,
> htmllib, etc.  Now for backwards compatibility, we add the web
> directory to sys.path, so one can write either "import web.urllib" or
> "import urllib".  But that loads the same code twice!  And in this
> (carefully chosen :-) example, urllib actually has some state which
> shouldn't be replicated.

We don't add it to the path. Instead, we create new modules that look
like:

---- httplib.py ----
from web.httplib import *
----

The only backwards-compat issue with this approach is that people who poke
values into the module will have problems. I don't believe that any of the
modules were designed for that, anyhow, so it would seem an acceptable to
(effectively) disallow that behavior.

> Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
> door, and there's a lot of other stuff I need to do besides moving
> modules around.

Stuff that *you* need to do, sure. But there *are* a lot of us who can
help here, and some who desire to spend their time moving modules.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Fri Mar 24 22:32:14 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:32:14 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242115.QAA04648@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241330080.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Guido van Rossum wrote:
> > Greg Stein writes:
> >  > Or in the "network" package that was suggested a month ago?
> 
> [Fred]
> >   +1
> 
> Which reminds me of another reason to wait: coming up with the right
> package hierarchy is hard.  (E.g. I find network too long; plus, does
> htmllib belong there?)

htmllib does not go there. Where does it go? Dunno. Leave it unless/until
somebody comes up with a place for it.

We package up obvious ones. We don't have to design a complete hierarchy.
There seemed to be a general "good feeling" around some kind of network
(protocol) package. Call it "net" if "network" is too long.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From guido at python.org  Fri Mar 24 22:27:51 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:27:51 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: Your message of "Fri, 24 Mar 2000 16:21:54 EST."
             <14555.56434.974884.832078@anthem.cnri.reston.va.us> 
References: <200003242103.QAA03288@eric.cnri.reston.va.us>  
            <14555.56434.974884.832078@anthem.cnri.reston.va.us> 
Message-ID: <200003242127.QAA06269@eric.cnri.reston.va.us>

> >>>>> "GvR" == Guido van Rossum <guido at python.org> writes:
> 
>     GvR> Someone noticed that socket.connect() and a few related
>     GvR> functions (connect_ex() and bind()) take either a single
>     GvR> (host, port) tuple or two separate arguments, but that only
>     GvR> the tuple is documented.
> 
>     GvR> Similar to append(), I'd like to close this gap, and I've
>     GvR> made the necessary changes.  This will probably break lots of
>     GvR> code.
> 
> I don't agree that socket.connect() and friends need this fix.  Yes,
> obviously append() needed fixing because of the application of Tim's
> Twelfth Enlightenment to the semantic ambiguity.  But socket.connect()
> has no such ambiguity; you may spell it differently, but you know
> exactly what you mean.
> 
> My suggestion would be to not break any code, but extend connect's
> interface to allow an optional second argument.  Thus all of these
> calls would be legal:
> 
> sock.connect(addr)
> sock.connect(addr, port)
> sock.connect((addr, port))

You probably meant:

  sock.connect(addr)
  sock.connect(host, port)
  sock.connect((host, port))

since (host, port) is equivalent to (addr).

> One nit on the documentation of the socket module.  The second entry
> says:
> 
>     bind (address) 
> 	 Bind the socket to address. The socket must not already be
> 	 bound. (The format of address depends on the address family --
> 	 see above.)
> 
> Huh?  What "above" part should I see?  Note that I'm reading this doc
> off the web!

Fred typically directs latex2html to break all sections apart.  It's
in the previous section:

  Socket addresses are represented as a single string for the AF_UNIX
  address family and as a pair (host, port) for the AF_INET address
  family, where host is a string representing either a hostname in
  Internet domain notation like 'daring.cwi.nl' or an IP address like
  '100.50.200.5', and port is an integral port number. Other address
  families are currently not supported.  The address format required by
  a particular socket object is automatically selected based on the
  address family specified when the socket object was created.

This also explains the reason for requiring a single argument: when
using AF_UNIX, the second argument makes no sense!

Frankly, I'm not sure what do here -- it's more correct to require a
single address argument always, but it's more convenient to allow two
sometimes.

Note that sendto(data, addr) only accepts the tuple form: you cannot
write sendto(data, host, port).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at acm.org  Fri Mar 24 22:28:32 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 16:28:32 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
References: <200003242111.QAA04208@eric.cnri.reston.va.us>
	<Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
Message-ID: <14555.56832.336242.378838@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Stuff that *you* need to do, sure. But there *are* a lot of us who can
 > help here, and some who desire to spend their time moving modules.

  Would it make sense for one of these people with time on their hands 
to propose a specific mapping from old->new names?  I think that would 
be a good first step, regardless of the implementation timing.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido at python.org  Fri Mar 24 22:29:44 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:29:44 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 13:27:57 PST."
             <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org> 
References: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org> 
Message-ID: <200003242129.QAA06510@eric.cnri.reston.va.us>

> We don't add it to the path. Instead, we create new modules that look
> like:
> 
> ---- httplib.py ----
> from web.httplib import *
> ----
> 
> The only backwards-compat issue with this approach is that people who poke
> values into the module will have problems. I don't believe that any of the
> modules were designed for that, anyhow, so it would seem an acceptable to
> (effectively) disallow that behavior.

OK, that's reasonable.  I'll have to invent a different reason why I
don't want this -- because I really don't!

> > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
> > door, and there's a lot of other stuff I need to do besides moving
> > modules around.
> 
> Stuff that *you* need to do, sure. But there *are* a lot of us who can
> help here, and some who desire to spend their time moving modules.

Hm.  Moving modules requires painful and arcane CVS manipulations that
can only be done by the few of us here at CNRI -- and I'm the only one
left who's full time on Python.  I'm still not convinced that it's a
good plan.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at acm.org  Fri Mar 24 22:32:39 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 16:32:39 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: <14555.56434.974884.832078@anthem.cnri.reston.va.us>
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
	<14555.56434.974884.832078@anthem.cnri.reston.va.us>
Message-ID: <14555.57079.187670.916002@weyr.cnri.reston.va.us>

Barry A. Warsaw writes:
 > I don't agree that socket.connect() and friends need this fix.  Yes,
 > obviously append() needed fixing because of the application of Tim's
 > Twelfth Enlightenment to the semantic ambiguity.  But socket.connect()
 > has no such ambiguity; you may spell it differently, but you know
 > exactly what you mean.

  Crock.  The address representations have been fairly well defined
for quite a while.  Be explicit.

 > sock.connect(addr)

  This is the only legal signature.  (host, port) is simply the form
of addr for a particular address family.

 > One nit on the documentation of the socket module.  The second entry
 > says:
 > 
 >     bind (address) 
 > 	 Bind the socket to address. The socket must not already be
 > 	 bound. (The format of address depends on the address family --
 > 	 see above.)
 > 
 > Huh?  What "above" part should I see?  Note that I'm reading this doc
 > off the web!

  Definately written for the paper document!  Remind me about this
again in a month and I'll fix it, but I don't want to play games with
this little stuff until the 1.5.2p2 and 1.6 trees have been merged.
  Harrumph.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From gstein at lyra.org  Fri Mar 24 22:37:41 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:37:41 -0800 (PST)
Subject: [Python-Dev] delegating (was: 1.6 job list)
In-Reply-To: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Greg Stein wrote:
>...
> > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
> > door, and there's a lot of other stuff I need to do besides moving
> > modules around.
> 
> Stuff that *you* need to do, sure. But there *are* a lot of us who can
> help here, and some who desire to spend their time moving modules.

I just want to empahisize this point some more.

Python 1.6 has a defined timeline, with a defined set of minimal
requirements. However! I don't believe that a corollary of that says we
MUST ignore everything else. If those other options fit within the
required timeline, then why not? (assuming we have adequate testing and
doc to go with the changes)

There are ample people who have time and inclination to contribute. If
those contributions add positive benefit, then I see no reason to exclude
them (other than on pure merit, of course).

Note that some of the problems stem from CVS access. Much Guido-time could
be saved by a commit-then-review model, rather than review-then-Guido-
commits model. Fred does this very well with the Doc/ area.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Fri Mar 24 22:38:48 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 13:38:48 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241337460.27878-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Guido van Rossum wrote:
>...
> > We don't add it to the path. Instead, we create new modules that look
> > like:
> > 
> > ---- httplib.py ----
> > from web.httplib import *
> > ----
> > 
> > The only backwards-compat issue with this approach is that people who poke
> > values into the module will have problems. I don't believe that any of the
> > modules were designed for that, anyhow, so it would seem an acceptable to
> > (effectively) disallow that behavior.
> 
> OK, that's reasonable.  I'll have to invent a different reason why I
> don't want this -- because I really don't!

Fair enough.

> > > Plus, it's too much work -- I'd rather focus on getting 1.6 out of the
> > > door, and there's a lot of other stuff I need to do besides moving
> > > modules around.
> > 
> > Stuff that *you* need to do, sure. But there *are* a lot of us who can
> > help here, and some who desire to spend their time moving modules.
> 
> Hm.  Moving modules requires painful and arcane CVS manipulations that
> can only be done by the few of us here at CNRI -- and I'm the only one
> left who's full time on Python.  I'm still not convinced that it's a
> good plan.

There are a number of ways to do this, and I'm familiar with all of them.
It is a continuing point of strife in the Apache CVS repositories :-)

But... it is premised on accepting the desire to move them, of course.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From guido at python.org  Fri Mar 24 22:38:51 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 16:38:51 -0500
Subject: [Python-Dev] delegating (was: 1.6 job list)
In-Reply-To: Your message of "Fri, 24 Mar 2000 13:37:41 PST."
             <Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org> 
References: <Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org> 
Message-ID: <200003242138.QAA07621@eric.cnri.reston.va.us>

> Note that some of the problems stem from CVS access. Much Guido-time could
> be saved by a commit-then-review model, rather than review-then-Guido-
> commits model. Fred does this very well with the Doc/ area.

Actually, I'm experimenting with this already: Unicode, list.append()
and socket.connect() are done in this way!

For renames it is really painful though, even if someone else at CNRI
can do it.

I'd like to see a draft package hierarchy please?

Also, if you have some time, please review the bugs in the bugs list.
Patches submitted with a corresponding PR# will be treated with
priority!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Fri Mar 24 22:40:48 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 22:40:48 +0100
Subject: [Python-Dev] Unicode Patch Set 2000-03-24
Message-ID: <38DBE0E0.76A298FE@lemburg.com>

Attached you find the latest update of the Unicode implementation.
The patch is against the current CVS version.

It includes the fix I posted yesterday for the core dump problem
in codecs.c (was introduced by my previous patch set -- sorry),
adds more tests for the codecs and two new parser markers
"es" and "es#".

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/
-------------- next part --------------
Only in CVS-Python/Doc/tools: anno-api.py
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/codecs.py Python+Unicode/Lib/codecs.py
--- CVS-Python/Lib/codecs.py	Thu Mar 23 23:58:41 2000
+++ Python+Unicode/Lib/codecs.py	Fri Mar 17 23:51:01 2000
@@ -46,7 +46,7 @@
         handling schemes by providing the errors argument. These
         string values are defined:
 
-         'strict' - raise an error (or a subclass)
+         'strict' - raise a ValueError error (or a subclass)
          'ignore' - ignore the character and continue with the next
          'replace' - replace with a suitable replacement character;
                     Python will use the official U+FFFD REPLACEMENT
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/output/test_unicode Python+Unicode/Lib/test/output/test_unicode
--- CVS-Python/Lib/test/output/test_unicode	Fri Mar 24 22:21:26 2000
+++ Python+Unicode/Lib/test/output/test_unicode	Sat Mar 11 00:23:21 2000
@@ -1,5 +1,4 @@
 test_unicode
 Testing Unicode comparisons... done.
-Testing Unicode contains method... done.
 Testing Unicode formatting strings... done.
 Testing unicodedata module... done.
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py
--- CVS-Python/Lib/test/test_unicode.py	Thu Mar 23 23:58:47 2000
+++ Python+Unicode/Lib/test/test_unicode.py	Fri Mar 24 00:29:43 2000
@@ -293,3 +293,33 @@
     assert unicodedata.combining(u'\u20e1') == 230
     
     print 'done.'
+
+# Test builtin codecs
+print 'Testing builtin codecs...',
+
+assert unicode('hello','ascii') == u'hello'
+assert unicode('hello','utf-8') == u'hello'
+assert unicode('hello','utf8') == u'hello'
+assert unicode('hello','latin-1') == u'hello'
+
+assert u'hello'.encode('ascii') == 'hello'
+assert u'hello'.encode('utf-8') == 'hello'
+assert u'hello'.encode('utf8') == 'hello'
+assert u'hello'.encode('utf-16-le') == 'h\000e\000l\000l\000o\000'
+assert u'hello'.encode('utf-16-be') == '\000h\000e\000l\000l\000o'
+assert u'hello'.encode('latin-1') == 'hello'
+
+u = u''.join(map(unichr, range(1024)))
+for encoding in ('utf-8', 'utf-16', 'utf-16-le', 'utf-16-be',
+                 'raw_unicode_escape', 'unicode_escape', 'unicode_internal'):
+    assert unicode(u.encode(encoding),encoding) == u
+
+u = u''.join(map(unichr, range(256)))
+for encoding in ('latin-1',):
+    assert unicode(u.encode(encoding),encoding) == u
+
+u = u''.join(map(unichr, range(128)))
+for encoding in ('ascii',):
+    assert unicode(u.encode(encoding),encoding) == u
+
+print 'done.'
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt
--- CVS-Python/Misc/unicode.txt	Thu Mar 23 23:58:48 2000
+++ Python+Unicode/Misc/unicode.txt	Fri Mar 24 22:29:35 2000
@@ -715,21 +715,126 @@
 
 These markers are used by the PyArg_ParseTuple() APIs:
 
-  'U':  Check for Unicode object and return a pointer to it
+  "U":  Check for Unicode object and return a pointer to it
 
-  's':  For Unicode objects: auto convert them to the <default encoding>
+  "s":  For Unicode objects: auto convert them to the <default encoding>
         and return a pointer to the object's <defencstr> buffer.
 
-  's#': Access to the Unicode object via the bf_getreadbuf buffer interface 
+  "s#": Access to the Unicode object via the bf_getreadbuf buffer interface 
         (see Buffer Interface); note that the length relates to the buffer
         length, not the Unicode string length (this may be different
         depending on the Internal Format).
 
-  't#': Access to the Unicode object via the bf_getcharbuf buffer interface
+  "t#": Access to the Unicode object via the bf_getcharbuf buffer interface
         (see Buffer Interface); note that the length relates to the buffer
         length, not necessarily to the Unicode string length (this may
         be different depending on the <default encoding>).
 
+  "es": 
+	Takes two parameters: encoding (const char *) and
+	buffer (char **). 
+
+	The input object is first coerced to Unicode in the usual way
+	and then encoded into a string using the given encoding.
+
+	On output, a buffer of the needed size is allocated and
+	returned through *buffer as NULL-terminated string.
+	The encoded may not contain embedded NULL characters.
+	The caller is responsible for free()ing the allocated *buffer
+	after usage.
+
+  "es#":
+	Takes three parameters: encoding (const char *),
+	buffer (char **) and buffer_len (int *).
+	
+	The input object is first coerced to Unicode in the usual way
+	and then encoded into a string using the given encoding.
+
+	If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer)
+	on input. Output is then copied to *buffer.
+
+	If *buffer is NULL, a buffer of the needed size is
+	allocated and output copied into it. *buffer is then
+	updated to point to the allocated memory area. The caller
+	is responsible for free()ing *buffer after usage.
+
+	In both cases *buffer_len is updated to the number of
+	characters written (excluding the trailing NULL-byte).
+	The output buffer is assured to be NULL-terminated.
+
+Examples:
+
+Using "es#" with auto-allocation:
+
+    static PyObject *
+    test_parser(PyObject *self,
+		PyObject *args)
+    {
+	PyObject *str;
+	const char *encoding = "latin-1";
+	char *buffer = NULL;
+	int buffer_len = 0;
+
+	if (!PyArg_ParseTuple(args, "es#:test_parser",
+			      encoding, &buffer, &buffer_len))
+	    return NULL;
+	if (!buffer) {
+	    PyErr_SetString(PyExc_SystemError,
+			    "buffer is NULL");
+	    return NULL;
+	}
+	str = PyString_FromStringAndSize(buffer, buffer_len);
+	free(buffer);
+	return str;
+    }
+
+Using "es" with auto-allocation returning a NULL-terminated string:    
+    
+    static PyObject *
+    test_parser(PyObject *self,
+		PyObject *args)
+    {
+	PyObject *str;
+	const char *encoding = "latin-1";
+	char *buffer = NULL;
+
+	if (!PyArg_ParseTuple(args, "es:test_parser",
+			      encoding, &buffer))
+	    return NULL;
+	if (!buffer) {
+	    PyErr_SetString(PyExc_SystemError,
+			    "buffer is NULL");
+	    return NULL;
+	}
+	str = PyString_FromString(buffer);
+	free(buffer);
+	return str;
+    }
+
+Using "es#" with a pre-allocated buffer:
+    
+    static PyObject *
+    test_parser(PyObject *self,
+		PyObject *args)
+    {
+	PyObject *str;
+	const char *encoding = "latin-1";
+	char _buffer[10];
+	char *buffer = _buffer;
+	int buffer_len = sizeof(_buffer);
+
+	if (!PyArg_ParseTuple(args, "es#:test_parser",
+			      encoding, &buffer, &buffer_len))
+	    return NULL;
+	if (!buffer) {
+	    PyErr_SetString(PyExc_SystemError,
+			    "buffer is NULL");
+	    return NULL;
+	}
+	str = PyString_FromStringAndSize(buffer, buffer_len);
+	return str;
+    }
+
 
 File/Stream Output:
 -------------------
@@ -837,6 +942,7 @@
 
 History of this Proposal:
 -------------------------
+1.3: Added new "es" and "es#" parser markers
 1.2: Removed POD about codecs.open()
 1.1: Added note about comparisons and hash values. Added note about
      case mapping algorithms. Changed stream codecs .read() and
Only in CVS-Python/Objects: .#stringobject.c.2.59
Only in CVS-Python/Objects: stringobject.c.orig
diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Python/getargs.c Python+Unicode/Python/getargs.c
--- CVS-Python/Python/getargs.c	Sat Mar 11 10:55:21 2000
+++ Python+Unicode/Python/getargs.c	Fri Mar 24 20:22:26 2000
@@ -178,6 +178,8 @@
 		}
 		else if (level != 0)
 			; /* Pass */
+		else if (c == 'e')
+			; /* Pass */
 		else if (isalpha(c))
 			max++;
 		else if (c == '|')
@@ -654,6 +656,122 @@
 			break;
 		}
 	
+	case 'e': /* encoded string */
+		{
+			char **buffer;
+			const char *encoding;
+			PyObject *u, *s;
+			int size;
+
+			/* Get 'e' parameter: the encoding name */
+			encoding = (const char *)va_arg(*p_va, const char *);
+			if (encoding == NULL)
+				return "(encoding is NULL)";
+			
+			/* Get 's' parameter: the output buffer to use */
+			if (*format != 's')
+				return "(unkown parser marker combination)";
+			buffer = (char **)va_arg(*p_va, char **);
+			format++;
+			if (buffer == NULL)
+				return "(buffer is NULL)";
+			
+			/* Convert object to Unicode */
+			u = PyUnicode_FromObject(arg);
+			if (u == NULL)
+				return "string, unicode or text buffer";
+			
+			/* Encode object; use default error handling */
+			s = PyUnicode_AsEncodedString(u,
+						      encoding,
+						      NULL);
+			Py_DECREF(u);
+			if (s == NULL)
+				return "(encoding failed)";
+			if (!PyString_Check(s)) {
+				Py_DECREF(s);
+				return "(encoder failed to return a string)";
+			}
+			size = PyString_GET_SIZE(s);
+
+			/* Write output; output is guaranteed to be
+			   0-terminated */
+			if (*format == '#') { 
+				/* Using buffer length parameter '#':
+
+				   - if *buffer is NULL, a new buffer
+				   of the needed size is allocated and
+				   the data copied into it; *buffer is
+				   updated to point to the new buffer;
+				   the caller is responsible for
+				   free()ing it after usage
+
+				   - if *buffer is not NULL, the data
+				   is copied to *buffer; *buffer_len
+				   has to be set to the size of the
+				   buffer on input; buffer overflow is
+				   signalled with an error; buffer has
+				   to provide enough room for the
+				   encoded string plus the trailing
+				   0-byte
+
+				   - in both cases, *buffer_len is
+				   updated to the size of the buffer
+				   /excluding/ the trailing 0-byte
+
+				*/
+				int *buffer_len = va_arg(*p_va, int *);
+
+				format++;
+				if (buffer_len == NULL)
+					return "(buffer_len is NULL)";
+				if (*buffer == NULL) {
+					*buffer = PyMem_NEW(char, size + 1);
+					if (*buffer == NULL) {
+						Py_DECREF(s);
+						return "(memory error)";
+					}
+				} else {
+					if (size + 1 > *buffer_len) {
+						Py_DECREF(s);
+						return "(buffer overflow)";
+					}
+				}
+				memcpy(*buffer,
+				       PyString_AS_STRING(s),
+				       size + 1);
+				*buffer_len = size;
+			} else {
+				/* Using a 0-terminated buffer:
+
+				   - the encoded string has to be
+				   0-terminated for this variant to
+				   work; if it is not, an error raised
+
+				   - a new buffer of the needed size
+				   is allocated and the data copied
+				   into it; *buffer is updated to
+				   point to the new buffer; the caller
+				   is responsible for free()ing it
+				   after usage
+
+				 */
+				if (strlen(PyString_AS_STRING(s)) != size)
+					return "(encoded string without "\
+					       "NULL bytes)";
+				*buffer = PyMem_NEW(char, size + 1);
+				if (*buffer == NULL) {
+					Py_DECREF(s);
+					return "(memory error)";
+				}
+				memcpy(*buffer,
+				       PyString_AS_STRING(s),
+				       size + 1);
+			}
+			Py_DECREF(s);
+			break;
+		}
+
 	case 'S': /* string object */
 		{
 			PyObject **p = va_arg(*p_va, PyObject **);

From fdrake at acm.org  Fri Mar 24 22:40:38 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 16:40:38 -0500 (EST)
Subject: [Python-Dev] delegating (was: 1.6 job list)
In-Reply-To: <Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org>
References: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
	<Pine.LNX.4.10.10003241332240.27878-100000@nebula.lyra.org>
Message-ID: <14555.57558.939236.363358@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Note that some of the problems stem from CVS access. Much Guido-time could
 > be saved by a commit-then-review model, rather than review-then-Guido-

  This is a non-problem; I'm willing to do the arcane CVS
manipulations if the issue is Guido's time.
  What I will *not* do is do it piecemeal without a cohesive plan that 
Guido approves of at least 95%, and I'll be really careful to do that
last 5% when he's not in the office.  ;)

 > commits model. Fred does this very well with the Doc/ area.

  Thanks for the vote of confidence!
  The model that I use for the Doc/ area is more like "Fred reviews,
Fred commits, and Guido can read it on python.org like everyone else."
Works for me!  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From bwarsaw at cnri.reston.va.us  Fri Mar 24 22:45:38 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 24 Mar 2000 16:45:38 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <200003242115.QAA04648@eric.cnri.reston.va.us>
	<Pine.LNX.4.10.10003241330080.27878-100000@nebula.lyra.org>
Message-ID: <14555.57858.824301.693390@anthem.cnri.reston.va.us>

One thing you can definitely do now which breaks no code: propose a
package hierarchy for the standard library.


From akuchlin at mems-exchange.org  Fri Mar 24 22:46:28 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 24 Mar 2000 16:46:28 -0500 (EST)
Subject: [Python-Dev] Unicode charnames impl.
In-Reply-To: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
Message-ID: <14555.57908.151946.182639@amarok.cnri.reston.va.us>

Here's a strawman codec for doing the \N{NULL} thing.  Questions:

0) Is the code below correct?

1) What the heck would this encoding be called?

2) What does .encode() do?  (Right now it escapes \N as
\N{BACKSLASH}N.)

3) How can we store all those names?  The resulting dictionary makes a
361K .py file; Python dumps core trying to parse it.  (Another bug...)

4) What do you with the error \N{...... no closing right bracket.
   Right now it stops at that point, and never advances any farther.  
   Maybe it should assume it's an error if there's no } within the
   next 200 chars or some similar limit?
  
5) Do we need StreamReader/Writer classes, too?

I've also add a script that parses the names out of the NameList.txt 
file at ftp://ftp.unicode.org/Public/UNIDATA/.

--amk 


namecodec.py:
=============

import codecs

#from _namedict import namedict
namedict = {'NULL': 0, 'START OF HEADING' : 1,
            'BACKSLASH':ord('\\')}
            
class NameCodec(codecs.Codec):
    def encode(self,input,errors='strict'):
        # XXX what should this do?  Escape the
        # sequence \N as '\N{BACKSLASH}N'?
        return input.replace( '\\N', '\\N{BACKSLASH}N' )

    def decode(self,input,errors='strict'):
        output = unicode("")
        last = 0
        index = input.find( u'\\N{' )
        while index != -1:
            output = output + unicode( input[last:index] )
            used = index
            r_bracket = input.find( '}', index)
            if r_bracket == -1:
                # No closing bracket; bail out...
                break

            name = input[index + 3 : r_bracket]
            code = namedict.get( name )
            if code is not None:
                output = output + unichr(code)
            elif errors == 'strict':
                raise ValueError, 'Unknown character name %s' % repr(name)
            elif errors == 'ignore': pass
            elif errors == 'replace':
                output = output + unichr( 0xFFFD )
            
            last = r_bracket + 1
            index = input.find( '\\N{', last)
        else:
            # Finally failed gently, no longer finding a \N{...
            output = output + unicode( input[last:] )
            return len(input), output

        # Otherwise, we hit the break for an unterminated \N{...}
        return index, output
        
if __name__ == '__main__':
    c = NameCodec()
    for s in [ r'b\lah blah \N{NULL} asdf',
               r'b\l\N{START OF HEADING}\N{NU' ]:
        used, s2 = c.decode(s)
        print repr( s2 )

        s3 = c.encode(s)
        _, s4 = c.decode(s3)
        print repr(s3)
        assert s4 == s
        
    print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='replace' ))
    print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='ignore' ))


makenamelist.py
===============

# Hack to extract character names from NamesList.txt
# Output the repr() of the resulting dictionary
        
import re, sys, string

namedict = {}

while 1:
    L = sys.stdin.readline()
    if L == "": break

    m = re.match('([0-9a-fA-F]){4}(?:\t(.*)\s*)', L)
    if m is not None:
        last_char = int(m.group(1), 16)
        if m.group(2) is not None:
            name = string.upper( m.group(2) )
            if name not in ['<CONTROL>',
                            '<NOT A CHARACTER>']:
                namedict[ name ] = last_char
#                print name, last_char
            
    m = re.match('\t=\s*(.*)\s*(;.*)?', L)
    if m is not None:
        name = string.upper( m.group(1) )
        names = string.split(name, ',')
        names = map(string.strip, names)
        for n in names:
            namedict[ n ] = last_char
#            print n, last_char

# XXX and do what with this dictionary?        
print namedict


From mal at lemburg.com  Fri Mar 24 22:50:19 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 22:50:19 +0100
Subject: [Python-Dev] Unicode Patch Set 2000-03-24
References: <38DBE0E0.76A298FE@lemburg.com>
Message-ID: <38DBE31B.BCB342CA@lemburg.com>

Oops, sorry, the patch file wasn't supposed to go to python-dev.

Anyway, Greg's wish is included in there and MarkH should be
happy now -- at least I hope he his ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Jasbahr at origin.EA.com  Fri Mar 24 22:49:35 2000
From: Jasbahr at origin.EA.com (Asbahr, Jason)
Date: Fri, 24 Mar 2000 15:49:35 -0600
Subject: [Python-Dev] Memory Management
Message-ID: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com>

Greetings!

We're working on integrating our own memory manager into our project
and the current challenge is figuring out how to make it play nice
with Python (and SWIG).  The approach we're currently taking is to
patch 1.5.2 and augment the PyMem* macros to call external memory
allocation functions that we provide.  The idea is to easily allow 
the addition of third party memory management facilities to Python.
Assuming 1) we get it working :-), and 2) we sync to the latest Python
CVS and patch that, would this be a useful patch to give back to the 
community?  Has anyone run up against this before?

Thanks,

Jason Asbahr
Origin Systems, Inc.
jasbahr at origin.ea.com


From bwarsaw at cnri.reston.va.us  Fri Mar 24 22:53:01 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Fri, 24 Mar 2000 16:53:01 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
	<14555.56434.974884.832078@anthem.cnri.reston.va.us>
	<200003242127.QAA06269@eric.cnri.reston.va.us>
Message-ID: <14555.58301.790774.159381@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido at python.org> writes:

    GvR> You probably meant:

    |   sock.connect(addr)
    |   sock.connect(host, port)
    |   sock.connect((host, port))

    GvR> since (host, port) is equivalent to (addr).

Doh, yes. :)

    GvR> Fred typically directs latex2html to break all sections
    GvR> apart.  It's in the previous section:

I know, I was being purposefully dense for effect :)  Fred, is there
some way to make the html contain a link to the previous section for
the "see above" text?  That would solve the problem I think.

    GvR> This also explains the reason for requiring a single
    GvR> argument: when using AF_UNIX, the second argument makes no
    GvR> sense!

    GvR> Frankly, I'm not sure what do here -- it's more correct to
    GvR> require a single address argument always, but it's more
    GvR> convenient to allow two sometimes.

    GvR> Note that sendto(data, addr) only accepts the tuple form: you
    GvR> cannot write sendto(data, host, port).

Hmm, that /does/ complicate things -- it makes explaining the API more
difficult.  Still, in this case I think I'd lean toward liberal
acceptance of input parameters. :)

-Barry


From bwarsaw at cnri.reston.va.us  Fri Mar 24 22:57:01 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Fri, 24 Mar 2000 16:57:01 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <Pine.LNX.4.10.10003241324400.27878-100000@nebula.lyra.org>
	<200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <14555.58541.207868.496747@anthem.cnri.reston.va.us>

>>>>> "GvR" == Guido van Rossum <guido at python.org> writes:

    GvR> OK, that's reasonable.  I'll have to invent a different
    GvR> reason why I don't want this -- because I really don't!

Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't
be persuaded to change your mind :)

-Barry


From fdrake at acm.org  Fri Mar 24 23:10:41 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 17:10:41 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: <14555.58301.790774.159381@anthem.cnri.reston.va.us>
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
	<14555.56434.974884.832078@anthem.cnri.reston.va.us>
	<200003242127.QAA06269@eric.cnri.reston.va.us>
	<14555.58301.790774.159381@anthem.cnri.reston.va.us>
Message-ID: <14555.59361.460705.258859@weyr.cnri.reston.va.us>

bwarsaw at cnri.reston.va.us writes:
 > I know, I was being purposefully dense for effect :)  Fred, is there
 > some way to make the html contain a link to the previous section for
 > the "see above" text?  That would solve the problem I think.

  No.  I expect this to no longer be a problem when we push to
SGML/XML, so I won't waste any time hacking around it.
  On the other hand, lots of places in the documentation refer to
"above" and "below" in the traditional sense used in paper documents,
and that doesn't work well for hypertext, even in the strongly
traditional book-derivation way the Python manuals are done.  As soon
as it's not in the same HTML file, "above" and "below" break for a lot 
of people.  So it still should be adjusted at an appropriate time.

 > Hmm, that /does/ complicate things -- it makes explaining the API more
 > difficult.  Still, in this case I think I'd lean toward liberal
 > acceptance of input parameters. :)

  No -- all the more reason to be strict and keep the descriptions as
simple as reasonable.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido at python.org  Fri Mar 24 23:10:32 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 17:10:32 -0500
Subject: [Python-Dev] Memory Management
In-Reply-To: Your message of "Fri, 24 Mar 2000 15:49:35 CST."
             <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> 
References: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com> 
Message-ID: <200003242210.RAA11434@eric.cnri.reston.va.us>

> We're working on integrating our own memory manager into our project
> and the current challenge is figuring out how to make it play nice
> with Python (and SWIG).  The approach we're currently taking is to
> patch 1.5.2 and augment the PyMem* macros to call external memory
> allocation functions that we provide.  The idea is to easily allow 
> the addition of third party memory management facilities to Python.
> Assuming 1) we get it working :-), and 2) we sync to the latest Python
> CVS and patch that, would this be a useful patch to give back to the 
> community?  Has anyone run up against this before?

Check out the archives for patches at python.org looking for posts by
Vladimir Marangozov.  Vladimir has produced several rounds of patches
with a very similar goal in mind.  We're still working out some
details -- but it shouldn't be too long, and I hope that his patches
are also suitable for you.  If not, discussion is required!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From bwarsaw at cnri.reston.va.us  Fri Mar 24 23:12:35 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Fri, 24 Mar 2000 17:12:35 -0500 (EST)
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
	<14555.56434.974884.832078@anthem.cnri.reston.va.us>
	<200003242127.QAA06269@eric.cnri.reston.va.us>
	<14555.58301.790774.159381@anthem.cnri.reston.va.us>
	<14555.59361.460705.258859@weyr.cnri.reston.va.us>
Message-ID: <14555.59475.802130.434345@anthem.cnri.reston.va.us>

>>>>> "Fred" == Fred L Drake, Jr <fdrake at acm.org> writes:

    Fred>   No -- all the more reason to be strict and keep the
    Fred> descriptions as simple as reasonable.

At the expense of (IMO unnecessarily) breaking existing code?


From mal at lemburg.com  Fri Mar 24 23:13:04 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 24 Mar 2000 23:13:04 +0100
Subject: [Python-Dev] Unicode charnames impl.
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us>
Message-ID: <38DBE870.D88915B5@lemburg.com>

"Andrew M. Kuchling" wrote:
> 
> Here's a strawman codec for doing the \N{NULL} thing.  Questions:
> 
> 0) Is the code below correct?

Some comments below.
 
> 1) What the heck would this encoding be called?

Ehm, 'unicode-with-smileys' I guess... after all that's what motivated
the thread ;-) Seriously, I'd go with 'unicode-named'. You can then
stack it on top of 'unicode-escape' and get the best of both
worlds...
 
> 2) What does .encode() do?  (Right now it escapes \N as
> \N{BACKSLASH}N.)

.encode() should translate Unicode to a string. Since the
named char thing is probably only useful on input, I'd say:
don't do anything, except maybe return input.encode('unicode-escape').
 
> 3) How can we store all those names?  The resulting dictionary makes a
> 361K .py file; Python dumps core trying to parse it.  (Another bug...)

I've made the same experience with the large Unicode mapping
tables... the trick is to split the dictionary definition
in chunks and then use dict.update() to paste them together
again.
 
> 4) What do you with the error \N{...... no closing right bracket.
>    Right now it stops at that point, and never advances any farther.
>    Maybe it should assume it's an error if there's no } within the
>    next 200 chars or some similar limit?

I'd suggest to take the upper bound of all Unicode name
lengths as limit.
 
> 5) Do we need StreamReader/Writer classes, too?

If you plan to have it registered with a codec search
function, yes. No big deal though, because you can use
the Codec class as basis for them:

class StreamWriter(Codec,codecs.StreamWriter):
    pass
        
class StreamReader(Codec,codecs.StreamReader):
    pass

### encodings module API

def getregentry():

    return (Codec().encode,Codec().decode,StreamReader,StreamWriter)

Then call drop the scripts into the encodings package dir
and it should be useable via unicode(r'\N{SMILEY}','unicode-named')
and u":-)".encode('unicode-named').

> I've also add a script that parses the names out of the NameList.txt
> file at ftp://ftp.unicode.org/Public/UNIDATA/.
> 
> --amk
> 
> namecodec.py:
> =============
> 
> import codecs
> 
> #from _namedict import namedict
> namedict = {'NULL': 0, 'START OF HEADING' : 1,
>             'BACKSLASH':ord('\\')}
> 
> class NameCodec(codecs.Codec):
>     def encode(self,input,errors='strict'):
>         # XXX what should this do?  Escape the
>         # sequence \N as '\N{BACKSLASH}N'?
>         return input.replace( '\\N', '\\N{BACKSLASH}N' )

You should return a string on output... input will be a Unicode
object and the return value too if you don't add e.g.
an .encode('unicode-escape').
 
>     def decode(self,input,errors='strict'):
>         output = unicode("")
>         last = 0
>         index = input.find( u'\\N{' )
>         while index != -1:
>             output = output + unicode( input[last:index] )
>             used = index
>             r_bracket = input.find( '}', index)
>             if r_bracket == -1:
>                 # No closing bracket; bail out...
>                 break
> 
>             name = input[index + 3 : r_bracket]
>             code = namedict.get( name )
>             if code is not None:
>                 output = output + unichr(code)
>             elif errors == 'strict':
>                 raise ValueError, 'Unknown character name %s' % repr(name)

This could also be UnicodeError (its a subclass of ValueError).

>             elif errors == 'ignore': pass
>             elif errors == 'replace':
>                 output = output + unichr( 0xFFFD )

'\uFFFD' would save a call.
 
>             last = r_bracket + 1
>             index = input.find( '\\N{', last)
>         else:
>             # Finally failed gently, no longer finding a \N{...
>             output = output + unicode( input[last:] )
>             return len(input), output
> 
>         # Otherwise, we hit the break for an unterminated \N{...}
>         return index, output

Note that .decode() must only return the decoded data.
The "bytes read" integer was removed in order to make
the Codec APIs compatible with the standard file object
APIs.
 
> if __name__ == '__main__':
>     c = NameCodec()
>     for s in [ r'b\lah blah \N{NULL} asdf',
>                r'b\l\N{START OF HEADING}\N{NU' ]:
>         used, s2 = c.decode(s)
>         print repr( s2 )
> 
>         s3 = c.encode(s)
>         _, s4 = c.decode(s3)
>         print repr(s3)
>         assert s4 == s
> 
>     print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='replace' ))
>     print repr( c.decode(r'blah blah \N{NULLsadf} asdf' , errors='ignore' ))
> 
> makenamelist.py
> ===============
> 
> # Hack to extract character names from NamesList.txt
> # Output the repr() of the resulting dictionary
> 
> import re, sys, string
> 
> namedict = {}
> 
> while 1:
>     L = sys.stdin.readline()
>     if L == "": break
> 
>     m = re.match('([0-9a-fA-F]){4}(?:\t(.*)\s*)', L)
>     if m is not None:
>         last_char = int(m.group(1), 16)
>         if m.group(2) is not None:
>             name = string.upper( m.group(2) )
>             if name not in ['<CONTROL>',
>                             '<NOT A CHARACTER>']:
>                 namedict[ name ] = last_char
> #                print name, last_char
> 
>     m = re.match('\t=\s*(.*)\s*(;.*)?', L)
>     if m is not None:
>         name = string.upper( m.group(1) )
>         names = string.split(name, ',')
>         names = map(string.strip, names)
>         for n in names:
>             namedict[ n ] = last_char
> #            print n, last_char
> 
> # XXX and do what with this dictionary?
> print namedict
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://www.python.org/mailman/listinfo/python-dev

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake at acm.org  Fri Mar 24 23:12:42 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 24 Mar 2000 17:12:42 -0500 (EST)
Subject: [Python-Dev] Memory Management
In-Reply-To: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com>
References: <11A17AA2B9EAD111BCEA00A0C9B4179305CB517F@molach.origin.ea.com>
Message-ID: <14555.59482.61317.992089@weyr.cnri.reston.va.us>

Asbahr, Jason writes:
 > community?  Has anyone run up against this before?

  You should talk to Vladimir Marangozov; he's done a fair bit of work 
dealing with memory management in Python.  You probably want to read
the chapter he contributed to the Python/C API document for the
release earlier this week.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From skip at mojam.com  Fri Mar 24 23:19:50 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 24 Mar 2000 16:19:50 -0600 (CST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242115.QAA04648@eric.cnri.reston.va.us>
References: <14555.54636.811100.254309@amarok.cnri.reston.va.us>
	<Pine.LNX.4.10.10003241259150.27878-100000@nebula.lyra.org>
	<14555.55139.484135.602894@weyr.cnri.reston.va.us>
	<200003242115.QAA04648@eric.cnri.reston.va.us>
Message-ID: <14555.59910.631130.241930@beluga.mojam.com>

    Guido> Which reminds me of another reason to wait: coming up with the
    Guido> right package hierarchy is hard.  (E.g. I find network too long;
    Guido> plus, does htmllib belong there?)

Ah, another topic for python-dev.  Even if we can't do the packaging right
away, we should be able to hash out the structure.

Skip


From guido at python.org  Fri Mar 24 23:25:01 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 17:25:01 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: Your message of "Fri, 24 Mar 2000 17:10:41 EST."
             <14555.59361.460705.258859@weyr.cnri.reston.va.us> 
References: <200003242103.QAA03288@eric.cnri.reston.va.us> <14555.56434.974884.832078@anthem.cnri.reston.va.us> <200003242127.QAA06269@eric.cnri.reston.va.us> <14555.58301.790774.159381@anthem.cnri.reston.va.us>  
            <14555.59361.460705.258859@weyr.cnri.reston.va.us> 
Message-ID: <200003242225.RAA13408@eric.cnri.reston.va.us>

> bwarsaw at cnri.reston.va.us writes:
>  > I know, I was being purposefully dense for effect :)  Fred, is there
>  > some way to make the html contain a link to the previous section for
>  > the "see above" text?  That would solve the problem I think.

[Fred]
>   No.  I expect this to no longer be a problem when we push to
> SGML/XML, so I won't waste any time hacking around it.
>   On the other hand, lots of places in the documentation refer to
> "above" and "below" in the traditional sense used in paper documents,
> and that doesn't work well for hypertext, even in the strongly
> traditional book-derivation way the Python manuals are done.  As soon
> as it's not in the same HTML file, "above" and "below" break for a lot 
> of people.  So it still should be adjusted at an appropriate time.

My approach to this: put more stuff on the same page!  I personally
favor putting an entire chapter on one page; even if you split the
top-level subsections this wouldn't have happened.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From klm at digicool.com  Fri Mar 24 23:40:54 2000
From: klm at digicool.com (Ken Manheimer)
Date: Fri, 24 Mar 2000 17:40:54 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003241729380.1711-100000@korak.digicool.com>

Guido wrote:

> OK, that's reasonable.  I'll have to invent a different reason why I
> don't want this -- because I really don't!

I'm glad this organize-the-library-in-packages initiative seems to be
moving towards concentrating on the organization, rather than just
starting to put obvious things in the obvious places.  Personally, i
*crave* sensible, discoverable organization.  The only thing i like less
than complicated disorganization is complicated misorganization - and i
think that just diving in and doing the "obvious" placements would have
the terrible effect of making it harder, not easier, to move eventually to
the right arrangement.

Ken
klm at digicool.com


From akuchlin at mems-exchange.org  Fri Mar 24 23:45:20 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 24 Mar 2000 17:45:20 -0500 (EST)
Subject: [Python-Dev] Unicode charnames impl.
In-Reply-To: <38DBE870.D88915B5@lemburg.com>
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
	<14555.57908.151946.182639@amarok.cnri.reston.va.us>
	<38DBE870.D88915B5@lemburg.com>
Message-ID: <14555.61440.613940.50492@amarok.cnri.reston.va.us>

M.-A. Lemburg writes:
>.encode() should translate Unicode to a string. Since the
>named char thing is probably only useful on input, I'd say:
>don't do anything, except maybe return input.encode('unicode-escape').

Wait... then you can't stack it on top of unicode-escape, because it
would already be Unicode escaped.
 
>> 4) What do you with the error \N{...... no closing right bracket.
>I'd suggest to take the upper bound of all Unicode name
>lengths as limit.

Seems like a hack.

>Note that .decode() must only return the decoded data.
>The "bytes read" integer was removed in order to make
>the Codec APIs compatible with the standard file object
>APIs.

Huh? Why does Misc/unicode.txt describe decode() as "Decodes the
object input and returns a tuple (output object, length consumed)"?
Or are you talking about a different .decode() method?

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
    "Ruby's dead?"
    "Yes."
    "Ah me. That's the trouble with mortals. They do that. Not to worry, eh?"
    -- Dream and Pharamond, in SANDMAN #46: "Brief Lives:6"


From gmcm at hypernet.com  Fri Mar 24 23:50:12 2000
From: gmcm at hypernet.com (Gordon McMillan)
Date: Fri, 24 Mar 2000 17:50:12 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: <200003242103.QAA03288@eric.cnri.reston.va.us>
Message-ID: <1258184279-6957124@hypernet.com>

[Guido]
> Someone noticed that socket.connect() and a few related functions
> (connect_ex() and bind()) take either a single (host, port) tuple or
> two separate arguments, but that only the tuple is documented.
> 
> Similar to append(), I'd like to close this gap, and I've made the
> necessary changes.  This will probably break lots of code.

This will indeed cause great wailing and gnashing of teeth. I've 
been criticized for using the tuple form in the Sockets 
HOWTO (in fact I foolishly changed it to demonstrate both 
forms).
 
> Similar to append(), I'd like people to fix their code rather than
> whine -- two-arg connect() has never been documented, although it's
> found in much code (even the socket module test code :-( ).
> 
> Similar to append(), I may revert the change if it is shown to cause
> too much pain during beta testing...

I say give 'em something to whine about.

put-sand-in-the-vaseline-ly y'rs

- Gordon


From klm at digicool.com  Fri Mar 24 23:55:43 2000
From: klm at digicool.com (Ken Manheimer)
Date: Fri, 24 Mar 2000 17:55:43 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.58541.207868.496747@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003241747570.1711-100000@korak.digicool.com>

On Fri, 24 Mar 2000, Barry A. Warsaw wrote:

> 
> >>>>> "GvR" == Guido van Rossum <guido at python.org> writes:
> 
>     GvR> OK, that's reasonable.  I'll have to invent a different
>     GvR> reason why I don't want this -- because I really don't!
> 
> Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't
> be persuaded to change your mind :)

Maybe i'm just a slave to my organization mania, but i'd suggest the
following order change of 5 and 6, plus an addition; from:

5 now: Flat is better than nested.
6 now: Sparse is better than dense.

to:

5 Sparse is better than dense.
6 Flat is better than nested
6.5 until it gets too dense.

or-is-it-me-that-gets-too-dense'ly yrs,

ken
klm at digicool.com

(And couldn't the humor page get hooked up a bit better?  That was
definitely a fun part of maintaining python.org...)


From gstein at lyra.org  Sat Mar 25 02:19:18 2000
From: gstein at lyra.org (Greg Stein)
Date: Fri, 24 Mar 2000 17:19:18 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.57858.824301.693390@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003241718500.26440-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Barry A. Warsaw wrote:
> One thing you can definitely do now which breaks no code: propose a
> package hierarchy for the standard library.

I already did!

http://www.python.org/pipermail/python-dev/2000-February/003761.html


*grumble*

-g

-- 
Greg Stein, http://www.lyra.org/


From tim_one at email.msn.com  Sat Mar 25 05:19:33 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 24 Mar 2000 23:19:33 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <001001bf9611$52e960a0$752d153f@tim>

[GregS proposes a partial packaging of std modules for 1.6, Guido objects on
 spurious grounds, GregS refutes that, Guido agrees]

> I'll have to invent a different reason why I don't want this -- because
> I really don't!

This one's easy!  It's why I left the 20th of the 20 Pythonic Theses for you
to fill in <wink>.  All you have to do now is come up with a pithy way to
say "if it's something Guido is so interested in that he wants to be deeply
involved in it himself, but it comes at a time when he's buried under prior
commitments, then tough tulips, it waits".

shades-of-the-great-renaming-ly y'rs  - tim


From tim_one at email.msn.com  Sat Mar 25 05:19:36 2000
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 24 Mar 2000 23:19:36 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.58541.207868.496747@anthem.cnri.reston.va.us>
Message-ID: <001101bf9611$544239e0$752d153f@tim>

[Guido]
> OK, that's reasonable.  I'll have to invent a different
> reason why I don't want this -- because I really don't!

[Barry]
> Tim's Fifth Enlightenment is all the reason you'd need, /if/ you can't
> be persuaded to change your mind :)

No no no no no:  "namespaces are one honking great idea ..." is the
controlling one here:  Guido really *does* want this!  It's a question of
timing, in the sense of "never is often better than *right* now", but to be
eventually modified by "now is better than never".  These were carefully
designed to support any position whatsoever, you know <wink>.

although-in-any-particular-case-there's-only-one-true-interpretation-ly
    y'rs  - tim


From guido at python.org  Sat Mar 25 05:19:41 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 24 Mar 2000 23:19:41 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Fri, 24 Mar 2000 17:19:18 PST."
             <Pine.LNX.4.10.10003241718500.26440-100000@nebula.lyra.org> 
References: <Pine.LNX.4.10.10003241718500.26440-100000@nebula.lyra.org> 
Message-ID: <200003250419.XAA25751@eric.cnri.reston.va.us>

> > One thing you can definitely do now which breaks no code: propose a
> > package hierarchy for the standard library.
> 
> I already did!
> 
> http://www.python.org/pipermail/python-dev/2000-February/003761.html
> 
> *grumble*

You've got to be kidding.  That's not a package hierarchy proposal,
it's just one package (network).

Without a comprehensive proposal I'm against a partial reorganization:
without a destination we can't start marching.

Naming things is very contentious -- everybody has an opinion.  To
pick the right names you must see things in perspective.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From moshez at math.huji.ac.il  Sat Mar 25 09:45:28 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 25 Mar 2000 10:45:28 +0200 (IST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.LNX.4.10.10003231202080.890-100000@akbar.nevex.com>
Message-ID: <Pine.GSO.4.10.10003251036170.3539-100000@sundial>

On Thu, 23 Mar 2000 gvwilson at nevex.com wrote:

> If None becomes a keyword, I would like to ask whether it could be used to
> signal that a method is a class method, as opposed to an instance method:

I'd like to know what you mean by "class" method. (I do know C++ and Java,
so I have some idea...). Specifically, my question is: how does a class
method access class variables? They can't be totally unqualified (because
that's very unpythonic). If they are qualified by the class's name, I see
it as a very mild improvement on the current situation. You could suggest,
for example, to qualify class variables by "class" (so you'd do things
like:
	class.x = 1), but I'm not sure I like it. On the whole, I think it
is a much bigger issue on how be denote class methods.

Also, one slight problem with your method of denoting class methods:
currently, it is possible to add instance method at run time to a class by
something like

class C:
	pass

def foo(self):
	pass

C.foo = foo

In your suggestion, how do you view the possiblity of adding class methods
to a class? (Note that "foo", above, is also perfectly usable as a plain
function). 

I want to note that Edward suggested denotation by a seperate namespace:

C.foo = foo # foo is an instance method
C.__methods__.foo = foo # foo is a class method

The biggest problem with that suggestion is that it doesn't address the
common case of defining it textually inside the class definition.

> I'd also like to ask (separately) that assignment to None be defined as a
> no-op, so that programmers can write:
> 
>     year, month, None, None, None, None, weekday, None, None = gmtime(time())
> 
> instead of having to create throw-away variables to fill in slots in
> tuples that they don't care about.

Currently, I use "_" for that purpose, after I heard the idea from Fredrik
Lundh.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From gstein at lyra.org  Sat Mar 25 10:26:23 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 01:26:23 -0800 (PST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <200003250419.XAA25751@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003250005430.30345-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Guido van Rossum wrote:
> > > One thing you can definitely do now which breaks no code: propose a
> > > package hierarchy for the standard library.
> > 
> > I already did!
> > 
> > http://www.python.org/pipermail/python-dev/2000-February/003761.html
> > 
> > *grumble*
> 
> You've got to be kidding.  That's not a package hierarchy proposal,
> it's just one package (network).
>
> Without a comprehensive proposal I'm against a partial reorganization:
> without a destination we can't start marching.

Not kidding at all. I said before that I don't think we can do everything
all at once. I *do* think this is solvable with a greedy algorithm rather
than waiting for some nebulous completion point.

> Naming things is very contentious -- everybody has an opinion.  To
> pick the right names you must see things in perspective.

Sure. And those diverse opinions are why I don't believe it is possible to
do all at once. The task is simply too large to tackle in one shot. IMO,
it must be solved incrementally. I'm not even going to attempt to try to
define a hierarchy for all those modules. I count 137 on my local system.
Let's say that I *do* try... some are going to end up "forced" rather than
obeying some obvious grouping. If you do it a chunk at a time, then you
get the obvious, intuitive groupings. Try for more, and you just bung it
all up.

For discussion's sake: can you provide a rationale for doing it all at
once? In the current scenario, modules just appear at some point. After a
partial reorg, some modules appear at a different point. "No big whoop."
Just because module A is in a package doesn't imply that module B must
also be in a package.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Sat Mar 25 10:35:39 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 01:35:39 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <001001bf9611$52e960a0$752d153f@tim>
Message-ID: <Pine.LNX.4.10.10003250127400.30345-100000@nebula.lyra.org>

On Fri, 24 Mar 2000, Tim Peters wrote:
> [GregS proposes a partial packaging of std modules for 1.6, Guido objects on
>  spurious grounds, GregS refutes that, Guido agrees]
> 
> > I'll have to invent a different reason why I don't want this -- because
> > I really don't!
> 
> This one's easy!  It's why I left the 20th of the 20 Pythonic Theses for you
> to fill in <wink>.  All you have to do now is come up with a pithy way to
> say "if it's something Guido is so interested in that he wants to be deeply
> involved in it himself, but it comes at a time when he's buried under prior
> commitments, then tough tulips, it waits".

No need for Pythonic Theses. I don't see anybody disagreeing with the end
goal. The issue comes up with *how* to get there.

I say "do it incrementally" while others say "do it all at once."
Personally, I don't think it is possible to do all at once. As a
corollary, if you can't do it all at once, but you *require* that it be
done all at once, then you have effectively deferred the problem. To put
it another way, Guido has already invented a reason to not do it: he just
requires that it be done all at once. Result: it won't be done.

[ not saying this was Guido's intent or desire... but this is how I read
  the result :-) ]

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From moshez at math.huji.ac.il  Sat Mar 25 10:55:12 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 25 Mar 2000 11:55:12 +0200 (IST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14555.34371.749039.946891@beluga.mojam.com>
Message-ID: <Pine.GSO.4.10.10003251154020.3539-100000@sundial>

On Fri, 24 Mar 2000, Skip Montanaro wrote:

> Might I suggest moving robotparser.py from Tools/webchecker to Lib?  Modules
> of general usefulness (this is at least generally useful for anyone writing
> web spiders ;-) shouldn't live in Tools, because it's not always available
> and users need to do extra work to make them available.

You're right, but I'd like this to be a 1.7 change. It's just that I plan
to suggest a great-renaming-fest for 1.7 modules, and then namespace
wouldn't be cluttered when you don't need it.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Sat Mar 25 11:16:23 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 25 Mar 2000 12:16:23 +0200 (IST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003242129.QAA06510@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003251214081.3539-100000@sundial>

On Fri, 24 Mar 2000, Guido van Rossum wrote:

> OK, that's reasonable.  I'll have to invent a different reason why I
> don't want this -- because I really don't!

Here's a reason: there shouldn't be changes we'll retract later -- we
need to come up with the (more or less) right hierarchy the first time,
or we'll do a lot of work for nothing.

> Hm.  Moving modules requires painful and arcane CVS manipulations that
> can only be done by the few of us here at CNRI -- and I'm the only one
> left who's full time on Python.

Hmmmmm....this is a big problem. Maybe we need to have more people with
access to the CVS?
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mal at lemburg.com  Sat Mar 25 11:47:30 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 25 Mar 2000 11:47:30 +0100
Subject: [Python-Dev] Unicode charnames impl.
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
		<14555.57908.151946.182639@amarok.cnri.reston.va.us>
		<38DBE870.D88915B5@lemburg.com> <14555.61440.613940.50492@amarok.cnri.reston.va.us>
Message-ID: <38DC9942.3C4E4B92@lemburg.com>

"Andrew M. Kuchling" wrote:
> 
> M.-A. Lemburg writes:
> >.encode() should translate Unicode to a string. Since the
> >named char thing is probably only useful on input, I'd say:
> >don't do anything, except maybe return input.encode('unicode-escape').
> 
> Wait... then you can't stack it on top of unicode-escape, because it
> would already be Unicode escaped.

Sorry for the mixup (I guess yesterday wasn't my day...). I had
stream codecs in mind: these are stackable, meaning that you can
wrap one codec around another. And its also their interface API
that was changed -- not the basic stateless encoder/decoder ones.

Stacking of .encode()/.decode() must be done "by hand" in e.g.
the way I described above. Another approach would be subclassing
the unicode-escape Codec and then calling the base class method.

> >> 4) What do you with the error \N{...... no closing right bracket.
> >I'd suggest to take the upper bound of all Unicode name
> >lengths as limit.
> 
> Seems like a hack.

It is... but what other way would there be ?
 
> >Note that .decode() must only return the decoded data.
> >The "bytes read" integer was removed in order to make
> >the Codec APIs compatible with the standard file object
> >APIs.
> 
> Huh? Why does Misc/unicode.txt describe decode() as "Decodes the
> object input and returns a tuple (output object, length consumed)"?
> Or are you talking about a different .decode() method?

You're right... I was thinking about .read() and .write().
.decode() should do return a tuple, just as documented in
unicode.txt.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mhammond at skippinet.com.au  Sat Mar 25 14:20:59 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sun, 26 Mar 2000 00:20:59 +1100
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <Pine.LNX.4.10.10003250005430.30345-100000@nebula.lyra.org>
Message-ID: <ECEPKNMJLHAPFFJHDOJBCECKCHAA.mhammond@skippinet.com.au>

[Greg writes]
> I'm not even going to attempt to try to
> define a hierarchy for all those modules. I count 137 on my local system.
> Let's say that I *do* try... some are going to end up "forced" rather than
> obeying some obvious grouping. If you do it a chunk at a time, then you
> get the obvious, intuitive groupings. Try for more, and you just bung it
> all up.
...
> Just because module A is in a package doesn't imply that module B must
> also be in a package.

I agree with Greg - every module will not fit into a package.

But I also agree with Guido - we _should_ attempt to go through the 137
modules and put the ones that fit into logical groupings.  Greg is probably
correct with his selection for "net", but a general evaluation is still a
good thing.  A view of the bigger picture will help to quell debates over
the structure, and only leave us with the squabbles over the exact spelling
:-)

+2 on ... err .... -1 on ... errr - awww - screw that-<grin>-ly,

Mark.


From tismer at tismer.com  Sat Mar 25 14:35:50 2000
From: tismer at tismer.com (Christian Tismer)
Date: Sat, 25 Mar 2000 14:35:50 +0100
Subject: [Python-Dev] Unicode charnames impl.
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50> <14555.57908.151946.182639@amarok.cnri.reston.va.us>
Message-ID: <38DCC0B6.2A7D0EF1@tismer.com>


"Andrew M. Kuchling" wrote:
...
> 3) How can we store all those names?  The resulting dictionary makes a
> 361K .py file; Python dumps core trying to parse it.  (Another bug...)

This is simply not the place to use a dictionary.
You don't need fast lookup from names to codes,
but something that supports incremental search.
This would enable PythonWin to sho a pop-up list after
you typed the first letters.

I'm working on a common substring analysis that makes
each entry into 3 to 5 small integers.
You then encode these in an order-preserving way. That means,
the resulting code table is still lexically ordered, and
access to the sentences is done via bisection.
Takes me some more time to get that, but it will not
be larger than 60k, or I drop it.
Also note that all the names use uppercase letters and space
only. An opportunity to use simple context encoding and
use just 4 bits most of the time.

...
> I've also add a script that parses the names out of the NameList.txt
> file at ftp://ftp.unicode.org/Public/UNIDATA/.

Is there any reason why you didn't use the UnicodeData.txt file,
I mean do I cover everything if I continue to use that?

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From Vladimir.Marangozov at inrialpes.fr  Sat Mar 25 15:59:55 2000
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Sat, 25 Mar 2000 15:59:55 +0100 (CET)
Subject: [Python-Dev] Windows and PyObject_NEW
Message-ID: <200003251459.PAA09181@python.inrialpes.fr>

For MarkH, Guido and the Windows experienced:

I've been reading Jeffrey Richter's "Advanced Windows" last night in order
to try understanding better why PyObject_NEW is implemented differently for
Windows. Again, I feel uncomfortable with this, especially now, when
I'm dealing with the memory aspect of Python's object constructors/desctrs.

Some time ago, Guido elaborated on why PyObject_NEW uses malloc() on the
user's side, before calling _PyObject_New (on Windows, cf. objimpl.h):

[Guido]
> I can explain the MS_COREDLL business:
> 
> This is defined on Windows because the core is in a DLL.  Since the
> caller may be in another DLL, and each DLL (potentially) has a
> different default allocator, and (in pre-Vladimir times) the
> type-specific deallocator typically calls free(), we (Mark & I)
> decided that the allocation should be done in the type-specific
> allocator.  We changed the PyObject_NEW() macro to call malloc() and
> pass that into _PyObject_New() as a second argument.

While I agree with this, from reading chapters 5-9 of (a French copy of)
the book (translated backwards here):

5. Win32 Memory Architecture
6. Exploring Virtual Memory
7. Using Virtual Memory in Your Applications
8. Memory Mapped Files
9. Heaps

I can't find any radical Windows specificities for memory management.
On Windows, like the rest of the OSes, the (virtual & physical) memory
allocated for a process is common and seem to be accessible from all
DDLs involved in an executable.

Things like page sharing, copy-on-write, private process mem, etc. are
conceptually all the same on Windows and Unix.

Now, the backwards binary compatibility argument aside (assuming that
extensions get recompiled when a new Python version comes out),
my concern is that with the introduction of PyObject_NEW *and* PyObject_DEL,
there's no point in having separate implementations for Windows and Unix
any more  (or I'm really missing something and I fail to see what it is).

User objects would be allocated *and* freed by the core DLL (at least
the object headers). Even if several DLLs use different allocators, this
shouldn't be a problem if what's obtained via PyObject_NEW is freed via
PyObject_DEL. This Python memory would be allocated from the Python's
core DLL regions/pages/heaps. And I believe that the memory allocated
by the core DLL is accessible from the other DLL's of the process.
(I haven't seen evidence on the opposite, but tell me if this is not true)

I thought that maybe Windows malloc() uses different heaps for the different
DLLs, but that's fine too, as long as the _NEW/_DEL symmetry is respected
and all heaps are accessible from all DLLs (which seems to be the case...),
but:

In the beginning of Chapter 9, Heaps, I read the following:

"""
...About Win32 heaps (compared to Win16 heaps)...

* There is only one kind of heap (it doesn't have any particular name,
  like "local" or "global" on Win16, because it's unique)

* Heaps are always local to a process. The contents of a process heap is
  not accessible from the threads of another process. A large number of
  Win16 applications use the global heap as a way of sharing data between
  processes; this change in the Win32 heaps is often a source of problems
  for porting Win16 applications to Win32.

* One process can create several heaps in its addressing space and can
  manipulate them all.

* A DLL does not have its own heap. It uses the heaps as part of the
  addressing space of the process. However, a DLL can create a heap in
  the addressing space of a process and reserve it for its own use.
  Since several 16-bit DLLs share data between processes by using the
  local heap of a DLL, this change is a source of problems when porting
  Win16 apps to Win32...
"""

This last paragraph confuses me. On one hand, it's stated that all heaps
can be manipulated by the process, and OTOH, a DLL can reserve a heap for
personal use within that process (implying the heap is r/w protected for
the other DLLs ?!?). The rest of this chapter does not explain how this
"private reservation" is or can be done, so some of you would probably
want to chime in and explain this to me.

Going back to PyObject_NEW, if it turns out that all heaps are accessible
from all DLLs involved in the process, I would probably lobby for unifying
the implementation of _PyObject_NEW/_New and _PyObject_DEL/_Del for Windows
and Unix.

Actually on Windows, object allocation does not depend on a central,
Python core memory allocator. Therefore, with the patches I'm working on,
changing the core allocator would work (would be changed for real) only for
platforms other than Windows.

Next, ff it's possible to unify the implementation, it would also be
possible to expose and officialize in the C API a new function set:

PyObject_New() and PyObject_Del() (without leading underscores)

For now, due to the implementation difference on Windows, we're forced to
use the macro versions PyObject_NEW/DEL.

Clearly, please tell me what would be wrong on Windows if a) & b) & c):

a) we have PyObject_New(), PyObject_Del()
b) their implementation is platform independent (no MS_COREDLL diffs,
   we retain the non-Windows variant)
c) they're both used systematically for all object types

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From gmcm at hypernet.com  Sat Mar 25 16:46:01 2000
From: gmcm at hypernet.com (Gordon McMillan)
Date: Sat, 25 Mar 2000 10:46:01 -0500
Subject: [Python-Dev] Windows and PyObject_NEW
In-Reply-To: <200003251459.PAA09181@python.inrialpes.fr>
Message-ID: <1258123323-10623548@hypernet.com>

Vladimir Marangozov

> ... And I believe that the memory allocated
> by the core DLL is accessible from the other DLL's of the process.
> (I haven't seen evidence on the opposite, but tell me if this is not true)

This is true. Or, I should say, it all boils down to 
 HeapAlloc( heap, flags, bytes)
and malloc is going to use the _crtheap.

> In the beginning of Chapter 9, Heaps, I read the following:
> 
> """
> ...About Win32 heaps (compared to Win16 heaps)...
> 
> * There is only one kind of heap (it doesn't have any particular name,
>   like "local" or "global" on Win16, because it's unique)
> 
> * Heaps are always local to a process. The contents of a process heap is
>   not accessible from the threads of another process. A large number of
>   Win16 applications use the global heap as a way of sharing data between
>   processes; this change in the Win32 heaps is often a source of problems
>   for porting Win16 applications to Win32.
> 
> * One process can create several heaps in its addressing space and can
>   manipulate them all.
> 
> * A DLL does not have its own heap. It uses the heaps as part of the
>   addressing space of the process. However, a DLL can create a heap in
>   the addressing space of a process and reserve it for its own use.
>   Since several 16-bit DLLs share data between processes by using the
>   local heap of a DLL, this change is a source of problems when porting
>   Win16 apps to Win32...
> """
> 
> This last paragraph confuses me. On one hand, it's stated that all heaps
> can be manipulated by the process, and OTOH, a DLL can reserve a heap for
> personal use within that process (implying the heap is r/w protected for
> the other DLLs ?!?). 

At any time, you can creat a new Heap
 handle HeapCreate(options, initsize, maxsize)

Nothing special about the "dll" context here. On Win9x, only 
someone who knows about the handle can manipulate the 
heap. (On NT, you can enumerate the handles in the process.)

I doubt very much that you would break anybody's code by 
removing the Windows specific behavior.

But it seems to me that unless Python always uses the 
default malloc, those of us who write C++ extensions will have 
to override operator new? I'm not sure. I've used placement 
new to allocate objects in a memory mapped file, but I've never 
tried to muck with the global memory policy of C++ program.


- Gordon


From akuchlin at mems-exchange.org  Sat Mar 25 18:58:56 2000
From: akuchlin at mems-exchange.org (Andrew Kuchling)
Date: Sat, 25 Mar 2000 12:58:56 -0500 (EST)
Subject: [Python-Dev] Unicode charnames impl.
In-Reply-To: <38DCC0B6.2A7D0EF1@tismer.com>
References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCE7C@RED-MSG-50>
	<14555.57908.151946.182639@amarok.cnri.reston.va.us>
	<38DCC0B6.2A7D0EF1@tismer.com>
Message-ID: <14556.65120.22727.524616@newcnri.cnri.reston.va.us>

Christian Tismer writes:
>This is simply not the place to use a dictionary.
>You don't need fast lookup from names to codes,
>but something that supports incremental search.
>This would enable PythonWin to sho a pop-up list after
>you typed the first letters.

Hmm... one could argue that PythonWin or IDLE should provide their own
database for incremental searching; I was planning on following Bill
Tutt's suggestion of generating a perfect minimal hash for the names.
gperf isn't up to the job, but I found an algorithm that should be OK.
Just got to implement it now...  But, if your approach pays off it'll
be superior to a perfect hash.

>Is there any reason why you didn't use the UnicodeData.txt file,
>I mean do I cover everything if I continue to use that?

Oops; I saw the NameList file and just went for it; maybe it should
use the full UnicodeData.txt.

--amk


From moshez at math.huji.ac.il  Sat Mar 25 19:10:44 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 25 Mar 2000 20:10:44 +0200 (IST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBCECKCHAA.mhammond@skippinet.com.au>
Message-ID: <Pine.GSO.4.10.10003252008130.7664-100000@sundial>

On Sun, 26 Mar 2000, Mark Hammond wrote:

> But I also agree with Guido - we _should_ attempt to go through the 137

Where did you come up with that number? I counted much more -- not quite
sure, but certainly more.

Well, here's a tentative suggestion I worked out today. This is just to
have something to quibble about. In the interest of rushing it out of the
door, there are a few modules (explicitly mentioned) which I have said
nothing about.

net
	httplib
	ftplib
	urllib
	cgi
	gopherlib
	imaplib
	poplib
	nntplib
	smptlib
	urlparse
	telnetlib
	server
		BaseHTTPServer
		CGIHTTPServer
		SimpleHTTPServer
		SocketServer
		asynchat
		asyncore
text
	sgmllib
	htmllib
	htmlentitydefs
	xml
		whatever the xml-sig puts here
	mail
		rfc822
		mime
			MimeWriter
			mimetools
			mimify
			mailcap
			mimetypes
			base64
			quopri
		mailbox
		mhlib
	binhex
	parse
		string
		re
		regex
		reconvert
		regex_syntax
		regsub
		shlex
	ConfigParser
	linecache
	multifile
	netrc
bin
	gzip
	zlib
	aifc
	chunk
	image
		imghdr
		colorsys
		imageop
		imgfile
		rgbimg
		yuvconvert
	sound
		sndhdr
		toaiff
		audiodev
		sunau
		sunaudio
		wave
		audioop
		sunaudiodev
db
	anydbm
	whichdb
	bsddb
	dbm
	dbhash
	dumbdbm
	gdbm
math
	bisect
	fpformat
	random
	whrandom
	cmath
	math
	crypt
	fpectl
	fpetest
	array
	md5
	mpz
	rotor
	sha
time
	calendar
	time
	tzparse
	sched
	timing
interpreter
	new
	py_compile
	code
	codeop
	compileall
	keyword
	token
	tokenize
	parser
	dis
	bdb
	pdb
	profile
	pyclbr
	tabnanny
	symbol
	pstats
	traceback
	rlcompleter
security
	Bastion
	rexec
	ihooks
file
	dircache
	path -- a virtual module which would do a from <something>path import *
	dospath
	posixpath
	macpath
	nturl2path
	ntpath
	macurl2path
	filecmp
	fileinput
	StringIO
	cStringIO
	glob
	fnmatch
	posixfile
	stat
	statcache
	statvfs
	tempfile
	shutil
	pipes
	popen2
	commands
	dl
	fcntl
serialize
	pickle
	cPickle
	shelve
	xdrlib
	copy
	copy_reg
threads
	thread
	threading
	Queue
	mutex
ui
	curses
	Tkinter
	cmd
	getpass
internal
	_codecs
	_locale
	_tkinter
	pcre
	strop
	posix
users
	pwd
	grp
	nis
exceptions
os
types
UserDict
UserList
user
site
locale
sgi
	al
	cd
	cl
	fl
	fm
	gl
	misc (what used to be sgimodule.c)
	sv
unicode
	codecs
	unicodedata
	unicodedatabase
========== Modules not handled ============
formatter
getopt
pprint
pty
repr
tty
errno
operator
pure
readline
resource
select
signal
socket
struct
syslog
termios

Well, if you got this far, you certainly deserve...

congratualtions-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From DavidA at ActiveState.com  Sat Mar 25 19:28:30 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Sat, 25 Mar 2000 10:28:30 -0800
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <Pine.GSO.4.10.10003252008130.7664-100000@sundial>
Message-ID: <LMBBIEIJKMPMLBONJMFCKEGNCDAA.DavidA@ActiveState.com>

> db
> 	anydbm
> 	whichdb
> 	bsddb
> 	dbm
> 	dbhash
> 	dumbdbm
> 	gdbm

This made me think of one issue which is worth considering -- is there a
mechanism for third-party packages to hook into the standard naming
hierarchy?  It'd be weird not to have the oracle and sybase modules within
the db toplevel package, for example.

--david ascher


From moshez at math.huji.ac.il  Sat Mar 25 19:30:26 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 25 Mar 2000 20:30:26 +0200 (IST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <LMBBIEIJKMPMLBONJMFCKEGNCDAA.DavidA@ActiveState.com>
Message-ID: <Pine.GSO.4.10.10003252028290.7664-100000@sundial>

On Sat, 25 Mar 2000, David Ascher wrote:

> This made me think of one issue which is worth considering -- is there a
> mechanism for third-party packages to hook into the standard naming
> hierarchy?  It'd be weird not to have the oracle and sybase modules within
> the db toplevel package, for example.

My position is that any 3rd party module decides for itself where it wants
to live -- once we formalized the framework. Consider PyGTK/PyGnome,
PyQT/PyKDE -- they should live in the UI package too...


From DavidA at ActiveState.com  Sat Mar 25 19:50:14 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Sat, 25 Mar 2000 10:50:14 -0800
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <Pine.GSO.4.10.10003252028290.7664-100000@sundial>
Message-ID: <LMBBIEIJKMPMLBONJMFCKEGOCDAA.DavidA@ActiveState.com>

> On Sat, 25 Mar 2000, David Ascher wrote:
>
> > This made me think of one issue which is worth considering -- is there a
> > mechanism for third-party packages to hook into the standard naming
> > hierarchy?  It'd be weird not to have the oracle and sybase
> modules within
> > the db toplevel package, for example.
>
> My position is that any 3rd party module decides for itself where it wants
> to live -- once we formalized the framework. Consider PyGTK/PyGnome,
> PyQT/PyKDE -- they should live in the UI package too...

That sounds good in theory, but I can see possible problems down the line:

1) The current mapping between package names and directory structure means
that installing a third party package hierarchy in a different place on disk
than the standard library requires some work on the import mechanisms (this
may have been discussed already) and a significant amount of user education.

2) We either need a 'registration' mechanism whereby people can claim a name
in the standard hierarchy or expect conflicts.  As far as I can gather, in
the Perl world registration occurs by submission to CPAN.  Correct?

One alternative is to go the Java route, which would then mean, I think,
that some core modules are placed very high in the hierarchy (the equivalent
of the java. subtree), and some others are deprecated to lower subtree (the
equivalent of com.sun).

Anyway, I agree with Guido on this one -- naming is a contentious issue
wrought with long-term implications.  Let's not rush into a decision just
yet.

--david


From guido at python.org  Sat Mar 25 19:56:20 2000
From: guido at python.org (Guido van Rossum)
Date: Sat, 25 Mar 2000 13:56:20 -0500
Subject: [Python-Dev] 1.6 job list
In-Reply-To: Your message of "Sat, 25 Mar 2000 01:35:39 PST."
             <Pine.LNX.4.10.10003250127400.30345-100000@nebula.lyra.org> 
References: <Pine.LNX.4.10.10003250127400.30345-100000@nebula.lyra.org> 
Message-ID: <200003251856.NAA09636@eric.cnri.reston.va.us>

> I say "do it incrementally" while others say "do it all at once."
> Personally, I don't think it is possible to do all at once. As a
> corollary, if you can't do it all at once, but you *require* that it be
> done all at once, then you have effectively deferred the problem. To put
> it another way, Guido has already invented a reason to not do it: he just
> requires that it be done all at once. Result: it won't be done.

Bullshit, Greg.  (I don't normally like to use such strong words, but
since you're being confrontational here...)

I'm all for doing it incrementally -- but I want the plan for how to
do it made up front.  That doesn't require all the details to be
worked out -- but it requires a general idea about what kind of things
we will have in the namespace and what kinds of names they get.  An
organizing principle, if you like.  If we were to decide later that we
go for a Java-like deep hierarchy, the network package would have to
be moved around again -- what a waste.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From moshez at math.huji.ac.il  Sat Mar 25 20:35:37 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sat, 25 Mar 2000 21:35:37 +0200 (IST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <LMBBIEIJKMPMLBONJMFCKEGOCDAA.DavidA@ActiveState.com>
Message-ID: <Pine.GSO.4.10.10003252127560.8000-100000@sundial>

On Sat, 25 Mar 2000, David Ascher wrote:

> > My position is that any 3rd party module decides for itself where it wants
> > to live -- once we formalized the framework. Consider PyGTK/PyGnome,
> > PyQT/PyKDE -- they should live in the UI package too...
> 
> That sounds good in theory, but I can see possible problems down the line:
> 
> 1) The current mapping between package names and directory structure means
> that installing a third party package hierarchy in a different place on disk
> than the standard library requires some work on the import mechanisms (this
> may have been discussed already) and a significant amount of user education.

Ummmm....
1.a) If the work of the import-sig produces something (which I suspect it
will), it's more complicated -- you could have JAR-like files with
hierarchies inside.

1.b) Installation is the domain of the distutils-sig. I seem to remember
Greg Ward saying something about installing packages.

> 2) We either need a 'registration' mechanism whereby people can claim a name
> in the standard hierarchy or expect conflicts.  As far as I can gather, in
> the Perl world registration occurs by submission to CPAN.  Correct?

Yes. But this is no worse then the current situation, where people pick 
a toplevel name <wink>. I agree a registration mechanism would be helpful.

> One alternative is to go the Java route, which would then mean, I think,
> that some core modules are placed very high in the hierarchy (the equivalent
> of the java. subtree), and some others are deprecated to lower subtree (the
> equivalent of com.sun).

Personally, I *hate* the Java mechanism -- see Stallman's position on why
GNU Java packages use gnu.* rather then org.gnu.* for some of the reasons.
I really, really, like the Perl mechanism, and I think we would do well
to think if something like that wouldn't suit us, with minor
modifications. (Remember that lwall copied the Pythonic module mechanism, 
so Perl and Python modules are quite similar)

> Anyway, I agree with Guido on this one -- naming is a contentious issue
> wrought with long-term implications.  Let's not rush into a decision just
> yet.

I agree. That's why I pushed out the straw-man proposal.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From bwarsaw at cnri.reston.va.us  Sat Mar 25 21:07:27 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Sat, 25 Mar 2000 15:07:27 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <14555.57858.824301.693390@anthem.cnri.reston.va.us>
	<Pine.LNX.4.10.10003241718500.26440-100000@nebula.lyra.org>
Message-ID: <14557.7295.451011.36533@anthem.cnri.reston.va.us>

I guess I was making a request for a more comprehensive list.  People
are asking to packagize the entire directory, so I'd like to know what
organization they'd propose for all the modules.

-Barry


From bwarsaw at cnri.reston.va.us  Sat Mar 25 21:20:09 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Sat, 25 Mar 2000 15:20:09 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <200003242129.QAA06510@eric.cnri.reston.va.us>
	<Pine.GSO.4.10.10003251214081.3539-100000@sundial>
Message-ID: <14557.8057.896921.693908@anthem.cnri.reston.va.us>

>>>>> "MZ" == Moshe Zadka <moshez at math.huji.ac.il> writes:

    MZ> Hmmmmm....this is a big problem. Maybe we need to have more
    MZ> people with access to the CVS?

To make changes like this, you don't just need write access to CVS,
you need physical access to the repository filesystem.  It's not
possible to provide this access to non-CNRI'ers.

-Barry


From gstein at lyra.org  Sat Mar 25 21:40:59 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 12:40:59 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14557.8057.896921.693908@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003251240010.2490-100000@nebula.lyra.org>

On Sat, 25 Mar 2000, Barry A. Warsaw wrote:
> >>>>> "MZ" == Moshe Zadka <moshez at math.huji.ac.il> writes:
> 
>     MZ> Hmmmmm....this is a big problem. Maybe we need to have more
>     MZ> people with access to the CVS?
> 
> To make changes like this, you don't just need write access to CVS,
> you need physical access to the repository filesystem.  It's not
> possible to provide this access to non-CNRI'ers.

Unless the CVS repository was moved to, say, SourceForge. 

:-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From bwarsaw at cnri.reston.va.us  Sat Mar 25 22:00:39 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Sat, 25 Mar 2000 16:00:39 -0500 (EST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
References: <LMBBIEIJKMPMLBONJMFCKEGOCDAA.DavidA@ActiveState.com>
	<Pine.GSO.4.10.10003252127560.8000-100000@sundial>
Message-ID: <14557.10487.736544.336550@anthem.cnri.reston.va.us>

>>>>> "MZ" == Moshe Zadka <moshez at math.huji.ac.il> writes:

    MZ> Personally, I *hate* the Java mechanism -- see Stallman's
    MZ> position on why GNU Java packages use gnu.* rather then
    MZ> org.gnu.* for some of the reasons.

Actually, it's Per Bothner's position:

http://www.gnu.org/software/java/why-gnu-packages.txt

and I agree with him.  I kind of wished that JimH had chosen simply
`python' as JPython's top level package heirarchy, but that's too late
to change now.

-Barry


From bwarsaw at cnri.reston.va.us  Sat Mar 25 22:03:08 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Sat, 25 Mar 2000 16:03:08 -0500 (EST)
Subject: [Python-Dev] 1.6 job list
References: <14557.8057.896921.693908@anthem.cnri.reston.va.us>
	<Pine.LNX.4.10.10003251240010.2490-100000@nebula.lyra.org>
Message-ID: <14557.10636.504088.517078@anthem.cnri.reston.va.us>

>>>>> "GS" == Greg Stein <gstein at lyra.org> writes:

    GS> Unless the CVS repository was moved to, say, SourceForge.

I didn't want to rehash that, but yes, you're absolutely right!

-Barry


From gstein at lyra.org  Sat Mar 25 22:13:00 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 13:13:00 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <14557.10636.504088.517078@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003251309050.2490-100000@nebula.lyra.org>

On Sat, 25 Mar 2000 bwarsaw at cnri.reston.va.us wrote:
> >>>>> "GS" == Greg Stein <gstein at lyra.org> writes:
> 
>     GS> Unless the CVS repository was moved to, say, SourceForge.
> 
> I didn't want to rehash that, but yes, you're absolutely right!

Me neither, ergo the smiley :-)

Just felt inclined to mention it, and I think the conversation stopped
last time at that point; not sure it ever was "hashed" :-). But it is only
a discussion to raise if checkins-via-CNRI-guys becomes a true bottleneck.
Which it hasn't and doesn't look to be. Constrained? Yes. Bottleneck? No.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From jeremy at cnri.reston.va.us  Sat Mar 25 22:22:09 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Sat, 25 Mar 2000 16:22:09 -0500 (EST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBCECKCHAA.mhammond@skippinet.com.au>
References: <Pine.LNX.4.10.10003250005430.30345-100000@nebula.lyra.org>
	<ECEPKNMJLHAPFFJHDOJBCECKCHAA.mhammond@skippinet.com.au>
Message-ID: <14557.4689.858620.578102@walden>

>>>>> "MH" == Mark Hammond <mhammond at skippinet.com.au> writes:

  MH> [Greg writes]
  >> I'm not even going to attempt to try to define a hierarchy for
  >> all those modules. I count 137 on my local system.  Let's say
  >> that I *do* try... some are going to end up "forced" rather than
  >> obeying some obvious grouping. If you do it a chunk at a time,
  >> then you get the obvious, intuitive groupings. Try for more, and
  >> you just bung it all up.

  MH> I agree with Greg - every module will not fit into a package.

Sure.  No one is arguing with that :-).

Where I disagree with Greg, is that we shouldn't approach this
piecemeal.  A greedy algorithm can lead to a locally optimal solution
that isn't the right for the whole library.  A name or grouping might
make sense on its own, but isn't sufficiently clear when taking all
137 odd modules into account.

  MH> But I also agree with Guido - we _should_ attempt to go through
  MH> the 137 modules and put the ones that fit into logical
  MH> groupings.  Greg is probably correct with his selection for
  MH> "net", but a general evaluation is still a good thing.  A view
  MH> of the bigger picture will help to quell debates over the
  MH> structure, and only leave us with the squabbles over the exact
  MH> spelling :-)

x1.5 on this. I'm not sure which direction you ended up thinking this
was (+ or -), but which ever direction it was I like it.

Jeremy


From gstein at lyra.org  Sat Mar 25 22:40:48 2000
From: gstein at lyra.org (Greg Stein)
Date: Sat, 25 Mar 2000 13:40:48 -0800 (PST)
Subject: [Python-Dev] voting numbers
Message-ID: <Pine.LNX.4.10.10003251328190.2490-100000@nebula.lyra.org>

Hey... just thought I'd drop off a description of the "formal" mechanism
that the ASF uses for voting since it has been seen here and there on this
group :-)

+1  "I'm all for it. Do it!"
+0  "Seems cool and acceptable, but I can also live without it"
-0  "Not sure this is the best thing to do, but I'm not against it."
-1  "Veto. And <HERE> is my reasoning."


Strictly speaking, there is no vetoing here, other than by Guido. For
changes to Apache (as opposed to bug fixes), it depends on where the
development is. Early stages, it is reasonably open and people work
straight against CVS (except for really big design changes). Late stage,
it requires three +1 votes during discussion of a patch before it goes in.

Here on python-dev, it would seem that the votes are a good way to quickly
let Guido know people's feelings about topic X or Y.

On the patches mailing list, the voting could actually be quite a useful
measure for the people with CVS commit access. If a patch gets -1, then
its commit should wait until reason X has been resolved. Note that it can
be resolved in two ways: the person lifts their veto (after some amount of
persuasion or explanation), or the patch is updated to address the
concerns (well, unless the veto is against the concept of the patch
entirely :-). If a patch gets a few +1 votes, then it can probably go
straight in. Note that the Apache guys sometimes say things like "+1 on
concept" meaning they like the idea, but haven't reviewed the code.

Do we formalize on using these? Not really suggesting that. But if myself
(and others) drop these things into mail notes, then we may as well have a
description of just what the heck is going on :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From moshez at math.huji.ac.il  Sun Mar 26 00:27:18 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 01:27:18 +0200 (IST)
Subject: [Python-Dev] Q: repr.py vs. pprint.py
Message-ID: <Pine.GSO.4.10.10003260123420.9956-100000@sundial>

Is there any reason to keep two seperate modules with simple-formatting
functions? I think pprint is somewhat more sophisticated, but in the
worst case, we can just dump them both in the same file (the only thing
would be that pprint would export "repr", in addition to "saferepr" (among
others).

(Just bumped into this in my reorg suggestion)
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Sun Mar 26 00:32:38 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 01:32:38 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
Message-ID: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>

Here's a second version of the straw man proposal for the reorganization
of modules in packages. Note that I'm treating it as a strictly 1.7
proposal, so I don't care a "lot" about backwards compatiblity.

I'm down to 4 unhandled modules, which means that if no one objects (and
I'm sure someone will <wink>), this can be a plan of action. So get your
objections ready guys!

net
	httplib
	ftplib
	urllib
	cgi
	gopherlib
	imaplib
	poplib
	nntplib
	smptlib
	urlparse
	telnetlib
	server
		BaseHTTPServer
		CGIHTTPServer
		SimpleHTTPServer
		SocketServer
		asynchat
		asyncore
text
	sgmllib
	htmllib
	htmlentitydefs
	xml
		whatever the xml-sig puts here
	mail
		rfc822
		mime
			MimeWriter
			mimetools
			mimify
			mailcap
			mimetypes
			base64
			quopri
		mailbox
		mhlib
	binhex
	parse
		string
		re
		regex
		reconvert
		regex_syntax
		regsub
		shlex
	ConfigParser
	linecache
	multifile
	netrc
bin
	gzip
	zlib
	aifc
	chunk
	image
		imghdr
		colorsys
		imageop
		imgfile
		rgbimg
		yuvconvert
	sound
		sndhdr
		toaiff
		audiodev
		sunau
		sunaudio
		wave
		audioop
		sunaudiodev
db
	anydbm
	whichdb
	bsddb
	dbm
	dbhash
	dumbdbm
	gdbm
math
	bisect
	fpformat
	random
	whrandom
	cmath
	math
	crypt
	fpectl
	fpetest
	array
	md5
	mpz
	rotor
	sha
time
	calendar
	time
	tzparse
	sched
	timing
interpreter
	new
	py_compile
	code
	codeop
	compileall
	keyword
	token
	tokenize
	parser
	dis
	bdb
	pdb
	profile
	pyclbr
	tabnanny
	symbol
	pstats
	traceback
	rlcompleter
security
	Bastion
	rexec
	ihooks
file
	dircache
	path -- a virtual module which would do a from <something>path import *
	dospath
	posixpath
	macpath
	nturl2path
	ntpath
	macurl2path
	filecmp
	fileinput
	StringIO
	cStringIO
	glob
	fnmatch
	posixfile
	stat
	statcache
	statvfs
	tempfile
	shutil
	pipes
	popen2
	commands
	dl
	fcntl
	lowlevel
		socket
		select
	terminal
		termios
		pty
		tty
		readline
	syslog
serialize
	pickle
	cPickle
	shelve
	xdrlib
	copy
	copy_reg
threads
	thread
	threading
	Queue
	mutex
ui
	curses
	Tkinter
	cmd
	getpass
internal
	_codecs
	_locale
	_tkinter
	pcre
	strop
	posix
users
	pwd
	grp
	nis
sgi
	al
	cd
	cl
	fl
	fm
	gl
	misc (what used to be sgimodule.c)
	sv
unicode
	codecs
	unicodedata
	unicodedatabase
exceptions
os
types
UserDict
UserList
user
site
locale
pure
formatter
getopt
signal
pprint
========== Modules not handled ============
errno
resource
operator
struct

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From DavidA at ActiveState.com  Sun Mar 26 00:39:51 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Sat, 25 Mar 2000 15:39:51 -0800
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <Pine.GSO.4.10.10003252127560.8000-100000@sundial>
Message-ID: <LMBBIEIJKMPMLBONJMFCEEHFCDAA.DavidA@ActiveState.com>

> I really, really, like the Perl mechanism, and I think we would do well
> to think if something like that wouldn't suit us, with minor
> modifications.

The biggest modification which I think is needed to a Perl-like organization
is that IMO there is value in knowing what packages are 'blessed' by Guido.
In other words, some sort of Q/A mechanism would be good, if it can be kept
simple.

[Alternatively, let's not put a Q/A mechanism in place and my employer can
make money selling that information, the way they do for Perl! =)]

> (Remember that lwall copied the Pythonic module mechanism,
> so Perl and Python modules are quite similar)

That's stretching things a bit (the part after the 'so' doesn't follow from
the part before), as there is a lot more to the nature of module systems,
but the point is well taken.

--david


From moshez at math.huji.ac.il  Sun Mar 26 06:44:02 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 06:44:02 +0200 (IST)
Subject: [Python-Dev] module reorg (was: 1.6 job list)
In-Reply-To: <LMBBIEIJKMPMLBONJMFCEEHFCDAA.DavidA@ActiveState.com>
Message-ID: <Pine.GSO.4.10.10003260642150.11076-100000@sundial>

On Sat, 25 Mar 2000, David Ascher wrote:

> The biggest modification which I think is needed to a Perl-like organization
> is that IMO there is value in knowing what packages are 'blessed' by Guido.
> In other words, some sort of Q/A mechanism would be good, if it can be kept
> simple.

You got a point. Anyone knows how the perl-porters decide what modules to 
put in source.tar.gz?

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping at lfw.org  Sun Mar 26 07:01:58 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 21:01:58 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003252035050.2741-100000@skuld.lfw.org>

On Sun, 26 Mar 2000, Moshe Zadka wrote:
> Here's a second version of the straw man proposal for the reorganization
> of modules in packages. Note that I'm treating it as a strictly 1.7
> proposal, so I don't care a "lot" about backwards compatiblity.

Hey, this looks pretty good.  For the most part i agree with
your layout.  Here are a few notes...

> net
[...]
> 	server
[...]

Good.

> text
[...]
> 	xml
> 		whatever the xml-sig puts here
> 	mail
> 		rfc822
> 		mime
> 			MimeWriter
> 			mimetools
> 			mimify
> 			mailcap
> 			mimetypes
> 			base64
> 			quopri
> 		mailbox
> 		mhlib
> 	binhex

I'm not convinced "mime" needs a separate branch here.
(This is the deepest part of the tree, and at three levels
small alarm bells went off in my head.)

For example, why text.binhex but text.mail.mime.base64?

> 	parse
> 		string
> 		re
> 		regex
> 		reconvert
> 		regex_syntax
> 		regsub
> 		shlex
> 	ConfigParser
> 	linecache
> 	multifile
> 	netrc

The "re" module, in particular, will get used a lot,
and it's not clear why these all belong under "parse".
I suggest dropping "parse" and moving these up.
What's "multifile" doing here instead of with the rest
of the mail/mime stuff?

> bin
[...]

I like this.  Good idea.

> 	gzip
> 	zlib
> 	aifc

Shouldn't "aifc" be under "sound"?

> 	image
[...]
> 	sound
[...]

> db
[...]

Yup.

> math
[...]
> time
[...]

Looks good.

> interpreter
[...]

How about just "interp"?

> security
[...]

> file
[...]
> 	lowlevel
> 		socket
> 		select

Why the separate "lowlevel" branch?
Why doesn't "socket" go under "net"?

> 	terminal
> 		termios
> 		pty
> 		tty
> 		readline

Why does "terminal" belong under "file"?
Maybe it could go under "ui"?  Hmm... "pty" doesn't
really belong.

> 	syslog

Hmm...

> serialize

> 	pickle
> 	cPickle
> 	shelve
> 	xdrlib
> 	copy
> 	copy_reg

"copy" doesn't really fit here under "serialize", and
"serialize" is kind of a long name.

How about a "data types" package?  We could then put
"struct", "UserDict", "UserList", "pprint", and "repr" here.

    data
        copy
        copy_reg
        pickle
        cPickle
        shelve
        xdrlib
        struct
        UserDict
        UserList
        pprint
        repr

On second thought, maybe "struct" fits better under "bin".

> threads
[...]
> ui
[...]

Uh huh.

> internal
> 	_codecs
> 	_locale
> 	_tkinter
> 	pcre
> 	strop
> 	posix

Not sure this is a good idea.  It means the Unicode
work lives under both "unicode" and "internal._codecs",
Tk is split between "ui" and "internal._tkinter",
regular expressions are split between "text.re" and
"internal.pcre".  I can see your motivation for getting
"posix" out of the way, but i suspect this is likely to
confuse people.

> users
> 	pwd
> 	grp
> 	nis

Hmm.  Yes, i suppose so.

> sgi
[...]
> unicode
[...]

Indeed.

> os
> UserDict
> UserList
> exceptions
> types
> operator
> user
> site

Yeah, these are all top-level (except maybe UserDict and
UserList, see above).

> locale

I think "locale" belongs under "math" with "fpformat" and
the others.  It's for numeric formatting.

> pure

What the heck is "pure"?

> formatter

This probably goes under "text".

> struct

See above under "data".  I can't decide whether "struct"
should be part of "data" or "bin".  Hmm... probably "bin" --
since, unlike the serializers under "data", "struct" does
not actually specify a serialization format, it only provides
fairly low-level operations.

Well, this leaves a few system-like modules that didn't
really fit elsewhere for me:

    pty
    tty
    termios
    syslog
    select
    getopt
    signal
    errno
    resource

They all seem to be Unix-related.  How about putting these
in a "unix" or "system" package?


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From moshez at math.huji.ac.il  Sun Mar 26 07:58:34 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 07:58:34 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252035050.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003260742070.11386-100000@sundial>

On Sat, 25 Mar 2000, Ka-Ping Yee wrote:

> I'm not convinced "mime" needs a separate branch here.
> (This is the deepest part of the tree, and at three levels
> small alarm bells went off in my head.)

I've had my problems with that too, but it seemed to many modules were
mime specific.

> For example, why text.binhex but text.mail.mime.base64?

Actually, I thought about this (this isn't random at all): base64 encoding
is part of the mime standard, together with quoted-printable. Binhex
isn't. I don't know if you find it reason enough, and it may be smarter
just having a text.encode.{quopri,uu,base64,binhex}

> > 	parse
> > 		string
> > 		re
> > 		regex
> > 		reconvert
> > 		regex_syntax
> > 		regsub
> > 		shlex
> > 	ConfigParser
> > 	linecache
> > 	multifile
> > 	netrc
> 
> The "re" module, in particular, will get used a lot,

and 
from <something> import re

Doesn't seem too painful.

> and it's not clear why these all belong under "parse".

These are all used for parsing data (which does not have some pre-written
parser). I had problems with the name too...

> What's "multifile" doing here instead of with the rest
> of the mail/mime stuff?

It's also useful generally.

> Shouldn't "aifc" be under "sound"?

You're right.

> > interpreter
> [...]
> 
> How about just "interp"?

I've no *strong* feelings, just a vague "don't abbrev." hunch <wink>

> Why the separate "lowlevel" branch?

Because it is -- most Python code will use one of the higher level
modules.

> Why doesn't "socket" go under "net"?

What about UNIX domain sockets? Again, no *strong* opinion, though.

> > 	terminal
> > 		termios
> > 		pty
> > 		tty
> > 		readline
> 
> Why does "terminal" belong under "file"?

Because it is (a special kind of file)

> > serialize
> 
> > 	pickle
> > 	cPickle
> > 	shelve
> > 	xdrlib
> > 	copy
> > 	copy_reg
> 
> "copy" doesn't really fit here under "serialize", and
> "serialize" is kind of a long name.

I beg to disagree -- "copy" is frequently close to serialization, both in
the model (serializing to a "data structure") and in real life (that's the
way people copy stuff in Java, and UNIX too: think tar cvf - | tar xvf -)

What's more, copy_reg is used both for copy and for pickle

I do like the idea of "data-types" package, but it needs to be ironed 
out a bit.

> > internal
> > 	_codecs
> > 	_locale
> > 	_tkinter
> > 	pcre
> > 	strop
> > 	posix
> 
> Not sure this is a good idea.  It means the Unicode
> work lives under both "unicode" and "internal._codecs",
> Tk is split between "ui" and "internal._tkinter",
> regular expressions are split between "text.re" and
> "internal.pcre".  I can see your motivation for getting
> "posix" out of the way, but i suspect this is likely to
> confuse people.

You mistook my motivation -- I just want unadvertised modules (AKA
internal use modules) to live in a carefully segregate section of the
namespace. How would this confuse people? No one imports _tkinter or pcre,
so no one would notice the change.


> > locale
> 
> I think "locale" belongs under "math" with "fpformat" and
> the others.  It's for numeric formatting.

Only? And anyway, I doubt many people will think like that.

> > pure
> 
> What the heck is "pure"?

A module that helps work with purify.

> > formatter
> 
> This probably goes under "text".

You're right.

> Well, this leaves a few system-like modules that didn't
> really fit elsewhere for me:
> 
>     pty
>     tty
>     termios
>     syslog
>     select
>     getopt
>     signal
>     errno
>     resource
> 
> They all seem to be Unix-related.  How about putting these
> in a "unix" or "system" package?

"select", "signal" aren't UNIX specific.
"getopt" is used for generic argument processing, so it isn't really UNIX
specific. And I don't like the name "system" either. But I have no
constructive proposals about thos either.

so-i'll-just-shut-up-now-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From dan at cgsoftware.com  Sun Mar 26 08:05:44 2000
From: dan at cgsoftware.com (Daniel Berlin)
Date: Sat, 25 Mar 2000 22:05:44 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260742070.11386-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003252202110.11001-100000@propylaea.anduin.com>

> "select", "signal" aren't UNIX specific.
Huh?
How not?
Can you name a non-UNIX that is providing them?
(BeOS wouldn't count, select is broken, and nobody uses signals.)
and if you can, is it providing them for something other than "UNIX/POSIX
compatibility"
> "getopt" is used for generic argument processing, so it isn't really UNIX
> specific.

It's a POSIX.2 function.
I consider that UNIX.

> And I don't like the name "system" either. But I have no
> constructive proposals about thos either.
> 
> so-i'll-just-shut-up-now-ly y'rs, Z.
> --
just-picking-nits-ly y'rs,
Dan


From moshez at math.huji.ac.il  Sun Mar 26 08:32:33 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 08:32:33 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252202110.11001-100000@propylaea.anduin.com>
Message-ID: <Pine.GSO.4.10.10003260830110.12676-100000@sundial>

On Sat, 25 Mar 2000, Daniel Berlin wrote:

> 
> > "select", "signal" aren't UNIX specific.
> Huh?
> How not?
> Can you name a non-UNIX that is providing them?

Win32. Both of them. I've even used select there.

> and if you can, is it providing them for something other than "UNIX/POSIX
> compatibility"

I don't know what it provides them for, but I've *used* *select* on
*WinNT*. I don't see why Python should make me feel bad when I'm doing
that.

> > "getopt" is used for generic argument processing, so it isn't really UNIX
> > specific.
> 
> It's a POSIX.2 function.
> I consider that UNIX.

Well, the argument style it processes is not unheard of in other OSes, and
it's nice to have command line apps that have a common ui. That's it!
"getopt" belongs in the ui package!


--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping at lfw.org  Sun Mar 26 09:23:45 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 23:23:45 -0800 (PST)
Subject: [Python-Dev] cPickle and cStringIO
Message-ID: <Pine.LNX.4.10.10003252320490.2741-100000@skuld.lfw.org>

Are there any objections to including

    try:
        from cPickle import *
    except:
        pass

in pickle and

    try:
        from cStringIO import *
    except:
        pass

in StringIO?


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From moshez at math.huji.ac.il  Sun Mar 26 09:14:10 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 09:14:10 +0200 (IST)
Subject: [Python-Dev] cPickle and cStringIO
In-Reply-To: <Pine.LNX.4.10.10003252320490.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003260913130.12676-100000@sundial>

On Sat, 25 Mar 2000, Ka-Ping Yee wrote:

> Are there any objections to including
> 
>     try:
>         from cPickle import *
>     except:
>         pass
> 
> in pickle and
> 
>     try:
>         from cStringIO import *
>     except:
>         pass
> 
> in StringIO?

Yes, until Python types are subclassable. Currently, one can inherit from
pickle.Pickler/Unpickler and StringIO.StringIO.


--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping at lfw.org  Sun Mar 26 09:37:11 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 23:37:11 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>

Okay, here's another shot at it.  Notice a few things:

    - no text.mime package
    - encoders moved to text.encode
    - Unix stuff moved to unix package (no file.lowlevel, file.terminal)
    - aifc moved to bin.sound package
    - struct moved to bin package
    - locale moved to math package
    - linecache moved to interp package
    - data-type stuff moved to data package
    - modules in internal package moved to live with their friends

Modules that are deprecated or not really intended to be imported
are listed in parentheses (to give a better idea of the "real"
size of each package).  cStringIO and cPickle are parenthesized
in hopeful anticipation of agreement on my last message...


net
        urlparse
        urllib
        ftplib
        gopherlib
        imaplib
        poplib
        nntplib
        smtplib
        telnetlib
        httplib
        cgi
        server
                BaseHTTPServer
                CGIHTTPServer
                SimpleHTTPServer
                SocketServer
                asynchat
                asyncore
text
        re              # general-purpose parsing
        sgmllib
        htmllib
        htmlentitydefs
        xml
                whatever the xml-sig puts here
        mail
                rfc822
                mailbox
                mhlib
        encode          # i'm also ok with moving text.encode.* to text.*
                binhex
                uu
                base64
                quopri
        MimeWriter
        mimify
        mimetools
        mimetypes
        multifile
        mailcap         # special-purpose file parsing
        shlex
        ConfigParser
        netrc
        formatter
        (string, strop, pcre, reconvert, regex, regex_syntax, regsub)
bin
        gzip
        zlib
        chunk
        struct
        image
                imghdr
                colorsys        # a bit unsure, but doesn't go anywhere else
                imageop
                imgfile
                rgbimg
                yuvconvert
        sound
                aifc
                sndhdr
                toaiff
                audiodev
                sunau
                sunaudio
                wave
                audioop
                sunaudiodev
db
        anydbm
        whichdb
        bsddb
        dbm
        dbhash
        dumbdbm
        gdbm
math
        math            # library functions
        cmath
        fpectl          # type-related
        fpetest
        array
        mpz
        fpformat        # formatting
        locale
        bisect          # algorithm: also unsure, but doesn't go anywhere else
        random          # randomness
        whrandom
        crypt           # cryptography
        md5
        rotor
        sha
time
        calendar
        time
        tzparse
        sched
        timing
interp
        new
        linecache       # handling .py files
        py_compile
        code            # manipulating internal objects
        codeop
        dis
        traceback
        compileall
        keyword         # interpreter constants
        token
        symbol
        tokenize        # parsing
        parser
        bdb             # development
        pdb
        profile
        pyclbr
        tabnanny
        pstats
        rlcompleter     # this might go in "ui"...
security
        Bastion
        rexec
        ihooks
file
        dircache
        path -- a virtual module which would do a from <something>path import *
        nturl2path
        macurl2path
        filecmp
        fileinput
        StringIO
        glob
        fnmatch
        stat
        statcache
        statvfs
        tempfile
        shutil
        pipes
        popen2
        commands
        dl
        (dospath, posixpath, macpath, ntpath, cStringIO)
data
        pickle
        shelve
        xdrlib
        copy
        copy_reg
        UserDict
        UserList
        pprint
        repr
        (cPickle)
threads
        thread
        threading
        Queue
        mutex
ui
        _tkinter
        curses
        Tkinter
        cmd
        getpass
        getopt
        readline
users
        pwd
        grp
        nis
sgi
        al
        cd
        cl
        fl
        fm
        gl
        misc (what used to be sgimodule.c)
        sv
unicode
        _codecs
        codecs
        unicodedata
        unicodedatabase
unix
        errno
        resource
        signal
        posix
        posixfile
        socket
        select
        syslog
        fcntl
        termios
        pty
        tty
_locale
exceptions
sys
os
types
user
site
pure
operator


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From ping at lfw.org  Sun Mar 26 09:40:27 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 23:40:27 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>
Message-ID: <Pine.LNX.4.10.10003252337160.2741-100000@skuld.lfw.org>

Hey, while we're at it... as long as we're renaming modules,
what do you all think of getting rid of that "lib" suffix?

As in:

> net
>         urlparse
>         url
>         ftp
>         gopher
>         imap
>         pop
>         nntp
>         smtp
>         telnet
>         http
>         cgi
>         server
[...]
> text
>         re              # general-purpose parsing
>         sgml
>         html
>         htmlentitydefs
[...]


"import net.ftp" seems nicer to me than "import ftplib".

We could also just stick htmlentitydefs.entitydefs in html
and deprecate htmlentitydefs.


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From ping at lfw.org  Sun Mar 26 09:53:06 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Sat, 25 Mar 2000 23:53:06 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260742070.11386-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003252300230.2741-100000@skuld.lfw.org>

On Sun, 26 Mar 2000, Moshe Zadka wrote:
> > For example, why text.binhex but text.mail.mime.base64?
> 
> Actually, I thought about this (this isn't random at all): base64 encoding
> is part of the mime standard, together with quoted-printable. Binhex
> isn't. I don't know if you find it reason enough, and it may be smarter
> just having a text.encode.{quopri,uu,base64,binhex}

I think i'd like that better, yes.

> > and it's not clear why these all belong under "parse".
> 
> These are all used for parsing data (which does not have some pre-written
> parser). I had problems with the name too...

And parsing is what the "text" package is about anyway.
I say move them up.  (See the layout in my other message.
Notice most of the regular-expression stuff is deprecated
anyway, so it's not like there are really that many.)

> > Why doesn't "socket" go under "net"?
> 
> What about UNIX domain sockets? Again, no *strong* opinion, though.

Bleck, you're right.  Well, i think we just have to pick one
or the other here, and i think most people would guess "net"
first.  (You can think of it as IPC, and file IPC-related
things under then "net" category...?)

> > Why does "terminal" belong under "file"?
> 
> Because it is (a special kind of file)

Only in Unix.  It's Unix that likes to think of all things,
including terminals, as files.

> I do like the idea of "data-types" package, but it needs to be ironed 
> out a bit.

See my other message for a possible suggested hierarchy...

> > > internal
[...]
> You mistook my motivation -- I just want unadvertised modules (AKA
> internal use modules) to live in a carefully segregate section of the
> namespace. How would this confuse people? No one imports _tkinter or pcre,
> so no one would notice the change.

I think it makes more sense to classify modules by their
topic rather than their exposure.  (For example, you wouldn't
move deprecated modules to a "deprecated" package.)

Keep in mind that (well, at least to me) the main point of
any naming hierarchy is to avoid name collisions.  "internal"
doesn't really help that purpose.  You also want to be sure
(or as sure as you can) that modules will be obvious to find
in the hierarchy.  An "internal" package creates a distinction
orthogonal to the topic-matter distinction we're using for the
rest of the packages, which *potentially* introduces the
question "well... is this module internal or not?" for every
other module.  Yes, admittedly this is only "potentially",
but i hope you see the abstract point i'm trying to make...

> > > locale
> > 
> > I think "locale" belongs under "math" with "fpformat" and
> > the others.  It's for numeric formatting.
> 
> Only? And anyway, I doubt many people will think like that.

Yeah, it is pretty much only for numeric formatting.  The
more generic locale stuff seems to be in _locale.

> > They all seem to be Unix-related.  How about putting these
> > in a "unix" or "system" package?
> 
> "select", "signal" aren't UNIX specific.

Yes, but when they're available on other systems they're an
attempt to emulate Unix or Posix functionality, aren't they?

> Well, the argument style it processes is not unheard of in other OSes, and
> it's nice to have command line apps that have a common ui. That's it!
> "getopt" belongs in the ui package!

I like ui.getopt.  It's a pretty good idea.


-- ?!ng

"I'm not trying not to answer the question; i'm just not answering it."
    -- Lenore Snell


From moshez at math.huji.ac.il  Sun Mar 26 10:05:49 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 10:05:49 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003261004550.14456-100000@sundial>

+1. I've had minor nits, but nothing is perfect, and this is definitely
"good enough".

Now we'll just have to wait until the BDFL says something...

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Sun Mar 26 10:06:59 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 10:06:59 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252337160.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003261006280.14456-100000@sundial>

On Sat, 25 Mar 2000, Ka-Ping Yee wrote:

> Hey, while we're at it... as long as we're renaming modules,
> what do you all think of getting rid of that "lib" suffix?

+0

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Sun Mar 26 10:19:34 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 10:19:34 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252300230.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003261017470.14456-100000@sundial>

On Sat, 25 Mar 2000, Ka-Ping Yee wrote:

> > "select", "signal" aren't UNIX specific.
> 
> Yes, but when they're available on other systems they're an
> attempt to emulate Unix or Posix functionality, aren't they?

I thinki "signal" is ANSI C, but I'm not sure.

no-other-comments-ly y'rs, Z.


From gstein at lyra.org  Sun Mar 26 13:52:53 2000
From: gstein at lyra.org (Greg Stein)
Date: Sun, 26 Mar 2000 03:52:53 -0800 (PST)
Subject: [Python-Dev] Windows and PyObject_NEW
In-Reply-To: <1258123323-10623548@hypernet.com>
Message-ID: <Pine.LNX.4.10.10003260350510.7085-100000@nebula.lyra.org>

On Sat, 25 Mar 2000, Gordon McMillan wrote:
>...
> I doubt very much that you would break anybody's code by 
> removing the Windows specific behavior.
> 
> But it seems to me that unless Python always uses the 
> default malloc, those of us who write C++ extensions will have 
> to override operator new? I'm not sure. I've used placement 
> new to allocate objects in a memory mapped file, but I've never 
> tried to muck with the global memory policy of C++ program.

Actually, the big problem arises when you have debug vs. non-debug DLLs.
malloc() uses different heaps based on the debug setting. As a result, it
is a bad idea to call malloc() from a debug DLL and free() it from a
non-debug DLL.

If the allocation pattern is fixed, then things may be okay. IF.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Sun Mar 26 14:02:40 2000
From: gstein at lyra.org (Greg Stein)
Date: Sun, 26 Mar 2000 04:02:40 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003260359070.7085-100000@nebula.lyra.org>

On Sun, 26 Mar 2000, Moshe Zadka wrote:
>...
> [ tree ]

This is a great start. I have two comments:

1) keep it *very* shallow. depth just makes it conceptually difficult.

2) you're pushing too hard. modules do not *have* to go into a package.
   there are some placements that you've made which are very
   questionable... it appears they are done for movement's sake rather
   than for being "right"

I'm off to sleep, but will look into specific comments tomorrow or so.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Sun Mar 26 14:14:32 2000
From: gstein at lyra.org (Greg Stein)
Date: Sun, 26 Mar 2000 04:14:32 -0800 (PST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <200003251856.NAA09636@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003260403180.7085-100000@nebula.lyra.org>

On Sat, 25 Mar 2000, Guido van Rossum wrote:
> > I say "do it incrementally" while others say "do it all at once."
> > Personally, I don't think it is possible to do all at once. As a
> > corollary, if you can't do it all at once, but you *require* that it be
> > done all at once, then you have effectively deferred the problem. To put
> > it another way, Guido has already invented a reason to not do it: he just
> > requires that it be done all at once. Result: it won't be done.
> 
> Bullshit, Greg.  (I don't normally like to use such strong words, but
> since you're being confrontational here...)

Fair enough, and point accepted. Sorry. I will say, tho, that you've taken
this slightly out of context. The next paragraph explicitly stated that I
don't believe you had this intent. I just felt that coming up with a
complete plan before doing anything would be prone to failure. You asked
to invent a new reason :-), so I said you had one already :-)

Confrontational? Yes, guilty as charged. I was a bit frustrated.

> I'm all for doing it incrementally -- but I want the plan for how to
> do it made up front.  That doesn't require all the details to be
> worked out -- but it requires a general idea about what kind of things
> we will have in the namespace and what kinds of names they get.  An
> organizing principle, if you like.  If we were to decide later that we
> go for a Java-like deep hierarchy, the network package would have to
> be moved around again -- what a waste.

All righty. So I think there is probably a single question that I have
here:

  Moshe posted a large breakdown of how things could be packaged. He and
  Ping traded a number of comments, and more will be coming as soon as
  people wake up :-)

  However, if you are only looking for a "general idea", then should
  python-dev'ers nit pick the individual modules, or just examine the
  general breakdown and hierarchy?

thx,
-g

-- 
Greg Stein, http://www.lyra.org/


From moshez at math.huji.ac.il  Sun Mar 26 14:09:02 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 14:09:02 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003260359070.7085-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003261405460.25062-100000@sundial>

On Sun, 26 Mar 2000, Greg Stein wrote:

> This is a great start. I have two comments:
> 
> 1) keep it *very* shallow. depth just makes it conceptually difficult.

I tried, and Ping shallowed it even more. 
BTW: Anyone who cares to comment, please comment on Ping's last
suggestion. I pretty much agree with the changes he made.

> 2) you're pushing too hard. modules do not *have* to go into a package.
>    there are some placements that you've made which are very
>    questionable... it appears they are done for movement's sake rather
>    than for being "right"

Well, I'm certainly sorry I gave that impression -- the reason I wans't
"right" wasn't that, it was more my desire to be "fast" -- I wanted to
have some proposal out the door, since it is harder to argue about
something concrete. The biggest prrof of concept that we all agree is that
no one seriously took objections to anything -- there were just some minor
nits to pick.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Sun Mar 26 14:11:10 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Sun, 26 Mar 2000 14:11:10 +0200 (IST)
Subject: [Python-Dev] 1.6 job list
In-Reply-To: <Pine.LNX.4.10.10003260403180.7085-100000@nebula.lyra.org>
Message-ID: <Pine.GSO.4.10.10003261409540.25062-100000@sundial>

On Sun, 26 Mar 2000, Greg Stein wrote:

>   Moshe posted a large breakdown of how things could be packaged. He and
>   Ping traded a number of comments, and more will be coming as soon as
>   people wake up :-)

Just a general comment -- it's so much fun to live in a different zone
then all of you guys.

just-wasting-time-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From gstein at lyra.org  Sun Mar 26 14:23:57 2000
From: gstein at lyra.org (Greg Stein)
Date: Sun, 26 Mar 2000 04:23:57 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003261405460.25062-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003260420480.7085-100000@nebula.lyra.org>

On Sun, 26 Mar 2000, Moshe Zadka wrote:
> On Sun, 26 Mar 2000, Greg Stein wrote:
>...
> > 2) you're pushing too hard. modules do not *have* to go into a package.
> >    there are some placements that you've made which are very
> >    questionable... it appears they are done for movement's sake rather
> >    than for being "right"
> 
> Well, I'm certainly sorry I gave that impression -- the reason I wans't
> "right" wasn't that, it was more my desire to be "fast" -- I wanted to
> have some proposal out the door, since it is harder to argue about
> something concrete. The biggest prrof of concept that we all agree is that
> no one seriously took objections to anything -- there were just some minor
> nits to pick.

Not something to apologize for! :-)

Well, the indicator was the line in your original post about "unhandled
modules" and the conversation between you and Ping with statements along
the lines of "wasn't sure where to put this." I say just leave it then :-)

If a module does not make *obvious* sense to be in a package, then it
should not be there. For example: locale. That is not about numbers or
about text. It has general utility. If there was an i18n package, then it
would go there. Otherwise, don't force it somewhere else. Other packages
are similar, so don't single out my comment about locale.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From DavidA at ActiveState.com  Sun Mar 26 20:09:15 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Sun, 26 Mar 2000 10:09:15 -0800
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003260420480.7085-100000@nebula.lyra.org>
Message-ID: <LMBBIEIJKMPMLBONJMFCGEIACDAA.DavidA@ActiveState.com>

> If a module does not make *obvious* sense to be in a package, then it
> should not be there. For example: locale. That is not about numbers or
> about text. It has general utility. If there was an i18n package, then it
> would go there. Otherwise, don't force it somewhere else. Other packages
> are similar, so don't single out my comment about locale.

I maintain that a general principle re: what the aim of this reorg is is
needed before the partitioning of the space can make sense.

What Moshe and Ping have is a good stab at partitioning of a subspace of the
total space of Python modules and packages, i.e., the standard library.

If we limit the aim of the reorg to cover just that subspace, then that's
fine and Ping's proposal seems grossly fine to me.

If we want to have a Perl-like packaging, then we _need_ to take into
account all known Python modules of general utility, such as the database
modules, the various GUI packages, the mx* packages, Aaron's work, PIL,
etc., etc.  Ignoring those means that the dataset used to decide the
partitioning function is highly biased.  Given the larger dataset, locale
might very well fit in a not-toplevel location.

I know that any organizational scheme is going to be optimal at best at its
inception, and that as history happens, it will become suboptimal.  However,
it's important to know what the space being partitioned is supposed to look
like.

A final comment: there's a history and science to this kind of organization,
which is part of library science.  I suspect there is quite a bit of
knowledge available as to organizing principles to do it right.  It would be
nice if someone could research it a bit and summarize the basic principles
to the rest of us.

I agree with Greg that we need high-level input from Guido on this.

--david 'academic today' ascher


From ping at lfw.org  Sun Mar 26 22:34:11 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Sun, 26 Mar 2000 12:34:11 -0800 (PST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003260420480.7085-100000@nebula.lyra.org>
Message-ID: <Pine.LNX.4.10.10003261142420.2741-100000@skuld.lfw.org>

On Sun, 26 Mar 2000, Greg Stein wrote:
> 
> If a module does not make *obvious* sense to be in a package, then it
> should not be there. For example: locale. That is not about numbers or
> about text. It has general utility. If there was an i18n package, then it
> would go there. Otherwise, don't force it somewhere else. Other packages
> are similar, so don't single out my comment about locale.

I goofed.  I apologize.  Moshe and Greg are right: locale isn't
just about numbers.  I just read the comment at the top of locale.py:

    "Support for number formatting using the current locale settings"

and didn't notice the

    from _locale import *

a couple of lines down.

"import locale; dir(locale)" didn't work for me because for some
reason there's no _locale built-in on my system (Red Hat 6.1,
python-1.5.1-10).  So i looked for 'def's and they all looked
like they had to do with numeric formatting.

My mistake.  "locale", at least, belongs at the top level.

Other candidates for top-level:

    bisect              # algorithm
    struct              # more general than "bin" or "data"
    colorsys            # not really just for image file formats
    yuvconvert          # not really just for image file formats
    rlcompleter         # not really part of the interpreter
    dl                  # not really just about files

Alternatively, we could have: ui.rlcompleter, unix.dl

(It would be nice, by the way, to replace "bisect" with
an "algorithm" module containing some nice pedagogical
implementations of things like bisect, quicksort, heapsort,
Dijkstra's algorithm etc.)

The following also could be left at the top-level, since
they seem like applications (i.e. they probably won't
get imported by code, only interactively).  No strong
opinion on this.

    bdb
    pdb
    pyclbr
    tabnanny
    profile
    pstats

Also... i was avoiding calling the "unix" package "posix"
because we already have a "posix" module.  But wait... the
proposed tree already contains "math" and "time" packages.
If there is no conflict (is there a conflict?) then the
"unix" package should probably be named "posix".


-- ?!ng

"In the sciences, we are now uniquely privileged to sit side by side
with the giants on whose shoulders we stand."
    -- Gerald Holton


From moshez at math.huji.ac.il  Mon Mar 27 07:35:23 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Mon, 27 Mar 2000 07:35:23 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003261142420.2741-100000@skuld.lfw.org>
Message-ID: <Pine.GSO.4.10.10003270728070.12902-100000@sundial>

On Sun, 26 Mar 2000, Ka-Ping Yee wrote:

> The following also could be left at the top-level, since
> they seem like applications (i.e. they probably won't
> get imported by code, only interactively).  No strong
> opinion on this.
> 
>     bdb
>     pdb
>     pyclbr
>     tabnanny
>     profile
>     pstats

Let me just state my feelings about the interpreter package: since Python
programs are probably the most suited to reasoning about Python programs 
(among other things, thanks to the strong introspection capabilities of
Python), many Python modules were written to supply a convenient interface
to that introspection. These modules are *only* needed by programs dealing
with Python programs, and hence should live in a well defined part of the
namespace. I regret calling it "interpreter" though: "Python" is a better
name (something like that java.lang package)

> Also... i was avoiding calling the "unix" package "posix"
> because we already have a "posix" module.  But wait... the
> proposed tree already contains "math" and "time" packages.

Yes. That was a hard decision I made, and I'm sort of waiting for Guido to
veto it: it would negate the easy backwards compatible path of providing
a toplevel module for each module which is moved somewhere else which does
"from import *".

> If there is no conflict (is there a conflict?) then the
> "unix" package should probably be named "posix".

I hardly agree. "dl", for example, is a common function on unices, but it
is not part of the POSIX standard. I think "posix" module should have
POSIX fucntions, and the "unix" package should deal with functinality
available on real-life unices.

standards-are-fun-aren't-they-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From pf at artcom-gmbh.de  Mon Mar 27 08:52:25 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Mon, 27 Mar 2000 08:52:25 +0200 (MEST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.GSO.4.10.10003270728070.12902-100000@sundial> from Moshe Zadka at "Mar 27, 2000  7:35:23 am"
Message-ID: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>

Hi!

Moshe Zadka wrote:
> Yes. That was a hard decision I made, and I'm sort of waiting for Guido to
> veto it: it would negate the easy backwards compatible path of providing
> a toplevel module for each module which is moved somewhere else which does
> "from import *".

If the result of this renaming initiative will be that I can't use
	import sys, os, time, re, struct, cPickle, parser
	import Tkinter; Tk=Tkinter; del Tkinter
anymore in Python 1.x and instead I have to change this into (for example):
	form posix import time
	from text import re
	from bin import struct
	from Python import parser
	from ui import Tkinter; ...
	...
I would really really *HATE* this change!

[side note:
  The 'from MODULE import ...' form is evil and I have abandoned its use
  in favor of the 'import MODULE' form in 1987 or so, as our Modula-2
  programs got bigger and bigger.  With 20+ software developers working
  on a ~1,000,000 LOC of Modula-2 software system, this decision
  proofed itself well.

  The situation with Python is comparable.  Avoiding 'from ... import'
  rewards itself later, when your software has grown bigger and when it
  comes to maintaince by people not familar with the used modules.
]

May be I didn't understand what this new subdivision of the standard
library should achieve.  

The library documentation provides a existing logical subdivision into 
chapters, which group the library into several kinds of services.  
IMO this subdivision could be discussed and possibly revised.  
But at the moment I got the impression, that it was simply ignored.  
Why?  What's so bad with it?  
Why is a subdivision on the documentation level not sufficient?  
Why should modules be moved into packages?  I don't get it.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From moshez at math.huji.ac.il  Mon Mar 27 09:09:18 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Mon, 27 Mar 2000 09:09:18 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.GSO.4.10.10003270904190.15099-100000@sundial>

On Mon, 27 Mar 2000, Peter Funk wrote:

> If the result of this renaming initiative will be that I can't use
> 	import sys, os, time, re, struct, cPickle, parser
> 	import Tkinter; Tk=Tkinter; del Tkinter
> anymore in Python 1.x and instead I have to change this into (for example):
> 	form posix import time

from time import time

> 	from text import re
> 	from bin import struct
> 	from Python import parser
> 	from ui import Tkinter; ...

Yes.

> I would really really *HATE* this change!

Well, I'm sorry to hear that -- I'm waiting for this change to happen
for a long time.

> [side note:
>   The 'from MODULE import ...' form is evil and I have abandoned its use
>   in favor of the 'import MODULE' form in 1987 or so, as our Modula-2
>   programs got bigger and bigger.  With 20+ software developers working
>   on a ~1,000,000 LOC of Modula-2 software system, this decision
>   proofed itself well.

Well, yes. Though syntactically equivalent,

from package import module

Is the recommended way to use packages, unless there is a specific need.

> May be I didn't understand what this new subdivision of the standard
> library should achieve.  

Namespace cleanup. Too many toplevel names seem evil to some of us.

> Why is a subdivision on the documentation level not sufficient?  
> Why should modules be moved into packages?  I don't get it.

To allow a greater number of modules to live without worrying about
namespace collision.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping at lfw.org  Mon Mar 27 10:08:57 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Mon, 27 Mar 2000 00:08:57 -0800 (PST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org>

Hi, Peter.

Your question as to the purpose of module reorganization is
well worth asking, and perhaps we should stand back for a
while and try to really answer it well first.

I think that my answers for your question would be:

    1. To alleviate potential namespace collision.

    2. To permit talking about packages as a unit.

I hereby solicit other reasons from the rest of the group...

Reason #1 is not a serious problem yet, but i think i've
seen a few cases where it might start to be an issue.
Reason #2 has to do with things like assigning people
responsibility for taking care of a particular package,
or making commitments about which packages will be
available with which distributions or platforms.  Hence,
for example, the idea of the "unix" package.

Neither of these reasons necessitate a deep and holy
hierarchy, so we certainly want to keep it shallow and
simple if we're going to do this at all.

> If the result of this renaming initiative will be that I can't use
> 	import sys, os, time, re, struct, cPickle, parser
> 	import Tkinter; Tk=Tkinter; del Tkinter
> anymore in Python 1.x and instead I have to change this into (for example):
> 	form posix import time
> 	from text import re
> 	from bin import struct
> 	from Python import parser
> 	from ui import Tkinter; ...

Won't

    import sys, os, time.time, text.re, bin.struct, data.pickle, python.parser

also work?  ...i hope?

> The library documentation provides a existing logical subdivision into 
> chapters, which group the library into several kinds of services.  
> IMO this subdivision could be discussed and possibly revised.  
> But at the moment I got the impression, that it was simply ignored.  
> Why?  What's so bad with it?  

I did look at the documentation for some guidance in arranging
the modules, though admittedly it didn't direct me much.


-- ?!ng

"In the sciences, we are now uniquely privileged to sit side by side
with the giants on whose shoulders we stand."
    -- Gerald Holton


From pf at artcom-gmbh.de  Mon Mar 27 10:35:50 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Mon, 27 Mar 2000 10:35:50 +0200 (MEST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org> from Ka-Ping Yee at "Mar 27, 2000  0: 8:57 am"
Message-ID: <m12ZV02-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> > 	import sys, os, time, re, struct, cPickle, parser
[...]

Ka-Ping Yee:
> Won't
> 
>     import sys, os, time.time, text.re, bin.struct, data.pickle, python.parser
> 
> also work?  ...i hope?

That is even worse.  So not only the 'import' sections, which I usually 
keep at the top of my modules, have to be changed:  This way for example
're.compile(...' has to be changed into 'text.re.compile(...' all over 
the place possibly breaking the 'Maximum Line Length' styleguide rule.

Regards, Peter


From pf at artcom-gmbh.de  Mon Mar 27 12:16:48 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Mon, 27 Mar 2000 12:16:48 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
Message-ID: <m12ZWZk-000CpwC@artcom0.artcom-gmbh.de>

String objects have grown methods since 1.5.2.  So it makes sense to
provide a class 'UserString' similar to 'UserList' and 'UserDict', so
that there is a standard base class to inherit from, if someone has the
desire to extend the string methods.  What do you think?

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From fdrake at acm.org  Mon Mar 27 17:12:55 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 27 Mar 2000 10:12:55 -0500 (EST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.GSO.4.10.10003261405460.25062-100000@sundial>
References: <Pine.LNX.4.10.10003260359070.7085-100000@nebula.lyra.org>
	<Pine.GSO.4.10.10003261405460.25062-100000@sundial>
Message-ID: <14559.31351.783771.472320@weyr.cnri.reston.va.us>

Moshe Zadka writes:
 > Well, I'm certainly sorry I gave that impression -- the reason I wans't
 > "right" wasn't that, it was more my desire to be "fast" -- I wanted to
 > have some proposal out the door, since it is harder to argue about
 > something concrete. The biggest prrof of concept that we all agree is that
 > no one seriously took objections to anything -- there were just some minor
 > nits to pick.

  It's *really easy* to argue about something concrete.  ;)  It's just 
harder to misunderstand the specifics of the proposal.
  It's too early to say what people think; not enough people have had
time to look at the proposals yet.
  On the other hand, I think its great -- that we have a proposal to
discuss.  I'll make my comments after I've read through the last
version posted when I have time to read these.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake at acm.org  Mon Mar 27 18:20:43 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 27 Mar 2000 11:20:43 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org>
References: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
	<Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org>
Message-ID: <14559.35419.793906.868645@weyr.cnri.reston.va.us>

Peter Funk said:
 > The library documentation provides a existing logical subdivision into 
 > chapters, which group the library into several kinds of services.  
 > IMO this subdivision could be discussed and possibly revised.  
 > But at the moment I got the impression, that it was simply ignored.  
 > Why?  What's so bad with it?  

Ka-Ping Yee writes:
 > I did look at the documentation for some guidance in arranging
 > the modules, though admittedly it didn't direct me much.

  The library reference is pretty well disorganized at this point.  I
want to improve that for the 1.6 docs.
  I received a suggestion a few months back, but haven't had a chance
to dig into it, or even respond to the email.  ;(


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From jeremy at cnri.reston.va.us  Mon Mar 27 19:14:46 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Mon, 27 Mar 2000 12:14:46 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12ZV02-000CpwC@artcom0.artcom-gmbh.de>
References: <Pine.LNX.4.10.10003262359490.2741-100000@skuld.lfw.org>
	<m12ZV02-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <14559.38662.835289.499610@goon.cnri.reston.va.us>

>>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:

  PF> That is even worse.  So not only the 'import' sections, which I
  PF> usually keep at the top of my modules, have to be changed: This
  PF> way for example 're.compile(...' has to be changed into
  PF> 'text.re.compile(...' all over the place possibly breaking the
  PF> 'Maximum Line Length' styleguide rule.

There is nothing wrong with changing only the import statement:
    from text import re

The only problematic use of from ... import ... is
    from text.re import *
which adds an unspecified set of names to the current namespace.

Jeremy


From moshez at math.huji.ac.il  Mon Mar 27 19:59:34 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Mon, 27 Mar 2000 19:59:34 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14559.35419.793906.868645@weyr.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003271956270.14218-100000@sundial>

Peter Funk said:
> The library documentation provides a existing logical subdivision into 
> chapters, which group the library into several kinds of services.  
> IMO this subdivision could be discussed and possibly revised.  
> But at the moment I got the impression, that it was simply ignored.  
> Why?  What's so bad with it?  

Ka-Ping Yee writes:
> I did look at the documentation for some guidance in arranging
> the modules, though admittedly it didn't direct me much.

Fred L. Drake, Jr. writes:
>   The library reference is pretty well disorganized at this point.  I
> want to improve that for the 1.6 docs.

Let me just mention where my inspirations came from: shame of shames, it
came from Perl. It's hard to use Perl's organization as is, because it
doesn't (view itself) as a general purpose langauge: so things like CGI.pm
are toplevel, and regex's are part of the syntax. However, there are a lot 
of good hints there.


--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From klm at digicool.com  Mon Mar 27 20:31:01 2000
From: klm at digicool.com (Ken Manheimer)
Date: Mon, 27 Mar 2000 13:31:01 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14559.38662.835289.499610@goon.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003271316090.1711-100000@korak.digicool.com>

On Mon, 27 Mar 2000, Jeremy Hylton wrote:

> >>>>> "PF" == Peter Funk <pf at artcom-gmbh.de> writes:
> 
>   PF> That is even worse.  So not only the 'import' sections, which I
>   PF> usually keep at the top of my modules, have to be changed: This
>   PF> way for example 're.compile(...' has to be changed into
>   PF> 'text.re.compile(...' all over the place possibly breaking the
>   PF> 'Maximum Line Length' styleguide rule.
> 
> There is nothing wrong with changing only the import statement:
>     from text import re
> 
> The only problematic use of from ... import ... is
>     from text.re import *
> which adds an unspecified set of names to the current namespace.

Actually, i think there's another important gotcha with from .. import
which may be contributing to peter's sense of concern, but which i don't
think needs to in this case.  I also thought we had discussed providing
transparency in general, at least of the 1.x series.  ?

The other gotcha i mean applies when the thing you're importing is a
terminal, ie a non-module.  Then, changes to the assignments of the names
in the original module aren't reflected in the names you've imported -
they're decoupled from the namespace of the original module.

When the thing you're importing is, itself, a module, the same kind of
thing *can* happen, but you're more generally concerned with tracking
revisions to the contents of those modules, which is tracked ok in the
thing you "from .. import"ed.

I thought the other problem peter was objecting to, having to change the
import sections in the first place, was going to be avoided in the 1.x
series (if we do this kind of thing) by inherently extending the import
path to include all the packages, so people need not change their code?  
Seems like most of this would be fairly transparent w.r.t. the operation
of existing applications.  Have i lost track of the discussion?

Ken
klm at digicool.com


From moshez at math.huji.ac.il  Mon Mar 27 20:55:35 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Mon, 27 Mar 2000 20:55:35 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.LNX.4.21.0003271316090.1711-100000@korak.digicool.com>
Message-ID: <Pine.GSO.4.10.10003272051001.14639-100000@sundial>

On Mon, 27 Mar 2000, Ken Manheimer wrote:

> I also thought we had discussed providing
> transparency in general, at least of the 1.x series.  ?

Yes, but it would be clearly marked as deprecated in 1.7, print out
error messages in 1.8 and won't work at all in 3000. (That's my view on
the point, but I got the feeling this is where the wind is blowing).

So the transperancy mechanism is intended only to be "something backwards
compatible"...it's not supposed to be a reason why things are ugly (I
don't think they are, though). 

BTW: the transperancy mechanism I suggested was not pushing things into
the import path, but rather having toplevel modules which "from import *"
from the modules that were moved.

E.g.,
re.py would contain

# Deprecated: don't import re, it won't work in future releases
from text.re import *

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From skip at mojam.com  Mon Mar 27 21:34:39 2000
From: skip at mojam.com (Skip Montanaro)
Date: Mon, 27 Mar 2000 13:34:39 -0600 (CST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
References: <Pine.GSO.4.10.10003270728070.12902-100000@sundial>
	<m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <14559.47055.604042.381126@beluga.mojam.com>

    Peter> The library documentation provides a existing logical subdivision
    Peter> into chapters, which group the library into several kinds of
    Peter> services.

Perhaps it makes sense to revise the library reference manual's
documentation to reflect the proposed package hierarchy once it becomes
concrete.

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From skip at mojam.com  Mon Mar 27 21:52:08 2000
From: skip at mojam.com (Skip Montanaro)
Date: Mon, 27 Mar 2000 13:52:08 -0600 (CST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252035050.2741-100000@skuld.lfw.org>
References: <Pine.GSO.4.10.10003260129180.9956-100000@sundial>
	<Pine.LNX.4.10.10003252035050.2741-100000@skuld.lfw.org>
Message-ID: <14559.48104.34263.680278@beluga.mojam.com>

Responding to an early item in this thread and trying to adapt to later
items...

Ping wrote:

    I'm not convinced "mime" needs a separate branch here.  (This is the
    deepest part of the tree, and at three levels small alarm bells went off
    in my head.)

It's not clear that mime should be beneath text/mail.  Moshe moved it up a
level, but not the way I would have done it.  I think the mime stuff still
belongs in a separate mime package.  I wouldn't just sprinkle the modules
under text.  I see two possibilities:

    text>mime
    net>mime

I prefer net>mime, because MIME and its artifacts are used heavily in
networked applications where the content being transferred isn't text.

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From fdrake at acm.org  Mon Mar 27 22:05:32 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 27 Mar 2000 15:05:32 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14559.47055.604042.381126@beluga.mojam.com>
References: <Pine.GSO.4.10.10003270728070.12902-100000@sundial>
	<m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
	<14559.47055.604042.381126@beluga.mojam.com>
Message-ID: <14559.48908.354425.313775@weyr.cnri.reston.va.us>

Skip Montanaro writes:
 > Perhaps it makes sense to revise the library reference manual's
 > documentation to reflect the proposed package hierarchy once it becomes
 > concrete.

  I'd go for this.  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido at python.org  Mon Mar 27 22:43:06 2000
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Mar 2000 15:43:06 -0500
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
Message-ID: <200003272043.PAA18445@eric.cnri.reston.va.us>

The _tkinter.c source code is littered with #ifdefs that mostly center
around distinguishing between Tcl/Tk 8.0 and older versions.  The
two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.

Would it be reasonable to assume that everybody is using at least
Tcl/Tk version 8.0?  This would simplify the code somewhat.

Or should I ask this in a larger forum?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at acm.org  Mon Mar 27 22:59:04 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 27 Mar 2000 15:59:04 -0500 (EST)
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us>
References: <200003272043.PAA18445@eric.cnri.reston.va.us>
Message-ID: <14559.52120.633384.651377@weyr.cnri.reston.va.us>

Guido van Rossum writes:
 > The _tkinter.c source code is littered with #ifdefs that mostly center
 > around distinguishing between Tcl/Tk 8.0 and older versions.  The
 > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.
 > 
 > Would it be reasonable to assume that everybody is using at least
 > Tcl/Tk version 8.0?  This would simplify the code somewhat.

  Simplify!  It's more important that the latest versions are
supported than pre-8.0 versions.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From gstein at lyra.org  Mon Mar 27 23:31:30 2000
From: gstein at lyra.org (Greg Stein)
Date: Mon, 27 Mar 2000 13:31:30 -0800 (PST)
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: <14559.52120.633384.651377@weyr.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003271330000.17374-100000@nebula.lyra.org>

On Mon, 27 Mar 2000, Fred L. Drake, Jr. wrote:
> Guido van Rossum writes:
>  > The _tkinter.c source code is littered with #ifdefs that mostly center
>  > around distinguishing between Tcl/Tk 8.0 and older versions.  The
>  > two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.
>  > 
>  > Would it be reasonable to assume that everybody is using at least
>  > Tcl/Tk version 8.0?  This would simplify the code somewhat.
> 
>   Simplify!  It's more important that the latest versions are
> supported than pre-8.0 versions.

I strongly agree.

My motto is, "if the latest Python version doesn't work for you, then
don't upgrade!"  This is also Open Source -- they can easily get the
source to the old _Tkinter if they want new Python + 7.x support.

If you ask in a larger forum, then you are certain to get somebody to say,
"yes... I need that support." Then you have yourself a quandary :-)

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From effbot at telia.com  Mon Mar 27 23:46:50 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Mon, 27 Mar 2000 23:46:50 +0200
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
References: <200003272043.PAA18445@eric.cnri.reston.va.us>
Message-ID: <009801bf9835$f85b87e0$34aab5d4@hagrid>

Guido van Rossum wrote:
> The _tkinter.c source code is littered with #ifdefs that mostly center
> around distinguishing between Tcl/Tk 8.0 and older versions.  The
> two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.
> 
> Would it be reasonable to assume that everybody is using at least
> Tcl/Tk version 8.0?  This would simplify the code somewhat.

yes.

if people are using older versions, they can always
use the version shipped with 1.5.2.

(has anyone actually tested that one with pre-8.0
versions, btw?)

> Or should I ask this in a larger forum?

maybe.  maybe not.

</F>


From jack at oratrix.nl  Mon Mar 27 23:58:56 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 27 Mar 2000 23:58:56 +0200
Subject: [Python-Dev] 1.6 job list 
In-Reply-To: Message by Moshe Zadka <moshez@math.huji.ac.il> ,
	     Sat, 25 Mar 2000 12:16:23 +0200 (IST) , <Pine.GSO.4.10.10003251214081.3539-100000@sundial> 
Message-ID: <20000327215901.ABA08F58C1@oratrix.oratrix.nl>

Recently, Moshe Zadka <moshez at math.huji.ac.il> said:
> Here's a reason: there shouldn't be changes we'll retract later -- we
> need to come up with the (more or less) right hierarchy the first time,
> or we'll do a lot of work for nothing.

I think I disagree here (hmm, it's probably better to say that I
agree, but I agree on a tangent:-). I think we can be 100% sure that
we're wrong the first time around, and we should plan for that.

One of the reasons why were' wrong is because the world is moving
on. A module that at this point in time will reside at some level in
the hierarchy may in a few years (or shorter) be one of a large family 
and be beter off elsewhere in the hierarchy. It would be silly if it
would have to stay where it was because of backward compatability.

If we plan for being wrong we can make the mistakes less painful. I
think that a simple scheme where a module can say "I'm expecting the
Python 1.6 namespace layout" would make transition to a completely
different Python 1.7 namespace layout a lot less painful, because some 
agent could do the mapping. This can either happen at runtime (through 
a namespace, or through an import hook, or probably through other
tricks as well) or optionally by a script that would do the
translations.

Of course this doesn't mean we should go off and hack in a couple of
namespaces (hence my "agreeing on a tangent"), but it does mean that I
think Gregs idea of not wanting to change everything at once has
merit.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From pf at artcom-gmbh.de  Tue Mar 28 00:11:39 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Tue, 28 Mar 2000 00:11:39 +0200 (MEST)
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 27, 2000  3:43: 6 pm"
Message-ID: <m12ZhjX-000CpzC@artcom0.artcom-gmbh.de>

Guido van Rossum:
> Or should I ask this in a larger forum?

Don't ask.  Simply tell the people on comp.lang.python that support
for the ancient Tcl/Tk versions < 8.0 will be dropped in Python 1.6.
Period. ;-)

Regards, Peter


From guido at python.org  Tue Mar 28 00:17:33 2000
From: guido at python.org (Guido van Rossum)
Date: Mon, 27 Mar 2000 17:17:33 -0500
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: Your message of "Tue, 28 Mar 2000 00:11:39 +0200."
             <m12ZhjX-000CpzC@artcom0.artcom-gmbh.de> 
References: <m12ZhjX-000CpzC@artcom0.artcom-gmbh.de> 
Message-ID: <200003272217.RAA28910@eric.cnri.reston.va.us>

> Don't ask.  Simply tell the people on comp.lang.python that support
> for the ancient Tcl/Tk versions < 8.0 will be dropped in Python 1.6.
> Period. ;-)

OK, I'm convinced.  We will pre-8.0 support.  Could someone submit a
set of patches?  It would make sense to call #error if a pre-8.0
version is detected at compile-time!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond at skippinet.com.au  Tue Mar 28 01:02:21 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue, 28 Mar 2000 09:02:21 +1000
Subject: [Python-Dev] Windows and PyObject_NEW
In-Reply-To: <200003251459.PAA09181@python.inrialpes.fr>
Message-ID: <ECEPKNMJLHAPFFJHDOJBOEEHCHAA.mhammond@skippinet.com.au>

Sorry for the delay, but Gordon's reply was accurate so should have kept you
going ;-)

> I've been reading Jeffrey Richter's "Advanced Windows" last night in order
> to try understanding better why PyObject_NEW is implemented
> differently for
> Windows.

So that is where the heaps discussion came from :-)  The problem is simply
"too many heaps are available".

> Again, I feel uncomfortable with this, especially now, when
> I'm dealing with the memory aspect of Python's object
> constructors/desctrs.

It is this exact reason it was added in the first place.

I believe this code predates the "_d" convention on Windows.  AFAIK, this
could could be removed today and everything should work (but see below why
it probably wont)

MSVC allows you to choose from a number of CRT versions.  Only in one of
these versions is the CRTL completely shared between the .EXE and all the
various .DLLs in the application.

What was happening is that this macro ended up causing the "malloc" for a
new object to occur in Python15.dll, but the Python type system meant that
tp_dealloc() (to cleanup the object) was called in the DLL implementing the
new type.  Unless Python15.dll and our extension DLL shared the same CRTL
(and hence the same malloc heap, fileno table etc) things would die.  The
DLL version of "free()" would complain, as it had never seen the pointer
before.  This change meant the malloc() and the free() were both implemented
in the same DLL/EXE

This was particularly true with Debug builds.  MSVC's debug CRTL
implementations have some very nice debugging features (guard-blocks, block
validity checks with debugger breapoints when things go wrong, leak
tracking, etc).  However, this means they use yet another heap.  Mixing
debug builds with release builds in Python is a recipe for disaster.

Theoretically, the problem has largely gone away now that a) we have
seperate "_d" versions and b) the "official" postition is to use the same
CRTL as Python15.dll.  However, is it still a minor FAQ on comp.lang.python
why PyRun_ExecFile (or whatever) fails with mysterious errors - the reason
is exactly the same - they are using a different CRTL, so the CRTL can't map
the file pointers correctly, and we get unexplained IO errors.  But now that
this macro hides the malloc problem, there may be plenty of "home grown"
extensions out there that do use a different CRTL and dont see any
problems - mainly cos they arent throwing file handles around!

Finally getting to the point of all this:

We now also have the PyMem_* functions.  This problem also doesnt exist if
extension modules use these functions instead of malloc()/free().  We only
ask them to change the PyObject allocations and deallocations, not the rest
of their code, so it is no real burden.  IMO, we should adopt these
functions for most internal object allocations and the extension
samples/docs.

Also, we should consider adding relevant PyFile_fopen(), PyFile_fclose()
type functions, that simply are a thin layer over the fopen/fclose
functions.  If extensions writers used these instead of fopen/fclose we
would gain a few fairly intangible things - lose the minor FAQ, platforms
that dont have fopen at all (eg, CE) would love you, etc.

Mark.


From mhammond at skippinet.com.au  Tue Mar 28 03:04:11 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue, 28 Mar 2000 11:04:11 +1000
Subject: [Python-Dev] Windows and PyObject_NEW
In-Reply-To: <ECEPKNMJLHAPFFJHDOJBOEEHCHAA.mhammond@skippinet.com.au>
Message-ID: <ECEPKNMJLHAPFFJHDOJBOEEJCHAA.mhammond@skippinet.com.au>

[I wrote]

> Also, we should consider adding relevant PyFile_fopen(), PyFile_fclose()

Maybe I had something like PyFile_FromString in mind!!

That-damn-time-machine-again-ly,

Mark.


From moshez at math.huji.ac.il  Tue Mar 28 07:36:59 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 28 Mar 2000 07:36:59 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <14559.48104.34263.680278@beluga.mojam.com>
Message-ID: <Pine.GSO.4.10.10003280734001.19279-100000@sundial>

On Mon, 27 Mar 2000, Skip Montanaro wrote:

> Responding to an early item in this thread and trying to adapt to later
> items...
> 
> Ping wrote:
> 
>     I'm not convinced "mime" needs a separate branch here.  (This is the
>     deepest part of the tree, and at three levels small alarm bells went off
>     in my head.)
> 
> It's not clear that mime should be beneath text/mail.  Moshe moved it up a
> level,

Actually, Ping moved it up a level. I only decided to agree with him
retroactively...

> I think the mime stuff still
> belongs in a separate mime package.  I wouldn't just sprinkle the modules
> under text.  I see two possibilities:
> 
>     text>mime
>     net>mime
> 
> I prefer net>mime,

I don't. MIME is not a "wire protocol" like all the other things in net --
it's used inside another wire protocol, like RFC822 or HTTP. If at all,
I'd go for having a 
net/
	mail/
		mime/
Package, but Ping would yell at me again for nesting 3 levels. 
I could live with text/mime, because the mime format basically *is* text.


--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Tue Mar 28 07:47:13 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 28 Mar 2000 07:47:13 +0200 (IST)
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: <200003272043.PAA18445@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003280745210.19279-100000@sundial>

On Mon, 27 Mar 2000, Guido van Rossum wrote:

> The _tkinter.c source code is littered with #ifdefs that mostly center
> around distinguishing between Tcl/Tk 8.0 and older versions.  The
> two pre-8.0 version supported seem to be 7.5/4.1 and 7.6/4.2.
> 
> Would it be reasonable to assume that everybody is using at least
> Tcl/Tk version 8.0?  This would simplify the code somewhat.

I want to ask a different question: when is Python going to officially
support Tcl/Tk v8.2/8.3? I'd really like for this to happen, as I hate
having several libraries of Tcl/Tk on my machine.

(I assume you know the joke about Jews always answering a question 
with a question <wink>)
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From jack at oratrix.nl  Tue Mar 28 10:55:56 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Tue, 28 Mar 2000 10:55:56 +0200
Subject: [Python-Dev] Great Renaming - Straw Man 0.2 
In-Reply-To: Message by Ka-Ping Yee <ping@lfw.org> ,
	     Sat, 25 Mar 2000 23:37:11 -0800 (PST) , <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org> 
Message-ID: <20000328085556.CFEAC370CF2@snelboot.oratrix.nl>

> Okay, here's another shot at it.  Notice a few things:
> ...
> bin
>	  ...
>         image
		  ...
>         sound
>		  ...

These I don't like, I think image and sound should be either at toplevel, or 
otherwise in a separate package (mm?). I know images and sounds are 
customarily stored in binary files, but so are databases and other things.

Hmm, the bin group in general seems to be a bit of a catch-all. gzip, zlib and 
chunk definitely belong together, but struct is a wholly different beast.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From jack at oratrix.nl  Tue Mar 28 11:01:51 2000
From: jack at oratrix.nl (Jack Jansen)
Date: Tue, 28 Mar 2000 11:01:51 +0200
Subject: [Python-Dev] module reorg (was: 1.6 job list) 
In-Reply-To: Message by Moshe Zadka <moshez@math.huji.ac.il> ,
	     Sat, 25 Mar 2000 20:30:26 +0200 (IST) , <Pine.GSO.4.10.10003252028290.7664-100000@sundial> 
Message-ID: <20000328090151.86B59370CF2@snelboot.oratrix.nl>

> On Sat, 25 Mar 2000, David Ascher wrote:
> 
> > This made me think of one issue which is worth considering -- is there a
> > mechanism for third-party packages to hook into the standard naming
> > hierarchy?  It'd be weird not to have the oracle and sybase modules within
> > the db toplevel package, for example.
> 
> My position is that any 3rd party module decides for itself where it wants
> to live -- once we formalized the framework. Consider PyGTK/PyGnome,
> PyQT/PyKDE -- they should live in the UI package too...

For separate modules, yes. For packages this is different. As a point in case 
think of MacPython: it could stuff all mac-specific packages under the 
toplevel "mac", but it would probably be nicer if it could extend the existing 
namespace. It is a bit silly if mac users have to do "from mac.text.encoding 
import macbinary" but "from text.encoding import binhex", just because BinHex 
support happens to live in the core (purely for historical reasons).

But maybe this holds only for the platform distributions, then it shouldn't be 
as much of a problem as there aren't that many.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From moshez at math.huji.ac.il  Tue Mar 28 11:24:14 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 28 Mar 2000 11:24:14 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2 
In-Reply-To: <20000328085556.CFEAC370CF2@snelboot.oratrix.nl>
Message-ID: <Pine.GSO.4.10.10003281121380.23735-100000@sundial>

On Tue, 28 Mar 2000, Jack Jansen wrote:

> These I don't like, I think image and sound should be either at toplevel, or 
> otherwise in a separate package (mm?). I know images and sounds are 
> customarily stored in binary files, but so are databases and other things.

Hmmm...I think of "bin" as "interface to binary files". Agreed that I
don't have a good reason for seperating gdbm from zlib.

> Hmm, the bin group in general seems to be a bit of a catch-all. gzip, zlib and 
> chunk definitely belong together, but struct is a wholly different beast.

I think Ping and I decided to move struct to toplevel.
Ping, would you like to take your last proposal and fold into it the
consensual changes,, or should I?
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From effbot at telia.com  Tue Mar 28 11:44:14 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Tue, 28 Mar 2000 11:44:14 +0200
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
References: <200003242103.QAA03288@eric.cnri.reston.va.us>
Message-ID: <02c101bf989a$2ee35860$34aab5d4@hagrid>

Guido van Rossum <guido at python.org> wrote:
> Similar to append(), I'd like to close this gap, and I've made the
> necessary changes.  This will probably break lots of code.
> 
> Similar to append(), I'd like people to fix their code rather than
> whine -- two-arg connect() has never been documented, although it's
> found in much code (even the socket module test code :-( ).
> 
> Similar to append(), I may revert the change if it is shown to cause
> too much pain during beta testing...

proposal: if anyone changes the API for a fundamental module, and
fails to update the standard library, the change is automatically "minus
one'd" for each major module that no longer works :-)

(in this case, that would be -5 or so...)

</F>


From effbot at telia.com  Tue Mar 28 11:55:19 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Tue, 28 Mar 2000 11:55:19 +0200
Subject: [Python-Dev] Great Renaming?  What is the goal?
References: <m12ZTNx-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <02c901bf989b$be203d80$34aab5d4@hagrid>

Peter Funk wrote:
> Why should modules be moved into packages?  I don't get it.

fwiw, neither do I...

I'm not so sure that Python really needs a simple reorganization
of the existing set of standard library modules.  just moving the
modules around won't solve the real problems with the 1.5.2 std
library...

> IMO this subdivision could be discussed and possibly revised.  

here's one proposal:
http://www.pythonware.com/people/fredrik/librarybook-contents.htm

</F>


From gstein at lyra.org  Tue Mar 28 12:09:44 2000
From: gstein at lyra.org (Greg Stein)
Date: Tue, 28 Mar 2000 02:09:44 -0800 (PST)
Subject: [Python-Dev] 3rd parties in the hierarchy (was: module reorg)
In-Reply-To: <20000328090151.86B59370CF2@snelboot.oratrix.nl>
Message-ID: <Pine.LNX.4.10.10003280207350.17374-100000@nebula.lyra.org>

On Tue, 28 Mar 2000, Jack Jansen wrote:
> > On Sat, 25 Mar 2000, David Ascher wrote:
> > > This made me think of one issue which is worth considering -- is there a
> > > mechanism for third-party packages to hook into the standard naming
> > > hierarchy?  It'd be weird not to have the oracle and sybase modules within
> > > the db toplevel package, for example.
> > 
> > My position is that any 3rd party module decides for itself where it wants
> > to live -- once we formalized the framework. Consider PyGTK/PyGnome,
> > PyQT/PyKDE -- they should live in the UI package too...
> 
> For separate modules, yes. For packages this is different. As a point in case 
> think of MacPython: it could stuff all mac-specific packages under the 
> toplevel "mac", but it would probably be nicer if it could extend the existing 
> namespace. It is a bit silly if mac users have to do "from mac.text.encoding 
> import macbinary" but "from text.encoding import binhex", just because BinHex 
> support happens to live in the core (purely for historical reasons).
> 
> But maybe this holds only for the platform distributions, then it shouldn't be 
> as much of a problem as there aren't that many.

Assuming that you use an archive like those found in my "small" distro or
Gordon's distro, then this is no problem. The archive simply recognizes
and maps "text.encoding.macbinary" to its own module.

Another way to say it: stop thinking in terms of the filesystem as the
sole mechanism for determining placement in the package hierarchy.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From guido at python.org  Tue Mar 28 15:38:12 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 08:38:12 -0500
Subject: [Python-Dev] Do we need to support Tcl/Tk versions before 8.0?
In-Reply-To: Your message of "Tue, 28 Mar 2000 07:47:13 +0200."
             <Pine.GSO.4.10.10003280745210.19279-100000@sundial> 
References: <Pine.GSO.4.10.10003280745210.19279-100000@sundial> 
Message-ID: <200003281338.IAA29532@eric.cnri.reston.va.us>

> I want to ask a different question: when is Python going to officially
> support Tcl/Tk v8.2/8.3? I'd really like for this to happen, as I hate
> having several libraries of Tcl/Tk on my machine.

This is already in the CVS tree, except for the Windows installer.
Python 1.6 will not install a separate complete Tcl installation;
instead, it will install the needed Tcl/Tk files (Tcl/Tk 8.3 or newer)
in the Python tree, so it won't affect existing Tcl/Tk installations.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Tue Mar 28 15:57:02 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 08:57:02 -0500
Subject: [Python-Dev] Heads up: socket.connect() breakage ahead
In-Reply-To: Your message of "Tue, 28 Mar 2000 11:44:14 +0200."
             <02c101bf989a$2ee35860$34aab5d4@hagrid> 
References: <200003242103.QAA03288@eric.cnri.reston.va.us>  
            <02c101bf989a$2ee35860$34aab5d4@hagrid> 
Message-ID: <200003281357.IAA29621@eric.cnri.reston.va.us>

> proposal: if anyone changes the API for a fundamental module, and
> fails to update the standard library, the change is automatically "minus
> one'd" for each major module that no longer works :-)
> 
> (in this case, that would be -5 or so...)

Oops.  Sigh.  While we're pretending that this change goes in, could
you point me to those five modules?  Also, we need to add test cases
to the standard test suite that would have found these!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward at cnri.reston.va.us  Tue Mar 28 17:04:47 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Tue, 28 Mar 2000 10:04:47 -0500
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>; from ping@lfw.org on Sat, Mar 25, 2000 at 11:37:11PM -0800
References: <Pine.GSO.4.10.10003260129180.9956-100000@sundial> <Pine.LNX.4.10.10003252300450.2741-100000@skuld.lfw.org>
Message-ID: <20000328100446.A2586@cnri.reston.va.us>

On 25 March 2000, Ka-Ping Yee said:
> Okay, here's another shot at it.  Notice a few things:

Damn, I started writing a response to Moshe's original proposal -- and
*then* saw this massive thread.  Oh well.  Turns out I still have a few
useful things to say:

First, any organization scheme for the standard library (or anything
else, for that matter) should have a few simple guidelines.  Here are
two:

  * "deep hierarchies considered harmful": ie. avoid sub-packages if at
    all possible

  * "everything should have a purpose": every top-level package should
    be describable with a single, clear sentence of plain language.
    Eg.:
       net - Internet protocols, data formats, and client/server infrastructure
       unix - Unix-specific system calls, protocols, and conventions

And two somewhat open issues:

  * "as long as we're renaming...": maybe this would be a good time to
    standardize naming conventions, eg. "cgi" -> "cgilib" *or*
    "{http,ftp,url,...}lib" -> "{http,ftp,url}...", "MimeWriter" ->
    "mimewriter", etc.

  * "shared namespaces vs system namespaces": the Perl model of "nothing
    belongs to The System; anyone can add a module in Text:: or Net:: or
    whatever" works there because Perl doesn't have __init__ files or
    anything to distinguish module namespaces; they just are.  Python's
    import mechanism would have to change to support this, and the fact
    that __init__ files may contain arbitrary code makes this feel
    like a very tricky change to make.

Now specific comments...

> net
>         urlparse
>         urllib
>         ftplib
>         gopherlib
>         imaplib
>         poplib
>         nntplib
>         smtplib
>         telnetlib
>         httplib
>         cgi

Rename?  Either cgi -> cgilib or foolib -> foo?

>         server
>                 BaseHTTPServer
>                 CGIHTTPServer
>                 SimpleHTTPServer
>                 SocketServer
>                 asynchat
>                 asyncore

This is one good place for a sub-package.  It's a also a good place to
rename: the convention for Python module names seems to be
all-lowercase; and "Server" is redundant when you're in the net.server
package.  How about:

    net.server.base_http
    net.server.cgi_http
    net.server.simple_http
    net.server.socket

Underscores negotiable.  They don't seem to be popular in module names,
although sometimes they would be real life-savers.

> text

I think "text" should mean "plain old unstructured, un-marked-up ASCII
text", where "unstructured, un-marked-up" really means "not structured
or marked up in a well-known standard way".

Or maybe not.  I'm just trying to come up with an excuse for moving xml
to top-level, which I think is where it belongs.  Maybe the excuse
should just be, "XML is really important and visible, and anyways Paul
Prescod will raise a stink if it isn't put at top-level in Python
package-space".

>         re              # general-purpose parsing
 
Top-level: this is a fundamental module that should be treated on a par
with 'string'.  (Well, except for building RE methods into
strings... hmmMMmm...maybe... [no, I'm kidding!])

>         sgmllib
>         htmllib
>         htmlentitydefs

Not sure what to do about these.  Someone referred somewhere to a "web"
top-level package, which seems to have disappeared.  If it reappars, it
would be a good place for the HTML modules (not to mention a big chunk
of "net") -- this would mainly be for "important and visible" (ie. PR)
reasons, rather than sound technical reasons.

>         xml
>                 whatever the xml-sig puts here

Should be top-level.

>         mail
>                 rfc822
>                 mailbox
>                 mhlib

"mail" should either be top-level or under "net".  (Yes, I *know* it's
not a wire-level protocol: that's what net.smtplib is for.  But last
time I checked, email is pretty useless without a network.  And
vice-versa.)

Or maybe these all belong in a top-level "data" package: I'm starting to
warm to that.

> bin
>         gzip
>         zlib
>         chunk
>         struct
>         image
>                 imghdr
>                 colorsys        # a bit unsure, but doesn't go anywhere else
>                 imageop
>                 imgfile
>                 rgbimg
>                 yuvconvert
>         sound
>                 aifc
>                 sndhdr
>                 toaiff
>                 audiodev
>                 sunau
>                 sunaudio
>                 wave
>                 audioop
>                 sunaudiodev

I agree with Jack: image and sound (audio?) should be top-level.  I
don't think I like the idea of an intervening "mm" or "multimedia" or
"media" or what-have-you package, though.

The other stuff in "bin" is kind of a grab-bag: "chunk" and "struct"
might belong in the mythical "data" package.

> db
>         anydbm
>         whichdb
>         bsddb
>         dbm
>         dbhash
>         dumbdbm
>         gdbm

Yup.

> math
>         math            # library functions
>         cmath
>         fpectl          # type-related
>         fpetest
>         array
>         mpz
>         fpformat        # formatting
>         locale
>         bisect          # algorithm: also unsure, but doesn't go anywhere else
>         random          # randomness
>         whrandom
>         crypt           # cryptography
>         md5
>         rotor
>         sha

Hmmm.  "locale" has already been dealt with; obviously it should be
top-evel.  I think "array" should be top-level or under the mythical
"data".

Six crypto-related modules seems like enough to justify a top-level
"crypt" package, though.

> time
>         calendar
>         time
>         tzparse
>         sched
>         timing

Yup.

> interp
>         new
>         linecache       # handling .py files
[...]
>         tabnanny
>         pstats
>         rlcompleter     # this might go in "ui"...

I like "python" for this one.  (But I'm not sure if tabnanny and
rlcompleter belong there.)

> security
>         Bastion
>         rexec
>         ihooks

What does ihooks have to do with security?

> file
>         dircache
>         path -- a virtual module which would do a from <something>path import *
>         nturl2path
>         macurl2path
>         filecmp
>         fileinput
>         StringIO

Lowercase for consistency?

>         glob
>         fnmatch
>         stat
>         statcache
>         statvfs
>         tempfile
>         shutil
>         pipes
>         popen2
>         commands
>         dl

No problem until these last two -- 'commands' is a Unix-specific thing
that has very little to do with the filesystem per se, and 'dl' is (as I
understand it) deep ju-ju with sharp edges that should probably be
hidden away in the 'python' ('sys'?) package.

Oh yeah, "dl" should be elsewhere -- "python" maybe?  Top-level?
Perhaps we need a "deepmagic" package for "dl" and "new"?  ;-)

> data
>         pickle
>         shelve
>         xdrlib
>         copy
>         copy_reg
>         UserDict
>         UserList
>         pprint
>         repr
>         (cPickle)

Oh hey, it's *not* a mythical package!  Guess I didn't read far enough
ahead.  I like it, but would add more stuff to it (obviously): 'struct',
'chunk', 'array' for starters.

Should cPickle be renamed to fastpickle?

> threads
>         thread
>         threading
>         Queue

Lowercase?

> ui
>         _tkinter
>         curses
>         Tkinter
>         cmd
>         getpass
>         getopt
>         readline

> users
>         pwd
>         grp
>         nis

These belong in "unix".  Possibly "nis" belongs in "net" -- do any
non-Unix OSes use NIS?

> sgi
>         al
>         cd
>         cl
>         fl
>         fm
>         gl
>         misc (what used to be sgimodule.c)
>         sv

Should this be "sgi" or "irix"?  Ditto for "sun" vs "solaris" if there
are a significant number of Sun/Solaris modules.  Note that the
respective trademark holders might get very antsy about who gets to put
names in those namespaces -- that's exactly what happened with Sun,
Solaris 8, and Perl.  I believe the compromise they arrived at was that
the "Solaris::" namespace remains open, but Sun gets the "Sun::"
namespace.

There should probably be a win32 package, for core registry access stuff
if nothing else.  There might someday be a "linux" package; it's highly
unlikely there would be a "pc" or "alpha" package though.  All of those
argue over "irix" and "solaris" instead of "sgi" and "sun".

        Greg


From gvwilson at nevex.com  Tue Mar 28 17:45:10 2000
From: gvwilson at nevex.com (gvwilson at nevex.com)
Date: Tue, 28 Mar 2000 10:45:10 -0500 (EST)
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: <Pine.GSO.4.10.10003251036170.3539-100000@sundial>
Message-ID: <Pine.LNX.4.10.10003281038400.29911-100000@akbar.nevex.com>

> > Greg Wilson
> > If None becomes a keyword, I would like to ask whether it could be
> > used to signal that a method is a class method, as opposed to an
> > instance method:

> I'd like to know what you mean by "class" method. (I do know C++ and
> Java, so I have some idea...). Specifically, my question is: how does
> a class method access class variables? They can't be totally
> unqualified (because that's very unpythonic). If they are qualified by
> the class's name, I see it as a very mild improvement on the current
> situation. You could suggest, for example, to qualify class variables
> by "class" (so you'd do things like:
>
> 	class.x = 1
>
> ), but I'm not sure I like it. On the whole, I think it is a much
> bigger issue on how be denote class methods.

I don't like overloading the word 'class' this way, as it makes it
difficult to distinguish a parent's 'foo' member and a child's 'foo'
member:

class Parent:
    foo = 3
    ...other stuff...

class Child(Parent):
    foo = 9
    def test():
        print class.foo   # obviously 9, but how to get 3?

I think that using the class's name instead of 'self' will be easy to
explain, will look like it belongs in the language, will be unlikely to
lead to errors, and will handle multiple inheritance with ease:

class Child(Parent):
    foo = 9
    def test():
        print Child.foo   # 9
        print Parent.foo  # 3

> Also, one slight problem with your method of denoting class methods:
> currently, it is possible to add instance method at run time to a
> class by something like
> 
> class C:
> 	pass
> 
> def foo(self):
> 	pass
> 
> C.foo = foo
> 
> In your suggestion, how do you view the possiblity of adding class
> methods to a class? (Note that "foo", above, is also perfectly usable
> as a plain function).

Hm, I hadn't thought of this... :-(

> > I'd also like to ask (separately) that assignment to None be defined as a
> > no-op, so that programmers can write:
> > 
> >     year, month, None, None, None, None, weekday, None, None = gmtime(time())
> > 
> > instead of having to create throw-away variables to fill in slots in
> > tuples that they don't care about.
> 
> Currently, I use "_" for that purpose, after I heard the idea from
> Fredrik Lundh.

I do the same thing when I need to; I just thought that making assignment
to "None" special would formalize this in a readable way.


From jeremy at cnri.reston.va.us  Tue Mar 28 19:31:48 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Tue, 28 Mar 2000 12:31:48 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.LNX.4.21.0003271316090.1711-100000@korak.digicool.com>
References: <14559.38662.835289.499610@goon.cnri.reston.va.us>
	<Pine.LNX.4.21.0003271316090.1711-100000@korak.digicool.com>
Message-ID: <14560.60548.74378.613188@goon.cnri.reston.va.us>

>>>>> "KLM" == Ken Manheimer <klm at digicool.com> writes:

  >> The only problematic use of from ... import ... is 
  >>     from text.re import * 
  >> which adds an unspecified set of names to the current
  >> namespace.

  KLM> The other gotcha i mean applies when the thing you're importing
  KLM> is a terminal, ie a non-module.  Then, changes to the
  KLM> assignments of the names in the original module aren't
  KLM> reflected in the names you've imported - they're decoupled from
  KLM> the namespace of the original module.

This isn't an import issue.  Some people simply don't understand
that assignment (and import as form of assignment) is name binding.
Import binds an imported object to a name in the current namespace.
It does not affect bindings in other namespaces, nor should it.

  KLM> I thought the other problem peter was objecting to, having to
  KLM> change the import sections in the first place, was going to be
  KLM> avoided in the 1.x series (if we do this kind of thing) by
  KLM> inherently extending the import path to include all the
  KLM> packages, so people need not change their code?  Seems like
  KLM> most of this would be fairly transparent w.r.t. the operation
  KLM> of existing applications.

I'm not sure if there is consensus on backwards compatibility.  I'm
not in favor of creating a huge sys.path that includes every package's
contents.  It would be a big performance hit.

Jeremy


From moshez at math.huji.ac.il  Tue Mar 28 19:36:47 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Tue, 28 Mar 2000 19:36:47 +0200 (IST)
Subject: [Python-Dev] Great Renaming - Straw Man 0.2
In-Reply-To: <20000328100446.A2586@cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003281914251.14542-100000@sundial>

On Tue, 28 Mar 2000, Greg Ward wrote:

>   * "deep hierarchies considered harmful": ie. avoid sub-packages if at
>     all possible
> 
>   * "everything should have a purpose": every top-level package should
>     be describable with a single, clear sentence of plain language.

Good guidelines, but they aren't enough. And anyway, rules were meant to
be broken <0.9 wink>

>   * "as long as we're renaming...": maybe this would be a good time to
>     standardize naming conventions, eg. "cgi" -> "cgilib" *or*
>     "{http,ftp,url,...}lib" -> "{http,ftp,url}...", "MimeWriter" ->
>     "mimewriter", etc.

+1

>   * "shared namespaces vs system namespaces": the Perl model of "nothing
>     belongs to The System; anyone can add a module in Text:: or Net:: or
>     whatever" works there because Perl doesn't have __init__ files or
>     anything to distinguish module namespaces; they just are.  Python's
>     import mechanism would have to change to support this, and the fact
>     that __init__ files may contain arbitrary code makes this feel
>     like a very tricky change to make.

Indeed. But I still feel that "few things should belong to the system"
is quite a useful rule...
(That's what I referred to when I said Perl's module system is more suited
to CPAN (now there's a surprise))

> Rename?  Either cgi -> cgilib or foolib -> foo?

Yes. But I wanted the first proposal to be just about placing stuff,
because that airs out more disagreements.

> This is one good place for a sub-package.  It's a also a good place to
> rename: the convention for Python module names seems to be
> all-lowercase; and "Server" is redundant when you're in the net.server
> package.  How about:
> 
>     net.server.base_http
>     net.server.cgi_http
>     net.server.simple_http
>     net.server.socket

Hmmmmm......+0

> Underscores negotiable.  They don't seem to be popular in module names,
> although sometimes they would be real life-savers.

Personally, I prefer underscores to CamelCase.

> Or maybe not.  I'm just trying to come up with an excuse for moving xml
> to top-level, which I think is where it belongs.  Maybe the excuse
> should just be, "XML is really important and visible, and anyways Paul
> Prescod will raise a stink if it isn't put at top-level in Python
> package-space".

I still think "xml" should be a brother to "html" and "sgml".
Current political trans not withstanding.

> Not sure what to do about these.  Someone referred somewhere to a "web"
> top-level package, which seems to have disappeared.  If it reappars, it
> would be a good place for the HTML modules (not to mention a big chunk
> of "net") -- this would mainly be for "important and visible" (ie. PR)
> reasons, rather than sound technical reasons.

I think the "web" package should be reinstated. But you won't like it:
I'd put xml in web.

> "mail" should either be top-level or under "net".  (Yes, I *know* it's
> not a wire-level protocol: that's what net.smtplib is for.  But last
> time I checked, email is pretty useless without a network.  And
> vice-versa.)

Ummmm.....I'd disagree, but I lack the strength and the moral conviction.
Put it under net and we'll call it a deal <wink>

> Or maybe these all belong in a top-level "data" package: I'm starting to
> warm to that.

Ummmm...I don't like the "data" package personally. It seems to disobey
your second guideline.

> I agree with Jack: image and sound (audio?) should be top-level.  I
> don't think I like the idea of an intervening "mm" or "multimedia" or
> "media" or what-have-you package, though.

Definitely multimedia. Okay, I'm bought.

> Six crypto-related modules seems like enough to justify a top-level
> "crypt" package, though.

It seemed obvious to me that "crypt" should be under "math". But maybe
that's just the mathematician in me speaking.

> I like "python" for this one.  (But I'm not sure if tabnanny and
> rlcompleter belong there.)

I agree, and I'm not sure about rlcompleter, but am sure about tabnanny.

> What does ihooks have to do with security?

Well, it was more or less written to support rexec. A weak argument,
admittedly

> No problem until these last two -- 'commands' is a Unix-specific thing
> that has very little to do with the filesystem per se

Hmmmmm...it is on the same level with popen. Why not move popen too?

>, and 'dl' is (as I
> understand it) deep ju-ju with sharp edges that should probably be
> hidden away 

Ummmmmm.....not in the "python" package: it doesn't have anything to
do with the interpreter.

> Should this be "sgi" or "irix"?  Ditto for "sun" vs "solaris" if there
> are a significant number of Sun/Solaris modules.  Note that the
> respective trademark holders might get very antsy about who gets to put
> names in those namespaces -- that's exactly what happened with Sun,
> Solaris 8, and Perl.  I believe the compromise they arrived at was that
> the "Solaris::" namespace remains open, but Sun gets the "Sun::"
> namespace.

Ummmmm.....I don't see how they have any legal standing. I for one refuse
to care about what Sun Microsystem thinks about names for Python packages.

> There should probably be a win32 package, for core registry access stuff
> if nothing else.

And for all the other extensions in win32all
Yep! 
(Just goes to show what happens when you decide to package based on a UNIX
system)

> All of those
> argue over "irix" and "solaris" instead of "sgi" and "sun".

Fine with me -- just wanted to move them out of my face <wink>
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From andy at reportlab.com  Tue Mar 28 20:13:02 2000
From: andy at reportlab.com (Andy Robinson)
Date: Tue, 28 Mar 2000 18:13:02 GMT
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <20000327170031.693531CDF6@dinsdale.python.org>
References: <20000327170031.693531CDF6@dinsdale.python.org>
Message-ID: <38e0f4cf.24247656@post.demon.co.uk>

On Mon, 27 Mar 2000 12:00:31 -0500 (EST), Peter Funk wrote:

> Do we need a UserString class?

This will probably be useful on top of the i18n stuff in due course,
so I'd like it.

Something Mike Da Silva and I have discussed a lot is implementing a
higher-level 'typed string' library on top of the Unicode stuff.  
A 'typed string' is like a string, but knows what encoding it is in -
possibly Unicode, possibly a native encoding and embodies some basic
type safety and convenience notions, like not being able to add a
Shift-JIS and an EUC string together.  Iteration would always be per
character, not per byte; and a certain amount of magic would say that
if the string was (say) Japanese, it would acquire a few extra methods
for doing some Japan-specific things like expanding half-width
katakana.

Of course, we can do this anyway, but I think defining the API clearly
in UserString is a great idea.

- Andy Robinson


From guido at python.org  Tue Mar 28 21:22:43 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 14:22:43 -0500
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: Your message of "Tue, 28 Mar 2000 18:13:02 GMT."
             <38e0f4cf.24247656@post.demon.co.uk> 
References: <20000327170031.693531CDF6@dinsdale.python.org>  
            <38e0f4cf.24247656@post.demon.co.uk> 
Message-ID: <200003281922.OAA03113@eric.cnri.reston.va.us>

> > Do we need a UserString class?
> 
> This will probably be useful on top of the i18n stuff in due course,
> so I'd like it.
> 
> Something Mike Da Silva and I have discussed a lot is implementing a
> higher-level 'typed string' library on top of the Unicode stuff.  
> A 'typed string' is like a string, but knows what encoding it is in -
> possibly Unicode, possibly a native encoding and embodies some basic
> type safety and convenience notions, like not being able to add a
> Shift-JIS and an EUC string together.  Iteration would always be per
> character, not per byte; and a certain amount of magic would say that
> if the string was (say) Japanese, it would acquire a few extra methods
> for doing some Japan-specific things like expanding half-width
> katakana.
> 
> Of course, we can do this anyway, but I think defining the API clearly
> in UserString is a great idea.

Agreed.  Please somebody send a patch!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Tue Mar 28 21:25:39 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 14:25:39 -0500
Subject: [Python-Dev] First alpha release of Python 1.6
Message-ID: <200003281925.OAA03287@eric.cnri.reston.va.us>

I'm hoping to release a first, rough alpha of Python 1.6 by April 1st
(no joke!).

Not everything needs to be finished by then, but I hope to have the
current versions of distutil, expat, and sre in there.

Anything else that needs to go into 1.6 and isn't ready yet?  (Small
stuff doesn't matter, everything currently in the patches queue can
probably go in if it isn't rejected by then.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From DavidA at ActiveState.com  Tue Mar 28 21:40:24 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Tue, 28 Mar 2000 11:40:24 -0800
Subject: [Python-Dev] First alpha release of Python 1.6
In-Reply-To: <200003281925.OAA03287@eric.cnri.reston.va.us>
Message-ID: <NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com>

> Anything else that needs to go into 1.6 and isn't ready yet? 

No one seems to have found time to figure out the mmap module support.

--david


From guido at python.org  Tue Mar 28 21:33:29 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 14:33:29 -0500
Subject: [Python-Dev] First alpha release of Python 1.6
In-Reply-To: Your message of "Tue, 28 Mar 2000 11:40:24 PST."
             <NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com> 
References: <NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com> 
Message-ID: <200003281933.OAA04896@eric.cnri.reston.va.us>

> > Anything else that needs to go into 1.6 and isn't ready yet? 
> 
> No one seems to have found time to figure out the mmap module support.

I wasn't even aware that that was a priority.  If someone submits it,
it will go in -- alpha 1 is not a total feature freeze, just a
"testing the waters".

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tismer at tismer.com  Tue Mar 28 21:49:17 2000
From: tismer at tismer.com (Christian Tismer)
Date: Tue, 28 Mar 2000 21:49:17 +0200
Subject: [Python-Dev] First alpha release of Python 1.6
References: <200003281925.OAA03287@eric.cnri.reston.va.us>
Message-ID: <38E10CBD.C6B71D50@tismer.com>


Guido van Rossum wrote:
...
> Anything else that needs to go into 1.6 and isn't ready yet?

Stackless Python of course, but it *is* ready yet.

Just kidding. I will provide a compressed unicode database
in a few days. That will be a non-Python-specific module,
and (Marc or I) will provide a Python specific wrapper.
This will probably not get ready until April 1.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From akuchlin at mems-exchange.org  Tue Mar 28 21:51:29 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Tue, 28 Mar 2000 14:51:29 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com>
References: <200003281925.OAA03287@eric.cnri.reston.va.us>
	<NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com>
Message-ID: <14561.3393.761177.776684@amarok.cnri.reston.va.us>

David Ascher writes:
>> Anything else that needs to go into 1.6 and isn't ready yet? 
>No one seems to have found time to figure out the mmap module support.

The issue there is cross-platform compatibility; the Windows and Unix
versions take completely different constructor arguments, so how
should we paper over the differences?

Unix arguments: (file descriptor, size, flags, protection)
Win32 arguments:(filename, tagname, size)

We could just say, "OK, the args are completely different between
Win32 and Unix, despite it being the same function name".  Maybe
that's best, because there seems no way to reconcile those two
different sets of arguments.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
I'm here for the FBI, not the _Weekly World News_.
  -- Scully in X-FILES #1


From DavidA at ActiveState.com  Tue Mar 28 22:06:09 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Tue, 28 Mar 2000 12:06:09 -0800
Subject: [Python-Dev] mmapfile module
In-Reply-To: <14561.3393.761177.776684@amarok.cnri.reston.va.us>
Message-ID: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>

> The issue there is cross-platform compatibility; the Windows and Unix
> versions take completely different constructor arguments, so how
> should we paper over the differences?
>
> Unix arguments: (file descriptor, size, flags, protection)
> Win32 arguments:(filename, tagname, size)
>
> We could just say, "OK, the args are completely different between
> Win32 and Unix, despite it being the same function name".  Maybe
> that's best, because there seems no way to reconcile those two
> different sets of arguments.

I guess my approach would be to provide two platform-specific modules, and
to figure out a high-level Python module which could provide a reasonable
platform-independent interface on top of it.  One problem with that approach
is that I think that there is also great value in having a portable mmap
interface in the C layer, where i see lots of possible uses in extension
modules (much like the threads API).

--david


From guido at python.org  Tue Mar 28 22:00:57 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 15:00:57 -0500
Subject: [Python-Dev] mmapfile module
In-Reply-To: Your message of "Tue, 28 Mar 2000 12:06:09 PST."
             <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> 
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> 
Message-ID: <200003282000.PAA11988@eric.cnri.reston.va.us>

> > The issue there is cross-platform compatibility; the Windows and Unix
> > versions take completely different constructor arguments, so how
> > should we paper over the differences?
> >
> > Unix arguments: (file descriptor, size, flags, protection)
> > Win32 arguments:(filename, tagname, size)
> >
> > We could just say, "OK, the args are completely different between
> > Win32 and Unix, despite it being the same function name".  Maybe
> > that's best, because there seems no way to reconcile those two
> > different sets of arguments.
> 
> I guess my approach would be to provide two platform-specific modules, and
> to figure out a high-level Python module which could provide a reasonable
> platform-independent interface on top of it.  One problem with that approach
> is that I think that there is also great value in having a portable mmap
> interface in the C layer, where i see lots of possible uses in extension
> modules (much like the threads API).

I don't know enough about this, but it seems that there might be two
steps: *creating* a mmap object is necessarily platform-specific; but
*using* a mmap object could be platform-neutral.

What is the API for mmap objects?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From klm at digicool.com  Tue Mar 28 22:07:25 2000
From: klm at digicool.com (Ken Manheimer)
Date: Tue, 28 Mar 2000 15:07:25 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14560.60548.74378.613188@goon.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.21.0003281504430.10812-100000@korak.digicool.com>

On Tue, 28 Mar 2000, Jeremy Hylton wrote:

> >>>>> "KLM" == Ken Manheimer <klm at digicool.com> writes:
> 
>   >> The only problematic use of from ... import ... is 
>   >>     from text.re import * 
>   >> which adds an unspecified set of names to the current
>   >> namespace.
> 
>   KLM> The other gotcha i mean applies when the thing you're importing
>   KLM> is a terminal, ie a non-module.  Then, changes to the
>   KLM> assignments of the names in the original module aren't
>   KLM> reflected in the names you've imported - they're decoupled from
>   KLM> the namespace of the original module.
> 
> This isn't an import issue.  Some people simply don't understand
> that assignment (and import as form of assignment) is name binding.
> Import binds an imported object to a name in the current namespace.
> It does not affect bindings in other namespaces, nor should it.

I know that - i was addressing the asserted evilness of

from ... import ...

and how it applied - and didn't - w.r.t. packages.

>   KLM> I thought the other problem peter was objecting to, having to
>   KLM> change the import sections in the first place, was going to be
>   KLM> avoided in the 1.x series (if we do this kind of thing) by
>   KLM> inherently extending the import path to include all the
>   KLM> packages, so people need not change their code?  Seems like
>   KLM> most of this would be fairly transparent w.r.t. the operation
>   KLM> of existing applications.
> 
> I'm not sure if there is consensus on backwards compatibility.  I'm
> not in favor of creating a huge sys.path that includes every package's
> contents.  It would be a big performance hit.

Yes, someone reminded me that the other (better, i think) option is stub
modules in the current places that do the "from ... import *" for the
right values of "...".  py3k finishes the migration by eliminating the
stubs.

Ken
klm at digicool.com


From gward at cnri.reston.va.us  Tue Mar 28 22:29:55 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Tue, 28 Mar 2000 15:29:55 -0500
Subject: [Python-Dev] First alpha release of Python 1.6
In-Reply-To: <200003281925.OAA03287@eric.cnri.reston.va.us>; from guido@python.org on Tue, Mar 28, 2000 at 02:25:39PM -0500
References: <200003281925.OAA03287@eric.cnri.reston.va.us>
Message-ID: <20000328152955.A3136@cnri.reston.va.us>

On 28 March 2000, Guido van Rossum said:
> I'm hoping to release a first, rough alpha of Python 1.6 by April 1st
> (no joke!).
> 
> Not everything needs to be finished by then, but I hope to have the
> current versions of distutil, expat, and sre in there.

We just need to do a bit of CVS trickery to put Distutils under the
Python tree.  I'd *like* for Distutils to have its own CVS existence at
least until 1.6 is released, but it's not essential.

Two of the big Distutils to-do items that I enumerated at IPC8 have been
knocked off: the "dist" command has been completely redone (and renamed
"sdist", for "source distribution"), as has the "install" command.

The really major to-do items left for Distutils are:

  * implement the "bdist" command with enough marbles to generate RPMs
    and some sort of Windows installer (Wise?); Solaris packages,
    Debian packages, and something for the Mac would be nice too.

  * documentation (started, but only just)

And there are some almost-as-important items:

  * Mac OS support; this has been started, at least for the
    unfashionable and clunky sounding MPW compiler; CodeWarrior
    support (via AppleEvents, I think) would be nice

  * test suite -- at least the fundamental Distutils marbles should get
    a good exercise; it would also be nice to put together a bunch
    of toy module distributions and make sure that "build" and "install"
    on them do the right things... all automatically, of course!

  * reduce number of tracebacks: right now, certain errors in the setup
    script or on the command line can result in a traceback, when
    they should just result in SystemExit with "error in setup script:
    ..." or "error on command line: ..."

  * fold in Finn Bock's JPython compat. patch

  * fold in Michael Muller's "pkginfo" patch

  * finish and fold in my Python 1.5.1 compat. patch (only necessary
    as long as Distutils has a life of its own, outside Python)

Well, I'd better get cracking ... Guido, we can do the CVS thing any
time; I guess I'll mosey on downstairs.

        Greg
-- 
Greg Ward - software developer                    gward at cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From effbot at telia.com  Tue Mar 28 21:46:17 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Tue, 28 Mar 2000 21:46:17 +0200
Subject: [Python-Dev] mmapfile module
References: <200003281925.OAA03287@eric.cnri.reston.va.us><NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com> <14561.3393.761177.776684@amarok.cnri.reston.va.us>
Message-ID: <003501bf98ee$50097a20$34aab5d4@hagrid>

Andrew M. Kuchling wrote:
> The issue there is cross-platform compatibility; the Windows and Unix
> versions take completely different constructor arguments, so how
> should we paper over the differences?
> 
> Unix arguments: (file descriptor, size, flags, protection)
> Win32 arguments:(filename, tagname, size)
> 
> We could just say, "OK, the args are completely different between
> Win32 and Unix, despite it being the same function name".  Maybe
> that's best, because there seems no way to reconcile those two
> different sets of arguments.

I don't get this.  Why expose low-level implementation details
to the user (flags, protection, tagname)?

(And how come the Windows implementation doesn't support
read-only vs. read/write flags?)

Unless the current implementation uses something radically
different from mmap/MapViewOfFile, wouldn't an interface like:

    (filename, mode="rb", size=entire file, offset=0)

be sufficient?  (where mode can be "wb" or "wb+" or "rb+",
optionally without the "b")

</F>


From donb at init.com  Tue Mar 28 22:46:06 2000
From: donb at init.com (Donald Beaudry)
Date: Tue, 28 Mar 2000 15:46:06 -0500
Subject: [Python-Dev] None as a keyword / class methods 
References: <Pine.LNX.4.10.10003281038400.29911-100000@akbar.nevex.com>
Message-ID: <200003282046.PAA18822@zippy.init.com>

...sorry to jump in on the middle of this one, but.

A while back I put a lot of thought into how to support class methods
and class attributes.  I feel that I solved the problem in a fairly
complete way though the solution does have some warts.  Here's an
example:

>>> class foo(base):
...     value = 10 # this is an instance attribute called 'value'
...                # as usual, it is shared between all instances
...                # until explicitly set on a particular instance
... 
...     def set_value(self, x):
...         print "instance method"
...         self.value = x
... 
...     #
...     # here come the weird part
...     #
...     class __class__:
...         value = 5  # this is a class attribute called value
... 
...         def set_value(cl, x):
...             print "class method"
...             cl.value = x
... 
...         def set_instance_default_value(cl, x):
...             cl._.value = x
...
>>> f = foo()
>>> f.value
10
>>> foo.value = 20
>>> f.value
10
>>> f.__class__.value
20
>>> foo._.value
10
>>> foo._.value = 1
>>> f.value
1
>>> foo.set_value(100)
class method
>>> foo.value
100
>>> f.value
1
>>> f.set_value(40)
instance method
>>> f.value
40
>>> foo._.value
1
>>> ff=foo()
>>> foo.set_instance_default_value(15)
>>> ff.value
15
>>> foo._.set_value(ff, 5)
instance method
>>> ff.value
5
>>>


Is anyone still with me?

The crux of the problem is that in the current python class/instance
implementation, classes dont have attributes of their own.  All of
those things that look like class attributes are really there as
defaults for the instances.  To support true class attributes a new
name space must be invented.  Since I wanted class objects to look
like any other object, I chose to move the "instance defaults" name
space under the underscore attribute.  This allows the class's
unqualified namespace to refer to its own attributes.  Clear as mud,
right?

In case you are wondering, yes, the code above is a working example.
I released it a while back as the 'objectmodule' and just updated it
to work with Python-1.5.2.  The update has yet to be released.

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb at init.com                                      Lexington, MA 02421
                      ...Will hack for sushi...


From akuchlin at mems-exchange.org  Tue Mar 28 22:50:18 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Tue, 28 Mar 2000 15:50:18 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <003501bf98ee$50097a20$34aab5d4@hagrid>
References: <200003281925.OAA03287@eric.cnri.reston.va.us>
	<NDBBJPNCJLKKIOBLDOMJEECMCDAA.DavidA@ActiveState.com>
	<14561.3393.761177.776684@amarok.cnri.reston.va.us>
	<003501bf98ee$50097a20$34aab5d4@hagrid>
Message-ID: <14561.6922.415063.279939@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>(And how come the Windows implementation doesn't support
>read-only vs. read/write flags?)

Good point; that should be fixed.

>    (filename, mode="rb", size=entire file, offset=0)
>be sufficient?  (where mode can be "wb" or "wb+" or "rb+",
>optionally without the "b")

Hmm... maybe we can dispose of the PROT_* argument that way on Unix.
But how would you specify MAP_SHARED vs. MAP_PRIVATE, or
MAP_ANONYMOUS?  (MAP_FIXED seems useless to a Python programmer.)
Another character in the mode argument, or a flags argument?

Worse, as you pointed out in the same thread, MAP_ANONYMOUS on OSF/1
doesn't want to take a file descriptor at all.

Also, the tag name on Windows seems important, from Gordon McMillan's
explanation of it:
http://www.python.org/pipermail/python-dev/1999-November/002808.html

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
You mustn't kill me. You don't love me. You d-don't even know me.
  -- The Furies kill Abel, in SANDMAN #66: "The Kindly Ones:10"


From guido at python.org  Tue Mar 28 23:02:04 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 16:02:04 -0500
Subject: [Python-Dev] None as a keyword / class methods
In-Reply-To: Your message of "Tue, 28 Mar 2000 15:46:06 EST."
             <200003282046.PAA18822@zippy.init.com> 
References: <Pine.LNX.4.10.10003281038400.29911-100000@akbar.nevex.com>  
            <200003282046.PAA18822@zippy.init.com> 
Message-ID: <200003282102.QAA13041@eric.cnri.reston.va.us>

> A while back I put a lot of thought into how to support class methods
> and class attributes.  I feel that I solved the problem in a fairly
> complete way though the solution does have some warts.  Here's an
> example:
[...]
> Is anyone still with me?
> 
> The crux of the problem is that in the current python class/instance
> implementation, classes dont have attributes of their own.  All of
> those things that look like class attributes are really there as
> defaults for the instances.  To support true class attributes a new
> name space must be invented.  Since I wanted class objects to look
> like any other object, I chose to move the "instance defaults" name
> space under the underscore attribute.  This allows the class's
> unqualified namespace to refer to its own attributes.  Clear as mud,
> right?
> 
> In case you are wondering, yes, the code above is a working example.
> I released it a while back as the 'objectmodule' and just updated it
> to work with Python-1.5.2.  The update has yet to be released.

This looks like it would break a lot of code.  How do you refer to a
superclass method?  It seems that ClassName.methodName would refer to
the class method, not to the unbound instance method.  Also, moving
the default instance attributes to a different namespace seems to be a
semantic change that could change lots of things.

I am still in favor of saying "Python has no class methods -- use
module-global functions for that".  Between the module, the class and
the instance, there are enough namespaces -- we don't need another
one.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pf at artcom-gmbh.de  Tue Mar 28 23:01:29 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Tue, 28 Mar 2000 23:01:29 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <200003281922.OAA03113@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000  2:22:43 pm"
Message-ID: <m12a37B-000CpwC@artcom0.artcom-gmbh.de>

I wrote:
> > > Do we need a UserString class?
> > 
Andy Robinson:
> > This will probably be useful on top of the i18n stuff in due course,
> > so I'd like it.
> > 
> > Something Mike Da Silva and I have discussed a lot is implementing a
> > higher-level 'typed string' library on top of the Unicode stuff.  
> > A 'typed string' is like a string, but knows what encoding it is in -
> > possibly Unicode, possibly a native encoding and embodies some basic
> > type safety and convenience notions, like not being able to add a
> > Shift-JIS and an EUC string together.  Iteration would always be per
> > character, not per byte; and a certain amount of magic would say that
> > if the string was (say) Japanese, it would acquire a few extra methods
> > for doing some Japan-specific things like expanding half-width
> > katakana.
> > 
> > Of course, we can do this anyway, but I think defining the API clearly
> > in UserString is a great idea.
> 
Guido van Rossum:
> Agreed.  Please somebody send a patch!

I feel unable to do, what Andy proposed.  What I had in mind was a
simple wrapper class around the builtin string type similar to 
UserDict and UserList which can be used to derive other classes from.

I use UserList and UserDict quite often and find them very useful.
They are simple and powerful and easy to extend.

May be the things Andy Robinson proposed above belong into a sub class
which inherits from a simple UserString class?  Do we need
an additional UserUnicode class for unicode string objects?

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From guido at python.org  Tue Mar 28 23:56:49 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 16:56:49 -0500
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: Your message of "Tue, 28 Mar 2000 23:01:29 +0200."
             <m12a37B-000CpwC@artcom0.artcom-gmbh.de> 
References: <m12a37B-000CpwC@artcom0.artcom-gmbh.de> 
Message-ID: <200003282156.QAA13361@eric.cnri.reston.va.us>

[Peter Funk]
> > > > Do we need a UserString class?
> > > 
> Andy Robinson:
> > > This will probably be useful on top of the i18n stuff in due course,
> > > so I'd like it.
> > > 
> > > Something Mike Da Silva and I have discussed a lot is implementing a
> > > higher-level 'typed string' library on top of the Unicode stuff.  
> > > A 'typed string' is like a string, but knows what encoding it is in -
> > > possibly Unicode, possibly a native encoding and embodies some basic
> > > type safety and convenience notions, like not being able to add a
> > > Shift-JIS and an EUC string together.  Iteration would always be per
> > > character, not per byte; and a certain amount of magic would say that
> > > if the string was (say) Japanese, it would acquire a few extra methods
> > > for doing some Japan-specific things like expanding half-width
> > > katakana.
> > > 
> > > Of course, we can do this anyway, but I think defining the API clearly
> > > in UserString is a great idea.
> > 
> Guido van Rossum:
> > Agreed.  Please somebody send a patch!

[PF]
> I feel unable to do, what Andy proposed.  What I had in mind was a
> simple wrapper class around the builtin string type similar to 
> UserDict and UserList which can be used to derive other classes from.

Yes.  I think Andy wanted his class to be a subclass of UserString.

> I use UserList and UserDict quite often and find them very useful.
> They are simple and powerful and easy to extend.

Agreed.

> May be the things Andy Robinson proposed above belong into a sub class
> which inherits from a simple UserString class?  Do we need
> an additional UserUnicode class for unicode string objects?

It would be great if there was a single UserString class which would
work with either Unicode or 8-bit strings.  I think that shouldn't be
too hard, since it's just a wrapper.

So why don't you give the UserString.py a try and leave Andy's wish alone?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pf at artcom-gmbh.de  Tue Mar 28 23:47:59 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Tue, 28 Mar 2000 23:47:59 +0200 (MEST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <02c901bf989b$be203d80$34aab5d4@hagrid> from Fredrik Lundh at "Mar 28, 2000 11:55:19 am"
Message-ID: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> Peter Funk wrote:
> > Why should modules be moved into packages?  I don't get it.
> 
Fredrik Lundh:
> fwiw, neither do I...

Pheeewww... And I thought I'am the only one! ;-)

> I'm not so sure that Python really needs a simple reorganization
> of the existing set of standard library modules.  just moving the
> modules around won't solve the real problems with the 1.5.2 std
> library...

Right.  I propose to leave the namespace flat.

I like to argue with Brad J. Cox ---the author of the book "Object
Oriented Programming - An Evolutionary Approach" Addison Wesley,
1987--- who proposes the idea of what he calls a "Software-IC":
He looks closely to design process of electronic engineers which 
ussually deal with large data books with prefabricated components.  
There are often hundreds of them in such a databook and most of
them have terse and not very mnemonic names.
But the engineers using them all day *know* after a short while that a 
7400 chip is a TTL-chip containing 4 NAND gates.  

Nearly the same holds true for software engineers using Software-IC
like 're' or 'struct' as their daily building blocks.

A software engineer who is already familar with his/her building
blocks has absolutely no advantage from a deeply nested namespace.

Now for something completely different:  
Fredrik Lundh about the library documentation:
> here's one proposal:
> http://www.pythonware.com/people/fredrik/librarybook-contents.htm

Whether 'md5', 'getpass' and 'traceback' fit into a category 
'Commonly Used Modules' is ....ummmm.... at least a bit questionable.

But we should really focus the discussion on the structure of the 
documentation.  Since many standard library modules belong into
several logical catagories at once, a true tree structured organization
is simply not sufficient to describe everything.  So it is important
to set up pointers between related functionality.  For example 
'string.replace' is somewhat related to 're.sub' or 'getpass' is
related to 'crypt', however 'crypt' is related to 'md5' and so on.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From pf at artcom-gmbh.de  Wed Mar 29 00:13:02 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 00:13:02 +0200 (MEST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules _tkinter.c,1.91,1.92
In-Reply-To: <200003282007.PAA12045@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000  3: 7: 9 pm"
Message-ID: <m12a4EQ-000CpzC@artcom0.artcom-gmbh.de>

Hi!

Guido van Rossum:
> Modified Files:
> 	_tkinter.c 
[...]

> *** 491,501 ****
>   
>   	v->interp = Tcl_CreateInterp();
> - 
> - #if TKMAJORMINOR == 8001
> - 	TclpInitLibraryPath(baseName);
> - #endif /* TKMAJORMINOR */
>   
> ! #if defined(macintosh) && TKMAJORMINOR >= 8000
> ! 	/* This seems to be needed since Tk 8.0 */
>   	ClearMenuBar();
>   	TkMacInitMenus(v->interp);
> --- 475,481 ----
>   
>   	v->interp = Tcl_CreateInterp();
>   
> ! #if defined(macintosh)
> ! 	/* This seems to be needed */
>   	ClearMenuBar();
>   	TkMacInitMenus(v->interp);
> ***************

Are you sure that the call to 'TclpInitLibraryPath(baseName);' 
is not required in Tcl/Tk 8.1, 8.2, 8.3 ?  
I would propose the following:

+#if TKMAJORMINOR >= 8001
+ TclpInitLibraryPath(baseName);
+# endif /* TKMAJORMINOR */

Here I quote from the Tcl8.3 source distribution:
/*
 *---------------------------------------------------------------------------
 *
 * TclpInitLibraryPath --
 *
 *      Initialize the library path at startup.  We have a minor
 *      metacircular problem that we don't know the encoding of the
 *      operating system but we may need to talk to operating system
 *      to find the library directories so that we know how to talk to
 *      the operating system.
 *
 *      We do not know the encoding of the operating system.
 *      We do know that the encoding is some multibyte encoding.
 *      In that multibyte encoding, the characters 0..127 are equivalent
 *          to ascii.
 *
 *      So although we don't know the encoding, it's safe:
 *          to look for the last slash character in a path in the encoding.
 *          to append an ascii string to a path.
 *          to pass those strings back to the operating system.
 *
 *      But any strings that we remembered before we knew the encoding of
 *      the operating system must be translated to UTF-8 once we know the
 *      encoding so that the rest of Tcl can use those strings.
 *
 *      This call sets the library path to strings in the unknown native
 *      encoding.  TclpSetInitialEncodings() will translate the library
 *      path from the native encoding to UTF-8 as soon as it determines
 *      what the native encoding actually is.
 *
 *      Called at process initialization time.
 *
 * Results:
 *      None.
 */

Sorry, but I don't know enough about this in connection with the 
unicode patches and if we should pay attention to this.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From akuchlin at mems-exchange.org  Wed Mar 29 00:21:07 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Tue, 28 Mar 2000 17:21:07 -0500 (EST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de>
References: <02c901bf989b$be203d80$34aab5d4@hagrid>
	<m12a3qB-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <14561.12371.857178.550236@amarok.cnri.reston.va.us>

Peter Funk quoted:
>Fredrik Lundh:
>> I'm not so sure that Python really needs a simple reorganization
>> of the existing set of standard library modules.  just moving the
>> modules around won't solve the real problems with the 1.5.2 std
>> library...
>Right.  I propose to leave the namespace flat.

I third that comment.  Arguments against reorganizing for 1.6:

  1) I doubt that we have time to do a good job of it for 1.6.  
  (1.7, maybe.)

  2) Right now there's no way for third-party extensions to add
  themselves to a package in the standard library.  Once Python finds
  foo/__init__.py, it won't look for site-packages/foo/__init__.py, so
  if you grab, say, "crypto" as a package name in the standard library,
  it's forever lost to third-party extensions.

  3) Rearranging the modules is a good chance to break backward
  compatibility in other ways.  If you want to rewrite, say, httplib
  in a non-compatible way to support HTTP/1.1, then the move from
  httplib.py to net.http.py is a great chance to do that, and leave
  httplib.py as-is for old programs.  If you just copy httplib.py,
  rewriting net.http.py is now harder, since you have to either 
  maintain compatibility or break things *again* in the next version
  of Python.

  4) We wanted to get 1.6 out fairly quickly, and therefore limited 
  the number of features that would get in.  (Vide the "Python 1.6
  timing" thread last ... November, was it?)  Packagizing is feature
  creep that'll slow things down
     
Maybe we should start a separate list to discuss a package hierarchy
for 1.7.  But for 1.6, forget it.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Posting "Please send e-mail, since I don't read this group": Poster is
rendered illiterate by a simple trepanation.
  -- Kibo, in the Happynet Manifesto


From guido at python.org  Wed Mar 29 00:24:46 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 17:24:46 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules _tkinter.c,1.91,1.92
In-Reply-To: Your message of "Wed, 29 Mar 2000 00:13:02 +0200."
             <m12a4EQ-000CpzC@artcom0.artcom-gmbh.de> 
References: <m12a4EQ-000CpzC@artcom0.artcom-gmbh.de> 
Message-ID: <200003282224.RAA13573@eric.cnri.reston.va.us>

> Are you sure that the call to 'TclpInitLibraryPath(baseName);' 
> is not required in Tcl/Tk 8.1, 8.2, 8.3 ?  
> I would propose the following:
> 
> +#if TKMAJORMINOR >= 8001
> + TclpInitLibraryPath(baseName);
> +# endif /* TKMAJORMINOR */

It is an internal routine which shouldn't be called at all by the
user.  I believe it is called internally at the right time.  Note that
we now call Tcl_FindExecutable(), which *is* intended to be called by
the user (and exists in all 8.x versions) -- maybe this causes
TclpInitLibraryPath() to be called.

I tested it on Solaris, with Tcl/Tk versions 8.0.4, 8.1.1, 8.2.3 and
8.3.0, and it doesn't seem to make any difference, as long as that
version of Tcl/Tk has actually been installed.  (When it's not
installed, TclpInitLibraryPath() doesn't help either.)

I still have to check this on Windows -- maybe it'll have to go back in.

[...]
> Sorry, but I don't know enough about this in connection with the 
> unicode patches and if we should pay attention to this.

It seems to be allright...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Wed Mar 29 00:25:27 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 17:25:27 -0500
Subject: [Python-Dev] Great Renaming? What is the goal?
In-Reply-To: Your message of "Tue, 28 Mar 2000 17:21:07 EST."
             <14561.12371.857178.550236@amarok.cnri.reston.va.us> 
References: <02c901bf989b$be203d80$34aab5d4@hagrid> <m12a3qB-000CpwC@artcom0.artcom-gmbh.de>  
            <14561.12371.857178.550236@amarok.cnri.reston.va.us> 
Message-ID: <200003282225.RAA13586@eric.cnri.reston.va.us>

> Maybe we should start a separate list to discuss a package hierarchy
> for 1.7.  But for 1.6, forget it.

Yes!  Please!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From donb at init.com  Wed Mar 29 00:56:03 2000
From: donb at init.com (Donald Beaudry)
Date: Tue, 28 Mar 2000 17:56:03 -0500
Subject: [Python-Dev] None as a keyword / class methods 
References: <Pine.LNX.4.10.10003281038400.29911-100000@akbar.nevex.com> <200003282046.PAA18822@zippy.init.com> <200003282102.QAA13041@eric.cnri.reston.va.us>
Message-ID: <200003282256.RAA21080@zippy.init.com>

Guido van Rossum <guido at python.org> wrote,
> This looks like it would break a lot of code.

Only if it were to replace the current implementation.  Perhaps I
inadvertly made that suggestion.  It was not my intention.  Another
way to look at my post is to say that it was intended to point out why
we cant have class methods in the current implementation... it's a
name space issue.

> How do you refer to a superclass method?  It seems that
> ClassName.methodName would refer to the class method, not to the
> unbound instance method.

Right.  To get at the unbound instance methods you must go through the
'unbound accessor' which is accessed via the underscore.

If you wanted to chain to a superclass method it would look like this:

    class child(parent):
        def do_it(self, x):
            z = parent._.do_it(self, x)
            return z

> Also, moving the default instance attributes to a different
> namespace seems to be a semantic change that could change lots of
> things.

I agree... and that's why I wouldnt suggest doing it to the current
class/instance implementation.  However, for those who insist on
having class attributes and methods I think it would be cool to settle
on a standard "syntax".

> I am still in favor of saying "Python has no class methods -- use
> module-global functions for that".

Or use a class/instance implementation provided via an extension
module rather than the built-in one.  The class named 'base' shown in
my example is a class designed for that purpose.

> Between the module, the class and the instance, there are enough
> namespaces -- we don't need another one.

The topic comes up often enough to make me think some might disagree.

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb at init.com                                      Lexington, MA 02421
                  ...So much code, so little time...


From moshez at math.huji.ac.il  Wed Mar 29 01:24:29 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 01:24:29 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14561.12371.857178.550236@amarok.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003290119110.18366-100000@sundial>

On Tue, 28 Mar 2000, Andrew M. Kuchling wrote:

> Peter Funk quoted:
> >Fredrik Lundh:
> >> I'm not so sure that Python really needs a simple reorganization
> >> of the existing set of standard library modules.  just moving the
> >> modules around won't solve the real problems with the 1.5.2 std
> >> library...
> >Right.  I propose to leave the namespace flat.
> 
> I third that comment.  Arguments against reorganizing for 1.6:

Let me just note that my original great renaming proposal was titled
"1.7". I'm certain I don't want it to affect the 1.6 release -- my god,
it's almost alpha time and we don't even know how to reorganize.
Strictly 1.7.

>   4) We wanted to get 1.6 out fairly quickly, and therefore limited 
>   the number of features that would get in.  (Vide the "Python 1.6
>   timing" thread last ... November, was it?)  Packagizing is feature
>   creep that'll slow things down

Oh yes. I'm waiting for that 1.6....I wouldn't want to stall it for the
world.

But this is a good chance as any to discuss reasons, before strategies.
Here's why I believe we should re-organize Python modules:
 -- modules fall quite naturally into subpackages. Reducing the number
    of toplevel modules will lessen the clutter
 -- it would be easier to synchronize documentation and code (think
    "automatically generated documentation")
 -- it would enable us to move toward a CPAN-like module repository,
    together with the dist-sig efforts.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From gmcm at hypernet.com  Wed Mar 29 01:44:27 2000
From: gmcm at hypernet.com (Gordon McMillan)
Date: Tue, 28 Mar 2000 18:44:27 -0500
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <14561.12371.857178.550236@amarok.cnri.reston.va.us>
References: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <1257835425-27941123@hypernet.com>

Andrew M. Kuchling wrote:
[snip]
>   2) Right now there's no way for third-party extensions to add
>   themselves to a package in the standard library.  Once Python finds
>   foo/__init__.py, it won't look for site-packages/foo/__init__.py, so
>   if you grab, say, "crypto" as a package name in the standard library,
>   it's forever lost to third-party extensions.

That way lies madness. While I'm happy to carp at Java for 
requiring "com", "net" or whatever as a top level name, their 
intent is correct: the names grabbed by the Python standard 
packages belong to no one but the Python standard 
packages. If you *don't* do that, upgrades are an absolute 
nightmare. 

Marc-Andre grabbed "mx". If (as I rather suspect <wink>) he 
wants to remake the entire standard lib in his image, he's 
welcome to - *under* mx.

What would happen if he (and everyone else) installed 
themselves *into* my core packages, then I decided I didn't 
want his stuff? More than likely I'd have to scrub the damn 
installation and start all over again.

- Gordon


From DavidA at ActiveState.com  Wed Mar 29 02:01:57 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Tue, 28 Mar 2000 16:01:57 -0800
Subject: [Python-Dev] yeah! for Jeremy and Greg
Message-ID: <NDBBJPNCJLKKIOBLDOMJGEDNCDAA.DavidA@ActiveState.com>

I'm thrilled to see the extended call syntax patches go in!  One less wart
in the language!

Jeremy ZitBlaster Hylton and Greg Noxzema Ewing!

--david


From pf at artcom-gmbh.de  Wed Mar 29 01:53:50 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 01:53:50 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <200003282156.QAA13361@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 28, 2000  4:56:49 pm"
Message-ID: <m12a5ny-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> [Peter Funk]
> > > > > Do we need a UserString class?
[...]
Guido van Rossum:
> So why don't you give the UserString.py a try and leave Andy's wish alone?

Okay.  Here we go.  Could someone please have a close eye on this?
I've haccked it up in hurry.
---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ----
#!/usr/bin/env python
"""A user-defined wrapper around string objects

Note: string objects have grown methods in Python 1.6 
This module requires Python 1.6 or later.
"""
import sys

# XXX Totally untested and hacked up until 2:00 am with too less sleep ;-)

class UserString:
    def __init__(self, string=""):
        self.data = string
    def __repr__(self): return repr(self.data)
    def __cmp__(self, string):
        if isinstance(string, UserString):
            return cmp(self.data, string.data)
        else:
            return cmp(self.data, string)
    def __len__(self): return len(self.data)
    # methods defined in alphabetical order
    def capitalize(self): return self.__class__(self.data.capitalize())
    def center(self, width): return self.__class__(self.data.center(width))
    def count(self, sub, start=0, end=sys.maxint):
        return self.data.count(sub, start, end)
    def encode(self, encoding=None, errors=None): # XXX improve this?
        if encoding:
	    if errors:
		return self.__class__(self.data.encode(encoding, errors))
	    else:
		return self.__class__(self.data.encode(encoding))
	else: 
	    return self.__class__(self.data.encode())
    def endswith(self):
        raise NotImplementedError
    def	find(self, sub, start=0, end=sys.maxint): 
        return self.data.find(sub, start, end)
    def index(self): 
        return self.data.index(sub, start, end)
    def isdecimal(self): return self.data.isdecimal()
    def isdigit(self): return self.data.isdigit()
    def islower(self): return self.data.islower()
    def isnumeric(self): return self.data.isnumeric()
    def isspace(self): return self.data.isspace()
    def istitle(self): return self.data.istitle()
    def isupper(self): return self.data.isupper()
    def join(self, seq): return self.data.join(seq)
    def ljust(self, width): return self.__class__(self.data.ljust(width))
    def lower(self): return self.__class__(self.data.lower())
    def lstrip(self): return self.__class__(self.data.lstrip())
    def replace(self, old, new, maxsplit=-1): 
	return self.__class__(self.data.replace(old, new, maxsplit))
    def rfind(self, sub, start=0, end=sys.maxint): 
        return self.data.rfind(sub, start, end)
    def rindex(self, sub, start=0, end=sys.maxint): 
        return self.data.rindex(sub, start, end)
    def rjust(self, width): return self.__class__(self.data.rjust(width))
    def rstrip(self): return self.__class__(self.data.rstrip())
    def split(self, sep=None, maxsplit=-1): 
        return self.data.split(sep, maxsplit)
    def splitlines(self, maxsplit=-1): return self.data.splitlines(maxsplit)
    def startswith(self, prefix, start=0, end=sys.maxint): 
        return self.data.startswith(prefix, start, end)
    def strip(self): return self.__class__(self.data.strip())
    def swapcase(self): return self.__class__(self.data.swapcase())
    def title(self): return self.__class__(self.data.title())
    def translate(self, table, deletechars=""): 
        return self.__class__(self.data.translate(table, deletechars))
    def upper(self): return self.__class__(self.data.upper())

    def __add__(self, other):
        if isinstance(other, UserString):
            return self.__class__(self.data + other.data)
        elif isinstance(other, type(self.data)):
            return self.__class__(self.data + other)
        else:
            return self.__class__(self.data + str(other))
    def __radd__(self, other):
        if isinstance(other, type(self.data)):
            return self.__class__(other + self.data)
        else:
            return self.__class__(str(other) + self.data)
    def __mul__(self, n):
        return self.__class__(self.data*n)
    __rmul__ = __mul__

def _test():
    s = UserString("abc")
    u = UserString(u"efg")
    # XXX add some real tests here?
    return [0]

if __name__ == "__main__":
    import sys
    sys.exit(_test()[0])


From effbot at telia.com  Wed Mar 29 01:12:55 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Wed, 29 Mar 2000 01:12:55 +0200
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <NDBBJPNCJLKKIOBLDOMJGEDNCDAA.DavidA@ActiveState.com>
Message-ID: <012301bf990b$2a494c80$34aab5d4@hagrid>

> I'm thrilled to see the extended call syntax patches go in!  One less wart
> in the language!

but did he compile before checking in?

..\Python\compile.c(1225) : error C2065: 'CALL_FUNCTION_STAR' :
undeclared identifier

(compile.c and opcode.h both mention this identifier, but
nobody defines it...  should it be CALL_FUNCTION_VAR,
perhaps?)

</F>


From guido at python.org  Wed Mar 29 02:07:34 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 19:07:34 -0500
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: Your message of "Wed, 29 Mar 2000 01:53:50 +0200."
             <m12a5ny-000CpwC@artcom0.artcom-gmbh.de> 
References: <m12a5ny-000CpwC@artcom0.artcom-gmbh.de> 
Message-ID: <200003290007.TAA16081@eric.cnri.reston.va.us>

> > [Peter Funk]
> > > > > > Do we need a UserString class?
> [...]
> Guido van Rossum:
> > So why don't you give the UserString.py a try and leave Andy's wish alone?
[Peter]
> Okay.  Here we go.  Could someone please have a close eye on this?
> I've haccked it up in hurry.

Good job!

Go get some sleep, and tomorrow morning when you're fresh, compare it
to UserList.  From visual inpsection, you seem to be missing
__getitem__ and __getslice__, and maybe more (of course not __set*__).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From ping at lfw.org  Wed Mar 29 02:13:24 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Tue, 28 Mar 2000 18:13:24 -0600 (CST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <012301bf990b$2a494c80$34aab5d4@hagrid>
Message-ID: <Pine.LNX.4.10.10003281809490.4220-100000@server1.lfw.org>

On Wed, 29 Mar 2000, Fredrik Lundh wrote:
> > I'm thrilled to see the extended call syntax patches go in!  One less wart
> > in the language!
> 
> but did he compile before checking in?

You beat me to it.  I read David's message and got so excited
i just had to try it right away.  So i updated my CVS tree,
did "make", and got the same error:

    make[1]: Entering directory `/home/ping/dev/python/dist/src/Python'
    gcc -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H   -c compile.c -o compile.o
    compile.c: In function `com_call_function':
    compile.c:1225: `CALL_FUNCTION_STAR' undeclared (first use in this function)
    compile.c:1225: (Each undeclared identifier is reported only once
    compile.c:1225: for each function it appears in.)
    make[1]: *** [compile.o] Error 1

> (compile.c and opcode.h both mention this identifier, but
> nobody defines it...  should it be CALL_FUNCTION_VAR,
> perhaps?)

But CALL_FUNCTION_STAR is mentioned in the comments...

    #define CALL_FUNCTION   131     /* #args + (#kwargs<<8) */
    #define MAKE_FUNCTION   132     /* #defaults */
    #define BUILD_SLICE     133     /* Number of items */

    /* The next 3 opcodes must be contiguous and satisfy
       (CALL_FUNCTION_STAR - CALL_FUNCTION) & 3 == 1  */
    #define CALL_FUNCTION_VAR          140  /* #args + (#kwargs<<8) */
    #define CALL_FUNCTION_KW           141  /* #args + (#kwargs<<8) */
    #define CALL_FUNCTION_VAR_KW       142  /* #args + (#kwargs<<8) */

The condition (CALL_FUNCTION_STAR - CALL_FUNCTION) & 3 == 1
doesn't make much sense, though...


-- ?!ng


From jeremy at cnri.reston.va.us  Wed Mar 29 02:18:54 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Tue, 28 Mar 2000 19:18:54 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <012301bf990b$2a494c80$34aab5d4@hagrid>
References: <NDBBJPNCJLKKIOBLDOMJGEDNCDAA.DavidA@ActiveState.com>
	<012301bf990b$2a494c80$34aab5d4@hagrid>
Message-ID: <14561.19438.157799.810802@goon.cnri.reston.va.us>

>>>>> "FL" == Fredrik Lundh <effbot at telia.com> writes:

  >> I'm thrilled to see the extended call syntax patches go in!  One
  >> less wart in the language!

  FL> but did he compile before checking in?

Indeed, but not often enough :-).

  FL> ..\Python\compile.c(1225) : error C2065: 'CALL_FUNCTION_STAR' :
  FL> undeclared identifier

  FL> (compile.c and opcode.h both mention this identifier, but nobody
  FL> defines it...  should it be CALL_FUNCTION_VAR, perhaps?)

This was a last minute change of names.  I had previously compiled
under the old names.  The Makefile doesn't describe the dependency
between opcode.h and compile.c.  And the compile.o file I had worked,
because the only change was to the name of a macro.

It's too bad the Makefile doesn't have all the dependencies.  It seems
that it's necessary to do a make clean before checking in a change
that affects many files.

Jeremy


From klm at digicool.com  Wed Mar 29 02:30:05 2000
From: klm at digicool.com (Ken Manheimer)
Date: Tue, 28 Mar 2000 19:30:05 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJGEDNCDAA.DavidA@ActiveState.com>
Message-ID: <Pine.LNX.4.21.0003281922460.10812-100000@korak.digicool.com>

On Tue, 28 Mar 2000, David Ascher wrote:

> I'm thrilled to see the extended call syntax patches go in!  One less wart
> in the language!

Me too!  Even the lisps i used to know (albeit ancient, according to eric)
couldn't get it as tidy as this.

(Silly me, now i'm imagining we're going to see operator assignments just
around the bend.  "Give them a tasty morsel, they ask for your dinner..."-)

Ken
klm at digicool.com


From ping at lfw.org  Wed Mar 29 02:35:54 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Tue, 28 Mar 2000 18:35:54 -0600 (CST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <14561.19438.157799.810802@goon.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>

On Tue, 28 Mar 2000, Jeremy Hylton wrote:
> 
> It's too bad the Makefile doesn't have all the dependencies.  It seems
> that it's necessary to do a make clean before checking in a change
> that affects many files.

I updated again and rebuilt.

    >>> def sum(*args):
    ...     s = 0
    ...     for x in args: s = s + x
    ...     return s
    ... 
    >>> sum(2,3,4)
    9
    >>> sum(*[2,3,4])
    9
    >>> x = (2,3,4)
    >>> sum(*x)
    9
    >>> def func(a, b, c):
    ...     print a, b, c
    ... 
    >>> func(**{'a':2, 'b':1, 'c':6})
    2 1 6
    >>> func(**{'c':8, 'a':1, 'b':9})
    1 9 8
    >>> 

*cool*.

So does this completely obviate the need for "apply", then?

    apply(x, y, z)  <==>  x(*y, **z)


-- ?!ng


From guido at python.org  Wed Mar 29 02:35:17 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 19:35:17 -0500
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: Your message of "Tue, 28 Mar 2000 18:35:54 CST."
             <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org> 
References: <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org> 
Message-ID: <200003290035.TAA16278@eric.cnri.reston.va.us>

> *cool*.
> 
> So does this completely obviate the need for "apply", then?
> 
>     apply(x, y, z)  <==>  x(*y, **z)

I think so (except for backwards compatibility).  The 1.6 docs for
apply should point this out!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From DavidA at ActiveState.com  Wed Mar 29 02:42:20 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Tue, 28 Mar 2000 16:42:20 -0800
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
Message-ID: <NDBBJPNCJLKKIOBLDOMJIEEACDAA.DavidA@ActiveState.com>

> I updated again and rebuilt.
> 
>     >>> def sum(*args):
>     ...     s = 0
>     ...     for x in args: s = s + x
>     ...     return s
>     ... 
>     >>> sum(2,3,4)
>     9
>     >>> sum(*[2,3,4])
>     9
>     >>> x = (2,3,4)
>     >>> sum(*x)
>     9
>     >>> def func(a, b, c):
>     ...     print a, b, c
>     ... 
>     >>> func(**{'a':2, 'b':1, 'c':6})
>     2 1 6
>     >>> func(**{'c':8, 'a':1, 'b':9})
>     1 9 8
>     >>> 
> 
> *cool*.


But most importantly, IMO:

class SubClass(Class):
	def __init__(self, a, *args, **kw):
		self.a = a
		Class.__init__(self, *args, **kw)

Much neater.


From bwarsaw at cnri.reston.va.us  Wed Mar 29 02:46:11 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 28 Mar 2000 19:46:11 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <14561.19438.157799.810802@goon.cnri.reston.va.us>
	<Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
Message-ID: <14561.21075.637108.322536@anthem.cnri.reston.va.us>

Uh oh.  Fresh CVS update and make clean, make:

-------------------- snip snip --------------------
Python 1.5.2+ (#20, Mar 28 2000, 19:37:38)  [GCC 2.8.1] on sunos5
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> def sum(*args):
...  s = 0
...  for x in args: s = s + x
...  return s
... 
>>> class Nums:
...  def __getitem__(self, i):
...   if i >= 10 or i < 0: raise IndexError
...   return i
... 
>>> n = Nums()
>>> for i in n: print i
... 
0
1
2
3
4
5
6
7
8
9
>>> sum(*n)
Traceback (innermost last):
  File "<stdin>", line 1, in ?
SystemError: bad argument to internal function
-------------------- snip snip --------------------

-Barry


From bwarsaw at cnri.reston.va.us  Wed Mar 29 03:02:16 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Tue, 28 Mar 2000 20:02:16 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <14561.19438.157799.810802@goon.cnri.reston.va.us>
	<Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
	<14561.21075.637108.322536@anthem.cnri.reston.va.us>
Message-ID: <14561.22040.383370.283163@anthem.cnri.reston.va.us>

Changing the definition of class Nums to

class Nums:
    def __getitem__(self, i):
        if 0 <= i < 10: return i
        raise IndexError
    def __len__(self):
        return 10

I.e. adding the __len__() method avoids the SystemError.

Either the *arg call should not depend on the sequence being
lenght-able, or it should error check that the length calculation
doesn't return -1 or raise an exception.

Looking at PySequence_Length() though, it seems that m->sq_length(s)
can return -1 without setting a type_error.  So the fix is either to
include a check for return -1 in PySequence_Length() when calling
sq_length, or instance_length() should set a TypeError when it has no
__len__() method and returns -1.

I gotta run so I can't follow this through -- I'm sure I'll see the
right solution from someone in tomorrow mornings email :)

-Barry


From ping at lfw.org  Wed Mar 29 03:17:27 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Tue, 28 Mar 2000 19:17:27 -0600 (CST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <14561.22040.383370.283163@anthem.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003281916100.4584-100000@server1.lfw.org>

On Tue, 28 Mar 2000, Barry A. Warsaw wrote:
> 
> Changing the definition of class Nums to
> 
> class Nums:
>     def __getitem__(self, i):
>         if 0 <= i < 10: return i
>         raise IndexError
>     def __len__(self):
>         return 10
> 
> I.e. adding the __len__() method avoids the SystemError.

It should be noted that "apply" has the same problem, with a
different counterintuitive error message:

    >>> n = Nums()
    >>> apply(sum, n)
    Traceback (innermost last):
      File "<stdin>", line 1, in ?
    AttributeError: __len__


-- ?!ng


From jeremy at cnri.reston.va.us  Wed Mar 29 04:59:26 2000
From: jeremy at cnri.reston.va.us (Jeremy Hylton)
Date: Tue, 28 Mar 2000 21:59:26 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJIEEACDAA.DavidA@ActiveState.com>
References: <Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
	<NDBBJPNCJLKKIOBLDOMJIEEACDAA.DavidA@ActiveState.com>
Message-ID: <14561.29070.940238.542509@bitdiddle.cnri.reston.va.us>

>>>>> "DA" == David Ascher <DavidA at ActiveState.com> writes:

  DA> But most importantly, IMO:

  DA> class SubClass(Class):
  DA> 	def __init__(self, a, *args, **kw):
  DA> 		self.a = a
  DA> 		Class.__init__(self, *args, **kw)

  DA> Much neater.

This version of method overloading was what I liked most about Greg's
patch.  Note that I also prefer:

class SubClass(Class):
    super_init = Class.__init__

    def __init__(self, a, *args, **kw):
        self.a = a
	self.super_init(*args, **kw)

I've been happy to have all the overridden methods explicitly labelled
at the top of a class lately.  It is much easier to change the class
hierarchy later.

Jeremy


From gward at cnri.reston.va.us  Wed Mar 29 05:15:00 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Tue, 28 Mar 2000 22:15:00 -0500
Subject: [Python-Dev] __debug__ and py_compile
Message-ID: <20000328221500.A3290@cnri.reston.va.us>

Hi all --

a particularly active member of the Distutils-SIG brought the
global '__debug__' flag to my attention, since I (and thus my code)
didn't know if calling 'py_compile.compile()' would result in a ".pyc"
or a ".pyo" file.  It appears that, using __debug__, you can determine
what you're going to get.  Cool!

However, it doesn't look like you can *choose* what you're going to
get.  Is this correct?  Ie. does the presence/absence of -O when the
interpreter starts up *completely* decide how code is compiled?

Also, can I rely on __debug__ being there in the future?  How about in
the past?  I still occasionally ponder making Distutils compatible with
Python 1.5.1.

Thanks --

       Greg


From guido at python.org  Wed Mar 29 06:08:12 2000
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Mar 2000 23:08:12 -0500
Subject: [Python-Dev] __debug__ and py_compile
In-Reply-To: Your message of "Tue, 28 Mar 2000 22:15:00 EST."
             <20000328221500.A3290@cnri.reston.va.us> 
References: <20000328221500.A3290@cnri.reston.va.us> 
Message-ID: <200003290408.XAA17991@eric.cnri.reston.va.us>

> a particularly active member of the Distutils-SIG brought the
> global '__debug__' flag to my attention, since I (and thus my code)
> didn't know if calling 'py_compile.compile()' would result in a ".pyc"
> or a ".pyo" file.  It appears that, using __debug__, you can determine
> what you're going to get.  Cool!
> 
> However, it doesn't look like you can *choose* what you're going to
> get.  Is this correct?  Ie. does the presence/absence of -O when the
> interpreter starts up *completely* decide how code is compiled?

Correct.  You (currently) can't change the opt setting of the
compiler.  (It was part of the compiler restructuring to give more
freedom here; this has been pushed back to 1.7.)

> Also, can I rely on __debug__ being there in the future?  How about in
> the past?  I still occasionally ponder making Distutils compatible with
> Python 1.5.1.

__debug__ is as old as the assert statement, going back to at least
1.5.0.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From moshez at math.huji.ac.il  Wed Mar 29 07:35:51 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 07:35:51 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <1257835425-27941123@hypernet.com>
Message-ID: <Pine.GSO.4.10.10003290729530.20524-100000@sundial>

On Tue, 28 Mar 2000, Gordon McMillan wrote:

> What would happen if he (and everyone else) installed 
> themselves *into* my core packages, then I decided I didn't 
> want his stuff? More than likely I'd have to scrub the damn 
> installation and start all over again.

I think Greg Stein answered that objection, by reminding us that the
filesystem isn't the only way to set up a package hierarchy. In
particular, even with Python's current module system, there is no need to
scrub installations: Python core modules go (under UNIX) in
/usr/local/lib/python1.5, and 3rd party modules go in
/usr/local/lib/python1.5/site-packages. Need to remove stuff? Remove
whatever is in /usr/local/lib/python1.5/site-packages. Need to upgrade?
Just backup /usr/local/lib/python1.5/site-packages, remove
/usr/local/lib/python1.5/, install, and move 3rd party modules back from
backup. This becomes even easier if the standard installation is in a
JAR-like file, and 3rd party modules are also in a JAR-like file, but
specified to be in their natural place.

Wow! That was a long rant!

Anyway, I already expressed my preference of the Perl way, over the Java
way. For one thing, I don't want to have to register a domain just so I
could distribute Python code <wink>

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From bwarsaw at cnri.reston.va.us  Wed Mar 29 07:42:34 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Wed, 29 Mar 2000 00:42:34 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <14561.19438.157799.810802@goon.cnri.reston.va.us>
	<Pine.LNX.4.10.10003281831580.4422-100000@server1.lfw.org>
	<14561.21075.637108.322536@anthem.cnri.reston.va.us>
Message-ID: <14561.38858.41246.28460@anthem.cnri.reston.va.us>

>>>>> "BAW" == Barry A Warsaw <bwarsaw at cnri.reston.va.us> writes:

    BAW> Uh oh.  Fresh CVS update and make clean, make:

    >>> sum(*n)
    | Traceback (innermost last):
    |   File "<stdin>", line 1, in ?
    | SystemError: bad argument to internal function

Here's a proposed patch that will cause a TypeError to be raised
instead.

-Barry

-------------------- snip snip --------------------
Index: abstract.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Objects/abstract.c,v
retrieving revision 2.33
diff -c -r2.33 abstract.c
*** abstract.c	2000/03/10 22:55:18	2.33
--- abstract.c	2000/03/29 05:36:21
***************
*** 860,866 ****
  	PyObject *s;
  {
  	PySequenceMethods *m;
! 
  	if (s == NULL) {
  		null_error();
  		return -1;
--- 860,867 ----
  	PyObject *s;
  {
  	PySequenceMethods *m;
! 	int size = -1;
! 	
  	if (s == NULL) {
  		null_error();
  		return -1;
***************
*** 868,877 ****
  
  	m = s->ob_type->tp_as_sequence;
  	if (m && m->sq_length)
! 		return m->sq_length(s);
  
! 	type_error("len() of unsized object");
! 	return -1;
  }
  
  PyObject *
--- 869,879 ----
  
  	m = s->ob_type->tp_as_sequence;
  	if (m && m->sq_length)
! 		size = m->sq_length(s);
  
! 	if (size < 0)
! 		type_error("len() of unsized object");
! 	return size;
  }
  
  PyObject *
Index: ceval.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Python/ceval.c,v
retrieving revision 2.169
diff -c -r2.169 ceval.c
*** ceval.c	2000/03/28 23:49:16	2.169
--- ceval.c	2000/03/29 05:39:00
***************
*** 1636,1641 ****
--- 1636,1649 ----
  				break;
  			    }
  			    nstar = PySequence_Length(stararg);
+ 			    if (nstar < 0) {
+ 				    if (!PyErr_Occurred)
+ 					    PyErr_SetString(
+ 						    PyExc_TypeError,
+ 						    "len() of unsized object");
+ 				    x = NULL;
+ 				    break;
+ 			    }
  			}
  			if (nk > 0) {
  			    if (kwdict == NULL) {


From bwarsaw at cnri.reston.va.us  Wed Mar 29 07:46:19 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Wed, 29 Mar 2000 00:46:19 -0500 (EST)
Subject: [Python-Dev] yeah! for Jeremy and Greg
References: <14561.22040.383370.283163@anthem.cnri.reston.va.us>
	<Pine.LNX.4.10.10003281916100.4584-100000@server1.lfw.org>
Message-ID: <14561.39083.748093.694726@anthem.cnri.reston.va.us>

>>>>> "KY" == Ka-Ping Yee <ping at lfw.org> writes:

    | It should be noted that "apply" has the same problem, with a
    | different counterintuitive error message:

    >> n = Nums() apply(sum, n)
    |     Traceback (innermost last):
    |       File "<stdin>", line 1, in ?
    |     AttributeError: __len__

The patch I just posted fixes this too.  The error message ain't
great, but at least it's consistent with the direct call.

-Barry

-------------------- snip snip --------------------
Traceback (innermost last):
  File "/tmp/doit.py", line 15, in ?
    print apply(sum, n)
TypeError: len() of unsized object


From pf at artcom-gmbh.de  Wed Mar 29 08:30:22 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 08:30:22 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <Pine.GSO.4.10.10003290737580.20524-100000@sundial> from Moshe Zadka at "Mar 29, 2000  7:44:42 am"
Message-ID: <m12aBzi-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> On Wed, 29 Mar 2000, Peter Funk wrote:
> 
> > class UserString:
> >     def __init__(self, string=""):
> >         self.data = string
>           ^^^^^^^
Moshe Zadka wrote:
> Why do you feel there is a need to default? Strings are immutable

I had something like this in my mind:

class MutableString(UserString):
    """Python strings are immutable objects.  But of course this can
    be changed in a derived class implementing the missing methods.

        >>> s = MutableString()
	>>> s[0:5] = "HUH?"
    """
    def __setitem__(self, char):
        ....
    def __setslice__(self, i, j, substring):
        ....
> What about __int__, __long__, __float__, __str__, __hash__?
> And what about __getitem__ and __contains__?
> And __complex__?

I was obviously too tired and too eager to get this out!  
Thanks for reviewing and responding so quickly.  I will add them.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From moshez at math.huji.ac.il  Wed Mar 29 08:51:30 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 08:51:30 +0200 (IST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString
 class?
In-Reply-To: <m12aBzi-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.GSO.4.10.10003290850310.20736-100000@sundial>

On Wed, 29 Mar 2000, Peter Funk wrote:

> Moshe Zadka wrote:
> > Why do you feel there is a need to default? Strings are immutable
> 
> I had something like this in my mind:
> 
> class MutableString(UserString):
>     """Python strings are immutable objects.  But of course this can
>     be changed in a derived class implementing the missing methods.

Then add the default in the constructor for MutableString....

eagerly-waiting-for-UserString.py-ly y'rs, Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Wed Mar 29 09:03:53 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 09:03:53 +0200 (IST)
Subject: [Python-Dev] 1.5.2->1.6 Changes
Message-ID: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>

I'm starting to compile a list of changes from 1.5.2 to 1.6. Here's what I
came up with so far
-- string objects now have methods (though they are still immutable)
-- unicode support: Unicode strings are marked with u"string", and there
   is support for arbitrary encoders/decoders
-- "in" operator can now be overriden in user-defined classes to mean anything:
   it calls the magic method __contains__
-- SRE is the new regular expression engine. re.py became an interface to
   the same engine. The new engine fully supports unicode regular expressions.
-- Some methods which would take multiple arguments and treat them as a tuple
   were fixed: list.{append, insert, remove, count}, socket.connect
-- Some modules were made obsolete
-- filecmp.py (supersedes the old cmp.py and dircmp.py modules),
-- tabnanny.py (make sure the source file doesn't assume a specific tab-width)
-- win32reg (win32 registry editor)
-- unicode module, and codecs package
-- New calling syntax: f(*args, **kw) equivalent to apply(f, args, kw)
-- _tkinter now uses the object, rather then string, interface to Tcl.

Please e-mail me personally if you think of any other changes, and I'll 
try to integrate them into a complete "changes" document.

Thanks in advance
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From esr at thyrsus.com  Wed Mar 29 09:21:29 2000
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 29 Mar 2000 02:21:29 -0500
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>; from Moshe Zadka on Wed, Mar 29, 2000 at 09:03:53AM +0200
References: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>
Message-ID: <20000329022129.A15539@thyrsus.com>

Moshe Zadka <moshez at math.huji.ac.il>:
> -- _tkinter now uses the object, rather then string, interface to Tcl.

Hm, does this mean that the annoying requirement to do explicit gets and
sets to move data between the Python world and the Tcl/Tk world is gone?
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

"A system of licensing and registration is the perfect device to deny
gun ownership to the bourgeoisie."
	-- Vladimir Ilyich Lenin


From moshez at math.huji.ac.il  Wed Mar 29 09:22:54 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 09:22:54 +0200 (IST)
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: <20000329022129.A15539@thyrsus.com>
Message-ID: <Pine.GSO.4.10.10003290921450.21447-100000@sundial>

On Wed, 29 Mar 2000, Eric S. Raymond wrote:

> Moshe Zadka <moshez at math.huji.ac.il>:
> > -- _tkinter now uses the object, rather then string, interface to Tcl.
> 
> Hm, does this mean that the annoying requirement to do explicit gets and
> sets to move data between the Python world and the Tcl/Tk world is gone?

I doubt it. It's just that Python and Tcl have such a different outlook
about variables, that I don't think it can be slided over.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From pf at artcom-gmbh.de  Wed Mar 29 11:16:17 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 11:16:17 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <Pine.GSO.4.10.10003290850310.20736-100000@sundial> from Moshe Zadka at "Mar 29, 2000  8:51:30 am"
Message-ID: <m12aEaH-000CpwC@artcom0.artcom-gmbh.de>

Hi!

Moshe Zadka:
> eagerly-waiting-for-UserString.py-ly y'rs, Z.

Well, I've added the missing methods.  Unfortunately I ran out of time now and
a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still 
missing.  

Regards, Peter
---- 8< ---- 8< ---- cut here ---- 8< ---- schnipp ---- 8< ---- schnapp ----
#!/usr/bin/env python
"""A user-defined wrapper around string objects

Note: string objects have grown methods in Python 1.6 
This module requires Python 1.6 or later.
"""
from types import StringType, UnicodeType
import sys

class UserString:
    def __init__(self, string):
        self.data = string
    def __str__(self): return str(self.data)
    def __repr__(self): return repr(self.data)
    def __int__(self): return int(self.data)
    def __long__(self): return long(self.data)
    def __float__(self): return float(self.data)
    def __hash__(self): return hash(self.data)

    def __cmp__(self, string):
        if isinstance(string, UserString):
            return cmp(self.data, string.data)
        else:
            return cmp(self.data, string)
    def __contains__(self, char):
        return char in self.data

    def __len__(self): return len(self.data)
    def __getitem__(self, index): return self.__class__(self.data[index])
    def __getslice__(self, start, end):
        start = max(start, 0); end = max(end, 0)
        return self.__class__(self.data[start:end])

    def __add__(self, other):
        if isinstance(other, UserString):
            return self.__class__(self.data + other.data)
        elif isinstance(other, StringType) or isinstance(other, UnicodeType):
            return self.__class__(self.data + other)
        else:
            return self.__class__(self.data + str(other))
    def __radd__(self, other):
        if isinstance(other, StringType) or isinstance(other, UnicodeType):
            return self.__class__(other + self.data)
        else:
            return self.__class__(str(other) + self.data)
    def __mul__(self, n):
        return self.__class__(self.data*n)
    __rmul__ = __mul__

    # the following methods are defined in alphabetical order:
    def capitalize(self): return self.__class__(self.data.capitalize())
    def center(self, width): return self.__class__(self.data.center(width))
    def count(self, sub, start=0, end=sys.maxint):
        return self.data.count(sub, start, end)
    def encode(self, encoding=None, errors=None): # XXX improve this?
        if encoding:
            if errors:
                return self.__class__(self.data.encode(encoding, errors))
            else:
                return self.__class__(self.data.encode(encoding))
        else: 
            return self.__class__(self.data.encode())
    def endswith(self, suffix, start=0, end=sys.maxint):
        return self.data.endswith(suffix, start, end)
    def find(self, sub, start=0, end=sys.maxint): 
        return self.data.find(sub, start, end)
    def index(self, sub, start=0, end=sys.maxint): 
        return self.data.index(sub, start, end)
    def isdecimal(self): return self.data.isdecimal()
    def isdigit(self): return self.data.isdigit()
    def islower(self): return self.data.islower()
    def isnumeric(self): return self.data.isnumeric()
    def isspace(self): return self.data.isspace()
    def istitle(self): return self.data.istitle()
    def isupper(self): return self.data.isupper()
    def join(self, seq): return self.data.join(seq)
    def ljust(self, width): return self.__class__(self.data.ljust(width))
    def lower(self): return self.__class__(self.data.lower())
    def lstrip(self): return self.__class__(self.data.lstrip())
    def replace(self, old, new, maxsplit=-1): 
        return self.__class__(self.data.replace(old, new, maxsplit))
    def rfind(self, sub, start=0, end=sys.maxint): 
        return self.data.rfind(sub, start, end)
    def rindex(self, sub, start=0, end=sys.maxint): 
        return self.data.rindex(sub, start, end)
    def rjust(self, width): return self.__class__(self.data.rjust(width))
    def rstrip(self): return self.__class__(self.data.rstrip())
    def split(self, sep=None, maxsplit=-1): 
        return self.data.split(sep, maxsplit)
    def splitlines(self, maxsplit=-1): return self.data.splitlines(maxsplit)
    def startswith(self, prefix, start=0, end=sys.maxint): 
        return self.data.startswith(prefix, start, end)
    def strip(self): return self.__class__(self.data.strip())
    def swapcase(self): return self.__class__(self.data.swapcase())
    def title(self): return self.__class__(self.data.title())
    def translate(self, table, deletechars=""): 
        return self.__class__(self.data.translate(table, deletechars))
    def upper(self): return self.__class__(self.data.upper())

class MutableString(UserString):
    """mutable string objects

    Python strings are immutable objects.  This has the advantage, that
    strings may be used as dictionary keys.  If this property isn't needed
    and you insist on changing string values in place instead, you may cheat
    and use MutableString.

    But the purpose of this class is an educational one: to prevent
    people from inventing their own mutable string class derived
    from UserString and than forget thereby to remove (override) the
    __hash__ method inherited from ^UserString.  This would lead to
    errors that would be very hard to track down.

    A faster and better solution is to rewrite the program using lists."""
    def __init__(self, string=""):
        self.data = string
    def __hash__(self): 
        raise TypeError, "unhashable type (it is mutable)"
    def __setitem__(self, index, sub):
	if index < 0 or index >= len(self.data): raise IndexError
        self.data = self.data[:index] + sub + self.data[index+1:]
    def __delitem__(self, index):
	if index < 0 or index >= len(self.data): raise IndexError
        self.data = self.data[:index] + self.data[index+1:]
    def __setslice__(self, start, end, sub):
        start = max(start, 0); end = max(end, 0)
        if isinstance(sub, UserString):
            self.data = self.data[:start]+sub.data+self.data[end:]
        elif isinstance(sub, StringType) or isinstance(sub, UnicodeType):
            self.data = self.data[:start]+sub+self.data[end:]
        else:
            self.data =  self.data[:start]+str(sub)+self.data[end:]
    def __delslice__(self, start, end):
        start = max(start, 0); end = max(end, 0)
        self.data = self.data[:start] + self.data[end:]
    def immutable(self):
        return UserString(self.data)
    
def _test():
    s = UserString("abc")
    u = UserString(u"efg")
    # XXX add some real tests here?
    return 0

if __name__ == "__main__":
    sys.exit(_test())


From mal at lemburg.com  Wed Mar 29 11:34:21 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 29 Mar 2000 11:34:21 +0200
Subject: [Python-Dev] Great Renaming?  What is the goal?
References: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de> <1257835425-27941123@hypernet.com>
Message-ID: <38E1CE1D.7899B1BC@lemburg.com>

Gordon McMillan wrote:
> 
> Andrew M. Kuchling wrote:
> [snip]
> >   2) Right now there's no way for third-party extensions to add
> >   themselves to a package in the standard library.  Once Python finds
> >   foo/__init__.py, it won't look for site-packages/foo/__init__.py, so
> >   if you grab, say, "crypto" as a package name in the standard library,
> >   it's forever lost to third-party extensions.
> 
> That way lies madness. While I'm happy to carp at Java for
> requiring "com", "net" or whatever as a top level name, their
> intent is correct: the names grabbed by the Python standard
> packages belong to no one but the Python standard
> packages. If you *don't* do that, upgrades are an absolute
> nightmare.
> 
> Marc-Andre grabbed "mx". If (as I rather suspect <wink>) he
> wants to remake the entire standard lib in his image, he's
> welcome to - *under* mx.

Right, that's the way I see it too. BTW, where can I register
the "mx" top-level package name ? Should these be registered
in the NIST registry ? Will the names registered there be
honored ?
 
> What would happen if he (and everyone else) installed
> themselves *into* my core packages, then I decided I didn't
> want his stuff? More than likely I'd have to scrub the damn
> installation and start all over again.

That's a no-no, IMHO. Unless explicitly allowed, packages
should *not* install themselves as subpackages to other
existing top-level packages. If they do, its their problem
if the hierarchy changes...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From moshez at math.huji.ac.il  Wed Mar 29 11:59:47 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 11:59:47 +0200 (IST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString
 class?
In-Reply-To: <m12aEaH-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.GSO.4.10.10003291152340.28879-100000@sundial>

On Wed, 29 Mar 2000, Peter Funk wrote:

> Hi!
> 
> Moshe Zadka:
> > eagerly-waiting-for-UserString.py-ly y'rs, Z.
> 
> Well, I've added the missing methods.  Unfortunately I ran out of time now and
> a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still 
> missing.  

Great work, Peter! I really like UserString. However, I have two issues
with MutableString:

1. I tshouldn't share implementation with UserString, otherwise your
algorithm are not behaving with correct big-O properties. It should
probably use a char-array (from the array module) as the internal
representation.

2. It shouldn't share interface iwth UserString, since it doesn't have a
proper implementation with __hash__.


All in all, I probably disagree with making MutableString a subclass of
UserString. If I have time later today, I'm hoping to be able to make my
own MutableString


From pf at artcom-gmbh.de  Wed Mar 29 12:35:32 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 12:35:32 +0200 (MEST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: <Pine.GSO.4.10.10003291152340.28879-100000@sundial> from Moshe Zadka at "Mar 29, 2000 11:59:47 am"
Message-ID: <m12aFoy-000CpwC@artcom0.artcom-gmbh.de>

Hi!

> > Moshe Zadka:
> > > eagerly-waiting-for-UserString.py-ly y'rs, Z.
> > 
> On Wed, 29 Mar 2000, Peter Funk wrote:
> > Well, I've added the missing methods.  Unfortunately I ran out of time now and
> > a 'test_userstring.py' derived from 'src/Lib/test/test_string.py' is still 
> > missing.  
> 
Moshe Zadka schrieb:
> Great work, Peter! I really like UserString. However, I have two issues
> with MutableString:
> 
> 1. I tshouldn't share implementation with UserString, otherwise your
> algorithm are not behaving with correct big-O properties. It should
> probably use a char-array (from the array module) as the internal
> representation.

Hmm.... I don't understand what you mean with 'big-O properties'.  
The internal representation of any object should be considered ...
umm ... internal.

> 2. It shouldn't share interface iwth UserString, since it doesn't have a
> proper implementation with __hash__.

What's wrong with my implementation of __hash__ raising a TypeError with
the attribution 'unhashable object'.  This is the same behaviour, if 
you try to add some other mutable object as key to dictionary:

>>> l = []
>>> d = { l : 'foo' }
Traceback (innermost last):
  File "<stdin>", line 1, in ?
TypeError: unhashable type

> All in all, I probably disagree with making MutableString a subclass of
> UserString. If I have time later today, I'm hoping to be able to make my
> own MutableString

As I tried to point out in the docstring of 'MutableString', I don't want 
people actually start using the 'MutableString' class.  My Intentation 
was to prevent people from trying to invent their own and than probably 
wrong MutableString class derived from UserString.  Only Newbies will really
ever need mutable strings in Python (see FAQ).

May be my 'MutableString' idea belongs somewhere into 
the to be written src/Doc/libuserstring.tex.  But since Newbies tend
to ignore docs ... Sigh.

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)


From gmcm at hypernet.com  Wed Mar 29 13:07:20 2000
From: gmcm at hypernet.com (Gordon McMillan)
Date: Wed, 29 Mar 2000 06:07:20 -0500
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <Pine.GSO.4.10.10003290729530.20524-100000@sundial>
References: <1257835425-27941123@hypernet.com>
Message-ID: <1257794452-30405909@hypernet.com>

Moshe Zadka wrote:

> On Tue, 28 Mar 2000, Gordon McMillan wrote:
> 
> > What would happen if he (and everyone else) installed 
> > themselves *into* my core packages, then I decided I didn't 
> > want his stuff? More than likely I'd have to scrub the damn 
> > installation and start all over again.
> 
> I think Greg Stein answered that objection, by reminding us that the
> filesystem isn't the only way to set up a package hierarchy.

You mean when Greg said:
>Assuming that you use an archive like those found in my "small" distro or
> Gordon's distro, then this is no problem. The archive simply recognizes
> and maps "text.encoding.macbinary" to its own module.

I don't know what this has to do with it. When we get around 
to the 'macbinary' part, we have already established that 
'text.encoding' is the parent which should supply 'macbinary'.

>  In
> particular, even with Python's current module system, there is no need to
> scrub installations: Python core modules go (under UNIX) in
> /usr/local/lib/python1.5, and 3rd party modules go in
> /usr/local/lib/python1.5/site-packages. 

And if there's a /usr/local/lib/python1.5/text/encoding, there's 
no way that /usr/local/lib/python1.5/site-
packages/text/encoding will get searched.

I believe you could hack up an importer that did allow this, and 
I think you'd be 100% certifiable if you did. Just look at the 
surprise factor.

Hacking stuff into another package is just as evil as math.pi = 
42.

> Anyway, I already expressed my preference of the Perl way, over the Java
> way. For one thing, I don't want to have to register a domain just so I
> could distribute Python code <wink>

I haven't the foggiest what the "Perl way" is; I wouldn't be 
surprised if it relied on un-Pythonic sociological factors. I 
already said the Java mechanics are silly; uniqueness is what 
matters. When Python packages start selling in the four and 
five figure range <snort>, then a registry mechanism will likely 
be necessary.

- Gordon


From moshez at math.huji.ac.il  Wed Mar 29 13:21:09 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 13:21:09 +0200 (IST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString
 class?
In-Reply-To: <m12aFoy-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <Pine.GSO.4.10.10003291316170.2448-100000@sundial>

On Wed, 29 Mar 2000, Peter Funk wrote:

> > 1. I tshouldn't share implementation with UserString, otherwise your
> > algorithm are not behaving with correct big-O properties. It should
> > probably use a char-array (from the array module) as the internal
> > representation.
> 
> Hmm.... I don't understand what you mean with 'big-O properties'.  
> The internal representation of any object should be considered ...
> umm ... internal.

Yes, but
s[0] = 'a'

Should take O(1) time, not O(len(s))

> > 2. It shouldn't share interface iwth UserString, since it doesn't have a
> > proper implementation with __hash__.
> 
> What's wrong with my implementation of __hash__ raising a TypeError with
> the attribution 'unhashable object'. 

A subtype shouldn't change contracts of its supertypes. hash() was
implicitly contracted as "raising no exceptions".


--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From moshez at math.huji.ac.il  Wed Mar 29 13:30:59 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 13:30:59 +0200 (IST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <1257794452-30405909@hypernet.com>
Message-ID: <Pine.GSO.4.10.10003291325270.2448-100000@sundial>

On Wed, 29 Mar 2000, Gordon McMillan wrote:

> And if there's a /usr/local/lib/python1.5/text/encoding, there's 
> no way that /usr/local/lib/python1.5/site-
> packages/text/encoding will get searched.

Oh my god! I just realized you're right. Well, back to the drawing board.

> I haven't the foggiest what the "Perl way" is; I wouldn't be 
> surprised if it relied on un-Pythonic sociological factors. 

No, it relies on non-Pythonic (but not unpythonic -- simply different)
technical choices.

> I 
> already said the Java mechanics are silly; uniqueness is what 
> matters. 

As in all things namespacish ;-)

Though I suspect a registry will be needed much sooner.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From guido at python.org  Wed Mar 29 14:26:56 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Mar 2000 07:26:56 -0500
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: Your message of "Wed, 29 Mar 2000 02:21:29 EST."
             <20000329022129.A15539@thyrsus.com> 
References: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>  
            <20000329022129.A15539@thyrsus.com> 
Message-ID: <200003291226.HAA18216@eric.cnri.reston.va.us>

> Moshe Zadka <moshez at math.huji.ac.il>:
> > -- _tkinter now uses the object, rather then string, interface to Tcl.

Eric Raymond:
> Hm, does this mean that the annoying requirement to do explicit gets and
> sets to move data between the Python world and the Tcl/Tk world is gone?

Not sure what you are referring to -- this should be completely
transparant to Python/Tkinter users.  If you are thinking of the way
Tcl variables are created and manipulated in Python, no, this doesn't
change, alas (Tcl variables aren't objects -- they are manipulated
through get and set commands. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Wed Mar 29 14:32:16 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Mar 2000 07:32:16 -0500
Subject: [Python-Dev] Great Renaming? What is the goal?
In-Reply-To: Your message of "Wed, 29 Mar 2000 11:34:21 +0200."
             <38E1CE1D.7899B1BC@lemburg.com> 
References: <m12a3qB-000CpwC@artcom0.artcom-gmbh.de> <1257835425-27941123@hypernet.com>  
            <38E1CE1D.7899B1BC@lemburg.com> 
Message-ID: <200003291232.HAA18234@eric.cnri.reston.va.us>

> > Marc-Andre grabbed "mx". If (as I rather suspect <wink>) he
> > wants to remake the entire standard lib in his image, he's
> > welcome to - *under* mx.
> 
> Right, that's the way I see it too. BTW, where can I register
> the "mx" top-level package name ? Should these be registered
> in the NIST registry ? Will the names registered there be
> honored ?

I think the NIST registry is a failed experiment -- too cumbersome to
maintain or consult.  We can do this the same way as common law
handles trade marks: if you have used it as your brand name long
enough, even if you didn't register, someone else cannot grab it away
from you.

> > What would happen if he (and everyone else) installed
> > themselves *into* my core packages, then I decided I didn't
> > want his stuff? More than likely I'd have to scrub the damn
> > installation and start all over again.
> 
> That's a no-no, IMHO. Unless explicitly allowed, packages
> should *not* install themselves as subpackages to other
> existing top-level packages. If they do, its their problem
> if the hierarchy changes...

Agreed.  Although some people seem to *want* this.  Probably because
it's okay to do that in Java and (apparently?) in Perl.  And C++,
probably.  It all probably stems back to Lisp.  I admit that I didn't
see this subtlety when I designed Python's package architecture.  It's
too late to change (e.g. because of __init__.py).  Is it a problem
though?  Let's be open-minded about this and think about whether we
want to allow this or not, and why...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Wed Mar 29 14:35:33 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Mar 2000 07:35:33 -0500
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString class?
In-Reply-To: Your message of "Wed, 29 Mar 2000 13:21:09 +0200."
             <Pine.GSO.4.10.10003291316170.2448-100000@sundial> 
References: <Pine.GSO.4.10.10003291316170.2448-100000@sundial> 
Message-ID: <200003291235.HAA18249@eric.cnri.reston.va.us>

> > What's wrong with my implementation of __hash__ raising a TypeError with
> > the attribution 'unhashable object'. 
> 
> A subtype shouldn't change contracts of its supertypes. hash() was
> implicitly contracted as "raising no exceptions".

Let's not confuse subtypes and subclasses.  One of the things implicit
in the discussion on types-sig is that not every subclass is a
subtype!  Yes, this violates something we all learned from C++ -- but
it's a great insight.  No time to explain it more, but for me, Peter's
subclassing UserString for MutableString to borrow implementation is
fine.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pf at artcom-gmbh.de  Wed Mar 29 15:49:24 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 29 Mar 2000 15:49:24 +0200 (MEST)
Subject: [Python-Dev] NIST Registry (was Great Renaming? What is the goal?)
In-Reply-To: <200003291232.HAA18234@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 29, 2000  7:32:16 am"
Message-ID: <m12aIqa-000CpwC@artcom0.artcom-gmbh.de>

Hi!

Guido van Rossum:
> I think the NIST registry is a failed experiment -- too cumbersome to
> maintain or consult.  

The WEB frontend of the NIST registry is not that bad --- if you are
even aware of the fact, that such a beast exists!

I use Python since 1994 and discovered the NIST registry incidental
a few weeks ago, when I was really looking for something about the
Win32 registry and used the search engine on www.python.org.
My first thought was: What a neat clever idea!

I think this is an example how the Python community suffers from 
poor advertising of good ideas.

> We can do this the same way as common law
> handles trade marks: if you have used it as your brand name long
> enough, even if you didn't register, someone else cannot grab it away
> from you.

Okay.  But a more formal registry wouldn't hurt.  Something like the
global module index from the current docs supplemented with all 
contribution modules which can be currently found a www.vex.net would
be a useful resource.

Regards, Peter


From moshez at math.huji.ac.il  Wed Mar 29 16:15:36 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 16:15:36 +0200 (IST)
Subject: [Python-Dev] [1.6]: UserList, Dict: Do we need a UserString
 class?
In-Reply-To: <200003291235.HAA18249@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003291614360.2448-100000@sundial>

On Wed, 29 Mar 2000, Guido van Rossum wrote:

> Let's not confuse subtypes and subclasses.  One of the things implicit
> in the discussion on types-sig is that not every subclass is a
> subtype!  Yes, this violates something we all learned from C++ -- but
> it's a great insight.  No time to explain it more, but for me, Peter's
> subclassing UserString for MutableString to borrow implementation is
> fine.

Oh, I agree with this. An earlier argument which got snipped in the
discussion is why it's a bad idea to borrow implementation (a totally
different argument)

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From fdrake at acm.org  Wed Mar 29 18:02:13 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 29 Mar 2000 11:02:13 -0500 (EST)
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>
References: <Pine.GSO.4.10.10003290901250.21447-100000@sundial>
Message-ID: <14562.10501.726637.335088@seahag.cnri.reston.va.us>

Moshe Zadka writes:
 > -- filecmp.py (supersedes the old cmp.py and dircmp.py modules),
 > -- tabnanny.py (make sure the source file doesn't assume a specific tab-width)

  Weren't these in 1.5.2?  I think filecmp is documented in the
released docs... ah, no, I'm safe.  ;)

 > Please e-mail me personally if you think of any other changes, and I'll 
 > try to integrate them into a complete "changes" document.

  The documentation is updated.  ;)


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From skip at mojam.com  Wed Mar 29 18:57:51 2000
From: skip at mojam.com (Skip Montanaro)
Date: Wed, 29 Mar 2000 10:57:51 -0600
Subject: [Python-Dev] CVS woes...
Message-ID: <200003291657.KAA22177@beluga.mojam.com>

Does anyone else besides me have trouble getting their Python tree to sync
with the CVS repository?  I've tried all manner of flags to "cvs update",
most recently "cvs update -d -A ." with no success.  There are still some
files I know Fred Drake has patched that show up as different and it refuses 
to pick up Lib/robotparser.py.

I'm going to blast my current tree and start anew after saving one or two
necessary files.  Any thoughts you might have would be much appreciated.

(Private emails please, unless for some reason you think this should be a
python-dev topic.  I only post here because I suspect most of the readers
use CVS to keep in frequent sync and may have some insight.)

Thx,

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From moshez at math.huji.ac.il  Wed Mar 29 19:06:59 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Wed, 29 Mar 2000 19:06:59 +0200 (IST)
Subject: [Python-Dev] 1.5.2->1.6 Changes
In-Reply-To: <14562.10501.726637.335088@seahag.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003291905430.11398-100000@sundial>

On Wed, 29 Mar 2000, Fred L. Drake, Jr. wrote:

> 
> Moshe Zadka writes:
>  > -- filecmp.py (supersedes the old cmp.py and dircmp.py modules),
>  > -- tabnanny.py (make sure the source file doesn't assume a specific tab-width)
> 
>   Weren't these in 1.5.2?  I think filecmp is documented in the
> released docs... ah, no, I'm safe.  ;)

Tabnanny wasn't a module, and filecmp wasn't at all.

>   The documentation is updated.  ;)

Yes, but it was released as a late part of 1.5.2.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From effbot at telia.com  Wed Mar 29 18:38:00 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Wed, 29 Mar 2000 18:38:00 +0200
Subject: [Python-Dev] CVS woes...
References: <200003291657.KAA22177@beluga.mojam.com>
Message-ID: <01b701bf999d$267b6740$34aab5d4@hagrid>

Skip wrote:
> Does anyone else besides me have trouble getting their Python tree to sync
> with the CVS repository?  I've tried all manner of flags to "cvs update",
> most recently "cvs update -d -A ." with no success.  There are still some
> files I know Fred Drake has patched that show up as different and it refuses 
> to pick up Lib/robotparser.py.

note that robotparser doesn't show up on cvs.python.org
either.  maybe cnri's cvs admins should look into this...

</F>


From fdrake at acm.org  Wed Mar 29 20:20:14 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 29 Mar 2000 13:20:14 -0500 (EST)
Subject: [Python-Dev] CVS woes...
In-Reply-To: <200003291657.KAA22177@beluga.mojam.com>
References: <200003291657.KAA22177@beluga.mojam.com>
Message-ID: <14562.18782.465814.696099@seahag.cnri.reston.va.us>

Skip Montanaro writes:
 > most recently "cvs update -d -A ." with no success.  There are still some
 > files I know Fred Drake has patched that show up as different and it refuses

  You should be aware that many of the more recent documentation
patches have been in the 1.5.2p2 branch (release-1.5.2p1-patches, I
think), rather than the development head.  I'm hoping to begin the
merge in the next week.
  I also have a few patches that I haven't had time to look at yet,
and I'm not inclined to make any changes until I've merged the 1.5.2p2
docs with the 1.6 tree, mostly to keep the merge from being any more
painful than I already expect it to be.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From bwarsaw at cnri.reston.va.us  Wed Mar 29 20:22:57 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Wed, 29 Mar 2000 13:22:57 -0500 (EST)
Subject: [Python-Dev] CVS woes...
References: <200003291657.KAA22177@beluga.mojam.com>
	<01b701bf999d$267b6740$34aab5d4@hagrid>
Message-ID: <14562.18945.407398.812930@anthem.cnri.reston.va.us>

>>>>> "FL" == Fredrik Lundh <effbot at telia.com> writes:

    FL> note that robotparser doesn't show up on cvs.python.org
    FL> either.  maybe cnri's cvs admins should look into this...

I've just resync'd python/dist and am doing a fresh checkout now.
Looks like Lib/robotparser.py is there now.

-Barry


From guido at python.org  Wed Mar 29 20:23:38 2000
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Mar 2000 13:23:38 -0500
Subject: [Python-Dev] CVS woes...
In-Reply-To: Your message of "Wed, 29 Mar 2000 10:57:51 CST."
             <200003291657.KAA22177@beluga.mojam.com> 
References: <200003291657.KAA22177@beluga.mojam.com> 
Message-ID: <200003291823.NAA20134@eric.cnri.reston.va.us>

> Does anyone else besides me have trouble getting their Python tree to sync
> with the CVS repository?  I've tried all manner of flags to "cvs update",
> most recently "cvs update -d -A ." with no success.  There are still some
> files I know Fred Drake has patched that show up as different and it refuses 
> to pick up Lib/robotparser.py.

My bad.  When I move or copy a file around in the CVS repository
directly instead of using cvs commit, I have to manually call a script
that updates the mirror.  I've done that now, and robotparser.py
should now be in the mirror.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward at cnri.reston.va.us  Wed Mar 29 21:06:14 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Wed, 29 Mar 2000 14:06:14 -0500
Subject: [Python-Dev] Distutils now in Python CVS tree
Message-ID: <20000329140613.A5850@cnri.reston.va.us>

Hi all --

Distutils is now available through the Python CVS tree *in addition to
its own CVS tree*.  That is, if you keep on top of developments in the
Python CVS tree, then you will be tracking the latest Distutils code in
Lib/distutils.  Or, you can keep following the Distutils through its own
CVS tree.  (This is all done through one itty-bitty little symlink in
the CNRI CVS repository, and It Just Works.  Cool.)

Note that only the 'distutils' subdirectory of the distutils
distribution is tracked by Python: that is, changes to the
documentation, test suites, and example setup scripts are *not*
reflected in the Python CVS tree.

If you follow neither Python nor Distutils CVS updates, this doesn't
affect you.

If you've been following Distutils CVS updates, you can continue to do so
as you've always done (and as is documented on the Distutils "Anonymous
CVS" web page).

If you've been following Python CVS updates, then you are now following
most Distutils CVS updates too -- as long as you do "cvs update -d", of
course.  If you're interested in following updates in the Distutils
documentation, tests, examples, etc. then you should follow the
Distutils CVS tree directly.

If you've been following *both* Python and Distutils CVS updates, and
hacking on the Distutils, then you should pick one or the other as your
working directory.  If you submit patches, it doesn't really matter if
they're relative to the top of the Python tree, the top of the Distutils
tree, or what -- I'll probably figure it out.  However, it's probably
best to continue sending Distutils patches to distutils-sig at python.org,
*or* direct to me (gward at python.net) for trivial patches.  Unless Guido
says otherwise, I don't see a compelling reason to send Distutils
patches to patches at python.org.

In related news, the distutils-checkins list is probably going to go
away, and all Distutils checkin messages will go python-checkins
instead.  Let me know if you avidly follow distutils-checkins, but do
*not* want to follow python-checkins -- if lots of people respond
(doubtful, as distutils-checkins only had 3 subscribers last I
checked!), we'll reconsider.

        Greg


From fdrake at acm.org  Wed Mar 29 21:28:19 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 29 Mar 2000 14:28:19 -0500 (EST)
Subject: [Python-Dev] Re: [Distutils] Distutils now in Python CVS tree
In-Reply-To: <20000329140525.A5842@cnri.reston.va.us>
References: <20000329140525.A5842@cnri.reston.va.us>
Message-ID: <14562.22867.998809.897214@seahag.cnri.reston.va.us>

Greg Ward writes:
 > Distutils is now available through the Python CVS tree *in addition to
 > its own CVS tree*.  That is, if you keep on top of developments in the
 > Python CVS tree, then you will be tracking the latest Distutils code in
 > Lib/distutils.  Or, you can keep following the Distutils through its own
 > CVS tree.  (This is all done through one itty-bitty little symlink in
 > the CNRI CVS repository, and It Just Works.  Cool.)

Greg,
  You may want to point out the legalese requirements for patches to
the Python tree.  ;(  That means the patches should probably go to
patches at python.org or you should ensure an archive of all the legal
statements is maintained at CNRI.


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From ping at lfw.org  Wed Mar 29 23:44:31 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Wed, 29 Mar 2000 15:44:31 -0600 (CST)
Subject: [Python-Dev] Great Renaming?  What is the goal?
In-Reply-To: <02c901bf989b$be203d80$34aab5d4@hagrid>
Message-ID: <Pine.LNX.4.10.10003291539340.7351-100000@server1.lfw.org>

On Tue, 28 Mar 2000, Fredrik Lundh wrote:
> 
> > IMO this subdivision could be discussed and possibly revised.  
> 
> here's one proposal:
> http://www.pythonware.com/people/fredrik/librarybook-contents.htm

Wow.  I don't think i hardly ever use any of the modules in your
"Commonly Used Modules" category.  Except traceback, from time to
time, but that's really the only one!

Hmm.  I'd arrange things a little differently, though i do like
the category for Data Representation (it should probably go next
to Data Storage though).  I would prefer a separate group for
interpreter-and-development-related things.  The "File Formats"
group seems weak... to me, its contents would better belong in a
"parsing" or "text processing" classification.

urlparse definitely goes with urllib.

These comments are kind of random, i know... maybe i'll try
putting together another grouping if i have any time.


-- ?!ng


From adustman at comstar.net  Thu Mar 30 02:57:06 2000
From: adustman at comstar.net (Andy Dustman)
Date: Wed, 29 Mar 2000 19:57:06 -0500 (EST)
Subject: [Python-Dev] socketmodule with SSL enabled
In-Reply-To: <200003290150.UAA17819@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003291952110.20418-100000@kenny.comstar.net>

I had to make the following one-line change to socketmodule.c so that it
would link properly with openssl-0.9.4. In studying the openssl include
files, I found:

#define SSLeay_add_ssl_algorithms()   SSL_library_init()

SSL_library_init() seems to be the "correct" call nowadays. I don't know
why this isn't being picked up. I also don't know how well the module
works, other than it imports, but I sure would like to try it with
Zope/ZServer/Medusa...

-- 
andy dustman       |     programmer/analyst     |      comstar.net, inc.
telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d
"Therefore, sweet knights, if you may doubt your strength or courage, 
come no further, for death awaits you all, with nasty, big, pointy teeth!"

Index: socketmodule.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Modules/socketmodule.c,v
retrieving revision 1.98
diff -c -r1.98 socketmodule.c
*** socketmodule.c      2000/03/24 20:56:56     1.98
--- socketmodule.c      2000/03/30 00:49:09
***************
*** 2384,2390 ****
                return;
  #ifdef USE_SSL
        SSL_load_error_strings();
!       SSLeay_add_ssl_algorithms();
        SSLErrorObject = PyErr_NewException("socket.sslerror", NULL, NULL);
        if (SSLErrorObject == NULL)
                return;
--- 2384,2390 ----
                return;
  #ifdef USE_SSL
        SSL_load_error_strings();
!       SSL_library_init();
        SSLErrorObject = PyErr_NewException("socket.sslerror", NULL, NULL);
        if (SSLErrorObject == NULL)
                return;


From gstein at lyra.org  Thu Mar 30 04:54:27 2000
From: gstein at lyra.org (Greg Stein)
Date: Wed, 29 Mar 2000 18:54:27 -0800 (PST)
Subject: [Python-Dev] installation points (was: Great Renaming?  What is the goal?)
In-Reply-To: <1257794452-30405909@hypernet.com>
Message-ID: <Pine.LNX.4.10.10003291832350.8823-100000@nebula.lyra.org>

On Wed, 29 Mar 2000, Gordon McMillan wrote:
> Moshe Zadka wrote:
> > On Tue, 28 Mar 2000, Gordon McMillan wrote:
> > > What would happen if he (and everyone else) installed 
> > > themselves *into* my core packages, then I decided I didn't 
> > > want his stuff? More than likely I'd have to scrub the damn 
> > > installation and start all over again.
> > 
> > I think Greg Stein answered that objection, by reminding us that the
> > filesystem isn't the only way to set up a package hierarchy.
> 
> You mean when Greg said:
> >Assuming that you use an archive like those found in my "small" distro or
> > Gordon's distro, then this is no problem. The archive simply recognizes
> > and maps "text.encoding.macbinary" to its own module.
> 
> I don't know what this has to do with it. When we get around 
> to the 'macbinary' part, we have already established that 
> 'text.encoding' is the parent which should supply 'macbinary'.

good point...

> >  In
> > particular, even with Python's current module system, there is no need to
> > scrub installations: Python core modules go (under UNIX) in
> > /usr/local/lib/python1.5, and 3rd party modules go in
> > /usr/local/lib/python1.5/site-packages. 
> 
> And if there's a /usr/local/lib/python1.5/text/encoding, there's 
> no way that /usr/local/lib/python1.5/site-
> packages/text/encoding will get searched.
> 
> I believe you could hack up an importer that did allow this, and 
> I think you'd be 100% certifiable if you did. Just look at the 
> surprise factor.
> 
> Hacking stuff into another package is just as evil as math.pi = 
> 42.

Not if the package was designed for it. For a "package" like "net", it
would be perfectly acceptable to allow third-parties to define that as
their installation point.

And yes, assume there is an importer that looks into the installed
archives for modules. In the example, the harder part is determining where
the "text.encoding" package is loaded from. And yah: it may be difficult
to arrange the the text.encoding's importer to allow for archive
searching.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From thomas.heller at ion-tof.com  Thu Mar 30 21:30:25 2000
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Thu, 30 Mar 2000 21:30:25 +0200
Subject: [Python-Dev] Metaclasses, customizing attribute access for classes
Message-ID: <021c01bf9a7e$662327c0$4500a8c0@thomasnotebook>

Dear Python-developers,

Recently I played with metaclasses from within python,
also with Jim Fulton's ExtensionClass.
I even tried to write my own metaclass in a C-extension, using the
famous Don Beaudry hook.
It seems that ExtensionClass does not completely what I want.
Metaclasses implemented in python are somewhat slow,
also writing them is a lot of work.
Writing a metaclass in C is even more work...

Well, what do I want?

Often, I use the following pattern:
class X:
    def __init__ (self):
        self.delegate = anObjectImplementedInC(...)

    def __getattr__ (self, key):
        return self.delegate.dosomething(key)

    def __setattr__ (self, key, value):
        self.delegate.doanotherthing(key, value)

    def __delattr__ (self, key):
        self.delegate.doevenmore(key)

This is too slow (for me).
So what I would like do to is:

class X:
    def __init__ (self):
        self.__dict__ = aMappingObject(...)

and now aMappingObject will automatically receive
all the setattr, getattr, and delattr calls.

The *only* thing which is required for this is to remove
the restriction that the __dict__ attribute must be a dictionary.
This is only a small change to classobject.c (which unfortunately I
have only implemented for 1.5.2, not for the CVS version).
The performance impact for this change is unnoticable in pystone.

What do you think?
Should I prepare a patch?
Any chance that this can be included in a future python version?

Thomas Heller


From petrilli at amber.org  Thu Mar 30 21:52:02 2000
From: petrilli at amber.org (Christopher Petrilli)
Date: Thu, 30 Mar 2000 14:52:02 -0500
Subject: [Python-Dev] Unicode compile
Message-ID: <20000330145202.B9078@trump.amber.org>

I don't know how much memory other people have in their machiens, but
in this machine (128Mb), I get the following trying to compile a CVS
checkout of Python:

gcc  -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c
./unicodedatabase.c:53482: virtual memory exhausted

I hope that this is a temporary thing, or we ship the database some
other manner, but I would argue that you should be able to compile
Python on a machine with 32Mb of RAM at MOST.... for an idea of how
much VM this machine has, i have 256Mb of SWAP on top of it.

Chris
-- 
| Christopher Petrilli
| petrilli at amber.org


From guido at python.org  Thu Mar 30 22:12:22 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 15:12:22 -0500
Subject: [Python-Dev] Unicode compile
In-Reply-To: Your message of "Thu, 30 Mar 2000 14:52:02 EST."
             <20000330145202.B9078@trump.amber.org> 
References: <20000330145202.B9078@trump.amber.org> 
Message-ID: <200003302012.PAA22062@eric.cnri.reston.va.us>

> I don't know how much memory other people have in their machiens, but
> in this machine (128Mb), I get the following trying to compile a CVS
> checkout of Python:
> 
> gcc  -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c
> ./unicodedatabase.c:53482: virtual memory exhausted
> 
> I hope that this is a temporary thing, or we ship the database some
> other manner, but I would argue that you should be able to compile
> Python on a machine with 32Mb of RAM at MOST.... for an idea of how
> much VM this machine has, i have 256Mb of SWAP on top of it.

I'm not sure how to fix this, short of reading the main database from
a file.  Marc-Andre?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tismer at tismer.com  Thu Mar 30 22:14:55 2000
From: tismer at tismer.com (Christian Tismer)
Date: Thu, 30 Mar 2000 22:14:55 +0200
Subject: [Python-Dev] Unicode compile
References: <20000330145202.B9078@trump.amber.org>
Message-ID: <38E3B5BF.2D00F930@tismer.com>


Christopher Petrilli wrote:
> 
> I don't know how much memory other people have in their machiens, but
> in this machine (128Mb), I get the following trying to compile a CVS
> checkout of Python:
> 
> gcc  -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c
> ./unicodedatabase.c:53482: virtual memory exhausted
> 
> I hope that this is a temporary thing, or we ship the database some
> other manner, but I would argue that you should be able to compile
> Python on a machine with 32Mb of RAM at MOST.... for an idea of how
> much VM this machine has, i have 256Mb of SWAP on top of it.

I had similar effects, what made me work on a compressed database
(see older messages). Due to time limits, I will not get ready
before 1.6.a1 is out. And then quite a lot of other changes
will be necessary by Marc, since the API changes quite much.
But it will definately be a less than 20 KB module, proven.

ciao - chris(2)

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From akuchlin at mems-exchange.org  Thu Mar 30 22:14:27 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 15:14:27 -0500 (EST)
Subject: [Python-Dev] Unicode compile
In-Reply-To: <200003302012.PAA22062@eric.cnri.reston.va.us>
References: <20000330145202.B9078@trump.amber.org>
	<200003302012.PAA22062@eric.cnri.reston.va.us>
Message-ID: <14563.46499.555853.413690@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>I'm not sure how to fix this, short of reading the main database from
>a file.  Marc-Andre?

Turning off optimization may help.  (Or it may not -- it might be
creating the data structures for a large static table that's the
problem.)

--amk


From akuchlin at mems-exchange.org  Thu Mar 30 22:22:02 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 15:22:02 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <200003282000.PAA11988@eric.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
	<200003282000.PAA11988@eric.cnri.reston.va.us>
Message-ID: <14563.46954.70800.706245@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>I don't know enough about this, but it seems that there might be two
>steps: *creating* a mmap object is necessarily platform-specific; but
>*using* a mmap object could be platform-neutral.
>
>What is the API for mmap objects?

You create them; Unix wants a file descriptor, and Windows wants a
filename.  Then they behave like buffer objects, like mutable strings.

I like Fredrik's suggestion of an 'open(filename, mode, ...)' type of
interface.  If someone can suggest a way to handle the extra flags
such as MAP_SHARED and the Windows tag argument, I'll happily
implement it.  Maybe just keyword arguments that differ across
platforms?  open(filename, mode, [tag = 'foo',] [flags =
mmapfile.MAP_SHARED]).  We could preserve the ability to mmap() only a
file descriptor on Unix through a separate openfd() function.  I'm
also strongly tempted to rename the module from mmapfile to just
'mmap'.

I'd suggest waiting until the interface is finalized before adding the
module to the CVS tree -- which means after 1.6a1 -- but I can add the
module as it stands if you like.  Guido, let me know if you want me to
do that.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
A Puck is harder by far to hurt than some little lord of malice from the lands
of ice and snow. We Pucks are old and hard and wild...
  -- Robin Goodfellow, in SANDMAN #66: "The Kindly Ones:10"


From guido at python.org  Thu Mar 30 22:23:42 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 15:23:42 -0500
Subject: [Python-Dev] socketmodule with SSL enabled
In-Reply-To: Your message of "Wed, 29 Mar 2000 19:57:06 EST."
             <Pine.LNX.4.10.10003291952110.20418-100000@kenny.comstar.net> 
References: <Pine.LNX.4.10.10003291952110.20418-100000@kenny.comstar.net> 
Message-ID: <200003302023.PAA22350@eric.cnri.reston.va.us>

> I had to make the following one-line change to socketmodule.c so that it
> would link properly with openssl-0.9.4. In studying the openssl include
> files, I found:
> 
> #define SSLeay_add_ssl_algorithms()   SSL_library_init()
> 
> SSL_library_init() seems to be the "correct" call nowadays. I don't know
> why this isn't being picked up. I also don't know how well the module
> works, other than it imports, but I sure would like to try it with
> Zope/ZServer/Medusa...

Strange -- the version of OpenSSL I have also calls itself 0.9.4
("OpenSSL 0.9.4 09 Aug 1999" to be precise) and doesn't have
SSL_library_init().

I wonder what gives...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at python.org  Thu Mar 30 22:25:58 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 15:25:58 -0500
Subject: [Python-Dev] mmapfile module
In-Reply-To: Your message of "Thu, 30 Mar 2000 15:22:02 EST."
             <14563.46954.70800.706245@amarok.cnri.reston.va.us> 
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us>  
            <14563.46954.70800.706245@amarok.cnri.reston.va.us> 
Message-ID: <200003302025.PAA22367@eric.cnri.reston.va.us>

> Guido van Rossum writes:
> >I don't know enough about this, but it seems that there might be two
> >steps: *creating* a mmap object is necessarily platform-specific; but
> >*using* a mmap object could be platform-neutral.
> >
> >What is the API for mmap objects?

[AMK]
> You create them; Unix wants a file descriptor, and Windows wants a
> filename.  Then they behave like buffer objects, like mutable strings.
> 
> I like Fredrik's suggestion of an 'open(filename, mode, ...)' type of
> interface.  If someone can suggest a way to handle the extra flags
> such as MAP_SHARED and the Windows tag argument, I'll happily
> implement it.  Maybe just keyword arguments that differ across
> platforms?  open(filename, mode, [tag = 'foo',] [flags =
> mmapfile.MAP_SHARED]).  We could preserve the ability to mmap() only a
> file descriptor on Unix through a separate openfd() function.

Yes, keyword args seem to be the way to go.  To avoid an extra
function you could add a fileno=... kwarg, in which case the filename
is ignored or required to be "".

> I'm
> also strongly tempted to rename the module from mmapfile to just
> 'mmap'.

Sure.

> I'd suggest waiting until the interface is finalized before adding the
> module to the CVS tree -- which means after 1.6a1 -- but I can add the
> module as it stands if you like.  Guido, let me know if you want me to
> do that.

Might as well check it in -- the alpha is going to be rough and I
expect another alpha to come out shortly to correct the biggest
problems.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Thu Mar 30 22:22:08 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 30 Mar 2000 22:22:08 +0200
Subject: [Python-Dev] Unicode compile
References: <20000330145202.B9078@trump.amber.org> <200003302012.PAA22062@eric.cnri.reston.va.us>
Message-ID: <38E3B770.6CD61C37@lemburg.com>

Guido van Rossum wrote:
> 
> > I don't know how much memory other people have in their machiens, but
> > in this machine (128Mb), I get the following trying to compile a CVS
> > checkout of Python:
> >
> > gcc  -g -O2 -I./../Include -I.. -DHAVE_CONFIG_H -c ./unicodedatabase.c
> > ./unicodedatabase.c:53482: virtual memory exhausted
> >
> > I hope that this is a temporary thing, or we ship the database some
> > other manner, but I would argue that you should be able to compile
> > Python on a machine with 32Mb of RAM at MOST.... for an idea of how
> > much VM this machine has, i have 256Mb of SWAP on top of it.
> 
> I'm not sure how to fix this, short of reading the main database from
> a file.  Marc-Andre?

Hmm, the file compiles fine on my 64MB Linux machine with about 100MB 
of swap. What gcc version do you use ?

Anyway, once Christian is ready with his compact
replacement I think we no longer have to worry about that
chunk of static data :-)

Reading in the data from a file is not a very good solution,
because it would override the OS optimizations for static
data in object files (like e.g. swapping in only those pages
which are really needed, etc.).

An alternative solution would be breaking the large
table into several smaller ones and accessing it via
a redirection function.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From adustman at comstar.net  Thu Mar 30 23:12:51 2000
From: adustman at comstar.net (Andy Dustman)
Date: Thu, 30 Mar 2000 16:12:51 -0500 (EST)
Subject: [Python-Dev] socketmodule with SSL enabled
In-Reply-To: <200003302023.PAA22350@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003301611430.32616-100000@kenny.comstar.net>

On Thu, 30 Mar 2000, Guido van Rossum wrote:

> > I had to make the following one-line change to socketmodule.c so that it
> > would link properly with openssl-0.9.4. In studying the openssl include
> > files, I found:
> > 
> > #define SSLeay_add_ssl_algorithms()   SSL_library_init()
> > 
> > SSL_library_init() seems to be the "correct" call nowadays. I don't know
> > why this isn't being picked up. I also don't know how well the module
> > works, other than it imports, but I sure would like to try it with
> > Zope/ZServer/Medusa...
> 
> Strange -- the version of OpenSSL I have also calls itself 0.9.4
> ("OpenSSL 0.9.4 09 Aug 1999" to be precise) and doesn't have
> SSL_library_init().
> 
> I wonder what gives...

I don't know. Right after I made the patch, I found that 0.9.5 is
available, and I was able to successfully compile against that version
(with the patch). 

-- 
andy dustman       |     programmer/analyst     |      comstar.net, inc.
telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d
"Therefore, sweet knights, if you may doubt your strength or courage, 
come no further, for death awaits you all, with nasty, big, pointy teeth!"


From akuchlin at mems-exchange.org  Thu Mar 30 23:19:45 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 16:19:45 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <200003302025.PAA22367@eric.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
	<200003282000.PAA11988@eric.cnri.reston.va.us>
	<14563.46954.70800.706245@amarok.cnri.reston.va.us>
	<200003302025.PAA22367@eric.cnri.reston.va.us>
Message-ID: <14563.50417.909045.81868@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>Might as well check it in -- the alpha is going to be rough and I
>expect another alpha to come out shortly to correct the biggest
>problems.

Done -- just doing my bit to ensure the first alpha is rough! :)

My next task is to add the Expat module.  My understanding is that
it's OK to add Expat itself, too; where should I put all that code?
Modules/expat/* ?

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
I'll bring the Kindly Ones down on his blasted head.
  -- Desire, in SANDMAN #31: "Three Septembers and a January"


From fdrake at acm.org  Thu Mar 30 23:29:58 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 30 Mar 2000 16:29:58 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <14563.50417.909045.81868@amarok.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
	<200003282000.PAA11988@eric.cnri.reston.va.us>
	<14563.46954.70800.706245@amarok.cnri.reston.va.us>
	<200003302025.PAA22367@eric.cnri.reston.va.us>
	<14563.50417.909045.81868@amarok.cnri.reston.va.us>
Message-ID: <14563.51030.24773.587972@seahag.cnri.reston.va.us>

Andrew M. Kuchling writes:
 > Done -- just doing my bit to ensure the first alpha is rough! :)
 > 
 > My next task is to add the Expat module.  My understanding is that
 > it's OK to add Expat itself, too; where should I put all that code?
 > Modules/expat/* ?

  Do you have documentation for this?


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From akuchlin at mems-exchange.org  Thu Mar 30 23:30:35 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 16:30:35 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <14563.51030.24773.587972@seahag.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
	<200003282000.PAA11988@eric.cnri.reston.va.us>
	<14563.46954.70800.706245@amarok.cnri.reston.va.us>
	<200003302025.PAA22367@eric.cnri.reston.va.us>
	<14563.50417.909045.81868@amarok.cnri.reston.va.us>
	<14563.51030.24773.587972@seahag.cnri.reston.va.us>
Message-ID: <14563.51067.560938.367690@amarok.cnri.reston.va.us>

Fred L. Drake, Jr. writes:
>  Do you have documentation for this?

Somewhere at home, I think, but not here at work.  I'll try to get it
checked in before 1.6alpha1, but don't hold me to that.

--amk


From guido at python.org  Thu Mar 30 23:31:58 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 16:31:58 -0500
Subject: [Python-Dev] mmapfile module
In-Reply-To: Your message of "Thu, 30 Mar 2000 16:19:45 EST."
             <14563.50417.909045.81868@amarok.cnri.reston.va.us> 
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us>  
            <14563.50417.909045.81868@amarok.cnri.reston.va.us> 
Message-ID: <200003302131.QAA22897@eric.cnri.reston.va.us>

> Done -- just doing my bit to ensure the first alpha is rough! :)

When the going gets rough, the rough get going :-)

> My next task is to add the Expat module.  My understanding is that
> it's OK to add Expat itself, too; where should I put all that code?
> Modules/expat/* ?

Whoa...  Not sure.  This will give issues with Patrice, at least (even
if it is pure Open Source -- given the size).  I'd prefer to add
instructions to Setup.in about where to get it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at acm.org  Thu Mar 30 23:34:55 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 30 Mar 2000 16:34:55 -0500 (EST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <14563.51067.560938.367690@amarok.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
	<200003282000.PAA11988@eric.cnri.reston.va.us>
	<14563.46954.70800.706245@amarok.cnri.reston.va.us>
	<200003302025.PAA22367@eric.cnri.reston.va.us>
	<14563.50417.909045.81868@amarok.cnri.reston.va.us>
	<14563.51030.24773.587972@seahag.cnri.reston.va.us>
	<14563.51067.560938.367690@amarok.cnri.reston.va.us>
Message-ID: <14563.51327.190466.477566@seahag.cnri.reston.va.us>

Andrew M. Kuchling writes:
 > Somewhere at home, I think, but not here at work.  I'll try to get it
 > checked in before 1.6alpha1, but don't hold me to that.

  The date isn't important; I'm not planning to match alpha/beta
releases with Doc releases.  I just want to be sure it gets in soon so
that the debugging process can kick in for that as well.  ;)
  Thanks!


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido at python.org  Thu Mar 30 23:34:02 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 16:34:02 -0500
Subject: [Python-Dev] mmapfile module
In-Reply-To: Your message of "Thu, 30 Mar 2000 16:31:58 EST."
             <200003302131.QAA22897@eric.cnri.reston.va.us> 
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us>  
            <200003302131.QAA22897@eric.cnri.reston.va.us> 
Message-ID: <200003302134.QAA22939@eric.cnri.reston.va.us>

> Whoa...  Not sure.  This will give issues with Patrice, at least (even
> if it is pure Open Source -- given the size).

For those outside CNRI -- Patrice is CNRI's tough IP lawyer.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From akuchlin at mems-exchange.org  Thu Mar 30 23:48:13 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 16:48:13 -0500 (EST)
Subject: [Python-Dev] Expat module
In-Reply-To: <200003302131.QAA22897@eric.cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
	<200003282000.PAA11988@eric.cnri.reston.va.us>
	<14563.46954.70800.706245@amarok.cnri.reston.va.us>
	<200003302025.PAA22367@eric.cnri.reston.va.us>
	<14563.50417.909045.81868@amarok.cnri.reston.va.us>
	<200003302131.QAA22897@eric.cnri.reston.va.us>
Message-ID: <14563.52125.401817.986919@amarok.cnri.reston.va.us>

Guido van Rossum writes:
>> My next task is to add the Expat module.  My understanding is that
>> it's OK to add Expat itself, too; where should I put all that code?
>> Modules/expat/* ?
>
>Whoa...  Not sure.  This will give issues with Patrice, at least (even
>if it is pure Open Source -- given the size).  I'd prefer to add
>instructions to Setup.in about where to get it.

Fair enough; I'll just add the module itself, then, and we can always
change it later.  

Should we consider replacing the makesetup/Setup.in mechanism with a
setup.py script that uses the Distutils?  You'd have to compile a
minipython with just enough critical modules -- strop and posixmodule
are probably the most important ones -- in order to run setup.py.
It's something I'd like to look at for 1.6, because then you could be
much smarter in automatically enabling modules.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
This is the way of Haskell or Design by Contract of Eiffel. This one is like
wearing a XV century armor, you walk very safely but in a very tiring way.
  -- Manuel Gutierrez Algaba, 26 Jan 2000


From guido at python.org  Fri Mar 31 00:41:45 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 17:41:45 -0500
Subject: [Python-Dev] Expat module
In-Reply-To: Your message of "Thu, 30 Mar 2000 16:48:13 EST."
             <14563.52125.401817.986919@amarok.cnri.reston.va.us> 
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us>  
            <14563.52125.401817.986919@amarok.cnri.reston.va.us> 
Message-ID: <200003302241.RAA23050@eric.cnri.reston.va.us>

> Fair enough; I'll just add the module itself, then, and we can always
> change it later.  

OK.

> Should we consider replacing the makesetup/Setup.in mechanism with a
> setup.py script that uses the Distutils?  You'd have to compile a
> minipython with just enough critical modules -- strop and posixmodule
> are probably the most important ones -- in order to run setup.py.
> It's something I'd like to look at for 1.6, because then you could be
> much smarter in automatically enabling modules.

If you can come up with something that works well enough, that would
be great.  (Although I'm not sure where the distutils come in.)

We still need to use configure/autoconf though.

Hardcoding a small complement of modules is no problem.  (Why do you
think you need strop though?  Remember we have string methods!)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mhammond at skippinet.com.au  Fri Mar 31 01:03:39 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri, 31 Mar 2000 09:03:39 +1000
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/PC python_nt.rc,1.8,1.9
In-Reply-To: <200003302259.RAA23266@eric.cnri.reston.va.us>
Message-ID: <ECEPKNMJLHAPFFJHDOJBAEJICHAA.mhammond@skippinet.com.au>

This is the version number as displayed by Windows Explorer in the
"properties" dialog.

Mark.

> Modified Files:
> 	python_nt.rc
> Log Message:
> Seems there was a version string here that still looked
> like 1.5.2.
>
>
> Index: python_nt.rc
> ==========================================================
> =========
> RCS file: /projects/cvsroot/python/dist/src/PC/python_nt.rc,v
> retrieving revision 1.8
> retrieving revision 1.9
> diff -C2 -r1.8 -r1.9
> *** python_nt.rc	2000/03/29 01:50:50	1.8
> --- python_nt.rc	2000/03/30 22:59:09	1.9
> ***************
> *** 29,34 ****
>
>   VS_VERSION_INFO VERSIONINFO
> !  FILEVERSION 1,5,2,3
> !  PRODUCTVERSION 1,5,2,3
>    FILEFLAGSMASK 0x3fL
>   #ifdef _DEBUG
> --- 29,34 ----
>
>   VS_VERSION_INFO VERSIONINFO
> !  FILEVERSION 1,6,0,0
> !  PRODUCTVERSION 1,6,0,0
>    FILEFLAGSMASK 0x3fL
>   #ifdef _DEBUG
>
>
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at python.org
> http://www.python.org/mailman/listinfo/python-checkins
>


From effbot at telia.com  Fri Mar 31 00:40:51 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 00:40:51 +0200
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
Message-ID: <00b701bf9a99$022339c0$34aab5d4@hagrid>

at this time, SRE uses types instead of classes for compiled
patterns and matches.  these classes provide a documented
interface, and a bunch of internal attributes, for example:

RegexObjects:

    code -- a PCRE code object
    pattern -- the source pattern
    groupindex -- maps group names to group indices

MatchObjects:

    regs -- same as match.span()?
    groupindex -- as above
    re -- the pattern object used for this match
    string -- the target string used for this match

the problem is that some other modules use these attributes
directly.  for example, xmllib.py uses the pattern attribute, and
other code I've seen uses regs to speed things up.

in SRE, I would like to get rid of all these (except possibly for
the match.string attribute).

opinions?

</F>


From guido at python.org  Fri Mar 31 01:31:43 2000
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Mar 2000 18:31:43 -0500
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
In-Reply-To: Your message of "Fri, 31 Mar 2000 00:40:51 +0200."
             <00b701bf9a99$022339c0$34aab5d4@hagrid> 
References: <00b701bf9a99$022339c0$34aab5d4@hagrid> 
Message-ID: <200003302331.SAA24895@eric.cnri.reston.va.us>

> at this time, SRE uses types instead of classes for compiled
> patterns and matches.  these classes provide a documented
> interface, and a bunch of internal attributes, for example:
> 
> RegexObjects:
> 
>     code -- a PCRE code object
>     pattern -- the source pattern
>     groupindex -- maps group names to group indices
> 
> MatchObjects:
> 
>     regs -- same as match.span()?
>     groupindex -- as above
>     re -- the pattern object used for this match
>     string -- the target string used for this match
> 
> the problem is that some other modules use these attributes
> directly.  for example, xmllib.py uses the pattern attribute, and
> other code I've seen uses regs to speed things up.
> 
> in SRE, I would like to get rid of all these (except possibly for
> the match.string attribute).
> 
> opinions?

Sounds reasonable.  All std lib modules that violate this will need to
be fixed once sre.py replaces re.py.

(Checkin of sre is next.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From akuchlin at mems-exchange.org  Fri Mar 31 01:40:16 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 30 Mar 2000 18:40:16 -0500 (EST)
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
In-Reply-To: <00b701bf9a99$022339c0$34aab5d4@hagrid>
References: <00b701bf9a99$022339c0$34aab5d4@hagrid>
Message-ID: <14563.58848.109072.339060@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>RegexObjects:
>    code -- a PCRE code object
>    pattern -- the source pattern
>    groupindex -- maps group names to group indices

pattern and groupindex are documented in the Library Reference, and
they're part of the public interface.  .code is not, so you can drop
it.

>MatchObjects:
>    regs -- same as match.span()?
>    groupindex -- as above
>    re -- the pattern object used for this match
>    string -- the target string used for this match

.re and .string are documented. I don't see a reference to
MatchObject.groupindex anywhere, and .regs isn't documented, so those
two can be ignored; xmllib or whatever external modules use them are
being very naughty, so go ahead and break them.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Imagine a thousand thousand fireflies of every shape and color; Oh, that was
Baghdad at night in those days.
  -- From SANDMAN #50: "Ramadan"


From effbot at telia.com  Fri Mar 31 01:05:15 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 01:05:15 +0200
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
References: <00b701bf9a99$022339c0$34aab5d4@hagrid> <14563.58848.109072.339060@amarok.cnri.reston.va.us>
Message-ID: <00e901bf9a9c$6c036240$34aab5d4@hagrid>

Andrew wrote:
> >RegexObjects:
> >    code -- a PCRE code object
> >    pattern -- the source pattern
> >    groupindex -- maps group names to group indices
> 
> pattern and groupindex are documented in the Library Reference, and
> they're part of the public interface.

hmm.  I could have sworn...   guess I didn't look carefully
enough (or someone's used his time machine again :-).

oh well, more bloat...

btw, "pattern" doesn't make much sense in SRE -- who says
the pattern object was created by re.compile?  guess I'll just
set it to None in other cases (e.g. sregex, sreverb, sgema...)

</F>


From bwarsaw at cnri.reston.va.us  Fri Mar 31 02:35:16 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 30 Mar 2000 19:35:16 -0500 (EST)
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
References: <00b701bf9a99$022339c0$34aab5d4@hagrid>
	<14563.58848.109072.339060@amarok.cnri.reston.va.us>
	<00e901bf9a9c$6c036240$34aab5d4@hagrid>
Message-ID: <14563.62148.860971.360871@anthem.cnri.reston.va.us>

>>>>> "FL" == Fredrik Lundh <effbot at telia.com> writes:

    FL> hmm.  I could have sworn...   guess I didn't look carefully
    FL> enough (or someone's used his time machine again :-).

Yep, sorry.  If it's documented as in the public interface, it should
be kept.  Anything else can go (he says without yet grep'ing through
his various code bases).

-Barry


From bwarsaw at cnri.reston.va.us  Fri Mar 31 06:34:15 2000
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 30 Mar 2000 23:34:15 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us>
Message-ID: <14564.10951.90258.729547@anthem.cnri.reston.va.us>

>>>>> "Guido" == Guido van Rossum <guido at cnri.reston.va.us> writes:

    Guido> Modified Files: mmapmodule.c Log Message: Hacked for Win32
    Guido> by Mark Hammond.  Reformatted for 8-space tabs and fitted
    Guido> into 80-char lines by GvR.

Can we change the 8-space-tab rule for all new C code that goes in?  I
know that we can't practically change existing code right now, but for
new C code, I propose we use no tab characters, and we use a 4-space
block indentation.

-Barry


From DavidA at ActiveState.com  Fri Mar 31 07:07:02 2000
From: DavidA at ActiveState.com (David Ascher)
Date: Thu, 30 Mar 2000 21:07:02 -0800
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
In-Reply-To: <14564.10951.90258.729547@anthem.cnri.reston.va.us>
Message-ID: <NDBBJPNCJLKKIOBLDOMJGEIOCDAA.DavidA@ActiveState.com>

> Can we change the 8-space-tab rule for all new C code that goes in?  I
> know that we can't practically change existing code right now, but for
> new C code, I propose we use no tab characters, and we use a 4-space
> block indentation.

Heretic!  

+1, FWIW =)


From bwarsaw at cnri.reston.va.us  Fri Mar 31 07:16:48 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Fri, 31 Mar 2000 00:16:48 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
References: <14564.10951.90258.729547@anthem.cnri.reston.va.us>
	<NDBBJPNCJLKKIOBLDOMJGEIOCDAA.DavidA@ActiveState.com>
Message-ID: <14564.13504.310866.835201@anthem.cnri.reston.va.us>

>>>>> "DA" == David Ascher <DavidA at ActiveState.com> writes:

    DA> Heretic!

    DA> +1, FWIW =)

I hereby offer to so untabify and reformat any C code in the standard
distribution that Guido will approve of.

-Barry


From mhammond at skippinet.com.au  Fri Mar 31 07:16:26 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri, 31 Mar 2000 15:16:26 +1000
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
In-Reply-To: <NDBBJPNCJLKKIOBLDOMJGEIOCDAA.DavidA@ActiveState.com>
Message-ID: <ECEPKNMJLHAPFFJHDOJBMEKCCHAA.mhammond@skippinet.com.au>

+1 for me too.  It also brings all source files under the same
guidelines (rather than seperate ones for .py and .c)

Mark.


From bwarsaw at cnri.reston.va.us  Fri Mar 31 07:40:16 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Fri, 31 Mar 2000 00:40:16 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
References: <NDBBJPNCJLKKIOBLDOMJGEIOCDAA.DavidA@ActiveState.com>
	<ECEPKNMJLHAPFFJHDOJBMEKCCHAA.mhammond@skippinet.com.au>
Message-ID: <14564.14912.629414.970309@anthem.cnri.reston.va.us>

>>>>> "MH" == Mark Hammond <mhammond at skippinet.com.au> writes:

    MH> +1 for me too.  It also brings all source files under the same
    MH> guidelines (rather than seperate ones for .py and .c)

BTW, I further propose that if Guido lets me reformat the C code, that
we freeze other checkins for the duration and I temporarily turn off
the python-checkins email.  That is, unless you guys /want/ to be
bombarded with boatloads of useless diffs. :)

-Barry


From pf at artcom-gmbh.de  Fri Mar 31 08:45:45 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Fri, 31 Mar 2000 08:45:45 +0200 (MEST)
Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....)
In-Reply-To: <14564.14912.629414.970309@anthem.cnri.reston.va.us> from "bwarsaw@cnri.reston.va.us" at "Mar 31, 2000  0:40:16 am"
Message-ID: <m12avBh-000CnCC@artcom0.artcom-gmbh.de>

Hi!

sigh :-(

> >>>>> "MH" == Mark Hammond <mhammond at skippinet.com.au> writes:
> 
>     MH> +1 for me too.  It also brings all source files under the same
>     MH> guidelines (rather than seperate ones for .py and .c)
 
bwarsaw at cnri.reston.va.us:
> BTW, I further propose that if Guido lets me reformat the C code, that
> we freeze other checkins for the duration and I temporarily turn off
> the python-checkins email.  That is, unless you guys /want/ to be
> bombarded with boatloads of useless diffs. :)

-1 for C reformatting.  The 4 space intendation seesm reasonable for
Python sources, but I disaggree for C code.  C is not Python.  Let me cite 
a very prominent member of the open source community (pasted from
/usr/src/linux/Documentation/CodingStyle):

		   Chapter 1: Indentation

   Tabs are 8 characters, and thus indentations are also 8 characters. 
   There are heretic movements that try to make indentations 4 (or even 2!)
   characters deep, and that is akin to trying to define the value of PI to
   be 3. 

   Rationale: The whole idea behind indentation is to clearly define where
   a block of control starts and ends.  Especially when you've been looking
   at your screen for 20 straight hours, you'll find it a lot easier to see
   how the indentation works if you have large indentations. 

   Now, some people will claim that having 8-character indentations makes
   the code move too far to the right, and makes it hard to read on a
   80-character terminal screen.  The answer to that is that if you need
   more than 3 levels of indentation, you're screwed anyway, and should fix
   your program. 

   In short, 8-char indents make things easier to read, and have the added
   benefit of warning you when you're nesting your functions too deep. 
   Heed that warning. 

Also the Python interpreter has no strong relationship with Linux kernel
a agree with Linus on this topic.  Python source code is another thing:
Python identifiers are usually longer due to qualifiying and Python
operands are often lists, tuples or the like, so lines contain more stuff.

disliking-yet-another-white-space-discussion-ly y'rs  - peter


From mhammond at skippinet.com.au  Fri Mar 31 09:11:50 2000
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri, 31 Mar 2000 17:11:50 +1000
Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....)
In-Reply-To: <m12avBh-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <ECEPKNMJLHAPFFJHDOJBOEKECHAA.mhammond@skippinet.com.au>

>    Rationale: The whole idea behind indentation is to
> clearly define where
>    a block of control starts and ends.  Especially when

Ironically, this statement is a strong argument for insisting on
Python using real tab characters!  "Clearly define" is upgraded to
"used to define".

>    80-character terminal screen.  The answer to that is
> that if you need
>    more than 3 levels of indentation, you're screwed
> anyway, and should fix
>    your program.

Yeah, right!

int foo()
{
	// one level for the privilege of being here.
	switch (bar) {
		// uh oh - running out of room...
		case WTF:
			// Oh no - if I use an "if" statement,
			// my code is "screwed"??
	}

}

> disliking-yet-another-white-space-discussion-ly y'rs  - peter

Like-death-and-taxes-ly y'rs - Mark.


From moshez at math.huji.ac.il  Fri Mar 31 10:04:32 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Fri, 31 Mar 2000 10:04:32 +0200 (IST)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <200003302134.QAA22939@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003311002290.27570-100000@sundial>

On Thu, 30 Mar 2000, Guido van Rossum wrote:

> > Whoa...  Not sure.  This will give issues with Patrice, at least (even
> > if it is pure Open Source -- given the size).
> 
> For those outside CNRI -- Patrice is CNRI's tough IP lawyer.

It was understandable from the context...
Personally, I'd rather if it was folded in by value, and not by reference:
one reason is versioning problems, and another is pure laziness on my
part.

what-do-you-have-when-you-got-a-lawyer-up-to-his-neck-in-the-sand-ly y'rs,
Z.
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From mal at lemburg.com  Fri Mar 31 09:42:04 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 31 Mar 2000 09:42:04 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules 
 mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us>
Message-ID: <38E456CC.1A49334A@lemburg.com>

"Barry A. Warsaw" wrote:
> 
> >>>>> "Guido" == Guido van Rossum <guido at cnri.reston.va.us> writes:
> 
>     Guido> Modified Files: mmapmodule.c Log Message: Hacked for Win32
>     Guido> by Mark Hammond.  Reformatted for 8-space tabs and fitted
>     Guido> into 80-char lines by GvR.
> 
> Can we change the 8-space-tab rule for all new C code that goes in?  I
> know that we can't practically change existing code right now, but for
> new C code, I propose we use no tab characters, and we use a 4-space
> block indentation.

Why not just leave new code formatted as it is (except maybe
to bring the used TAB width to the standard 8 spaces used throughout
the Python C source code) ?

BTW, most of the new unicode stuff uses 4-space indents.
Unfortunately, it mixes whitespace and tabs since Emacs 
c-mode doesn't do the python-mode magic yet (is there a
way to turn it on ?).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From effbot at telia.com  Fri Mar 31 11:14:49 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 11:14:49 +0200
Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....)
References: <m12avBh-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <01ae01bf9af1$927b1940$34aab5d4@hagrid>

Peter Funk wrote:

> Also the Python interpreter has no strong relationship with Linux kernel
> a agree with Linus on this topic.  Python source code is another thing:
> Python identifiers are usually longer due to qualifiying and Python
> operands are often lists, tuples or the like, so lines contain more stuff.

you're just guessing, right?

(if you check, you'll find that the actual difference is very small.
iirc, that's true for c, c++, java, python, tcl, and probably a few
more languages.  dunno about perl, though... :-)

</F>


From effbot at telia.com  Fri Mar 31 11:17:42 2000
From: effbot at telia.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 11:17:42 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com>
Message-ID: <01b501bf9af1$f9b44500$34aab5d4@hagrid>

M.-A. Lemburg <mal at lemburg.com> wrote:
> Why not just leave new code formatted as it is (except maybe
> to bring the used TAB width to the standard 8 spaces used throughout
> the Python C source code) ?
> 
> BTW, most of the new unicode stuff uses 4-space indents.
> Unfortunately, it mixes whitespace and tabs since Emacs 
> c-mode doesn't do the python-mode magic yet (is there a
> way to turn it on ?).

http://www.jwz.org/doc/tabs-vs-spaces.html
contains some hints.

</F>


From moshez at math.huji.ac.il  Fri Mar 31 13:24:05 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Fri, 31 Mar 2000 13:24:05 +0200 (IST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
Message-ID: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>

Here is a new list of things that will change in the next release. 
Thanks to all the people who gave me hints and information!
If you have anything you think I missed, or mistreated, please e-mail
me personally -- I'll post an updated version soon.

Obligatory
==========
A lot of bug-fixes, some optimizations, many improvements in the documentation

Core changes
============
Deleting objects is safe even for deeply nested data structures.

Long/int unifications: long integers can be used in seek() calls, as slice
indexes. str(1L) --> '1', not '1L' (repr() is still the same)

Builds on NT Alpha

UnboundLocalError is raised when a local variable is undefined
long, int take optional "base" parameter

string objects now have methods (though they are still immutable)

unicode support: Unicode strings are marked with u"string", and there
is support for arbitrary encoders/decoders

"in" operator can now be overriden in user-defined classes to mean anything:
it calls the magic method __contains__

New calling syntax: f(*args, **kw) equivalent to apply(f, args, kw)

Some methods which would take multiple arguments and treat them as a tuple
were fixed: list.{append, insert, remove, count}, socket.connect

New modules
===========
winreg - Windows registry interface.
Distutils - tools for distributing Python modules
robotparser - parse a robots.txt file (for writing web spiders)
linuxaudio - audio for Linux
mmap - treat a file as a memory buffer
sre -  regular expressions (fast, supports unicode)
filecmp - supersedes the old cmp.py and dircmp.py modules
tabnanny - check Python sources for tab-width dependance
unicode - support for unicode
codecs - support for Unicode encoders/decoders

Module changes
==============
re - changed to be a frontend to sre
readline, ConfigParser, cgi, calendar, posix, readline, xmllib, aifc, chunk, 
wave, random, shelve, nntplib - minor enhancements
socket, httplib, urllib - optional OpenSSL support
_tkinter - support for 8.1,8.2,8.3 (no support for versions older then 8.0)

Tool changes
============
IDLE -- complete overhaul

(Andrew, I'm still waiting for the expat support and integration to add to
this list -- other than that, please contact me if you want something less
telegraphic <wink>)
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From ping at lfw.org  Fri Mar 31 14:01:21 2000
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 31 Mar 2000 04:01:21 -0800 (PST)
Subject: [Python-Dev] Roundup et al.
Message-ID: <Pine.LNX.4.10.10003310355250.1007-100000@skuld.lfw.org>

Hi -- there was some talk on this list earlier about nosy
lists, managing patches, and such things, so i just wanted
to mention, for anybody interested, that i threw together
Roundup very quickly for you to try out.

    http://www.lfw.org/python/

There's a tar file there -- it's very messy code, and i
apologize (it was hastily hacked out of the running
prototype implementation), but it should be workable
enough to play with.  There's a test installation to play
with at

    http://www.lfw.org/ping/roundup/roundup.cgi

Dummy user:password pairs are test:test, spam:spam, eggs:eggs.

A fancier design, still in the last stages of coming
together (which will be my submission to the Software
Carpentry contest) is up at

    http://crit.org/http://www.lfw.org/ping/sctrack.html

and i welcome your thoughts and comments on that if you
have the spare time (ha!) and generous inclination to
contribute them.

Thank you and apologies for the interruption.


-- ?!ng

"To be human is to continually change.  Your desire to remain as you are
is what ultimately limits you."
    -- The Puppet Master, Ghost in the Shell


From guido at python.org  Fri Mar 31 14:10:45 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Mar 2000 07:10:45 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.1,2.2
In-Reply-To: Your message of "Thu, 30 Mar 2000 23:34:15 EST."
             <14564.10951.90258.729547@anthem.cnri.reston.va.us> 
References: <200003310117.UAA26774@eric.cnri.reston.va.us>  
            <14564.10951.90258.729547@anthem.cnri.reston.va.us> 
Message-ID: <200003311210.HAA29010@eric.cnri.reston.va.us>

> Can we change the 8-space-tab rule for all new C code that goes in?  I
> know that we can't practically change existing code right now, but for
> new C code, I propose we use no tab characters, and we use a 4-space
> block indentation.

Actually, this one was formatted for 8-space indents but using 4-space
tabs, so in my editor it looked like 16-space indents!

Given that we don't want to change existing code, I'd prefer to stick
with 1-tab 8-space indents.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From moshez at math.huji.ac.il  Fri Mar 31 15:10:06 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Fri, 31 Mar 2000 15:10:06 +0200 (IST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52
In-Reply-To: <200003311301.IAA29221@eric.cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003311507270.3725-100000@sundial>

On Fri, 31 Mar 2000, Guido van Rossum wrote:

> + Christian Tismer
> + Christian Tismer

Ummmmm....I smell something fishy here. Are there two Christian Tismers?
That would explain how Christian has so much time to work on Stackless.

Well, between the both of them, Guido will have no chance but to put
Stackless in the standard distribution.

--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From fredrik at pythonware.com  Fri Mar 31 15:16:16 2000
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 31 Mar 2000 15:16:16 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52
References: <200003311301.IAA29221@eric.cnri.reston.va.us>
Message-ID: <000d01bf9b13$4be1db00$0500a8c0@secret.pythonware.com>

>   Tracy Tims
> + Christian Tismer
> + Christian Tismer
>   R Lindsay Todd

two christians?

</F>


From bwarsaw at cnri.reston.va.us  Fri Mar 31 15:55:13 2000
From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us)
Date: Fri, 31 Mar 2000 08:55:13 -0500 (EST)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules 
 mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us>
	<14564.10951.90258.729547@anthem.cnri.reston.va.us>
	<38E456CC.1A49334A@lemburg.com>
Message-ID: <14564.44609.221250.471147@anthem.cnri.reston.va.us>

>>>>> "M" == M  <mal at lemburg.com> writes:

    M> BTW, most of the new unicode stuff uses 4-space indents.
    M> Unfortunately, it mixes whitespace and tabs since Emacs 
    M> c-mode doesn't do the python-mode magic yet (is there a
    M> way to turn it on ?).

(setq indent-tabs-mode nil)

I could add that to the "python" style.  And to zap all your existing
tab characters:

C-M-h M-x untabify RET

-Barry


From skip at mojam.com  Fri Mar 31 16:04:46 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 31 Mar 2000 08:04:46 -0600 (CST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>
References: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>
Message-ID: <14564.45182.460160.589244@beluga.mojam.com>

Moshe,

I would highlight those bits that are likely to warrant a little closer
scrutiny.  The list.{append,insert,...} and socket.connect change certainly
qualify.  Perhaps split the Core Changes section into two subsections, one
set of changes likely to require some adaptation and one set that should be
backwards-compatible. 

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From guido at python.org  Fri Mar 31 16:47:31 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Mar 2000 09:47:31 -0500
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: Your message of "Fri, 31 Mar 2000 08:04:46 CST."
             <14564.45182.460160.589244@beluga.mojam.com> 
References: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>  
            <14564.45182.460160.589244@beluga.mojam.com> 
Message-ID: <200003311447.JAA29633@eric.cnri.reston.va.us>

See what I've done to Moshe's list: http://www.python.org/1.6/

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at mojam.com  Fri Mar 31 17:28:56 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 31 Mar 2000 09:28:56 -0600 (CST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: <200003311447.JAA29633@eric.cnri.reston.va.us>
References: <Pine.GSO.4.10.10003311320090.2786-100000@sundial>
	<14564.45182.460160.589244@beluga.mojam.com>
	<200003311447.JAA29633@eric.cnri.reston.va.us>
Message-ID: <14564.50232.734778.152933@beluga.mojam.com>


    Guido> See what I've done to Moshe's list: http://www.python.org/1.6/

Looks good.  Attached are a couple nitpicky diffs.

Skip

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1.6.diff
Type: application/octet-stream
Size: 1263 bytes
Desc: diffs to 1.6 Release Notes
URL: <http://mail.python.org/pipermail/python-dev/attachments/20000331/379961d0/attachment-0001.obj>

From guido at python.org  Fri Mar 31 17:47:56 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Mar 2000 10:47:56 -0500
Subject: [Python-Dev] Windows installer pre-prelease
Message-ID: <200003311547.KAA15538@eric.cnri.reston.va.us>

The Windows installer is always hard to get just right.  If you have a
moment, go to http://www.python.org/1.6/ and download the Windows
Installer prerelease.  Let me know what works, what doesn't!

I've successfully installed it on Windows NT 4.0 and on Windows 98,
both with default install target and with a modified install target.

I'd love to hear that it also installs cleanly on Windows 95.  Please
test IDLE from the start menu!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward at cnri.reston.va.us  Fri Mar 31 18:18:43 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Fri, 31 Mar 2000 11:18:43 -0500
Subject: [Python-Dev] Distutils for the std. library (was: Expat module)
In-Reply-To: <14563.52125.401817.986919@amarok.cnri.reston.va.us>; from akuchlin@mems-exchange.org on Thu, Mar 30, 2000 at 04:48:13PM -0500
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us>
Message-ID: <20000331111842.A8060@cnri.reston.va.us>

On 30 March 2000, Andrew M. Kuchling said:
> Should we consider replacing the makesetup/Setup.in mechanism with a
> setup.py script that uses the Distutils?  You'd have to compile a
> minipython with just enough critical modules -- strop and posixmodule
> are probably the most important ones -- in order to run setup.py.
> It's something I'd like to look at for 1.6, because then you could be
> much smarter in automatically enabling modules.

Gee, I didn't think anyone was gonna open *that* can of worms for 1.6.
Obviously, I'd love to see the Distutils used to build parts of the
Python library.  Some possible problems:

  * Distutils relies heavily on the sys, os, string, and re modules,
    so those would have to be built and included in the mythical
    mini-python (as would everything they rely on -- strop, pcre, ... ?)

  * Distutils currently assumes that it's working with an installed
    Python -- it doesn't know anything about working in the Python
    source tree.  I think this could be fixed just be tweaking the
    distutils.sysconfig module, but there might be subtle assumptions
    elsewhere in the code.

  * I haven't written the mythical Autoconf-in-Python yet, so we'd still have
    to rely on either the configure script or user intervention to find
    out whether library X is installed, and where its header and library
    files live (for X in zlib, tcl, tk, ...).

Of course, the configure script would still be needed to build the
mini-python, so it's not going away any time soon.

        Greg


From skip at mojam.com  Fri Mar 31 18:26:55 2000
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 31 Mar 2000 10:26:55 -0600 (CST)
Subject: [Python-Dev] Distutils for the std. library (was: Expat module)
In-Reply-To: <20000331111842.A8060@cnri.reston.va.us>
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com>
	<200003282000.PAA11988@eric.cnri.reston.va.us>
	<14563.46954.70800.706245@amarok.cnri.reston.va.us>
	<200003302025.PAA22367@eric.cnri.reston.va.us>
	<14563.50417.909045.81868@amarok.cnri.reston.va.us>
	<200003302131.QAA22897@eric.cnri.reston.va.us>
	<14563.52125.401817.986919@amarok.cnri.reston.va.us>
	<20000331111842.A8060@cnri.reston.va.us>
Message-ID: <14564.53711.803509.962248@beluga.mojam.com>

    Greg>   * Distutils relies heavily on the sys, os, string, and re
    Greg>     modules, so those would have to be built and included in the
    Greg>     mythical mini-python (as would everything they rely on --
    Greg>     strop, pcre, ... ?)

With string methods in 1.6, reliance on the string and strop modules should
be lessened or eliminated, right?  re and os may need a tweak or two to use
string methods themselves. The sys module is always available.  Perhaps it
would make sense to put sre(module)?.c into the Python directory where
sysmodule.c lives.  That way, a Distutils-capable mini-python could be built
without messing around in the Modules directory at all...

-- 
Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/


From moshez at math.huji.ac.il  Fri Mar 31 18:25:11 2000
From: moshez at math.huji.ac.il (Moshe Zadka)
Date: Fri, 31 Mar 2000 18:25:11 +0200 (IST)
Subject: [Python-Dev] Distutils for the std. library (was: Expat module)
In-Reply-To: <20000331111842.A8060@cnri.reston.va.us>
Message-ID: <Pine.GSO.4.10.10003311817090.7408-100000@sundial>

On Fri, 31 Mar 2000, Greg Ward wrote:

> Gee, I didn't think anyone was gonna open *that* can of worms for 1.6.

Well, it's not like it's not a lot of work, but it could be done, with
liberal interpretation of "mini": include in "mini" Python *all* modules
which do not rely on libraries not distributed with the Python core --
zlib, expat and Tkinter go right out the window, but most everything
else can stay. That way, Distutils can use all modules it currently 
uses <wink>.

The other problem, file-location, is a problem I have talked about
earlier: it *cannot* be assumed that the default place for putting new
libraries is the same place the Python interpreter resides, for many
reasons. Why not ask the user explicitly?


--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com


From gward at cnri.reston.va.us  Fri Mar 31 18:29:33 2000
From: gward at cnri.reston.va.us (Greg Ward)
Date: Fri, 31 Mar 2000 11:29:33 -0500
Subject: [Python-Dev] Distutils for the std. library (was: Expat module)
In-Reply-To: <14564.53711.803509.962248@beluga.mojam.com>; from skip@mojam.com on Fri, Mar 31, 2000 at 10:26:55AM -0600
References: <NDBBJPNCJLKKIOBLDOMJAEDBCDAA.DavidA@ActiveState.com> <200003282000.PAA11988@eric.cnri.reston.va.us> <14563.46954.70800.706245@amarok.cnri.reston.va.us> <200003302025.PAA22367@eric.cnri.reston.va.us> <14563.50417.909045.81868@amarok.cnri.reston.va.us> <200003302131.QAA22897@eric.cnri.reston.va.us> <14563.52125.401817.986919@amarok.cnri.reston.va.us> <20000331111842.A8060@cnri.reston.va.us> <14564.53711.803509.962248@beluga.mojam.com>
Message-ID: <20000331112933.B8060@cnri.reston.va.us>

On 31 March 2000, Skip Montanaro said:
> With string methods in 1.6, reliance on the string and strop modules should
> be lessened or eliminated, right?  re and os may need a tweak or two to use
> string methods themselves. The sys module is always available.  Perhaps it
> would make sense to put sre(module)?.c into the Python directory where
> sysmodule.c lives.  That way, a Distutils-capable mini-python could be built
> without messing around in the Modules directory at all...

But I'm striving to maintain compatability with (at least) Python 1.5.2
in Distutils.  That need will fade with time, but it's not going to
disappear the moment Python 1.6 is released.  (Guess I'll have to find
somewhere else to play with string methods and extended call syntax).

        Greg


From thomas.heller at ion-tof.com  Fri Mar 31 19:09:41 2000
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Fri, 31 Mar 2000 19:09:41 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils msvccompiler.py
References: <200003311653.LAA08175@thrak.cnri.reston.va.us>
Message-ID: <038701bf9b33$e7c49240$4500a8c0@thomasnotebook>

> Simplified Thomas Heller's registry patch: just assign all those
> HKEY_* and Reg* names once, rather than having near-duplicate code
> in the two import attempts.

Your change won't work, the function names in win32api and winreg are not the same:
Example:    win32api.RegEnumValue <-> winreg.EnumValue 

> 
> Also dropped the leading underscore on all the imported symbols,
> as it's not appropriate (they're not local to this module).

Are they used anywhere else? Or do you think they *could* be used somewhere else?

Thomas Heller


From mal at lemburg.com  Fri Mar 31 12:19:58 2000
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 31 Mar 2000 12:19:58 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules 
 mmapmodule.c,2.1,2.2
References: <200003310117.UAA26774@eric.cnri.reston.va.us> <14564.10951.90258.729547@anthem.cnri.reston.va.us> <38E456CC.1A49334A@lemburg.com> <01b501bf9af1$f9b44500$34aab5d4@hagrid>
Message-ID: <38E47BCE.94E4E012@lemburg.com>

Fredrik Lundh wrote:
> 
> M.-A. Lemburg <mal at lemburg.com> wrote:
> > Why not just leave new code formatted as it is (except maybe
> > to bring the used TAB width to the standard 8 spaces used throughout
> > the Python C source code) ?
> >
> > BTW, most of the new unicode stuff uses 4-space indents.
> > Unfortunately, it mixes whitespace and tabs since Emacs
> > c-mode doesn't do the python-mode magic yet (is there a
> > way to turn it on ?).
> 
> http://www.jwz.org/doc/tabs-vs-spaces.html
> contains some hints.

Ah, cool. Thanks :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From pf at artcom-gmbh.de  Fri Mar 31 20:56:40 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Fri, 31 Mar 2000 20:56:40 +0200 (MEST)
Subject: [Python-Dev] 'make install' should create lib/site-packages IMO
In-Reply-To: <200003311513.KAA00790@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 31, 2000 10:13:20 am"
Message-ID: <m12b6b2-000CnCC@artcom0.artcom-gmbh.de>

Hi!

Guido van Rossum:
[...]
> Modified Files:
> 	Makefile.in 
> Log Message:
> Added distutils and distutils/command to LIBSUBDIRS.  Noted by Andrew
> Kuchling.
[...]
> ! LIBSUBDIRS=	lib-old lib-tk test test/output encodings \
> ! 		distutils distutils/command $(MACHDEPS)
[...]

What about 'site-packages'?  SuSE added this to their Python packaging
and I think it is a good idea to have an empty 'site-packages' directory
installed by default.

Regards, Peter


From akuchlin at mems-exchange.org  Fri Mar 31 22:16:53 2000
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Fri, 31 Mar 2000 15:16:53 -0500 (EST)
Subject: [Python-Dev] SRE: what to do with undocumented attributes?
In-Reply-To: <00e901bf9a9c$6c036240$34aab5d4@hagrid>
References: <00b701bf9a99$022339c0$34aab5d4@hagrid>
	<14563.58848.109072.339060@amarok.cnri.reston.va.us>
	<00e901bf9a9c$6c036240$34aab5d4@hagrid>
Message-ID: <14565.1973.361549.291817@amarok.cnri.reston.va.us>

Fredrik Lundh writes:
>btw, "pattern" doesn't make much sense in SRE -- who says
>the pattern object was created by re.compile?  guess I'll just
>set it to None in other cases (e.g. sregex, sreverb, sgema...)

Good point; I can imagine fabulously complex patterns assembled
programmatically, for which no summary could be made.  I guess there
could be another attribute that also gives the class (module?
function?) used to compile the pattern, but more likely, the pattern
attribute should be deprecated and eventually dropped.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
You know how she is when she gets an idea into her head. I mean, when one
finally penetrates.
  -- Desire describes Delirium, in SANDMAN #41: "Brief Lives:1"


From pf at artcom-gmbh.de  Fri Mar 31 22:14:41 2000
From: pf at artcom-gmbh.de (Peter Funk)
Date: Fri, 31 Mar 2000 22:14:41 +0200 (MEST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: <200003311447.JAA29633@eric.cnri.reston.va.us> from Guido van Rossum at "Mar 31, 2000  9:47:31 am"
Message-ID: <m12b7oX-000CnCC@artcom0.artcom-gmbh.de>

Hi!

Guido van Rossum :
> See what I've done to Moshe's list: http://www.python.org/1.6/

Very fine, but I have a few small annotations:

1.'linuxaudio' has been renamed to 'linuxaudiodev'

2.The following text:

  "_tkinter - support for 8.1,8.2,8.3 (no support for versions older than 8.0)."

  looks a bit misleading, since it is not explicit about Version 8.0.x
  I suggest the following wording:

  "_tkinter - supports Tcl/Tk from version 8.0 up to the current 8.3.  
   Support for versions older than 8.0 has been dropped."

3.'src/Tools/i18n/pygettext.py' by Barry should be mentioned.  This is
  a very useful utility.  I suggest to append the following text:

   "New utility pygettext.py -- Python equivalent of xgettext(1).
    A message text extraction tool used for internationalizing 
    applications written in Python"

Regards, Peter


From fdrake at acm.org  Fri Mar 31 22:30:00 2000
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 31 Mar 2000 15:30:00 -0500 (EST)
Subject: [Python-Dev] 1.5.2 -> 1.6 Changes
In-Reply-To: <m12b7oX-000CnCC@artcom0.artcom-gmbh.de>
References: <200003311447.JAA29633@eric.cnri.reston.va.us>
	<m12b7oX-000CnCC@artcom0.artcom-gmbh.de>
Message-ID: <14565.2760.665022.206361@seahag.cnri.reston.va.us>

Peter Funk writes:
 >   I suggest the following wording:
...
 >   a very useful utility.  I suggest to append the following text:

Peter,
  I'm beginning to figure this out -- you really just want to get
published!  ;)
  You forgot the legelese.  ;(


  -Fred

--
Fred L. Drake, Jr.	  <fdrake at acm.org>
Corporation for National Research Initiatives


From guido at python.org  Fri Mar 31 23:30:42 2000
From: guido at python.org (Guido van Rossum)
Date: Fri, 31 Mar 2000 16:30:42 -0500
Subject: [Python-Dev] Python 1.6 alpha 1 released
Message-ID: <200003312130.QAA04361@eric.cnri.reston.va.us>

I've just released a source tarball and a Windows installer for Python
1.6 alpha 1 to the Python website:

  http://www.python.org/1.6/

Probably the biggest news (if you hadn't heard the rumors) is Unicode
support.  More news on the above webpage.

Note: this is an alpha release.  Some of the code is very rough!
Please give it a try with your favorite Python application, but don't
trust it for production use yet.  I plan to release several more alpha
and beta releases over the next two months, culminating in an 1.6
final release around June first.

We need your help to make the final 1.6 release as robust as possible
-- please test this alpha release!!!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gandalf at starship.python.net  Fri Mar 31 23:56:16 2000
From: gandalf at starship.python.net (Vladimir Ulogov)
Date: Fri, 31 Mar 2000 16:56:16 -0500 (EST)
Subject: [Python-Dev] Re: Python 1.6 alpha 1 released
In-Reply-To: <200003312130.QAA04361@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.4.10.10003311651590.22919-100000@starship.python.net>

Guido,

"""where you used to write sock.connect(host, port) you must now write
sock.connect((host, port))"""

Is it possible to keep old notation ? I'm understand (according you past
mail about parameters of the connect) this may be not what you has have in
mind, but we do use this notation "a lot" and for us it will means to
create workaround for socket.connect function. It's inconvinient. In
general, I'm thinknig the socket.connect(Host, Port) looks prettier :))
than socket.connect((Host, Port))
Vladimir