From greg@cosc.canterbury.ac.nz Thu Aug 1 00:48:21 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 01 Aug 2002 11:48:21 +1200 (NZST) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <20020731152832.99003.qmail@web40106.mail.yahoo.com> Message-ID: <200207312348.g6VNmLiQ019823@kuku.cosc.canterbury.ac.nz> Scott Gilbert : > getreadbufferproc bf_getreadbuffer; > getwritebufferproc bf_getwritebuffer; > > acquirereadbufferproc bf_acquirereadbuffer; > acquirewritebufferproc bf_acquirewritebuffer; Is there really a need for both "get" and "acquire" methods? Surely if an object requires locking, it always requires locking, so why can't the "get" functions simply include the locking operation if they need it? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu Aug 1 00:52:53 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 01 Aug 2002 11:52:53 +1200 (NZST) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <0d2b01c238ab$0e892ff0$e000a8c0@thomasnotebook> Message-ID: <200207312352.g6VNqrWF019829@kuku.cosc.canterbury.ac.nz> Thomas Heller : > The consequence: mmap objects need a 'buffer lock counter', > and cannot be closed while the count is >0. Which exception > is raised then? Maybe instead of raising an exception at all, the closing could simply be deferred until the lock count reaches 0? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From mhammond@skippinet.com.au Thu Aug 1 00:51:36 2002 From: mhammond@skippinet.com.au (Mark Hammond) Date: Thu, 1 Aug 2002 09:51:36 +1000 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: <2mu1mgfgsh.fsf@starship.python.net> Message-ID: > "Mark Hammond" writes: > > IMO, the Python debugger "interface" should include function entry. > > There goes the time machine: it does. I just think everyone ignores > 'call' messages because they're a bit redundant today (because of the > matter under discussion). Yes, I should have said "continue to include function entry". I understood that a patch under discussion may have *removed* this facility from the debugger. While I agree it is redundant and most debuggers will choose to ignore it, I believe removing it from the low level debugger hooks would be a mistake. Mark. From barry@python.org Thu Aug 1 01:14:19 2002 From: barry@python.org (Barry A. Warsaw) Date: Wed, 31 Jul 2002 20:14:19 -0400 Subject: [Python-Dev] PEP 298 - the Fixed Buffer Interface References: <04da01c237ef$c103ac30$e000a8c0@thomasnotebook> <200207301946.g6UJkf520799@odiug.zope.com> <0fe601c238c3$8bab1b20$e000a8c0@thomasnotebook> Message-ID: <15688.32091.532041.67207@anthem.wooz.org> >>>>> "TH" == Thomas Heller writes: TH> I've changed PEP 298 to incorporate the latest changes. TH> Barry has not yet run pep2html (and I don't want to bother TH> him too much with this) Not a bother. I had to wait until I got home, but I just pushed it out. -Barry From xscottg@yahoo.com Thu Aug 1 01:17:49 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Wed, 31 Jul 2002 17:17:49 -0700 (PDT) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <200207312348.g6VNmLiQ019823@kuku.cosc.canterbury.ac.nz> Message-ID: <20020801001749.78707.qmail@web40106.mail.yahoo.com> --- Greg Ewing wrote: > Scott Gilbert : > > > getreadbufferproc bf_getreadbuffer; > > getwritebufferproc bf_getwritebuffer; > > > > acquirereadbufferproc bf_acquirereadbuffer; > > acquirewritebufferproc bf_acquirewritebuffer; > > Is there really a need for both "get" and "acquire" > methods? Surely if an object requires locking, it > always requires locking, so why can't the "get" > functions simply include the locking operation > if they need it? > That is the proposal. The get methods are the legacy (non-fixed) interface. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From goodger@users.sourceforge.net Thu Aug 1 02:17:22 2002 From: goodger@users.sourceforge.net (David Goodger) Date: Wed, 31 Jul 2002 21:17:22 -0400 Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs Message-ID: Python-developers, Pursuant to PEP 287, one of the deliverables of the just-released Docutils 0.2 (http://docutils.sf.net/) is a processing system for reStructuredText-format PEPs as an alternative to the current PEP processing. Here are examples of new-style PEPs (processed to HTML, with links to the source text as usual): - http://docutils.sf.net/spec/pep-0287.html (latest) - http://docutils.sf.net/spec/pep-0000.html (as a proof of concept because of its special processing) Compare to the old-style PEPs: - http://www.python.org/peps/pep-0287.html (update pending) - http://www.python.org/peps/pep-0000.html Existing old-style PEPs can coexist with reStructuredText PEPs indefinitely. What to do with new PEPs is a policy decision that doesn't have to be made immediately. PEP 287 puts forward a detailed rationale for reStructuredText PEPs; especially see the "Questions & Answers" section, items 4 through 7. In earlier correspondence Guido critiqued some style issues (since corrected) and said "I'm sure you can fix all these things with a simple style sheet change, and then I'm all for allowing Docutils for PEPs." I'd appreciate more critiques/suggestions on PEP formatting issues, no matter how small. Especially, please point out any HTML/stylesheet issues with the various browsers. I hereby formally request permission to deploy Docutils for PEPs on Python.org. Here's a deployment plan for your consideration: - Install the Docutils-modified version of Fredrik Lundh's nondist/peps/pep2html.py script into CVS, along with ancillary files. The modified pep2html.py auto-detects old-style and new-style PEPs and processes accordingly. (http://docutils.sf.net/tools/pep2html.py) - Install Docutils 0.2 on the server that does the PEP processing. I don't think it's necessary to put Docutils into Python's CVS. - Make up a README for the "peps" directory with instructions for installing Docutils and running the modified pep2html.py. - Modify PEP 1 (PEP Purpose and Guidelines) and PEP 9 (Sample PEP Template) with the new formatting instructions. - Make an announcement to the Python community. - I will maintain the software, convert current meta-PEPs to the new format as desired, handle PEP conversion updates, and assist other PEP authors to convert their PEPs if they wish. If this is acceptable, to begin I will need write access to CVS and shell access to the Python.org server (however that works; please let me know what I need to do). Once I have the necessary access, I will try to ensure a near-zero impact on the PythonLabs crew. Feedback is most welcome. -- David Goodger Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ From barry@python.org Thu Aug 1 03:28:53 2002 From: barry@python.org (Barry A. Warsaw) Date: Wed, 31 Jul 2002 22:28:53 -0400 Subject: [Python-Dev] split('') revisited References: <200207312235.g6VMZL218546@europa.research.att.com> Message-ID: <15688.40165.658750.506208@anthem.wooz.org> >>>>> "AK" == Andrew Koenig writes: AK> Back in February, there was a thread in comp.lang.python (and, AK> I think, also on Python-Dev) that asked whether the following AK> behavior: >> 'abcde'.split('') | Traceback (most recent call last): | File "", line 1, in ? | ValueError: empty separator AK> was a bug or a feature. The prevailing opinion at the time AK> seemed to be that there was not a sensible, unique way of AK> defining this operation, so rejecting it was a feature. AK> That answer didn't bother me particularly at the time, but AK> since then I have learned a new fact (or perhaps an old fact AK> that I didn't notice at the time) that has changed my mind: AK> Section 4.2.4 of the library reference says that the 'split' AK> method of a regular expression object is defined as AK> Identical to the split() function, using the compiled AK> pattern. AK> This claim does not appear to be correct: Actually, I believe what it's saying is that re.compile('').split('abcde') is the same as re.split('', 'abcde') not that re...split() has anything to do with the split() string method. -Barry From python-dev@zesty.ca Thu Aug 1 05:16:19 2002 From: python-dev@zesty.ca (Ka-Ping Yee) Date: Wed, 31 Jul 2002 21:16:19 -0700 (PDT) Subject: [Python-Dev] Re: Docutils/reStructuredText is ready to process PEPs In-Reply-To: Message-ID: On Wed, 31 Jul 2002, David Goodger wrote: > I hereby formally request permission to deploy Docutils for PEPs on > Python.org. Here's a deployment plan for your consideration: I have just read the specification: http://docutils.sourceforge.net/spec/rst/reStructuredText.html It took a long time. Perhaps it seems not so big to others, but my personal opinion would be to recommend against this proposal until the specification fits in, say, 1000 lines and can be absorbed in ten minutes. For me, it violates the fits-in-my-brain principle: the spec is 2500 lines long, and supports six different kinds of references and five different kinds of lists (even lists with roman numerals!). It also violates the one-way-to-do-it principle: for example, there are a huge variety of ways to do headings, and two different syntaxes for drawing a table. I am not against structured text processing systems in general. I think that something of this flavour would be a great solution for PEPs and docstrings, and that David has done an impressive job on RST. It's just that RST is much too big (for me). -- ?!ng "This code is better than any code that doesn't work has any right to be." -- Roger Gregory, on Xanadu From esr@thyrsus.com Thu Aug 1 05:31:07 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 1 Aug 2002 00:31:07 -0400 Subject: [Python-Dev] Re: Docutils/reStructuredText is ready to process PEPs In-Reply-To: References: Message-ID: <20020801043107.GA32402@thyrsus.com> Ka-Ping Yee : > I am not against structured text processing systems in general. > I think that something of this flavour would be a great solution > for PEPs and docstrings, and that David has done an impressive > job on RST. It's just that RST is much too big (for me). And if we're going to pay the transition costs to move to a heavyweight markup, it ought to be DocBook, same direction GNOME and KDE and the Linux kernel and FreeBSD and PHP are going. -- Eric S. Raymond From aahz@pythoncraft.com Thu Aug 1 05:42:48 2002 From: aahz@pythoncraft.com (Aahz) Date: Thu, 1 Aug 2002 00:42:48 -0400 Subject: [Python-Dev] Re: Docutils/reStructuredText is ready to process PEPs In-Reply-To: <20020801043107.GA32402@thyrsus.com> References: <20020801043107.GA32402@thyrsus.com> Message-ID: <20020801044248.GA4424@panix.com> On Thu, Aug 01, 2002, Eric S. Raymond wrote: > Ka-Ping Yee : >> >> I am not against structured text processing systems in general. >> I think that something of this flavour would be a great solution >> for PEPs and docstrings, and that David has done an impressive >> job on RST. It's just that RST is much too big (for me). > > And if we're going to pay the transition costs to move to a > heavyweight markup, it ought to be DocBook, same direction GNOME and > KDE and the Linux kernel and FreeBSD and PHP are going. Well, reST can generate DocBook easily enough. The problem I see with DocBook is the creation/editing side: XML is painful. Having written one presentation in pure XML/PythonPoint and another presentation in my home-grown structured text system that then got converted to XML for processing by PythonPoint, I'm a big believer in the *concept* of reST. What remains to be seen is whether reST works well enough in the Real World [tm]. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From martin@v.loewis.de Thu Aug 1 07:21:15 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 01 Aug 2002 08:21:15 +0200 Subject: [Python-Dev] split('') revisited In-Reply-To: <200207312235.g6VMZL218546@europa.research.att.com> References: <200207312235.g6VMZL218546@europa.research.att.com> Message-ID: Andrew Koenig writes: > It seems to me that there are four reasonable courses of action: > > 1) Do nothing -- the problem is too trivial to worry about. > > 2) Change string split (and its documentation) to match regexp split. > > 3) Change regexp split (and its documentation) to match string split. > > 4) Change both string split and regexp split to do something else :-) There is another option: 5) Change the documentation of re.split to match the implemented behaviour. Not that I could say what the implemented behaviour is, though :-( Regards, Martin From eric@enthought.com Thu Aug 1 07:32:27 2002 From: eric@enthought.com (eric jones) Date: Thu, 1 Aug 2002 01:32:27 -0500 Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs In-Reply-To: Message-ID: <000001c23925$37357a60$777ba8c0@ericlaptop> I would very much like to see reStructuredText, or some minor variation on it, move forward as a "standard" for doc-strings very soon. I have long lamented not having a prescribed format *and* an associated processing tool suite included in the standard library. Even if the format isn't perfect (I think it looks very good), it is time to pick a reasonable candidate and go. SciPy does not yet have a standard doc-string format. The .3 release of SciPy (we're at .2alpha) will primarily be a documentation/testing effort. I'd like to use the chosen standard so that we can auto-generate the reference manual without setting up some complex third party tools. The user documentation for SciPy may still end up in TeX (which is very hard for me to swallow) or Word (I know, I know) because of their power, but doc-strings need something simpler. If XML or something like that is chosen, we'd probably use it, but I'd be less excited because it doesn't read as well in plain text form. Also, it will be much harder to get the scientists that contribute modules to conform to this. I watched the doc-sig for many months a when SciPy was started, and there was a lot of discussion on the multitude of different choices. It seemed like 50% wanted a dirt simple mark up like Ka-Ping suggests, and 50% wanted TeX or XML for maximum power as Eric R. suggests. You can't satisfy both camps, but David seems to have balanced most of the issues very well. Looking at David's marked-up PEPs, they read very nicely as plain text. I'm fairly confident that, with these as an example and without reading the specification, I can write my own marked up document or doc-string with little effort. Diving into the longer spec is only needed if you want fancy stuff. There are no doubt millions of choices for marking up doc-strings. David's is quite reasonable, solves many problems with StructureText, *has a champion*, and looks to have a fairly good start on a tool suite. If another choice is as well balanced, *has a champion*, and has a prayer of having tools ready in the standard library soon, then lets consider it. Otherwise, the argument over the perfect markup choice has been kicked around enough over the last several years. Let's just tweak this one. I wish for less vertical white space and a simpler heading markup too, but, not so much that I'm willing to think through all this as thoroughly as David has. I no longer wish for a "perfect" markup, just a standard one -- and soon. On a related note, distutils is (far) less than perfect (sorry Greg), and I have cursed it on many occasions. However, it works, solves a huge problem, and (with modifications) made building the 130,000 or so lines of Python/C/Fortran code that is SciPy tractable in a platform independent way. Standardizing on reStructuredText will have similar impact. my 0.02, eric > -----Original Message----- > From: python-dev-admin@python.org [mailto:python-dev-admin@python.org] On > Behalf Of David Goodger > Sent: Wednesday, July 31, 2002 8:17 PM > To: python-dev@python.org > Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs > > Python-developers, > > Pursuant to PEP 287, one of the deliverables of the just-released > Docutils 0.2 (http://docutils.sf.net/) is a processing system for > reStructuredText-format PEPs as an alternative to the current PEP > processing. Here are examples of new-style PEPs (processed to HTML, > with links to the source text as usual): > > - http://docutils.sf.net/spec/pep-0287.html (latest) > - http://docutils.sf.net/spec/pep-0000.html (as a proof of concept > because of its special processing) > > Compare to the old-style PEPs: > > - http://www.python.org/peps/pep-0287.html (update pending) > - http://www.python.org/peps/pep-0000.html > > Existing old-style PEPs can coexist with reStructuredText PEPs > indefinitely. What to do with new PEPs is a policy decision that > doesn't have to be made immediately. PEP 287 puts forward a detailed > rationale for reStructuredText PEPs; especially see the "Questions & > Answers" section, items 4 through 7. > > In earlier correspondence Guido critiqued some style issues (since > corrected) and said "I'm sure you can fix all these things with a > simple style sheet change, and then I'm all for allowing Docutils for > PEPs." I'd appreciate more critiques/suggestions on PEP formatting > issues, no matter how small. Especially, please point out any > HTML/stylesheet issues with the various browsers. > > I hereby formally request permission to deploy Docutils for PEPs on > Python.org. Here's a deployment plan for your consideration: > > - Install the Docutils-modified version of Fredrik Lundh's > nondist/peps/pep2html.py script into CVS, along with ancillary > files. The modified pep2html.py auto-detects old-style and > new-style PEPs and processes accordingly. > (http://docutils.sf.net/tools/pep2html.py) > > - Install Docutils 0.2 on the server that does the PEP processing. I > don't think it's necessary to put Docutils into Python's CVS. > > - Make up a README for the "peps" directory with instructions for > installing Docutils and running the modified pep2html.py. > > - Modify PEP 1 (PEP Purpose and Guidelines) and PEP 9 (Sample PEP > Template) with the new formatting instructions. > > - Make an announcement to the Python community. > > - I will maintain the software, convert current meta-PEPs to the new > format as desired, handle PEP conversion updates, and assist other > PEP authors to convert their PEPs if they wish. > > If this is acceptable, to begin I will need write access to CVS and > shell access to the Python.org server (however that works; please let > me know what I need to do). Once I have the necessary access, I will > try to ensure a near-zero impact on the PythonLabs crew. > > Feedback is most welcome. > > -- > David Goodger Open-source projects: > - Python Docutils: http://docutils.sourceforge.net/ > (includes reStructuredText: http://docutils.sf.net/rst.html) > - The Go Tools Project: http://gotools.sourceforge.net/ > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev From tim.one@comcast.net Thu Aug 1 07:39:24 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 01 Aug 2002 02:39:24 -0400 Subject: [Python-Dev] split('') revisited In-Reply-To: <200207312235.g6VMZL218546@europa.research.att.com> Message-ID: [Andrew Koenig] > ... > Section 4.2.4 of the library reference says that the 'split' method of a > regular expression object is defined as > > Identical to the split() function, using the compiled pattern. Supplying words intended to be clear from context, it's saying that the split method of a regexp object is identical to the re.split() function, which is true. In much the same way, list.pop() isn't the same thing as eyeball.pop() . > This claim does not appear to be correct: > > >>> import re > >>> re.compile('').split('abcde') > ['abcde'] > > This result differs from the result of using the string split method. True, but it's the same as >>> import re >>> re.split('', 'abcde') ['abcde'] >>> which is all the docs are trying to say. > ... > My first impulse was to argue that (4) is right, and that the behavior > should be as follows > > >>> 'abcde'.split('') > ['a', 'b', 'c', 'd', 'e'] If that's what you want, list('abcde') is a direct way to get it. > ... > I made the counterargument that one could disambiguate by adding the > rule that no element of the result could be equal to the delimiter. > Therefore, if s is a string, s.split('') cannot contain any empty > strings. Sure, that's one arbitrary rule . It doesn't seem to extend to regexps in a reasonable way, though: >>> re.split('.*', 'abcde') ['', ''] >>> Both split pieces there match the pattern. > However, looking at the behavior of regular expression splitting more > closely, I become more confused. Can someone explain the following > behavior to me? > > >>> re.compile('a|(x?)').split('abracadabra') > ['', None, 'br', None, 'c', None, 'd', None, 'br', None, ''] >From the docs: If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. It should also say that splits never occur at points where the only match is against an empty string (indeed, that's exactly why re.split('', 'abcde') doesn't split anywhere). The logic is like: while True: find next non-empty match, else break emit the slice between this and the end of the last match emit all capturing groups advance position by length of match emit the slice from the end of the last match to the end of the string It's the last line in the loop body that makes empty matches a wart if allowed: they wouldn't advance the position at all, and an infinite loop would result. In order to make them do what you think you want, we'd have to add, at the end of the loop body ah, and if the match was emtpy, advance the position again, by, oh, i don't know, how about 1? That's close to 0 . So the pattern matches at the first 'a', and adds '' to the list (the slice to the left of the first match) and None to the list (the capturing group didn't participate in the match, but that doesn't excuse it from adding something to the list). There are no other non-empty matches until getting to the second 'a', and then that adds 'br' to the list (the slice between the current match and the last match), and None again for the non-participating capturing group. Etc. The trailing empty string is the slice from the end of the last match to the end of the string (which happens to be empty in this case). It's unclear to me what you expected instead. Perhaps this? >>> re.split('a|(?:x?)', 'abracadabra') ['', 'br', 'c', 'd', 'br', ''] >>> From loewis@informatik.hu-berlin.de Thu Aug 1 09:27:10 2002 From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=) Date: Thu, 1 Aug 2002 10:27:10 +0200 (CEST) Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Misc python.man,1.24,1.25 Message-ID: <200208010827.g718RADp019371@paros.informatik.hu-berlin.de> Jack Jansen writes: > > ! Force stdin, stdout and stderr to be totally unbuffered. Note that > > ! there is internal buffering in xreadlines(), readlines() and file-object > > ! iterators ("for line in sys.stdin") which is not influenced by this > > ! option. To work around this, you will want to use "sys.stdin.readline()" > > ! inside a "while 1:" loop. > > For readlines() I think this is the right thing to do, but > xreadlines() and file iterators could actually "do the right thing" > and revert to a slower scheme if the underlying stream is unbuffered? > Or is this overkill? I'm not sure. The patch describes the current state; if anybody improves that, they should change the man page, too. Regards, Martin From thomas.heller@ion-tof.com Thu Aug 1 09:31:56 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Thu, 1 Aug 2002 10:31:56 +0200 Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface References: <200207312348.g6VNmLiQ019823@kuku.cosc.canterbury.ac.nz> Message-ID: <002701c23935$e5831c20$e000a8c0@thomasnotebook> From: "Greg Ewing" > Scott Gilbert : > > > getreadbufferproc bf_getreadbuffer; > > getwritebufferproc bf_getwritebuffer; > > > > acquirereadbufferproc bf_acquirereadbuffer; > > acquirewritebufferproc bf_acquirewritebuffer; > > Is there really a need for both "get" and "acquire" > methods? Surely if an object requires locking, it > always requires locking, so why can't the "get" > functions simply include the locking operation > if they need it? Backward compatibility. If we change the array object to enter a locked state when getreadbuffer() is called, it would be surprising. Thomas From thomas@xs4all.net Thu Aug 1 09:29:48 2002 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 1 Aug 2002 10:29:48 +0200 Subject: [Python-Dev] Re: What to do about the Wiki? In-Reply-To: <3D4824E1.1090304@lemburg.com> References: <200207311547.g6VFlk601129@odiug.zope.com> <15688.2985.118330.48738@localhost.localdomain> <200207311616.g6VGGuF01886@odiug.zope.com> <3D48183B.7070306@lemburg.com> <200207311724.g6VHOCZ02434@odiug.zope.com> <3D4824E1.1090304@lemburg.com> Message-ID: <20020801082948.GX19784@xs4all.nl> On Wed, Jul 31, 2002 at 07:56:49PM +0200, M.-A. Lemburg wrote: > >A process running out of memory, AFAIK. > In that case, wouldn't it be better to impose a memoryuse limit > on the user which Apache uses for dealing with CGI > scripts ? That wouldn't solve any specific Wiki related > problem, but prevents the server from going offline because > of memory problems. There is a memory limit, and the problem is not that a single process freezes the server. Instead, if a single process's memory limits is 1/4th of the physical limit, 4 bloated wiki's freeze the server. If it's 1/10th, it's 10, and so on. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mwh@python.net Thu Aug 1 10:00:21 2002 From: mwh@python.net (Michael Hudson) Date: 01 Aug 2002 10:00:21 +0100 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: "Mark Hammond"'s message of "Thu, 1 Aug 2002 09:51:36 +1000" References: Message-ID: <2meldjvtqy.fsf@starship.python.net> "Mark Hammond" writes: > > "Mark Hammond" writes: > > > > IMO, the Python debugger "interface" should include function entry. > > > > There goes the time machine: it does. I just think everyone ignores > > 'call' messages because they're a bit redundant today (because of the > > matter under discussion). > > Yes, I should have said "continue to include function entry". > > I understood that a patch under discussion may have *removed* this facility > from the debugger. Nononononononono. No. No. Currently a trace function can be called for four reasons: 'call', 'line', 'return' and 'raise'. 'call' is called high up in eval_frame, on entry to the code object (I suspect it is also called on resumption of generators, but also suspect that this is accidental). 'return' is called when the main loop finished with why == WHY_RETURN or WHY_YIELD, 'raise' ditto but why == WHY_EXCEPTION. None of these are affected by my patch. At the moment 'line' is called by the SET_LINENO opcode. My patch changes it to be called when the co_lnotab indicates execution has moved onto a different line. The reason this changes behaviour is that currently a SET_LINENO opcode is emitted for the def line of every function (I guess this is to cope with def functions_like_this(): return 1 ). After my patch there are no SET_LINENO opcodes, so execution is never on the def line[*], so no 'line' trace event is generated for the def line, so a debugger that only listens to the 'line' events and ignores the 'call' events will not stop on that line. If my patch goes in, I'll probably change pdb to catch 'call' events, and nag authors of other debuggers that they should do the same. It is possible to generate an extra 'line' trace event to mimic the old behaviour, but it's gross. > While I agree it is redundant and most debuggers will choose to > ignore it, I believe removing it from the low level debugger hooks > would be a mistake. Now I've spent some minutes explaining myself, you can explain to me where you got the idea that I was even considering doing so from! Cheers, M. [*] For a typical function which has no code on the def line. -- 34. The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From thomas.heller@ion-tof.com Thu Aug 1 10:07:42 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Thu, 1 Aug 2002 11:07:42 +0200 Subject: [Python-Dev] PEP 298, final (?) version Message-ID: <00b101c2393a$e4a01ce0$e000a8c0@thomasnotebook> Here is PEP 298 in it's near final version (not yet checked in). It seems to me we have to end the discussion and I'm quite happy with it. If accepted in this form, I can start the implementation right after the end of my vacation, second half of August. The only thing I consider worth changing is to rename the whole stuff from 'fixed buffer interface' to 'locked buffer interface', which makes more sense at the current state. Thomas ----- PEP: 298 Title: The Fixed Buffer Interface Version: $Revision: 1.4 $ Last-Modified: $Date: 2002/07/31 18:48:36 $ Author: Thomas Heller Status: Draft Type: Standards Track Created: 26-Jul-2002 Python-Version: 2.3 Post-History: 30-Jul-2002, 1-Aug-2002 Abstract This PEP proposes an extension to the buffer interface called the 'fixed buffer interface'. The fixed buffer interface fixes the flaws of the 'old' buffer interface [1] as defined in Python versions up to and including 2.2, and has the following semantics: The lifetime of the retrieved pointer is clearly defined and controlled by the client. The buffer size is returned as a 'size_t' data type, which allows access to large buffers on platforms where sizeof(int) != sizeof(void *). (Guido comments: This second sounds like a change we could also make to the "old" buffer interface, if we introduce another flag bit that's *not* part of the default flags.) Specification The fixed buffer interface exposes new functions which return the size and the pointer to the internal memory block of any python object which chooses to implement this interface. Retrieving a buffer from an object puts this object in a locked state during which the buffer may not be freed, resized, or reallocated. The object must be unlocked again by releasing the buffer if it's no longer used by calling another function in the fixed buffer interface. If the object never resizes or reallocates the buffer during it's lifetime, this function may be NULL. Failure to call this function (if it is != NULL) is a programming error and may have unexpected results. The fixed buffer interface omits the memory segment model which is present in the old buffer interface - only a single memory block can be exposed. Implementation Define a new flag in Include/object.h: /* PyBufferProcs contains bf_acquirefixedreadbuffer, bf_acquirefixedwritebuffer, and bf_releasefixedbuffer */ #define Py_TPFLAGS_HAVE_FIXEDBUFFER (1L<<15) This flag would be included in Py_TPFLAGS_DEFAULT: #define Py_TPFLAGS_DEFAULT ( \ .... Py_TPFLAGS_HAVE_FIXEDBUFFER | \ .... 0) Extend the PyBufferProcs structure by new fields in Include/object.h: typedef size_t (*acquirefixedreadbufferproc)(PyObject *, const void **); typedef size_t (*acquirefixedwritebufferproc)(PyObject *, void **); typedef void (*releasefixedbufferproc)(PyObject *); typedef struct { getreadbufferproc bf_getreadbuffer; getwritebufferproc bf_getwritebuffer; getsegcountproc bf_getsegcount; getcharbufferproc bf_getcharbuffer; /* fixed buffer interface functions */ acquirefixedreadbufferproc bf_acquirefixedreadbuffer; acquirefixedwritebufferproc bf_acquirefixedwritebuffer; releasefixedbufferproc bf_releasefixedbuffer; } PyBufferProcs; The new fields are present if the Py_TPFLAGS_HAVE_FIXEDBUFFER flag is set in the object's type. The Py_TPFLAGS_HAVE_FIXEDBUFFER flag implies the Py_TPFLAGS_HAVE_GETCHARBUFFER flag. The acquirefixedreadbufferproc and acquirefixedwritebufferproc functions return the size in bytes of the memory block on success, and fill in the passed void * pointer on success. If these functions fail - either because an error occurs or no memory block is exposed - they must set the void * pointer to NULL and raise an exception. The return value is undefined in these cases and should not be used. If calls to these functions succeed, eventually the buffer must be released by a call to the releasefixedbufferproc, supplying the original object as argument. The releasefixedbufferproc cannot fail. Usually these functions aren't called directly, they are called through convenience functions declared in Include/abstract.h: int PyObject_AquireFixedReadBuffer(PyObject *obj, const void **buffer, size_t *buffer_len); int PyObject_AcquireFixedWriteBuffer(PyObject *obj, void **buffer, size_t *buffer_len); void PyObject_ReleaseFixedBuffer(PyObject *obj); The former two functions return 0 on success, set buffer to the memory location and buffer_len to the length of the memory block in bytes. On failure, or if the fixed buffer interface is not implemented by obj, they return -1 and set an exception. The latter function doesn't return anything, and cannot fail. Backward Compatibility The size of the PyBufferProcs structure changes if this proposal is implemented, but the type's tp_flags slot can be used to determine if the additional fields are present. Reference Implementation Will be uploaded to the SourceForge patch manager by the author. Additional Notes/Comments Python strings, unicode strings, mmap objects, and array objects would expose the fixed buffer interface. mmap and array objects would actually enter a locked state while the buffer is active, this is not needed for strings and unicode objects. Resizing locked array objects is not allowed and will raise an exception. Whether closing a locked mmap object is an error or will only be deferred until the lock count reaches zero is an implementation detail. Community Feedback Greg Ewing doubts the fixed buffer interface is needed at all, he thinks the normal buffer interface could be used if the pointer is (re)fetched each time it's used. This seems to be dangerous, because even innocent looking calls to the Python API like Py_DECREF() may trigger execution of arbitrary Python code. The first version of this proposal didn't have the release function, but it turned out that this would have been too restrictive: mmap and array objects wouldn't have been able to implement it, because mmap objects can be closed anytime if not locked, and array objects could resize or reallocate the buffer. Credits Scott Gilbert came up with the name 'fixed buffer interface'. References [1] The buffer interface http://mail.python.org/pipermail/python-dev/2000-October/009974.html [2] The Buffer Problem http://www.python.org/peps/pep-0296.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: From python-dev@zesty.ca Thu Aug 1 12:22:19 2002 From: python-dev@zesty.ca (Ka-Ping Yee) Date: Thu, 1 Aug 2002 04:22:19 -0700 (PDT) Subject: [Python-Dev] Re: Docutils/reStructuredText is ready to process PEPs In-Reply-To: <20020801043107.GA32402@thyrsus.com> Message-ID: On Thu, 1 Aug 2002, Eric S. Raymond wrote: > Ka-Ping Yee : > > I am not against structured text processing systems in general. > > I think that something of this flavour would be a great solution > > for PEPs and docstrings, and that David has done an impressive > > job on RST. It's just that RST is much too big (for me). > > And if we're going to pay the transition costs to move to a > heavyweight markup, it ought to be DocBook, same direction GNOME and > KDE and the Linux kernel and FreeBSD and PHP are going. I would be very unhappy about having to enter and edit inline documentation in an XML-based markup language. RST is not what i would call heavyweight *markup*. It's just a heavy specification. There are too many cases to know. If you simplified RST in the following ways, we might have something i would consider reasonably-sized: - Choose one way to do headings. - Choose one way to do numbered and non-numbered lists. - Choose one way to do tables. - Drop bibliographic fields. - Drop RCS keyword processing. - Get rid of option lists (we already have definition lists). - Drop some fancy reference features (e.g. auto-numbered and auto-symbol footnotes, indirect references, substitutions). - Drop inline hyperlink references (we already have inline URLs). - Drop inline internal targets (we already have explicit targets). - Drop interpreted text (we already have inline literals). - Drop citations (we already have footnotes). - (Or, in summary -- instead of ten kinds of inline markup, we only need four: emphasis, literals, footnotes, and URLs.) - Simplify inline markup rules (way too many characters to know). Instead of 100 lines describing markup rules, two lines are sufficient: emphasis starts from " *" and stops at "*", literals go from " `" to "`", and footnotes go from " [" to "[". -- ?!ng "This code is better than any code that doesn't work has any right to be." -- Roger Gregory, on Xanadu From goodger@users.sourceforge.net Thu Aug 1 13:52:48 2002 From: goodger@users.sourceforge.net (David Goodger) Date: Thu, 01 Aug 2002 08:52:48 -0400 Subject: [Python-Dev] Re: Docutils/reStructuredText is ready to process PEPs In-Reply-To: Message-ID: Ka-Ping Yee wrote: > I have just read the specification: > > http://docutils.sourceforge.net/spec/rst/reStructuredText.html > > It took a long time. Perhaps it seems not so big to others, but > my personal opinion would be to recommend against this proposal > until the specification fits in, say, 1000 lines and can be absorbed > in ten minutes. The specification is, as its title says, a *specification*. It's a detailed description of the markup, intended to guide the *developer* who is writing a parser or other tool. It's not user documentation. For that, see the quick reference at http://docutils.sf.net/docs/rst/quickref.html. It's only 1153 lines of HTML (with lots of blank lines and linebreaks, hand-written before the reStructuredText parser could handle everything). Perhaps you started at the wrong end. The best place to start is with "A ReStructuredText Primer" by Richard Jones, at http://docutils.sf.net/docs/rst/quickstart.html (which *is* generated from text). It's only 335 lines long :-). It leads to the quick reference, which leads to the spec itself. And there was this item of the "deployment plan": - Modify PEP 1 (PEP Purpose and Guidelines) and PEP 9 (Sample PEP Template) with the new formatting instructions. PEP 9 could contain or point to a short & to-the-point overview of the markup. I see no problem coming up a user document that's more complete than the "Primer" above but still weighs in at under 1000 lines. But with the docs mentioned above, it it necessary? You could also begin by perusing an example. Look at the markup in http://docutils.sf.net/spec/pep-0287.txt, marked up in reStructuredText in the intended way. With the exception of embedded references and targets (for which there is no plaintext equivalent), none of the markup there looks like markup, and should be very easy to follow. Now look at the processed result (http://docutils.sf.net/spec/pep-0287.html); I think the return is worth the investment. > For me, it violates the fits-in-my-brain principle: > the spec is 2500 lines long, and supports six different kinds of > references and five different kinds of lists (even lists with roman > numerals!). It also violates the one-way-to-do-it principle: > for example, there are a huge variety of ways to do headings, > and two different syntaxes for drawing a table. How many times have we heard this? "All we need are paragraphs and bullet lists." That line of argument has been going on for at least six years, and has hampered progress all along. IMHO, variety in markup is good and necessary. Artificially limiting the markup makes for limited usefulness. OTOH, I have no problem with mandating standard uses, like a standard set of section title adornments. > I am not against structured text processing systems in general. ... > It's just that RST is much too big (for me). Somehow I think that "size of the spec" is a specious argument. Take the Python spec, for example: many times the size of the reStructuredText spec and yet it arguably fits in many different sized brains. I know they're different things and I'm not implying they're the same; the markup is a much smaller thing. reStructuredText is very practical in its scope. Constructs are there because they're useful and *used*. If we removed all the items you list, we'd end up with a crippled markup of little use to anyone. > I think that something of this flavour would be a great solution > for PEPs and docstrings, and that David has done an impressive > job on RST. Thank you. Anyhow, off to work. I'll follow up on further posts this evening. -- David Goodger Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ From ark@research.att.com Thu Aug 1 14:14:04 2002 From: ark@research.att.com (Andrew Koenig) Date: Thu, 1 Aug 2002 09:14:04 -0400 (EDT) Subject: [Python-Dev] split('') revisited In-Reply-To: (message from Tim Peters on Thu, 01 Aug 2002 02:39:24 -0400) References: Message-ID: <200208011314.g71DE4w07989@europa.research.att.com> >> Section 4.2.4 of the library reference says that the 'split' method of a >> regular expression object is defined as >> >> Identical to the split() function, using the compiled pattern. Tim> Supplying words intended to be clear from context, it's saying that the Tim> split method of a regexp object is identical to the re.split() function, Tim> which is true. In much the same way, list.pop() isn't the same thing as Tim> eyeball.pop() . Right. I missed the fact that there's another split. Sorry about that. >> My first impulse was to argue that (4) is right, and that the behavior >> should be as follows >> >> >>> 'abcde'.split('') >> ['a', 'b', 'c', 'd', 'e'] Tim> If that's what you want, list('abcde') is a direct way to get it. True, but that doesn't explain why it is useful to have 'abcde'.split('') and re.split('', 'abcde') behave differently. >> I made the counterargument that one could disambiguate by adding the >> rule that no element of the result could be equal to the delimiter. >> Therefore, if s is a string, s.split('') cannot contain any empty >> strings. Tim> Sure, that's one arbitrary rule . It doesn't seem to extend to Tim> regexps in a reasonable way, though: >>>> re.split('.*', 'abcde') Tim> ['', ''] Tim> Both split pieces there match the pattern. Yes, that's part of the source fo my confusion. >> However, looking at the behavior of regular expression splitting more >> closely, I become more confused. Can someone explain the following >> behavior to me? >> >>> re.compile('a|(x?)').split('abracadabra') >> ['', None, 'br', None, 'c', None, 'd', None, 'br', None, ''] >> From the docs: Tim> If capturing parentheses are used in pattern, then the text of all Tim> groups in the pattern are also returned as part of the resulting list. OK -- as I said, I had assumed that split() was referring to the other split function, probably because both of them were offscreen at the time. Tim> It should also say that splits never occur at points where the only match is Tim> against an empty string (indeed, that's exactly why re.split('', 'abcde') Tim> doesn't split anywhere). The logic is like: Tim> while True: Tim> find next non-empty match, else break Tim> emit the slice between this and the end of the last match Tim> emit all capturing groups Tim> advance position by length of match Tim> emit the slice from the end of the last match to the end of the string Tim> It's the last line in the loop body that makes empty matches a wart if Tim> allowed: they wouldn't advance the position at all, and an infinite loop Tim> would result. In order to make them do what you think you want, we'd have Tim> to add, at the end of the loop body Tim> ah, and if the match was emtpy, advance the position again, by, Tim> oh, i don't know, how about 1? That's close to 0 . Indeed, that's an arbitrary rule -- just about as arbitrary as the one that you abbreviated above, which should really be find the next match, but if the match is empty, disregard it; instead, find the next match with a length of at least, oh, I don't know, how about 1? That's close to 0 . What I'm trying to do is come up with a useful example to convince myself that one is better than the other. From mhammond@skippinet.com.au Thu Aug 1 14:22:55 2002 From: mhammond@skippinet.com.au (Mark Hammond) Date: Thu, 1 Aug 2002 23:22:55 +1000 Subject: [Python-Dev] PEP 298, final (?) version In-Reply-To: <00b101c2393a$e4a01ce0$e000a8c0@thomasnotebook> Message-ID: > The only thing I consider worth changing is to rename the > whole stuff from 'fixed buffer interface' to 'locked buffer > interface', which makes more sense at the current state. Agreed. Sorry if I missed this before, but: > If the object never resizes or reallocates the buffer > during it's lifetime, this function may be NULL. Failure to call > this function (if it is != NULL) is a programming error and may > have unexpected results. Not sure I like this. I would prefer to put the burden of "you must provide a (possibly empty) release function" on the few buffer interface implementers than the many (ie, potentially any extension author) buffer interface consumers. I believe there is a good chance of extension authors testing against, and therefore assuming, non-NULL implementations of this function. OTOH, if every fixed buffer consumer assumes a non-NULL implementation, people implementing this interface will quickly see their error well before it gets into the wild. No biggie, but worth considering... Mark. From mhammond@skippinet.com.au Thu Aug 1 14:28:59 2002 From: mhammond@skippinet.com.au (Mark Hammond) Date: Thu, 1 Aug 2002 23:28:59 +1000 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: <2meldjvtqy.fsf@starship.python.net> Message-ID: [Michael] > At the moment 'line' is called by the SET_LINENO opcode. My patch > changes it to be called when the co_lnotab indicates execution has > moved onto a different line. > > The reason this changes behaviour is that currently a SET_LINENO > opcode is emitted for the def line of every function (I guess this is > to cope with > > def functions_like_this(): return 1 Right - sorry - my misunderstanding. > If my patch goes in, I'll probably change pdb to catch 'call' events, > and nag authors of other debuggers that they should do the same. Yes, I agree this should not be necessary. You may even find debugger implementers already hack around this :) And yes, I agree that if debugger implementers really want to hook something on function entry, they should use the facility explicity designed for that purpose ;) > It is possible to generate an extra 'line' trace event to mimic the > old behaviour, but it's gross. Agreed. > Now I've spent some minutes explaining myself, you can explain to me > where you got the idea that I was even considering doing so from! Sorry, I just didn't re-read the thread well enough. Jumping to conclusions seems to be one of my strong points ;) Mark. From mal@lemburg.com Thu Aug 1 14:50:11 2002 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 01 Aug 2002 15:50:11 +0200 Subject: [Python-Dev] Enabling Python cross-compilation Message-ID: <3D493C93.5020704@lemburg.com> Someone just posted this link to the German Python mailing list: http://www.ailis.de/~k/knowledge/crosscompiling/python.php The page contains instruction to cross compile Python for the ARM processor and includes a patch which enables cross compiling Python in a very generic way. Wouldn't it make sense to add this kind of support to the standard dist ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/ From thomas.heller@ion-tof.com Thu Aug 1 15:19:37 2002 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Thu, 1 Aug 2002 16:19:37 +0200 Subject: [Python-Dev] PEP 298, final (?) version References: Message-ID: <024501c23966$77c63100$e000a8c0@thomasnotebook> > > If the object never resizes or reallocates the buffer > > during it's lifetime, this function may be NULL. Failure to call > > this function (if it is != NULL) is a programming error and may > > have unexpected results. > [Mark] > Not sure I like this. I would prefer to put the burden of "you must provide > a (possibly empty) release function" on the few buffer interface > implementers than the many (ie, potentially any extension author) buffer > interface consumers. > > I believe there is a good chance of extension authors testing against, and > therefore assuming, non-NULL implementations of this function. OTOH, if > every fixed buffer consumer assumes a non-NULL implementation, people > implementing this interface will quickly see their error well before it gets > into the wild. > > No biggie, but worth considering... I thought nobody would call these functions directly, but only through the PyObject_AcquireBuffer/PyObject_ReleaseBuffer functions, but maybe you're right. So probably it should be required that the release function must be implemented if any of the aquire functions is implemented. We could even implement the lockcount in every fixed buffer object even if it does no actual locking, and issue a warning or raise an exception in the destructor if it is not zero. (Or can we somehow prevent clients from calling these functions without going through the PyObject_ funcs?) Thomas From xscottg@yahoo.com Thu Aug 1 15:38:40 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Thu, 1 Aug 2002 07:38:40 -0700 (PDT) Subject: [Python-Dev] PEP 298, final (?) version In-Reply-To: <00b101c2393a$e4a01ce0$e000a8c0@thomasnotebook> Message-ID: <20020801143840.51753.qmail@web40105.mail.yahoo.com> --- Thomas Heller wrote: > void **); > typedef void (*releasefixedbufferproc)(PyObject *); > [...] > > If calls to these functions succeed, eventually the buffer must be > released by a call to the releasefixedbufferproc, supplying the > original object as argument. The releasefixedbufferproc cannot > fail. > [...] > > void PyObject_ReleaseFixedBuffer(PyObject *obj); > Would it be useful to allow bf_releasefixedbuffer to return an int indicating an exception? For instance, it could raise an exception if the extension errantly releases more times than it has acquired (a negative lock count). Just a thought. > > Python strings, unicode strings, mmap objects, and array objects > would expose the fixed buffer interface. > > mmap and array objects would actually enter a locked state while > the buffer is active, this is not needed for strings and unicode > objects. Resizing locked array objects is not allowed and will > raise an exception. Whether closing a locked mmap object is an > error or will only be deferred until the lock count reaches zero > is an implementation detail. > The mmap object is a good candidate for this, but I'm a little worried about adding it to array. I'm not saying it shouldn't be done, but I can imagine a surprized user who: - has an existing application using the array module - starts making use of a new extension that uses the fixed/locked buffer interface - gets an exception in code that never raised that exception before With the "deferred closed" strategy for the mmap object, this can't be a problem there. Just something to think about (or ignore :-). Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From Jack.Jansen@oratrix.com Thu Aug 1 15:22:35 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Thu, 1 Aug 2002 16:22:35 +0200 Subject: [Python-Dev] Enabling Python cross-compilation In-Reply-To: <3D493C93.5020704@lemburg.com> Message-ID: <2038E0BC-A55A-11D6-B123-003065517236@oratrix.com> On donderdag, augustus 1, 2002, at 03:50 , M.-A. Lemburg wrote: > Someone just posted this link to the German Python mailing > list: > > http://www.ailis.de/~k/knowledge/crosscompiling/python.php > > The page contains instruction to cross compile Python for > the ARM processor and includes a patch which enables cross > compiling Python in a very generic way. I like the idea, but I think it could be implemented slightly cleaner (without need for the make clean and all the environment variables). I was thinking something along the lines of having two build subdirectories (as is already supported currently), let's say build-host and build-crosscompile. Then you would first configure and build normally in build-host, and then in build-crosscompile do something like "CC=xxxx ETC ETC ../configure --hostbuilddir=../build-host". hostbuilddir would be used for finding python and pgen, and would default to ".". And I think all the funnies like EXEEXT would work correctly too. (Please note that I'm not volunteering to write the code, crosscompiling is not on my current wishlist) -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From guido@python.org Thu Aug 1 16:18:22 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 11:18:22 -0400 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: Your message of "Thu, 01 Aug 2002 10:00:21 BST." <2meldjvtqy.fsf@starship.python.net> References: <2meldjvtqy.fsf@starship.python.net> Message-ID: <200208011518.g71FIMb13597@odiug.zope.com> > After my patch there are no SET_LINENO opcodes, so execution is > never on the def line[*], so no 'line' trace event is generated for > the def line, so a debugger that only listens to the 'line' events and > ignores the 'call' events will not stop on that line. If the argument list contains embedded tuples, there's code executed to unpack those before the first line of the function. Example: >>> def f(a, (b, c), d): ... print a, b, c, d ... >>> f(1, (2, 3), 4) 1 2 3 4 >>> f(1, 2, 3) Traceback (most recent call last): File "", line 1, in ? File "", line 1, in f TypeError: unpack non-sequence >>> I hope the debugger will stop *before* this unpacking happens! It does now: >>> import pdb >>> pdb.run("f(1, 2, 3)") > (0)?() (Pdb) s > (1)?() (Pdb) > (1)f() (Pdb) TypeError: 'unpack non-sequence' > (1)f() (Pdb) q >>> --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Aug 1 16:34:04 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 11:34:04 -0400 Subject: [Python-Dev] PEP 298, final (?) version In-Reply-To: Your message of "Thu, 01 Aug 2002 23:22:55 +1000." References: Message-ID: <200208011534.g71FY4b13710@odiug.zope.com> > > The only thing I consider worth changing is to rename the > > whole stuff from 'fixed buffer interface' to 'locked buffer > > interface', which makes more sense at the current state. > > Agreed. Ditto. Ready for implementation now. > Sorry if I missed this before, but: > > If the object never resizes or reallocates the buffer > > during it's lifetime, this function may be NULL. Failure to call > > this function (if it is != NULL) is a programming error and may > > have unexpected results. > > Not sure I like this. I would prefer to put the burden of "you must provide > a (possibly empty) release function" on the few buffer interface > implementers than the many (ie, potentially any extension author) buffer > interface consumers. > > I believe there is a good chance of extension authors testing against, and > therefore assuming, non-NULL implementations of this function. OTOH, if > every fixed buffer consumer assumes a non-NULL implementation, people > implementing this interface will quickly see their error well before it gets > into the wild. > > No biggie, but worth considering... Hm, *users* of the interface would always go through this API: int PyObject_AquireFixedReadBuffer(PyObject *obj, const void **buffer, size_t *buffer_len); int PyObject_AcquireFixedWriteBuffer(PyObject *obj, void **buffer, size_t *buffer_len); void PyObject_ReleaseFixedBuffer(PyObject *obj); But I'm still very concerned that if most built-in types (e.g. strings, bytes) don't implement the release functionality, it's too easy for an extension to seem to work while forgetting to release the buffer. I recommend that at least some built-in types implement the acquire/release functionality with a counter, and assert that the counter is zero when the object is deleted -- if the assert fails, someone DECREF'ed their reference to the object without releasing it. (The rule should be that you must own a reference to the object while you've aquired the object.) For strings that might be impractical because the string object would have to grow 4 bytes to hold the counter; but the new bytes object (PEP 296) could easily implement the counter, and the array object too -- that way there will be plenty of opportunity to test proper use of the protocol. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Aug 1 16:22:06 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 11:22:06 -0400 Subject: [Python-Dev] Re: Docutils/reStructuredText is ready to process PEPs In-Reply-To: Your message of "Thu, 01 Aug 2002 04:22:19 PDT." References: Message-ID: <200208011522.g71FM6I13637@odiug.zope.com> > On Thu, 1 Aug 2002, Eric S. Raymond wrote: > > Ka-Ping Yee : > > > I am not against structured text processing systems in general. > > > I think that something of this flavour would be a great solution > > > for PEPs and docstrings, and that David has done an impressive > > > job on RST. It's just that RST is much too big (for me). > > > > And if we're going to pay the transition costs to move to a > > heavyweight markup, it ought to be DocBook, same direction GNOME and > > KDE and the Linux kernel and FreeBSD and PHP are going. > > I would be very unhappy about having to enter and edit inline > documentation in an XML-based markup language. Agreed 110%. Perhaps Eric thought we were talking about the core Python docs? David was only talking about PEPs right now. > RST is not what i would call heavyweight *markup*. It's just a > heavy specification. There are too many cases to know. If you > simplified RST in the following ways, we might have something > i would consider reasonably-sized: > > - Choose one way to do headings. > - Choose one way to do numbered and non-numbered lists. > - Choose one way to do tables. > - Drop bibliographic fields. > - Drop RCS keyword processing. > - Get rid of option lists (we already have definition lists). > - Drop some fancy reference features (e.g. auto-numbered and > auto-symbol footnotes, indirect references, substitutions). > - Drop inline hyperlink references (we already have inline URLs). > - Drop inline internal targets (we already have explicit targets). > - Drop interpreted text (we already have inline literals). > - Drop citations (we already have footnotes). > - (Or, in summary -- instead of ten kinds of inline markup, we > only need four: emphasis, literals, footnotes, and URLs.) > - Simplify inline markup rules (way too many characters to know). > Instead of 100 lines describing markup rules, two lines are > sufficient: emphasis starts from " *" and stops at "*", literals > go from " `" to "`", and footnotes go from " [" to "[". Perhaps this could be a preferred subset? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Aug 1 16:47:25 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 11:47:25 -0400 Subject: [Python-Dev] PEP 298, final (?) version In-Reply-To: Your message of "Thu, 01 Aug 2002 07:38:40 PDT." <20020801143840.51753.qmail@web40105.mail.yahoo.com> References: <20020801143840.51753.qmail@web40105.mail.yahoo.com> Message-ID: <200208011547.g71FlPf13861@odiug.zope.com> > > void PyObject_ReleaseFixedBuffer(PyObject *obj); > > > > Would it be useful to allow bf_releasefixedbuffer to return an int > indicating an exception? For instance, it could raise an exception if the > extension errantly releases more times than it has acquired (a negative > lock count). Just a thought. OTOH, it means that the caller would have to check for errors. It may make more sense to make this a fatal error, since it's purely the (or at least *a*) caller's fault. > > Python strings, unicode strings, mmap objects, and array objects > > would expose the fixed buffer interface. > > > > mmap and array objects would actually enter a locked state while > > the buffer is active, this is not needed for strings and unicode > > objects. Resizing locked array objects is not allowed and will > > raise an exception. Whether closing a locked mmap object is an > > error or will only be deferred until the lock count reaches zero > > is an implementation detail. > > The mmap object is a good candidate for this, but I'm a little worried > about adding it to array. I'm not saying it shouldn't be done, but I can > imagine a surprized user who: > > - has an existing application using the array module > - starts making use of a new extension that uses the fixed/locked > buffer interface > - gets an exception in code that never raised that exception before Hm. As long as it's not too hard to point out the cause (using the new extension) I don't think this would be a problem. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Aug 1 16:36:59 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 11:36:59 -0400 Subject: [Python-Dev] PEP 298, final (?) version In-Reply-To: Your message of "Thu, 01 Aug 2002 16:19:37 +0200." <024501c23966$77c63100$e000a8c0@thomasnotebook> References: <024501c23966$77c63100$e000a8c0@thomasnotebook> Message-ID: <200208011537.g71FaxE13781@odiug.zope.com> > (Or can we somehow prevent clients from calling these functions > without going through the PyObject_ funcs?) No, don't worry. Once the PyObject_ API exists, nobody will bother calling call the slot functions directly. --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh@python.net Thu Aug 1 16:56:49 2002 From: mwh@python.net (Michael Hudson) Date: 01 Aug 2002 16:56:49 +0100 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: Guido van Rossum's message of "Thu, 01 Aug 2002 11:18:22 -0400" References: <2meldjvtqy.fsf@starship.python.net> <200208011518.g71FIMb13597@odiug.zope.com> Message-ID: <2mofcmpo72.fsf@starship.python.net> Guido van Rossum writes: > > After my patch there are no SET_LINENO opcodes, so execution is > > never on the def line[*], so no 'line' trace event is generated for > > the def line, so a debugger that only listens to the 'line' events and > > ignores the 'call' events will not stop on that line. > > If the argument list contains embedded tuples, there's code executed > to unpack those before the first line of the function. Well, if there's code there, then the debugger stops. I know it's confusing to have intuitive behaviour in this area... > Example: > > >>> def f(a, (b, c), d): > ... print a, b, c, d > ... > >>> f(1, (2, 3), 4) > 1 2 3 4 > >>> f(1, 2, 3) > Traceback (most recent call last): > File "", line 1, in ? > File "", line 1, in f > TypeError: unpack non-sequence > >>> > > I hope the debugger will stop *before* this unpacking happens! It > does now: > > >>> import pdb > >>> pdb.run("f(1, 2, 3)") > > (0)?() > (Pdb) s > > (1)?() > (Pdb) > > (1)f() > (Pdb) > TypeError: 'unpack non-sequence' > > (1)f() > (Pdb) q > >>> Still does: $ cat t.py def f(a, (b, c), d): print a, b, c, d $ ./python Python 2.3a0 (#14, Aug 1 2002, 16:48:20) [GCC 2.96 20000731 (Red Hat Linux 7.2 2.96-108.7.2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import pdb, t >>> pdb.run("t.f(1, 2, 3)") > (1)?() (Pdb) s > /home/mwh/src/sf/python/dist/src/build/t.py(1)f() -> def f(a, (b, c), d): (Pdb) TypeError: 'unpack non-sequence' > /home/mwh/src/sf/python/dist/src/build/t.py(1)f() -> def f(a, (b, c), d): (Pdb) q Anyway, I think I'm done now (as you maybe able to tell from the pile of patch notification emails than just landed in your inbox :). These issues from my original mail in this thread still haven't be addressed: 4) The patch installs a descriptor for f_lineno so that there is no incompatibility for Python code. The question is what to do with the f_lineno field in the C struct? Remove it? That would (probably) mean bumping PY_API_VERSION. Leave it in? Then its contents would usually be meaningless (keeping it up to date would rather defeat the point of this patch). I think leaving f_lineno there but useless is the way to go. If we actually make incompatible changes for other reasons, then it can disappear. 8) I haven't measured the performance impact of the changes to code that is tracing or code that isn't. There's a possible optimization mentioned in the patch for traced code. For not traced code it MAY be worthwhile putting the tracing support code in a static function somewhere so there's less code to jump over in the main loop (for i-caches and such). Still haven't done this. 9) This patch stops LLTRACE telling you when execution moves onto a different line. This could be restored, but a) I expect I'm the only persion to have used LLTRACE recently (debugging this patch). b) This will cause obfuscation, so I'd prefer to do it last. No change here either. Cheers, M. -- The gripping hand is really that there are morons everywhere, it's just that the Americon morons are funnier than average. -- Pim van Riezen, alt.sysadmin.recovery From guido@python.org Thu Aug 1 16:35:47 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 11:35:47 -0400 Subject: [Python-Dev] Enabling Python cross-compilation In-Reply-To: Your message of "Thu, 01 Aug 2002 15:50:11 +0200." <3D493C93.5020704@lemburg.com> References: <3D493C93.5020704@lemburg.com> Message-ID: <200208011535.g71FZlu13746@odiug.zope.com> > Someone just posted this link to the German Python mailing > list: > > http://www.ailis.de/~k/knowledge/crosscompiling/python.php > > The page contains instruction to cross compile Python for > the ARM processor and includes a patch which enables cross > compiling Python in a very generic way. > > Wouldn't it make sense to add this kind of support to the > standard dist ? Sure! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Aug 1 17:14:25 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 12:14:25 -0400 Subject: [Python-Dev] Speed of test_sort.py In-Reply-To: Your message of "Wed, 31 Jul 2002 19:23:08 PDT." References: Message-ID: <200208011614.g71GEPZ14933@odiug.zope.com> [Tim, in python-checkins] > Bizarre: this takes 11x longer to run if and only if test_longexp is > run before it, on my box. The bigger REPS is in test_longexp, the > slower this gets. What happens on your box? It's not gc on my box > (which is good, because gc isn't a plausible candidate here). > > The slowdown is massive in the parts of test_sort that implicitly > invoke a new-style class's __lt__ or __cmp__ methods. If I boost > REPS large enough in test_longexp, even the test_sort tests on an array > of size 64 visibly c-r-a-w-l. The relative slowdown is even worse in > a debug build. And if I reduce REPS in test_longexp, the slowdown in > test_sort goes away. > > test_longexp does do horrid things to Win98's management of user > address space, but I thought I had made that a whole lot better a month > or so ago (by overallocating aggressively in the parser). It's about the same on my Linux box (system time is CPU time spent in the kernel): test_longexp alone takes 1.92 user + 0.22 system seconds. test_sort alone takes 1.71 user + 0.01 system seconds. test_sort + test_longexp takes 3.62 user + 0.18 system seconds. test_longexp + test_sort takes 38.05 user and 0.34 system seconds!!! I'll see if I can get this to run under a profiler. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@v.loewis.de Thu Aug 1 17:54:47 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 01 Aug 2002 18:54:47 +0200 Subject: [Python-Dev] Enabling Python cross-compilation In-Reply-To: <2038E0BC-A55A-11D6-B123-003065517236@oratrix.com> References: <2038E0BC-A55A-11D6-B123-003065517236@oratrix.com> Message-ID: Jack Jansen writes: > I like the idea, but I think it could be implemented slightly cleaner > (without need for the make clean and all the environment variables). I > was thinking something along the lines of having two build > subdirectories (as is already supported currently), let's say > build-host and build-crosscompile. I think requiring the host compilation is wrong in the first place. Instead, when cross-compiling, Python should require that the host python already exists - whether from a previous configure;make; make install, or because a host Python had been there all along (it doesn't even have to be the same Python version). Likewise, building the host pgen in a cross-compilation should not be necessary, since the pgen output is shipped with the source release. configure already supports cross-compilation, so setting CC should not be necessary (since it will automatically find arm-linux-gcc if you have a GNU cross-compilation environment). I don't volunteer to write patches, either, but I do volunteer to review patches. Regards, Martin From zack@codesourcery.com Thu Aug 1 17:19:46 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Thu, 1 Aug 2002 09:19:46 -0700 Subject: [Python-Dev] Weird error handling in os._execvpe Message-ID: <20020801161946.GA32076@codesourcery.com> While testing my tempfile.py rewrite I ran into this mess in os.py: def _execvpe(file, args, env=None): # ... if not _notfound: if sys.platform[:4] == 'beos': # Process handling (fork, wait) under BeOS (up to 5.0) # doesn't interoperate reliably with the thread interlocking # that happens during an import. The actual error we need # is the same on BeOS for posix.open() et al., ENOENT. try: unlink('/_#.# ## #.#') except error, _notfound: pass else: import tempfile t = tempfile.mktemp() # Exec a file that is guaranteed not to exist try: execv(t, ('blah',)) except error, _notfound: pass exc, arg = error, _notfound for dir in PATH: fullname = path.join(dir, file) try: apply(func, (fullname,) + argrest) except error, (errno, msg): if errno != arg[0]: exc, arg = error, (errno, msg) raise exc, arg This appears to be an overcomplicated, unreliable way of writing import errno def _execvpe(file, args, env=None): # ... for dir in PATH: fullname = path.join(dir, file) try: apply(func, (fullname,) + argrest) except error, (err, msg): if err != errno.ENOENT: # and err != errno.ENOTDIR, maybe raise raise error, (err, msg) Can anyone explain why it is done this way? zw From guido@python.org Thu Aug 1 17:26:09 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 12:26:09 -0400 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: Your message of "Thu, 01 Aug 2002 16:56:49 BST." <2mofcmpo72.fsf@starship.python.net> References: <2meldjvtqy.fsf@starship.python.net> <200208011518.g71FIMb13597@odiug.zope.com> <2mofcmpo72.fsf@starship.python.net> Message-ID: <200208011626.g71GQAK24757@odiug.zope.com> > Well, if there's code there, then the debugger stops. I know it's > confusing to have intuitive behaviour in this area... :-) > Anyway, I think I'm done now (as you maybe able to tell from the pile > of patch notification emails than just landed in your inbox :). > > These issues from my original mail in this thread still haven't be > addressed: > > 4) The patch installs a descriptor for f_lineno so that there is no > incompatibility for Python code. The question is what to do with > the f_lineno field in the C struct? Remove it? That would > (probably) mean bumping PY_API_VERSION. Leave it in? Then its > contents would usually be meaningless (keeping it up to date would > rather defeat the point of this patch). > > I think leaving f_lineno there but useless is the way to go. If we > actually make incompatible changes for other reasons, then it can > disappear. Agreed. > 8) I haven't measured the performance impact of the changes to code > that is tracing or code that isn't. There's a possible > optimization mentioned in the patch for traced code. For not > traced code it MAY be worthwhile putting the tracing support code > in a static function somewhere so there's less code to jump over in > the main loop (for i-caches and such). > > Still haven't done this. I don't care if it slows down tracing, but I'd like it not to slow down regular operation. Of course, since SET_LINENO is gone, it should speed things up dramatically; but how does it do compared to previous -O mode? (I guess the only difference that -O makes now is that asserts aren't compiled. :-) > 9) This patch stops LLTRACE telling you when execution moves onto a > different line. This could be restored, but > > a) I expect I'm the only persion to have used LLTRACE recently > (debugging this patch). > b) This will cause obfuscation, so I'd prefer to do it last. > > No change here either. I'm not too attached to LLTRACE. As long as it's usable for debugging massive changes to the VM implementation I'm okay with it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Aug 1 18:23:03 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 13:23:03 -0400 Subject: [Python-Dev] Weird error handling in os._execvpe In-Reply-To: Your message of "Thu, 01 Aug 2002 09:19:46 PDT." <20020801161946.GA32076@codesourcery.com> References: <20020801161946.GA32076@codesourcery.com> Message-ID: <200208011723.g71HN4025731@odiug.zope.com> > While testing my tempfile.py rewrite I ran into this mess in os.py: > > def _execvpe(file, args, env=None): > # ... > if not _notfound: > if sys.platform[:4] == 'beos': > # Process handling (fork, wait) under BeOS (up to 5.0) > # doesn't interoperate reliably with the thread interlocking > # that happens during an import. The actual error we need > # is the same on BeOS for posix.open() et al., ENOENT. > try: unlink('/_#.# ## #.#') > except error, _notfound: pass > else: > import tempfile > t = tempfile.mktemp() > # Exec a file that is guaranteed not to exist > try: execv(t, ('blah',)) > except error, _notfound: pass > exc, arg = error, _notfound > for dir in PATH: > fullname = path.join(dir, file) > try: > apply(func, (fullname,) + argrest) > except error, (errno, msg): > if errno != arg[0]: > exc, arg = error, (errno, msg) > raise exc, arg > > This appears to be an overcomplicated, unreliable way of writing > > import errno > > def _execvpe(file, args, env=None): > # ... > for dir in PATH: > fullname = path.join(dir, file) > try: > apply(func, (fullname,) + argrest) > except error, (err, msg): > if err != errno.ENOENT: # and err != errno.ENOTDIR, maybe > raise > raise error, (err, msg) > > Can anyone explain why it is done this way? Because not all systems report the same error for this error condition (attempting to execute a file that doesn't exist). --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh@python.net Thu Aug 1 18:11:32 2002 From: mwh@python.net (Michael Hudson) Date: Thu, 1 Aug 2002 18:11:32 +0100 (BST) Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: <200208011626.g71GQAK24757@odiug.zope.com> Message-ID: On Thu, 1 Aug 2002, Guido van Rossum wrote: > > I think leaving f_lineno there but useless is the way to go. If we > > actually make incompatible changes for other reasons, then it can > > disappear. > > Agreed. Good. > > 8) I haven't measured the performance impact of the changes to code > > that is tracing or code that isn't. There's a possible > > optimization mentioned in the patch for traced code. For not > > traced code it MAY be worthwhile putting the tracing support code > > in a static function somewhere so there's less code to jump over in > > the main loop (for i-caches and such). > > > > Still haven't done this. > > I don't care if it slows down tracing, but I'd like it not to slow > down regular operation. Of course, since SET_LINENO is gone, it > should speed things up dramatically; but how does it do compared to > previous -O mode? Currently compiling up two interpreters for pybench testing... Here goes. Everything is relative to 221-base, which is 2.2.1 from Sean's RPM. This is the slowest, so all percentages are negative, and more negative is better. I hope the names are obvious. 221-base +0.00% (obviously) 221-O-base: -9.69% CVS-base: -15.43% CVS-O-base: -23.56% CVS-hacked: -23.66% CVS-O-hacked: -23.70% (Nearly 25% speed up since 221? Boggle. Some of this may be compilation options, I guess) Anyway, it seems I haven't slowed -O down. At some point I might try moving the trace code out of line and see if that has any effect. Not today. If you want to look at where the improvements are in more detail, I've put the pybench files here: http://starship.python.net/crew/mwh/hacks/pybench-files.tar.gz > (I guess the only difference that -O makes now is that asserts aren't > compiled. :-) I think so, yes. > > 9) This patch stops LLTRACE telling you when execution moves onto a > > different line. This could be restored, but > > > > a) I expect I'm the only persion to have used LLTRACE recently > > (debugging this patch). > > b) This will cause obfuscation, so I'd prefer to do it last. > > > > No change here either. > > I'm not too attached to LLTRACE. As long as it's usable for debugging > massive changes to the VM implementation I'm okay with it. Good. I don't suppose you'd actually LLTRACE something without dis output in front of you anyway, so this isn't much of a loss. Something I just remembered: I turned off LLTRACE for trace functions. I guess this isn't really worth caring about either. Cheers, M. From guido@python.org Thu Aug 1 19:16:04 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 14:16:04 -0400 Subject: [Python-Dev] Re: Speed of test_sort.py In-Reply-To: Your message of "Thu, 01 Aug 2002 12:14:25 EDT." Message-ID: <200208011816.g71IG4V25873@odiug.zope.com> > [Tim, in python-checkins] > > Bizarre: this takes 11x longer to run if and only if test_longexp is > > run before it, on my box. The bigger REPS is in test_longexp, the > > slower this gets. What happens on your box? It's not gc on my box > > (which is good, because gc isn't a plausible candidate here). > > > > The slowdown is massive in the parts of test_sort that implicitly > > invoke a new-style class's __lt__ or __cmp__ methods. If I boost > > REPS large enough in test_longexp, even the test_sort tests on an array > > of size 64 visibly c-r-a-w-l. The relative slowdown is even worse in > > a debug build. And if I reduce REPS in test_longexp, the slowdown in > > test_sort goes away. > > > > test_longexp does do horrid things to Win98's management of user > > address space, but I thought I had made that a whole lot better a month > > or so ago (by overallocating aggressively in the parser). > > It's about the same on my Linux box (system time is CPU time spent in > the kernel): > > test_longexp alone takes 1.92 user + 0.22 system seconds. > test_sort alone takes 1.71 user + 0.01 system seconds. > test_sort + test_longexp takes 3.62 user + 0.18 system seconds. > test_longexp + test_sort takes 38.05 user and 0.34 system seconds!!! > > I'll see if I can get this to run under a profiler. The profiler shows that in the latter run, 86% of the time (39 seconds -- the profiler slows things down :-) was spent in PyFrame_New, for 188923 calls. I note that the longexp-only profile has only 593 calls to that function, and the sort-only profile has 183075 calls to it, but look only 0.39 seconds for those altogether! The numbers don't quite add up to 188923 (it misses 5255), but it's close enough, and the rest is probably because regrtest does extra stuff when it runs two tests, or there's some randomness in the sort test. So why would 180,000 calls to PyFrame_New take 38 seconds in one case and 0.39 seconds in the other? I checked the call tree in the profiler output, and only very few of the calls to PyFrame_New call something else (mostly PyType_IsSubtype and PyDict_GetItem), and besides the two cases have an almost identical call profile. Suggestion: doesn't test_longexp create some frames with a very large number of local variables? Then PyFrame_New could spend a lot of time in this loop: while (--extras >= 0) f->f_localsplus[extras] = NULL; There's a free list of frames, and PyFrame_New picks the first frame on the free list. It grows the space for locals if necessary, but it never shrinks it. Back to Tim -- does this make sense? Should we attempt to fix it? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Aug 1 19:19:11 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 14:19:11 -0400 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: Your message of "Thu, 01 Aug 2002 18:11:32 BST." References: Message-ID: <200208011819.g71IJBR25893@odiug.zope.com> > Here goes. Everything is relative to 221-base, which is 2.2.1 from Sean's > RPM. This is the slowest, so all percentages are negative, and more > negative is better. I hope the names are obvious. > > 221-base +0.00% (obviously) > 221-O-base: -9.69% > CVS-base: -15.43% > CVS-O-base: -23.56% > CVS-hacked: -23.66% > CVS-O-hacked: -23.70% > > (Nearly 25% speed up since 221? Boggle. Some of this may be compilation > options, I guess) No, pymalloc sped us up quite a bit. > Anyway, it seems I haven't slowed -O down. At some point I might try > moving the trace code out of line and see if that has any effect. Not > today. Fine. > If you want to look at where the improvements are in more detail, I've put > the pybench files here: > > http://starship.python.net/crew/mwh/hacks/pybench-files.tar.gz > > > (I guess the only difference that -O makes now is that asserts aren't > > compiled. :-) > > I think so, yes. Ah well. So much -O. :-) > > > 9) This patch stops LLTRACE telling you when execution moves onto a > > > different line. This could be restored, but > > > > > > a) I expect I'm the only persion to have used LLTRACE recently > > > (debugging this patch). > > > b) This will cause obfuscation, so I'd prefer to do it last. > > > > > > No change here either. > > > > I'm not too attached to LLTRACE. As long as it's usable for debugging > > massive changes to the VM implementation I'm okay with it. > > Good. I don't suppose you'd actually LLTRACE something without dis output > in front of you anyway, so this isn't much of a loss. Something I just > remembered: I turned off LLTRACE for trace functions. I guess this isn't > really worth caring about either. Fine. What's the next step? I haven't had time to review your code. Do you want to check it in without further review, or do you want to wait until someone can give it a serious look? (Tim's on vacation this week so it might be a while.) --Guido van Rossum (home page: http://www.python.org/~guido/) From zack@codesourcery.com Thu Aug 1 19:23:40 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Thu, 1 Aug 2002 11:23:40 -0700 Subject: [Python-Dev] Weird error handling in os._execvpe In-Reply-To: <200208011723.g71HN4025731@odiug.zope.com> References: <20020801161946.GA32076@codesourcery.com> <200208011723.g71HN4025731@odiug.zope.com> Message-ID: <20020801182340.GB27575@codesourcery.com> On Thu, Aug 01, 2002 at 01:23:03PM -0400, Guido van Rossum wrote: > > Can anyone explain why it is done this way? > > Because not all systems report the same error for this error condition > (attempting to execute a file that doesn't exist). That's unfortunate. The existing code is buggy on at least three grounds: First and most important, it's currently trivial to cause any program that uses os.execvp[e] to invoke a program of the attacker's choice, rather than the intended one, on any platform that supports symbolic links and has predictable PIDs. My tempfile rewrite will make this much harder, but still not impossible. Second, the BeOS code will silently delete the file '/_#.# ## #.#' if it exists, which is unlikely, but not impossible. A user who had created such a file would certainly be surprised to discover it gone after running an apparently-innocuous Python program. Third, if an error other than the expected one comes back, the loop clobbers the saved exception info and keeps going. Consider the situation where PATH=/bin:/usr/bin, /bin/foobar exists but is not executable by the invoking user, and /usr/bin/foobar does not exist. The exception thrown will be 'No such file or directory', not the expected 'Permission denied'. Also, I'm not certain what will happen if two threads go through the if not _notfound: block at the same time, but it could be bad, depending on how much implicit locking there is in the interpreter. I see three possible fixes. In order of personal preference: 1. Make os.execvp[e] just call the C library's execvp[e]; it has to get this stuff right anyway. We are already counting on it for execv - I would be surprised to find a system that had execv and not execvp, as long as PATH was a meaningful concept (it isn't, for instance, on classic MacOS). 2. Enumerate all the platform-specific errno values for this failure mode, and check them all. On Unix, ENOENT and arguably ENOTDIR. I don't know about others. 3. If we must do the temporary file thing, create a temporary _directory_; we control the contents of that directory, so we can be sure that the file name we choose does not exist. Cleanup is messier than the other two possibilities. zw From tim.one@comcast.net Thu Aug 1 19:11:42 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 01 Aug 2002 14:11:42 -0400 Subject: [Python-Dev] Speed of test_sort.py In-Reply-To: <200208011614.g71GEPZ14933@odiug.zope.com> Message-ID: [Guido, mixing test_longexp w/ the new test_sort] > It's about the same on my Linux box (system time is CPU time spent in > the kernel): Dang! I was more than half hoping it was a Windows glitch. > test_longexp alone takes 1.92 user + 0.22 system seconds. > test_sort alone takes 1.71 user + 0.01 system seconds. > test_sort + test_longexp takes 3.62 user + 0.18 system seconds. > test_longexp + test_sort takes 38.05 user and 0.34 system seconds!!! > > I'll see if I can get this to run under a profiler. It's intriguing, but I have to do other things today. Here's a self-contained test case that allows to vary REPS from the command line: """ import sys from time import clock as now def do_shuffle(x): import random random.shuffle(x) def do_longexp(REPS): l = eval("[" + "2," * REPS + "]") assert len(l) == REPS def do_sort(x): x.sort(lambda x, y: cmp(x, y)) # Doing x.sort(cmp) instead, there's no slowdown, so it's not just # that there's an explicit comparison function. x = range(1000) do_shuffle(x) REPS = 65580 if len(sys.argv) > 1: REPS = int(sys.argv[1]) t1 = now() do_longexp(REPS) t2 = now() do_sort(x) t3 = now() print "At REPS=%d, longexp %.2g sort %.2g" % (REPS, t2-t1, t3-t2) """ On my box, the time it takes for the sort appears, after a certain point, to grow quadratically(!) in the REPS value (these timings were hasty and not on a quiet box, so only gross conclusions are justified): C:\Code\python\PCbuild>python temp.py 1 At REPS=1, longexp 0.00021 sort 0.027 At REPS=1, longexp 0.00018 sort 0.036 At REPS=10, longexp 0.00035 sort 0.053 At REPS=100, longexp 0.002 sort 0.028 At REPS=1000, longexp 0.039 sort 0.073 At REPS=10000, longexp 0.47 sort 0.45 At REPS=20000, longexp 0.89 sort 0.44 At REPS=40000, longexp 1.5 sort 1.3 At REPS=80000, longexp 2.5 sort 5.9 At REPS=160000, longexp 5 sort 22 From guido@python.org Thu Aug 1 19:38:17 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 14:38:17 -0400 Subject: [Python-Dev] Re: Speed of test_sort.py In-Reply-To: Your message of "Thu, 01 Aug 2002 14:16:04 EDT." <200208011816.g71IG4V25873@odiug.zope.com> References: <200208011816.g71IG4V25873@odiug.zope.com> Message-ID: <200208011838.g71IcIL06718@odiug.zope.com> > Suggestion: doesn't test_longexp create some frames with a very large > number of local variables? Then PyFrame_New could spend a lot of time > in this loop: > > while (--extras >= 0) > f->f_localsplus[extras] = NULL; > > There's a free list of frames, and PyFrame_New picks the first frame > on the free list. It grows the space for locals if necessary, but it > never shrinks it. Jeremy made me think about it some more. Deleting two lines from PyFrame_New() made the timing behavior much more reasonable: *** frameobject.c 20 Apr 2002 04:46:55 -0000 2.62 --- frameobject.c 1 Aug 2002 18:32:40 -0000 *************** *** 265,272 **** if (f == NULL) return NULL; } - else - extras = f->ob_size; _Py_NewReference((PyObject *)f); } if (builtins == NULL) { --- 265,270 ---- This means that the while loop only clears that part of the stack that we plan to *use*, not all that's available. I've run the whole test suite in debug mode with this change and it showed no failures, so I'll check this in now. Should we fix this in 2.2.2 too? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Thu Aug 1 20:05:16 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 01 Aug 2002 15:05:16 -0400 Subject: [Python-Dev] Re: Speed of test_sort.py In-Reply-To: <200208011816.g71IG4V25873@odiug.zope.com> Message-ID: [Guido, pins the blame on PyFrame_New -- cool!] > ... > Suggestion: doesn't test_longexp create some frames with a very large > number of local variables? Then PyFrame_New could spend a lot of time > in this loop: > > while (--extras >= 0) > f->f_localsplus[extras] = NULL; In my poor man's profiling , I ran the self-contained test case posted eariler under the debugger with REPS=120000, and since the "sort" part takes 20 seconds then, there was lots of opportunity to break at random times (the MSVC debugger lets you do that, i.e. click a button that means "I don't care where you are, break *now*"). It was always in that loop when it broke, and extras always started life at 120000 before that loop. Yikes! > There's a free list of frames, and PyFrame_New picks the first frame > on the free list. It grows the space for locals if necessary, but it > never shrinks it. > > Back to Tim -- does this make sense? Should we attempt to fix it? I can't make sufficient time to think about this, but I suspect a principled fix is simply to delete these two lines: else extras = f->ob_size; The number of extras the code object actually needs was already computed correctly earlier, via extras = code->co_stacksize + code->co_nlocals + ncells + nfrees; and there's no point clearing any more than that original value. IOW, I don't think it hurts to have a big old frame left on the freelist, the pain comes from clearing out more slots in it than the *current* code object uses. A quick test of this showed it cured the test_longexp + test_sort speed problem, and the regression suite ran without problems. If someone understands this code well enough to finish thinking about whether that's a correct thing to do, please do! From guido@python.org Thu Aug 1 20:12:29 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 15:12:29 -0400 Subject: [Python-Dev] Re: Speed of test_sort.py In-Reply-To: Your message of "Thu, 01 Aug 2002 15:05:16 EDT." References: Message-ID: <200208011912.g71JCTi10213@odiug.zope.com> > If someone understands this code well enough to finish thinking about > whether that's a correct thing to do, please do! Please do a cvs update. :-) Jeremy & I independently came up with the same solution, so I consider this resolved. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Aug 1 20:27:43 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 15:27:43 -0400 Subject: [Python-Dev] Weird error handling in os._execvpe In-Reply-To: Your message of "Thu, 01 Aug 2002 11:23:40 PDT." <20020801182340.GB27575@codesourcery.com> References: <20020801161946.GA32076@codesourcery.com> <200208011723.g71HN4025731@odiug.zope.com> <20020801182340.GB27575@codesourcery.com> Message-ID: <200208011927.g71JRhm10269@odiug.zope.com> > > > Can anyone explain why it is done this way? > > > > Because not all systems report the same error for this error condition > > (attempting to execute a file that doesn't exist). > > That's unfortunate. The existing code is buggy on at least three > grounds: > > First and most important, it's currently trivial to cause any program > that uses os.execvp[e] to invoke a program of the attacker's choice, > rather than the intended one, on any platform that supports symbolic > links and has predictable PIDs. My tempfile rewrite will make this > much harder, but still not impossible. That's important. > Second, the BeOS code will silently delete the file '/_#.# ## #.#' if > it exists, which is unlikely, but not impossible. A user who had > created such a file would certainly be surprised to discover it gone > after running an apparently-innocuous Python program. I really don't care about that. :-) > Third, if an error other than the expected one comes back, the loop > clobbers the saved exception info and keeps going. Consider the > situation where PATH=/bin:/usr/bin, /bin/foobar exists but is not > executable by the invoking user, and /usr/bin/foobar does not exist. > The exception thrown will be 'No such file or directory', not the > expected 'Permission denied'. Hm, you're right. The code (which I believe I wrote, except for the BeOS bit) was attempting to get the opposite effect, but seems to be broken. :-( > Also, I'm not certain what will happen if two threads go through the > if not _notfound: block at the same time, but it could be bad, > depending on how much implicit locking there is in the interpreter. > > I see three possible fixes. In order of personal preference: > > 1. Make os.execvp[e] just call the C library's execvp[e]; it has to > get this stuff right anyway. We are already counting on it for > execv - I would be surprised to find a system that had execv and > not execvp, as long as PATH was a meaningful concept (it isn't, for > instance, on classic MacOS). Probably agreed for execvpe(). All the non-env versions must call the env version because not all platforms have putenv, and there changes to os.environ don't get reflected in the process's environment. > 2. Enumerate all the platform-specific errno values for this failure > mode, and check them all. On Unix, ENOENT and arguably ENOTDIR. I > don't know about others. > > 3. If we must do the temporary file thing, create a temporary > _directory_; we control the contents of that directory, so we can > be sure that the file name we choose does not exist. Cleanup is > messier than the other two possibilities. I like to agree with this, but I don't recall exactly why we ended up in this situation in the first place. It's possible that it's an unnecessary sacrifice of a dead chicken, but it's also possible that there are platforms where this addressed a real need. I'd like to think that it was because I didn't want to add more cruft to posixmodule.c (I've long given up on that :-). Can you post a patch to SF? Then we can ask for volunteers to test it on various platforms. --Guido van Rossum (home page: http://www.python.org/~guido/) From zack@codesourcery.com Thu Aug 1 23:06:41 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Thu, 1 Aug 2002 15:06:41 -0700 Subject: [Python-Dev] Weird error handling in os._execvpe In-Reply-To: <200208011927.g71JRhm10269@odiug.zope.com> References: <20020801161946.GA32076@codesourcery.com> <200208011723.g71HN4025731@odiug.zope.com> <20020801182340.GB27575@codesourcery.com> <200208011927.g71JRhm10269@odiug.zope.com> Message-ID: <20020801220641.GA17902@codesourcery.com> On Thu, Aug 01, 2002 at 03:27:43PM -0400, Guido van Rossum wrote: > > 1. Make os.execvp[e] just call the C library's execvp[e]; it has to > > get this stuff right anyway. We are already counting on it for > > execv - I would be surprised to find a system that had execv and > > not execvp, as long as PATH was a meaningful concept (it isn't, for > > instance, on classic MacOS). > > Probably agreed for execvpe(). All the non-env versions must call the > env version because not all platforms have putenv, and there changes > to os.environ don't get reflected in the process's environment. execvp could be just def execvp(file, args): return execvpe(file, args, environ) yes? > > 2. Enumerate all the platform-specific errno values for this failure > > mode, and check them all. On Unix, ENOENT and arguably ENOTDIR. I > > don't know about others. > > > > 3. If we must do the temporary file thing, create a temporary > > _directory_; we control the contents of that directory, so we can > > be sure that the file name we choose does not exist. Cleanup is > > messier than the other two possibilities. > > I like to agree with this, but I don't recall exactly why we ended up > in this situation in the first place. It's possible that it's an > unnecessary sacrifice of a dead chicken, but it's also possible that > there are platforms where this addressed a real need. I'd like to > think that it was because I didn't want to add more cruft to > posixmodule.c (I've long given up on that :-). > > Can you post a patch to SF? Then we can ask for volunteers to test it > on various platforms. I will write such a patch, however, I keep getting lost in the Python source tree. In addition to Modules/posixmodule.c, I would need to update the nt, dos, os2, mac, ce, and riscos modules also, yes? Where are their sources kept? I don't see an ntmodule.c, etc anywhere. zw From greg@cosc.canterbury.ac.nz Thu Aug 1 23:43:41 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 02 Aug 2002 10:43:41 +1200 (NZST) Subject: [Python-Dev] pre-PEP: The Safe Buffer Interface In-Reply-To: <002701c23935$e5831c20$e000a8c0@thomasnotebook> Message-ID: <200208012243.g71MhfIk020726@kuku.cosc.canterbury.ac.nz> > Backward compatibility. > If we change the array object to enter a locked state > when getreadbuffer() is called, it would be surprising. Yes, I understand now. I hadn't realised that list include both the old and new routines. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From barry@python.org Thu Aug 1 23:51:16 2002 From: barry@python.org (Barry A. Warsaw) Date: Thu, 1 Aug 2002 18:51:16 -0400 Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs References: Message-ID: <15689.47972.953407.418801@anthem.wooz.org> >>>>> "DG" == David Goodger writes: DG> I hereby formally request permission to deploy Docutils for DG> PEPs on Python.org. Here's a deployment plan for your DG> consideration: I'm sympathetic to your aims, but I have reservations. As lightweight as reST is, it's still too heavy for me. Ka-Ping described some of my feelings quite well so I won't repeat what he said. I like that PEPs are 70-odd column plain text, with just a few style guidelines to aid in the html generation tool, and to promote consistency. I think of PEPs as our RFCs and I'm dinosaurically attached to the RFC format, which has served standards bodies well for so long. I like that the plain text sources are readable and consistent, with virtually no rules that are hard to remember. More importantly for me, I find it easy to do editing passes on submitted PEPs in order to ensure consistency. The noisy markup in reST bothers me, although you've done a good job in minimizing the impact compared to other markup languages. Magical double colons, trailing underscores, etc. are jarring to me. I wonder how tools like ispell will handle some of it (I haven't tried it on your reST source versions). I made this suggestion privately to David, but I'll repeat it here. I'd be willing to accept that PEPs /may/ be written in reST as an alternative to plaintext, but not require it. I'd like for PEP authors to explicitly choose one or the other, preferrably by file extension (e.g. .txt for plain text .rst or .rest for reST). I'd also like for there to be two tools for generation derivative forms from the original source. I would leave pep2html.py alone. That's the tool that generates .html from .txt. I'd write a different tool that took a .rst file and generated both a .html file and a .txt file. The generated .txt file would have no markup and would conform to .txt PEP style as closely as possible. reST generated html would then have a link both to the original reST source, and to the plain text form. A little competition never hurt anyone. :) So I'd open it up and let PEP authors decide, and we can do a side-by-side comparison of which format folks prefer to use. -Barry From mhammond@skippinet.com.au Thu Aug 1 23:42:11 2002 From: mhammond@skippinet.com.au (Mark Hammond) Date: Fri, 2 Aug 2002 08:42:11 +1000 Subject: [Python-Dev] Weird error handling in os._execvpe In-Reply-To: <20020801220641.GA17902@codesourcery.com> Message-ID: > update the nt, dos, os2, mac, ce, and riscos modules also, yes? Where > are their sources kept? I don't see an ntmodule.c, etc anywhere. The nt module is built from the posixmodule.c sources. It is not called posix to prevent flame wars ;) Mark. From greg@cosc.canterbury.ac.nz Fri Aug 2 00:19:13 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 02 Aug 2002 11:19:13 +1200 (NZST) Subject: [Python-Dev] PEP 298, final (?) version In-Reply-To: Message-ID: <200208012319.g71NJDEH021069@kuku.cosc.canterbury.ac.nz> > > Failure to call > > this function (if it is != NULL) is a programming error > Not sure I like this. I would prefer to put the burden of "you must provide > a (possibly empty) release function" on the few buffer interface > implementers than the many (ie, potentially any extension author) buffer > interface consumers. The test for whether the release routine is NULL or not (if one is needed at all) surely belongs inside PyObject_ReleaseFixedBuffer. Clients should be required to always call this routine. I say "if one is needed at all" because PyType_Ready could fill it in with a default one if required. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@python.org Fri Aug 2 00:36:50 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 19:36:50 -0400 Subject: [Python-Dev] Weird error handling in os._execvpe In-Reply-To: Your message of "Thu, 01 Aug 2002 15:06:41 PDT." <20020801220641.GA17902@codesourcery.com> References: <20020801161946.GA32076@codesourcery.com> <200208011723.g71HN4025731@odiug.zope.com> <20020801182340.GB27575@codesourcery.com> <200208011927.g71JRhm10269@odiug.zope.com> <20020801220641.GA17902@codesourcery.com> Message-ID: <200208012336.g71NaoK28203@pcp02138704pcs.reston01.va.comcast.net> > > > 1. Make os.execvp[e] just call the C library's execvp[e]; it has to > > > get this stuff right anyway. We are already counting on it for > > > execv - I would be surprised to find a system that had execv and > > > not execvp, as long as PATH was a meaningful concept (it isn't, for > > > instance, on classic MacOS). > > > > Probably agreed for execvpe(). All the non-env versions must call the > > env version because not all platforms have putenv, and there changes > > to os.environ don't get reflected in the process's environment. > > execvp could be just > > def execvp(file, args): > return execvpe(file, args, environ) > > yes? It already is, sort of: def execvp(file, args): """execp(file, args) Execute the executable file (which is searched for along $PATH) with argument list args, replacing the current process. args may be a list or tuple of strings. """ _execvpe(file, args) > > > 2. Enumerate all the platform-specific errno values for this failure > > > mode, and check them all. On Unix, ENOENT and arguably ENOTDIR. I > > > don't know about others. > > > > > > 3. If we must do the temporary file thing, create a temporary > > > _directory_; we control the contents of that directory, so we can > > > be sure that the file name we choose does not exist. Cleanup is > > > messier than the other two possibilities. > > > > I like to agree with this, but I don't recall exactly why we ended up > > in this situation in the first place. It's possible that it's an > > unnecessary sacrifice of a dead chicken, but it's also possible that > > there are platforms where this addressed a real need. I'd like to > > think that it was because I didn't want to add more cruft to > > posixmodule.c (I've long given up on that :-). > > > > Can you post a patch to SF? Then we can ask for volunteers to test it > > on various platforms. > > I will write such a patch, however, I keep getting lost in the Python > source tree. In addition to Modules/posixmodule.c, I would need to > update the nt, dos, os2, mac, ce, and riscos modules also, yes? Where > are their sources kept? I don't see an ntmodule.c, etc anywhere. The nt module is built from the posixmodule.c source file. AFAIK the others don't support the exec* family at all, so don't worry about them; if something is needed the respective maintainers will have to provide it. --Guido van Rossum (home page: http://www.python.org/~guido/) From python-dev@zesty.ca Fri Aug 2 00:58:40 2002 From: python-dev@zesty.ca (Ka-Ping Yee) Date: Thu, 1 Aug 2002 16:58:40 -0700 (PDT) Subject: [Python-Dev] Re: Docutils/reStructuredText is ready to process PEPs In-Reply-To: Message-ID: On Thu, 1 Aug 2002, David Goodger wrote: > Ka-Ping Yee wrote: > > It took a long time. Perhaps it seems not so big to others, but > > my personal opinion would be to recommend against this proposal > > until the specification fits in, say, 1000 lines and can be absorbed > > in ten minutes. > > The specification is, as its title says, a *specification*. It's a detailed > description of the markup, intended to guide the *developer* who is writing > a parser or other tool. It's not user documentation. Okay, i understand that it's a spec and not a user manual. I think the fact that it takes that much text to describe all of the rules does say something about its complexity, though. Other people may have different thresholds; it exceeds my threshold. But again i want to stress that i think the structured-text approach is good and i do not advocate abandoning the whole idea; i just want a simpler set of rules. > > For me, it violates the fits-in-my-brain principle: > > the spec is 2500 lines long, and supports six different kinds of > > references and five different kinds of lists (even lists with roman > > numerals!). It also violates the one-way-to-do-it principle: > > for example, there are a huge variety of ways to do headings, > > and two different syntaxes for drawing a table. > > How many times have we heard this? "All we need are paragraphs and bullet > lists." That line of argument has been going on for at least six years, and > has hampered progress all along. Well, that depends what you mean by "progress"! :) There might be something to that line of argument, if it has a habit of cropping up. One can separate two issues here: 1. too much functionality (YAGNI) 2. too many ways of expressing the same functionality (TMTOWTDI) As for the first, there's some room to argue here. I happen to feel there are quite a few YAGNI features in RST, like the Roman numerals and the RCS keyword processing. Auto-numbering in particular takes RST in a direction that makes me uncomfortable -- it means that RST now has the potential for a compile-debug cycle. But as for the second, i just don't see any justification for it. Reducing the multiple ways to do headers and lists and tables doesn't cripple anything; it only makes RST simpler and easier to understand. I acknowledge that there is some question of opinion as to what is the "same" functionality, causing issues to slush over from #1 to #2. To me, using "1.", "(1)", or "1)" to number a list makes no semantic difference at all, and so it counts as redundancy. If you already have definition lists, why also have option lists and field lists? If you already have literals, why have interpreted text? If you already have both footnotes *and* inline URLs, why also have anonymous inline hyperlink references? > OTOH, I have no problem with mandating standard uses, like a standard set of > section title adornments. If you're going to recommend certain ways, why not just decide what to use and be done with it? When designing a new standard, there's no point starting out with parts of it already deprecated. -- ?!ng From guido@python.org Fri Aug 2 01:17:44 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 01 Aug 2002 20:17:44 -0400 Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs In-Reply-To: Your message of "Thu, 01 Aug 2002 18:51:16 EDT." <15689.47972.953407.418801@anthem.wooz.org> References: <15689.47972.953407.418801@anthem.wooz.org> Message-ID: <200208020017.g720Hi811597@odiug.zope.com> > I made this suggestion privately to David, but I'll repeat it here. > I'd be willing to accept that PEPs /may/ be written in reST as an > alternative to plaintext, but not require it. I'd like for PEP > authors to explicitly choose one or the other, preferrably by file > extension (e.g. .txt for plain text .rst or .rest for reST). I'd also > like for there to be two tools for generation derivative forms from > the original source. AFAICT that's all that David asked for. It's the only thing that makes sense; nobody's going to convert over 200 existing PEPs to reST. > I would leave pep2html.py alone. That's the tool that generates .html > from .txt. I'd write a different tool that took a .rst file and > generated both a .html file and a .txt file. The generated .txt file > would have no markup and would conform to .txt PEP style as closely as > possible. reST generated html would then have a link both to the > original reST source, and to the plain text form. I don't see why reST needs to produce .txt output. The reST source is readable enough. > A little competition never hurt anyone. :) So I'd open it up and let > PEP authors decide, and we can do a side-by-side comparison of which > format folks prefer to use. Exactly. Let's do it. --Guido van Rossum (home page: http://www.python.org/~guido/) From tdelaney@avaya.com Fri Aug 2 01:28:14 2002 From: tdelaney@avaya.com (Delaney, Timothy) Date: Fri, 2 Aug 2002 10:28:14 +1000 Subject: [Python-Dev] Re: Docutils/reStructuredText is ready to proces s PEPs Message-ID: > From: Ka-Ping Yee [mailto:python-dev@zesty.ca] > > One can separate two issues here: > > 1. too much functionality (YAGNI) > 2. too many ways of expressing the same functionality (TMTOWTDI) >From my reading of the reST docs at various times, I've come to the conclusion that YAGNI doesn't apply - that each of the features exists because someone *did* need it (i.e. they had a real use case for it). I do feel that the explanations of some of the constructs are somewhat confusing. I think all constructs should include at least one (and preferably more) use cases in their explanations. Personally, I'm in favour of having the complete reST specification, but have well-defined conventions for usage of reST within specific applications. So a "docstring convention" document would specify what the structure of a docstring should (or must) include, how it is parsed, what interpreted text means, etc. Fairly comprehensive examples should be included. Unless you had very specific need you shouldn't go outside of the convention, but it should be available if you needed it for something which couldn't be expressed otherwise. If all you wanted to do was write docstrings, you would refer to the docstring convention document. If you wanted to write a PEP, you would refer to the PEP convention document. Because they use the same underlying syntax, knowning how to do one will help with learning how to do the other. One may normally use more (or different) constructs than the other, but there will be a lot of crossover. Tim Delaney From greg@cosc.canterbury.ac.nz Fri Aug 2 01:36:52 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 02 Aug 2002 12:36:52 +1200 (NZST) Subject: [Python-Dev] PEP 298, final (?) version In-Reply-To: <20020801143840.51753.qmail@web40105.mail.yahoo.com> Message-ID: <200208020036.g720aq905371@oma.cosc.canterbury.ac.nz> > > void PyObject_ReleaseFixedBuffer(PyObject *obj); > > > > Would it be useful to allow bf_releasefixedbuffer to return an int > indicating an exception? For instance, it could raise an exception if the > extension errantly releases more times than it has acquired The code making the call might not be in an easy position to deal with an exception -- e.g. an asynchronous I/O routine called from a signal handler, another thread, etc. Maybe use the warning mechanism to produce a message? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Fri Aug 2 01:38:52 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 02 Aug 2002 12:38:52 +1200 (NZST) Subject: [Python-Dev] Re: Docutils/reStructuredText is ready to process PEPs In-Reply-To: Message-ID: <200208020038.g720cqV05378@oma.cosc.canterbury.ac.nz> Ka-Ping Yee : > To me, using "1.", "(1)", or "1)" to number a list makes no semantic > difference at all, and so it counts as redundancy. I imagine the variations are there so that RST documents can be easily read in their own right. Having different styles of headings, list numbers, etc. for different levels aids readability. Each of these features is no doubt useful for one appplication or another, but we're talking about a fairly restricted application here. Docstrings are usually pretty short and not likely to require multiple levels of headings, lists, etc. So I'm in favour of choosing a subset to recommend, perhaps mandate. Maybe a slightly larger subset could be used for PEPs, since they're somewhat bigger. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From aahz@pythoncraft.com Fri Aug 2 03:07:19 2002 From: aahz@pythoncraft.com (Aahz) Date: Thu, 1 Aug 2002 22:07:19 -0400 Subject: [Python-Dev] PEP 298, final (?) version In-Reply-To: <00b101c2393a$e4a01ce0$e000a8c0@thomasnotebook> References: <00b101c2393a$e4a01ce0$e000a8c0@thomasnotebook> Message-ID: <20020802020719.GA10231@panix.com> I finally read all these threads today, cleaning out much of my OSCON backlog. Now, maybe I'm stupid, but I'm not understanding the relationship between the new buffer protocol (PEP 298) and the new bytes object (PEP 296). Should this be something documented in one or both PEPs? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From goodger@users.sourceforge.net Fri Aug 2 03:21:07 2002 From: goodger@users.sourceforge.net (David Goodger) Date: Thu, 01 Aug 2002 22:21:07 -0400 Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs In-Reply-To: <15689.47972.953407.418801@anthem.wooz.org> Message-ID: Barry A. Warsaw wrote: > I like that PEPs are 70-odd column plain text, with just a few style > guidelines to aid in the html generation tool, and to promote > consistency. I think of PEPs as our RFCs and I'm dinosaurically > attached to the RFC format, which has served standards bodies well for > so long. I like that the plain text sources are readable and > consistent, with virtually no rules that are hard to remember. More > importantly for me, I find it easy to do editing passes on submitted > PEPs in order to ensure consistency. Why are PEPs converted to HTML at all then? (Semi-seriously :-) RFCs pre-date the Web, HTML, GUIs, and PCs. There is a great advantage in sticking to a text-based format, but the existing structure is very limited. RFCs are so 20th century; don't you think it's time to move on? ;-) Dinosaurs have a tendency to become extinct you know. Given a small amount of use, I think you'll find the rules easy to remember. There should be little effect on editing. At most, Emacs may need to be taught to recognize a bit more punctuation. > The noisy markup in reST bothers me, although you've done a good job > in minimizing the impact compared to other markup languages. It's a trade-off: functionality for markup intrusion. It's the functionality of the processed form that's important: inline live links; live links to & from footnotes; automatic tables of contents (with live links!); images (don't you just *cringe* when you see ASCII graphics?); pleasant, readable text. The markup is minimal, quickly and easily ignored. > I made this suggestion privately to David, but I'll repeat it here. > I'd be willing to accept that PEPs /may/ be written in reST as an > alternative to plaintext, but not require it. Sure. I thought I'd emphasized that in my original post: it'd be an alternative, the two styles can coexist. If you want to keep PEP 0 as it is, that's fine. I converted it to show that its special processing was also supported. > I'd like for PEP authors to explicitly choose one or the other, preferrably by > file extension (e.g. .txt for plain text .rst or .rest for reST). I'm not keen on a new file extension (this issue has come up before). There's so much in place on many platforms that says .txt means text files, and reStructuredText files *are* text files, with just a bit of formal structure sprinkled over. Browsers know what to do with .txt files; they wouldn't know what to do with .rest or .rtxt files. Near-universal file naming conventions are not the place to innovate IMHO. > I'd also like for there to be two tools for generation derivative forms from > the original source. > > I would leave pep2html.py alone. That's the tool that generates .html > from .txt. See http://docutils.sf.net/tools/pep2html.py (based on revision 1.37 of Python's nondist/peps/pep2html.py). Other than abstracting the file I/O and some minor changes for consistency & legibility, the reStructuredText-specific part is just two functions. One checks for the format of the PEP, and the other calls Docutils to do the work. Even without a new file extension, there's no need for a separate tool. > I'd write a different tool that took a .rst file and > generated both a .html file and a .txt file. The generated .txt file > would have no markup and would conform to .txt PEP style as closely as > possible. reST generated html would then have a link both to the > original reST source, and to the plain text form. Do we need a slightly less-structured text output? I don't think so, but I offered two alternative strategies in PEP 287: a) Keep the existing PEP section structure constructs (one-line section headers, indented body text). Subsections can either be forbidden, or supported with reStructuredText-style underlined headers in the indented body text. b) Replace the PEP section structure constructs with the reStructuredText syntax. Section headers will require underlines, subsections will be supported out of the box, and body text need not be indented (except for block quotes). Strategy (b) has been implemented; that's what the edited PEP 287 uses. I'd recommend against it, but if you insist on existing PEP structure, strategy (a) fits better although inconsistently (depending on the decision on subsections). > A little competition never hurt anyone. :) So I'd open it up and let > PEP authors decide, and we can do a side-by-side comparison of which > format folks prefer to use. Sure. Once authors see what the new markup gives them, I'm sure there will be some converts. -- David Goodger Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ From goodger@users.sourceforge.net Fri Aug 2 03:28:42 2002 From: goodger@users.sourceforge.net (David Goodger) Date: Thu, 01 Aug 2002 22:28:42 -0400 Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs In-Reply-To: Message-ID: [Ping] >>> It took a long time. Perhaps it seems not so big to others, but >>> my personal opinion would be to recommend against this proposal >>> until the specification fits in, say, 1000 lines and can be absorbed >>> in ten minutes. [David] >> The specification is, as its title says, a *specification*. It's a detailed >> description of the markup, intended to guide the *developer* who is writing a >> parser or other tool. It's not user documentation. [Ping] > Okay, i understand that it's a spec and not a user manual. I think > the fact that it takes that much text to describe all of the rules > does say something about its complexity, though. I prefer the term "rich" over "complex". ;-) Seriously, any significant technology requires a significant spec. Did you look at the primer and quick reference? - Primer: http://docutils.sf.net/docs/rst/quickstart.html - Quick reference: http://docutils.sf.net/docs/rst/quickref.html I wouldn't recommend the Language or Library Reference to a Python newbie either; they're references! I'd point them to the Tutorial. The primer above is reStructuredText's tutorial: short & sweet. > But again i want to stress that i think the structured-text approach > is good and i do not advocate abandoning the whole idea; i just want > a simpler set of rules. Speaking from experience having hashed out all these issues over the last two years, "a simpler set of rules" won't work. Sure, a few conveniences could be trimmed from reStructuredText, and all we'd lose would be convenience. Go past that and the markup would become less useful. Cut everything you listed and the markup would be next to useless. >>> For me, it violates the fits-in-my-brain principle: >>> the spec is 2500 lines long, and supports six different kinds of >>> references and five different kinds of lists (even lists with roman >>> numerals!). It also violates the one-way-to-do-it principle: >>> for example, there are a huge variety of ways to do headings, >>> and two different syntaxes for drawing a table. >> >> How many times have we heard this? "All we need are paragraphs and bullet >> lists." That line of argument has been going on for at least six years, and >> has hampered progress all along. > > Well, that depends what you mean by "progress"! :) There might be > something to that line of argument, if it has a habit of cropping up. "Progress" in auto-documentation tools for Python. "Progress" in a usable, successful structured plaintext markup. OTOH, there's been just as much pressure from the other direction: "The markup needs a construct for XYZ." reStructuredText is the result of working toward a practical, usable, and readable balance. > One can separate two issues here: > > 1. too much functionality (YAGNI) > 2. too many ways of expressing the same functionality (TMTOWTDI) > > As for the first, there's some room to argue here. I happen to feel > there are quite a few YAGNI features in RST, like the Roman numerals > and the RCS keyword processing. What's the big deal about Roman numerals? Human beings have many ways to count; our markup should allow us the freedom to choose the style we like. Ask a lawyer if Roman numerals for lists are expendible. If *you* don't like them, don't use them. RCS keyword processing is *not* a syntax feature; it's for readability, so readers don't have that $RCS: cruft$ shoved in their faces. > Auto-numbering in particular takes RST in a direction that makes me > uncomfortable -- it means that RST now has the potential for a > compile-debug cycle. That's true with *any* markup processing system. It's the price of the increased functionality and readability of the processed result. A small price, IMHO. The reStructuredText parser is very helpful with diagnostics, and can only improve with user feedback. I've volunteered to do the processing, so there should be no impact on anyone. > But as for the second, i just don't see any justification for it. > Reducing the multiple ways to do headers and lists and tables doesn't > cripple anything; it only makes RST simpler and easier to understand. Headers: by "multiple ways", are you referring to the author's choice of underline style? Or to the choice for overline & underline versus underline-only? Perhaps, when reading the spec, it's overwhelming; so don't start with the spec! But I don't see the big deal in having variety. The true test is this: when you look at a reStructuredText title, in whatever style, does it scream out at you, "I am a title!"? Without knowing anything about the markup, most people would answer "yes, it does". The same is true for lists and tables too. I base this on reports from people who are using Docutils/reStructuredText in the real world, introducing it to non-technical users, and reporting nothing but positive experiences. Lists: see below. Tables: the "simple table" syntax was added recently, because although it's limited, it's much simpler to type and edit than the original "grid tables". But grid tables don't have the limitations, so it's practical to keep both constructs around. > I acknowledge that there is some question of opinion as to what is the > "same" functionality, causing issues to slush over from #1 to #2. > > To me, using "1.", "(1)", or "1)" to number a list makes no semantic > difference at all, and so it counts as redundancy. The variety of list styles is based on real-world usage. See "The Chicago Manual of Style", 14th edition, section 8.79 (page 315): every variation of list enumeration is right there in a single nested list. Any reasonable person looking at any of those list styles will understand what they mean. Different strokes for different folks. Variety is the spice of life, and a necessity for otherwise dry documentation. > If you already have definition lists, why also have option lists and field > lists? They're semantically different. Sure you could implement option & field lists with definition lists, just as you could implement definition lists with tables. Option lists are explicitly for command-line option descriptions. Field lists are for name-value pairs where the details matter, like database records or attributes of extension constructs (directives). > If you already have literals, why have interpreted text? They're very different things. Literals are for monospaced, *uninterpreted*, unprocessed, computer I/O text. From PEP 287: Text enclosed in single backquotes is recognized as "interpreted text", whose interpretation is application-dependent. In the context of a Python docstring, the default interpretation of interpreted text is as Python identifiers. The text will be marked up with a hyperlink connected to the documentation for the identifier given. In PEPs, there is no use for interpreted text currently (so they wouldn't be mentioned in the new-style-PEP guide, except perhaps in a footnote saying so). In the future auto-documentation tool, interpreted text will do explicitly what pydoc does auto-magically: link Python identifiers to their definitions elsewhere. But because it's explicit, interpreted text will not be accidentally misinterpreted (as can happen in pydoc). > If you already have both footnotes *and* inline URLs, why also have anonymous > inline hyperlink references? Because inline live links are useful, but nobody wants to trip over a three-line URL in the middle of a sentence. >> OTOH, I have no problem with mandating standard uses, like a standard set of >> section title adornments. > > If you're going to recommend certain ways, why not just decide > what to use and be done with it? When designing a new standard, > there's no point starting out with parts of it already deprecated. PEPs are just one application of Docutils/reStructuredText. I see no conflict here. Groups often use a technology in conjunction with a conventions guide limiting the local use of that technology, for the sake of consistency or simplicity. We have such guides for Python's C code and stdlib code. (Does the Python LaTeX documentation mandate a subset of LaTeX? I know it specifies *additional* macros to use.) In the case of PEPs, I think a guide recommending certain practices would be appropriate, rather than mandating that certain constructs *not* be used. Constructs not used in PEPs are useful in other applications. Nothing would be deprecated, just "not used in PEPs". -- David Goodger Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ From goodger@users.sourceforge.net Fri Aug 2 03:28:57 2002 From: goodger@users.sourceforge.net (David Goodger) Date: Thu, 01 Aug 2002 22:28:57 -0400 Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs In-Reply-To: <000001c23925$37357a60$777ba8c0@ericlaptop> Message-ID: Eric and Timothy, thank you for putting quite clearly what I am sometimes unable to express myself. -- David From aahz@pythoncraft.com Fri Aug 2 04:04:41 2002 From: aahz@pythoncraft.com (Aahz) Date: Thu, 1 Aug 2002 23:04:41 -0400 Subject: [Python-Dev] Sorting In-Reply-To: References: Message-ID: <20020802030441.GA831@panix.com> On Mon, Jul 22, 2002, Tim Peters wrote: > > In an effort to save time on email (ya, right ...), I wrote up a pretty > detailed overview of the "timsort" algorithm. It's attached. It seems pretty clear by now that the new mergesort is going to replace samplesort, but since nobody else has said this, I figured I'd add one more comment: I actually understood your description of the new mergesort algorithm. Unless you can come up with similar docs for samplesort, the Beer Truck scenario dictates that mergesort be the new gold standard. Or to quote Tim Peters, "Complex is better than complicated." -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From zack@codesourcery.com Fri Aug 2 06:24:33 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Thu, 1 Aug 2002 22:24:33 -0700 Subject: [Python-Dev] Weird error handling in os._execvpe In-Reply-To: <200208011927.g71JRhm10269@odiug.zope.com> References: <20020801161946.GA32076@codesourcery.com> <200208011723.g71HN4025731@odiug.zope.com> <20020801182340.GB27575@codesourcery.com> <200208011927.g71JRhm10269@odiug.zope.com> Message-ID: <20020802052433.GA466@codesourcery.com> On Thu, Aug 01, 2002 at 03:27:43PM -0400, Guido van Rossum wrote: > ... I don't recall exactly why we ended up in this situation in the > first place. It's possible that it's an unnecessary sacrifice of a > dead chicken, but it's also possible that there are platforms where > this addressed a real need. I'd like to think that it was because I > didn't want to add more cruft to posixmodule.c (I've long given up > on that :-). I found out why it's done the way it is: There is no execvpe() in C, not even in the extended-to-hell-and-back GNU libc. I considered dinking around with the C-level environ pointer so that execvp() would do what we want, but this seems unreliable at best, given how many different ways to access the environment there are. So I think we're back to option 2 (enumerate the possible errors for each platform). ENOENT and ENOTDIR should cover it for Unix. Would other platform maintainers care to comment, please? zw From xscottg@yahoo.com Fri Aug 2 06:31:35 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Thu, 1 Aug 2002 22:31:35 -0700 (PDT) Subject: [Python-Dev] PEP 298, final (?) version In-Reply-To: <20020802020719.GA10231@panix.com> Message-ID: <20020802053135.432.qmail@web40111.mail.yahoo.com> --- Aahz wrote: > > I finally read all these threads today, cleaning out much of my > OSCON backlog. Now, maybe I'm stupid, but I'm not understanding the > relationship between the new buffer protocol (PEP 298) and the new bytes > object (PEP 296). Should this be something documented in one or both > PEPs? > In the course of examining PEP 296 (the one I'm working on), Thomas Heller thought it would be a good idea to make some additions to PyBufferProcs and abstract.h so that he, and others, could treat a wider class of objects with one API. I was only proposing the bytes object, where as he wanted to be able to write code that works with bytes, string, mmap, array, and any other buffer-like object uniformly (since they all make promises about the lifetime of the pointer). I liked his idea but was concerned that making additional changes to the Python baseline might get received poorly. In other words, I'm an overconservative worrywort, and wanted to make sure I didn't sink PEP 296 with features of PEP 298. As such, I encouraged him to submit a separate PEP so that if the protocol part got sunk, the bytes object part could remain. He was probably sick of arguing with me at that point, so PEP 298 got created. Guido apparently likes both PEPs, so it looks like both will get in if our implementations are timely and don't suck. If I could have channeled Guido a week ago, there might be only one PEP. However, with the way this played out, it has the benefit (to me at least) that now Thomas Heller is on the hook for part of the implementation. :-) As for documenting this, my next draft of PEP 296 (later tonight) will refer to PEP 298 to indicate that the bytes objects will support the "fixed/locked buffer protocol". Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From xscottg@yahoo.com Fri Aug 2 06:54:12 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Thu, 1 Aug 2002 22:54:12 -0700 (PDT) Subject: [Python-Dev] PEP 298, __buffer__ Message-ID: <20020802055412.12716.qmail@web40106.mail.yahoo.com> Tonight, I remember another thought that I've had for a while. There isn't currently a way for a class object created from Python script to indicate that it wishes to implement the buffer interface. In the Numeric source, I've seen them use self.__buffer__ for this purpose, but this isn't actually an officially sanctioned magic name. Now that classes can derive from builtin types, perhaps there is less of a need for this, but I still think we would want it. There are times when you want inheritance, and others when you want containment. With a slight modification to the PyObject_*Buffer functions (in the failure branches), an instance of a class could use containment of a PyBufferProcs supporting object and publish the buffer interface as its own. I'm thinking one of: class OneWay(object): def __init__(self): self.__buffer__ = bytes(1000) Or: class SomeOther(object): def __init__(self): self._private = bytes(1000) def __buffer__(self): return self._private I believe the first one is the way it's done in Numeric (Numarray too?). (Maybe Todd Miller will comment on this and whether it's useful to him.) If this is worthwhile, it could be added to PEP 298 or as a new mini PEP. In either case, I'm willing to do the work. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From zack@codesourcery.com Fri Aug 2 07:45:39 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Thu, 1 Aug 2002 23:45:39 -0700 Subject: [Python-Dev] tempfile.py rewrite, take two Message-ID: <20020802064539.GB466@codesourcery.com> Now at Sourceforge: http://sourceforge.net/tracker/index.php?func=detail&aid=589982&group_id=5470&atid=305470 zw From ville.vainio@swisslog.com Fri Aug 2 08:10:07 2002 From: ville.vainio@swisslog.com (Ville Vainio) Date: Fri, 02 Aug 2002 10:10:07 +0300 Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs References: <20020802053301.18494.17285.Mailman@mail.python.org> Message-ID: <3D4A304F.1040307@swisslog.com> David G wrote: >PEPs are just one application of Docutils/reStructuredText. I see no > Exactly. I can't understand the motivation in crippling a useful markup in order to serve some niche application (however important). As it stands, restx appears to be general enough to be accepted as a kind of "standard" markup, to be used for authoring all documentation, python related or not - perhaps even motivating people to write an emacs mode for it. -- Ville From mwh@python.net Fri Aug 2 09:27:17 2002 From: mwh@python.net (Michael Hudson) Date: 02 Aug 2002 09:27:17 +0100 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: Guido van Rossum's message of "Thu, 01 Aug 2002 14:19:11 -0400" References: <200208011819.g71IJBR25893@odiug.zope.com> Message-ID: <2mlm7pfyxm.fsf@starship.python.net> Guido van Rossum writes: > > Here goes. Everything is relative to 221-base, which is 2.2.1 from Sean's > > RPM. This is the slowest, so all percentages are negative, and more > > negative is better. I hope the names are obvious. > > > > 221-base +0.00% (obviously) > > 221-O-base: -9.69% > > CVS-base: -15.43% > > CVS-O-base: -23.56% > > CVS-hacked: -23.66% > > CVS-O-hacked: -23.70% > > > > (Nearly 25% speed up since 221? Boggle. Some of this may be compilation > > options, I guess) > > No, pymalloc sped us up quite a bit. Yes, this occurred to me after I posted. pystone is a mystery. It's a fair bit slower but also much more variable with my patch. Moving trace code out of line helps quite a bit but it's still ~1% slower. > > Anyway, it seems I haven't slowed -O down. At some point I might try > > moving the trace code out of line and see if that has any effect. Not > > today. Did do this yesterday, in fact. As I said, it helped pystone a bit, so I'll upload a separate patch to sf. [...] > What's the next step? I haven't had time to review your code. Do you > want to check it in without further review, or do you want to wait > until someone can give it a serious look? (Tim's on vacation this > week so it might be a while.) I think I'd like to wait for serious review. I'd be surprised if the patch went out of date at all quickly. Also, it seems Lib/compiler currently works by generating SET_LINENO and then builds co_lnotab by scanning for them afterwards. That's not going to work in the new world, so I should probably think about how to change it... Cheers, M. -- Finding a needle in a haystack is a lot easier if you burn down the haystack and scan the ashes with a metal detector. -- the Silicon Valley Tarot (another one nicked from David Rush) From tim.one@comcast.net Fri Aug 2 09:45:06 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 02 Aug 2002 04:45:06 -0400 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: Message-ID: [Michael Hudson] > ... > Here goes. Everything is relative to 221-base, which is 2.2.1 > from Sean's RPM. This is the slowest, so all percentages are negative, > and more negative is better. I hope the names are obvious. > > 221-base +0.00% (obviously) > 221-O-base: -9.69% > CVS-base: -15.43% > CVS-O-base: -23.56% > CVS-hacked: -23.66% > CVS-O-hacked: -23.70% Great!! > (Nearly 25% speed up since 221? Boggle. Some of this may be compilation > options, I guess) No, it's the new sort implementation -- it's truly magical . I've been telling people at Zope Corp that getting rid of SET_LINENO would speed pystone (which is said to be a good predictor of Zope performance) by at least 7%. If you can fudge up a test showing that, your performance work will be complete. [Guido] > ... > What's the next step? I haven't had time to review your code. Do you > want to check it in without further review, or do you want to wait > until someone can give it a serious look? (Tim's on vacation this > week so it might be a while.) I'm really not the best person for this, since, e.g., I never use the debugger, so couldn't personally care less if it stopped working <0.9 wink>. The patch set looks very complete, so I'd encourage a checkin if nobody objects. I have one objection, but it's kind of vague: Michael, you're taking too much delight in how obscure this is! Two examples: + int instr_ub = -1, instr_lb = 0; /* for tracing */ It takes a lot of effort to reverse-engineer that the line number has changed if and only if not instr_lb <= current_bytecode_offset < instr_ub -- or at least to reverse-engineer that this is what you believe . Paste the above in as a comment and save the next person the pain. I got hung up the first 5 minutes guessing that "lb" and "ub" referred to "lower byte" and "upper byte". The other example: + /* I (mwh) will gladly buy anyone a beer who + can tell me off the top of their head why + the exception for POP_TOP is needed... */ That's not going to be amusing two years from now when your unstated reasoning is no longer true, and this code breaks. Then someone will have to guess what you thought you meant by this comment, whether your reasoning was correct at the time, and what may have changed to invalidate it. Rather than tease, just explain why POP_TOP must be an exception. If you don't know why, I'll buy *you* a beer . From mwh@python.net Fri Aug 2 10:29:57 2002 From: mwh@python.net (Michael Hudson) Date: 02 Aug 2002 10:29:57 +0100 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: Tim Peters's message of "Fri, 02 Aug 2002 04:45:06 -0400" References: Message-ID: <2mptx1liay.fsf@starship.python.net> Tim Peters writes: > I've been telling people at Zope Corp that getting rid of SET_LINENO would > speed pystone (which is said to be a good predictor of Zope performance) by > at least 7%. If you can fudge up a test showing that, your performance work > will be complete. It's about 5%: $ ../../build/python pystone.py Pystone(1.1) time for 100000 passes = 8.11 This machine benchmarks at 12330.5 pystones/second $ ../../build/python pystone.py Pystone(1.1) time for 100000 passes = 7.69 This machine benchmarks at 13003.9 pystones/second I can run the vanilla pystone whilst compiling or something if you like :) As I said, my patched Python is much more variable in pystone than before. I'm going to try invoking the Cache Effect Demon on this one, unless someone can come up with a real explanation. > [Guido] > > ... > > What's the next step? I haven't had time to review your code. Do you > > want to check it in without further review, or do you want to wait > > until someone can give it a serious look? (Tim's on vacation this > > week so it might be a while.) > > I'm really not the best person for this, since, e.g., I never use the > debugger, so couldn't personally care less if it stopped working <0.9 wink>. > > The patch set looks very complete, so I'd encourage a checkin if nobody > objects. > > I have one objection, but it's kind of vague: Michael, you're taking too > much delight in how obscure this is! It's the old boys club effect: I worked damn hard to get to the point of understanding this stuff, so everyone else should bloody well have to too! > Two examples: > > + int instr_ub = -1, instr_lb = 0; /* for tracing */ > > It takes a lot of effort to reverse-engineer that the line number has > changed if and only if > > not instr_lb <= current_bytecode_offset < instr_ub > > -- or at least to reverse-engineer that this is what you believe . > Paste the above in as a comment and save the next person the pain. I got > hung up the first 5 minutes guessing that "lb" and "ub" referred to "lower > byte" and "upper byte". Ah, OK. Actually, taking the tracing code out of line makes me feel less uneasy about adding hundred+ line comments explaining what's going on. > The other example: > > + /* I (mwh) will gladly buy anyone a beer who > + can tell me off the top of their head why > + the exception for POP_TOP is needed... */ > > That's not going to be amusing two years from now when your unstated > reasoning is no longer true, and this code breaks. Then someone will have > to guess what you thought you meant by this comment, whether your reasoning > was correct at the time, and what may have changed to invalidate it. Rather > than tease, just explain why POP_TOP must be an exception. If you don't > know why, I'll buy *you* a beer . All I can say is that I'd been driven insane by co_lnotab and Python/compile.c when I wrote that comment . Cheers, M. -- I'm okay with intellegent buildings, I'm okay with non-sentient buildings. I have serious reservations about stupid buildings. -- Dan Sheppard, ucam.chat (from Owen Dunn's summary of the year) From python-dev@zesty.ca Fri Aug 2 11:14:04 2002 From: python-dev@zesty.ca (Ka-Ping Yee) Date: Fri, 2 Aug 2002 03:14:04 -0700 (PDT) Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs In-Reply-To: <3D4A304F.1040307@swisslog.com> Message-ID: On Fri, 2 Aug 2002, Ville Vainio wrote: > Exactly. I can't understand the motivation in crippling a useful markup > in order to serve some niche application (however important). I just don't see how it is "crippling" to have one simple way to do something instead of lots of different ways to do the same thing. If anything, having more choices is worse, because it's not clear which differences are meaningful or meaningless, and you may have to think harder about which one to choose. -- ?!ng From jmiller@stsci.edu Fri Aug 2 11:41:05 2002 From: jmiller@stsci.edu (Todd Miller) Date: Fri, 02 Aug 2002 06:41:05 -0400 Subject: [Python-Dev] PEP 298, __buffer__ References: <20020802055412.12716.qmail@web40106.mail.yahoo.com> Message-ID: <3D4A61C1.8030800@stsci.edu> Scott Gilbert wrote: >Tonight, I remember another thought that I've had for a while. > >There isn't currently a way for a class object created from Python script >to indicate that it wishes to implement the buffer interface. In the >Numeric source, I've seen them use self.__buffer__ for this purpose, but >this isn't actually an officially sanctioned magic name. > > >I'm thinking one of: > > class OneWay(object): > def __init__(self): > self.__buffer__ = bytes(1000) > >Or: > > class SomeOther(object): > def __init__(self): > self._private = bytes(1000) > def __buffer__(self): > return self._private > >I believe the first one is the way it's done in Numeric (Numarray too?). > The numarray C-API essentially supports both usages, although we only use the __buffer__ name in the second case. > >(Maybe Todd Miller will comment on this and whether it's useful to him.) > Yes, it is useful for prototyping. Numarray calls a __buffer__() method to support python class wrappers around mmap. We use our class wrappers around mmap to add the ability to chop a file up into non-overlapping resizeable slices. Each slice can be used as the buffer of an independent memory mapped array. Todd From mwh@python.net Fri Aug 2 11:34:55 2002 From: mwh@python.net (Michael Hudson) Date: 02 Aug 2002 11:34:55 +0100 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: Guido van Rossum's message of "Thu, 01 Aug 2002 14:19:11 -0400" References: <200208011819.g71IJBR25893@odiug.zope.com> Message-ID: <2msn1xpn00.fsf@starship.python.net> Guido van Rossum writes: > What's the next step? I haven't had time to review your code. Do you > want to check it in without further review, or do you want to wait > until someone can give it a serious look? (Tim's on vacation this > week so it might be a while.) I've found another annoying problem. I'm not really expecting someone here to sovle it for me, but writing it down might help me think clearly. This is about the function epilogues that always get generated. I.e: >>> def f(): ... if a: ... print 1 ... >>> import dis >>> dis.dis(f) 2 0 LOAD_GLOBAL 0 (a) 3 JUMP_IF_FALSE 9 (to 15) 6 POP_TOP 3 7 LOAD_CONST 1 (1) 10 PRINT_ITEM 11 PRINT_NEWLINE 12 JUMP_FORWARD 1 (to 16) >> 15 POP_TOP >> 16 LOAD_CONST 0 (None) 19 RETURN_VALUE You can see here that the epilogue gets associated with line 3, whereas it shouldn't really be associated with any line at all. For why this is a problem: $ cat t.py a = 0 def f(): if a: print 1 >>> pdb.runcall(t.f) > /home/mwh/src/sf/python/dist/src/build/t.py(3)f() -> if a: (Pdb) s > /home/mwh/src/sf/python/dist/src/build/t.py(4)f() -> print 1 (Pdb) --Return-- > /home/mwh/src/sf/python/dist/src/build/t.py(4)f()->None -> print 1 (Pdb) The debugger stopping on the "print 1" is confusing. There's an "obvious" solution to this: check it we're less than 4 bytes from the end of the code string and don't do anything if we are. This would be easy, except that for some bonkers reason, we support arbitrary buffer objects for code strings! (see _PyCode_GETCODEPTR in Include/compile.h -- though at least you can't create a code object with an array code string from python, the getreadbuffer failing will cause the interpreter to unceremoniously crash and burn). I guess I can store the length somewhere -- _PyCode_GETCODEPTR returns this, more by accident than design I suspect -- or call bf_getsegcount(frame->f_code->co_code, &length) or something. Does anyone actually *use* this feature? I see Guido checked it in and the patch was written by Greg Stein. Anyone remember motivations from the time? Cheers, M. -- In general, I'd recommend injecting LSD directly into your temples, Syd-Barret-style, before mucking with Motif's resource framework. The former has far lower odds of leading directly to terminal insanity. -- Dan Martinez From python-dev@zesty.ca Fri Aug 2 11:42:42 2002 From: python-dev@zesty.ca (Ka-Ping Yee) Date: Fri, 2 Aug 2002 03:42:42 -0700 (PDT) Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs In-Reply-To: Message-ID: On Thu, 1 Aug 2002, David Goodger wrote: > Speaking from experience having hashed out all these issues over the last > two years, "a simpler set of rules" won't work. Sure, a few conveniences > could be trimmed from reStructuredText, and all we'd lose would be > convenience. Go past that and the markup would become less useful. Cut > everything you listed and the markup would be next to useless. If you took reST and removed the features i listed, you would have a markup system with paragraphs, multi-level headings, nestable bullet lists and numbered lists, definition lists, literal blocks, block quotes, tables, inline emphasis, inline literals, footnotes, inline hyperlinks, and internal and external hyperlink targets. Sounds pretty powerful to me. I find it strange that you would call this "next to useless". > > Auto-numbering in particular takes RST in a direction that makes me > > uncomfortable -- it means that RST now has the potential for a > > compile-debug cycle. > > That's true with *any* markup processing system. It's the price of the > increased functionality and readability of the processed result. I don't want to have to debug my text. If the markup is simple enough, determining the output requires very little context (a line or two); this means i can be sure what i'm going to get. Auto-numbering expands the context so any part of the entire document can affect the transformation of another part of the document. > A small price, IMHO. The difference between writing a document once and *knowing* that it's correct, and having to compile-test-debug a few times, is a big cost. > I've volunteered to do the processing, so there should be no impact on > anyone. Huh? I don't know what you mean here. The design of reST impacts everyone who has to read to write documents in it. > > But as for the second, i just don't see any justification for it. > > Reducing the multiple ways to do headers and lists and tables doesn't > > cripple anything; it only makes RST simpler and easier to understand. > > Headers: by "multiple ways", are you referring to the author's choice of > underline style? Or to the choice for overline & underline versus > underline-only? Both. Why have an assortment of 32 random punctuation characters, for a total of 64 different ways to do a heading? Who's going to remember what the characters are, anyway? Pick one or two and stick to them. There are really only two obvious ones: '-' and '='. You can differentiate heading levels by indenting the heading, one space per level. It would be vastly easier to tell what level a heading was at by looking at its position, rather than running all the way back to the beginning of the document and counting the number of different heading styles that appear. > true test is this: when you look at a reStructuredText title, in whatever > style, does it scream out at you, "I am a title!"? Without knowing anything > about the markup, most people would answer "yes, it does". That's a backwards argument. It's good that reST titles look like titles. But that doesn't mean reST has to recognize all possible things that might look like titles, as titles. It's a lot easier to say "just underline your title with a row of hyphens" than "choose one of the following list of 32 random punctuation marks to underline your title; and optionally overline it; oh, but actually we think you should use only the following subset of the 32 punctuation marks..." > > If you already have definition lists, why also have option lists and field > > lists? > > They're semantically different. Sure you could implement option & field > lists with definition lists, just as you could implement definition lists > with tables. Option lists are explicitly for command-line option > descriptions. Field lists are for name-value pairs where the details > matter, like database records or attributes of extension constructs > (directives). All three are about associating a list of things with their corresponding definitions. Distinguishing whether the things being defined are options or not is just as unnecessary as distinguishing shopping lists, to-do lists, hit lists, etc. -- ?!ng From aahz@pythoncraft.com Fri Aug 2 14:25:56 2002 From: aahz@pythoncraft.com (Aahz) Date: Fri, 2 Aug 2002 09:25:56 -0400 Subject: [Python-Dev] PEP 298, final (?) version In-Reply-To: <20020802053135.432.qmail@web40111.mail.yahoo.com> References: <20020802020719.GA10231@panix.com> <20020802053135.432.qmail@web40111.mail.yahoo.com> Message-ID: <20020802132556.GA18086@panix.com> On Thu, Aug 01, 2002, Scott Gilbert wrote: > --- Aahz wrote: >> >> I finally read all these threads today, cleaning out much of my >> OSCON backlog. Now, maybe I'm stupid, but I'm not understanding the >> relationship between the new buffer protocol (PEP 298) and the new bytes >> object (PEP 296). Should this be something documented in one or both >> PEPs? > > In the course of examining PEP 296 (the one I'm working on), Thomas Heller > thought it would be a good idea to make some additions to PyBufferProcs and > abstract.h so that he, and others, could treat a wider class of objects > with one API. I was only proposing the bytes object, where as he wanted to > be able to write code that works with bytes, string, mmap, array, and any > other buffer-like object uniformly (since they all make promises about the > lifetime of the pointer). Seems to me that part of my confusion lies in the fact that PEP 296 says that the bytes object is suitable for implementing arrays, whereas the discussion surrounding PEP 298 coughed up the issue that pure fixed buffers without locking were insufficient for arrays. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From barry@python.org Fri Aug 2 15:18:02 2002 From: barry@python.org (Barry A. Warsaw) Date: Fri, 2 Aug 2002 10:18:02 -0400 Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs References: <15689.47972.953407.418801@anthem.wooz.org> Message-ID: <15690.38042.441354.272859@anthem.wooz.org> >>>>> "DG" == David Goodger writes: DG> Why are PEPs converted to HTML at all then? (Semi-seriously DG> :-) To brand them with a Python banner and give them some hyperlinks. DG> RFCs pre-date the Web, HTML, GUIs, and PCs. There is a great DG> advantage in sticking to a text-based format, but the existing DG> structure is very limited. RFCs are so 20th century; don't DG> you think it's time to move on? ;-) Dinosaurs have a tendency DG> to become extinct you know. They also become the oil that drives our engines of industry, to twist an analogy. :) An of course RFCs are also converted to html: http://www.faqs.org/rfcs/rfc2822.html DG> Given a small amount of use, I think you'll find the rules DG> easy to remember. There should be little effect on editing. DG> At most, Emacs may need to be taught to recognize a bit more DG> punctuation. We'll see! >> The noisy markup in reST bothers me, although you've done a >> good job in minimizing the impact compared to other markup >> languages. DG> It's a trade-off: functionality for markup intrusion. It's DG> the functionality of the processed form that's important: DG> inline live links; live links to & from footnotes; automatic DG> tables of contents (with live links!); images (don't you just DG> *cringe* when you see ASCII graphics?); pleasant, readable DG> text. The markup is minimal, quickly and easily ignored. Taken to the extreme, why do we even use a text based format at all? We could, of course, get all that by authoring the PEPs directly in HTML. >> I made this suggestion privately to David, but I'll repeat it >> here. I'd be willing to accept that PEPs /may/ be written in >> reST as an alternative to plaintext, but not require it. DG> Sure. I thought I'd emphasized that in my original post: it'd DG> be an alternative, the two styles can coexist. If you want to DG> keep PEP 0 as it is, that's fine. I converted it to show that DG> its special processing was also supported. Cool. >> I'd like for PEP authors to explicitly choose one or the other, >> preferrably by file extension (e.g. .txt for plain text .rst or >> .rest for reST). DG> I'm not keen on a new file extension (this issue has come up DG> before). There's so much in place on many platforms that says DG> .txt means text files, and reStructuredText files *are* text DG> files, with just a bit of formal structure sprinkled over. DG> Browsers know what to do with .txt files; they wouldn't know DG> what to do with .rest or .rtxt files. Near-universal file DG> naming conventions are not the place to innovate IMHO. Don't most servers default to text/plain for types they don't know? I'm pretty sure Apache does. If a file extension isn't acceptable, then I'd still want the determination of plaintext vs. reST to be explicit. The other alternative is to add a PEP header to specify. I'd propose calling it Content-Type: and use text/x-rest as the value. >> I'd also like for there to be two tools for generation >> derivative forms from the original source. I would leave >> pep2html.py alone. That's the tool that generates .html from >> .txt. DG> See http://docutils.sf.net/tools/pep2html.py (based on DG> revision 1.37 of Python's nondist/peps/pep2html.py). Other DG> than abstracting the file I/O and some minor changes for DG> consistency & legibility, the reStructuredText-specific part DG> is just two functions. One checks for the format of the PEP, DG> and the other calls Docutils to do the work. Even without a DG> new file extension, there's no need for a separate tool. Fair enough. Let's do this: send me a diff against v1.39 of pep2html.py. I just downloaded docutils-0.2, but I'm not sure of the best way to integrate this in the nondist/peps directory. - If we do the normal setup.py install, that's fine for my machine but it means that everyone who will be pushing out peps will have to do the same. - If we hack pep2html.py to put ./docutils-0.2 on sys.path, then we can just check this stuff into the peps directory and it should Just Work. We'd have to update it when new docutils releases are made. Suggestions? Mostly I'd like to hear from others who push out new PEP versions. Would you rather have to install a disutils package in the normal way locally, or would you rather have everything you need in the nondist/peps directory? OTOH, if plaintext PEPs work without having access to the docutils package, that would be fine too (another reason perhaps for an explicit flag). >> I'd write a different tool that took a .rst file and generated >> both a .html file and a .txt file. The generated .txt file >> would have no markup and would conform to .txt PEP style as >> closely as possible. reST generated html would then have a >> link both to the original reST source, and to the plain text >> form. DG> Do we need a slightly less-structured text output? Maybe not. I'd prefer to have it, but if I'm alone there then I'll give up that crusade (or at least call YAGNI for now). DG> I don't think so, but I offered two alternative strategies in DG> PEP 287: >> Keep the existing PEP section structure constructs (one-line >> section headers, indented body text). Subsections can either be >> forbidden, or supported with reStructuredText-style underlined >> headers in the indented body text. >> Replace the PEP section structure constructs with the >> reStructuredText syntax. Section headers will require underlines, >> subsections will be supported out of the box, and body text need >> not be indented (except for block quotes). DG> Strategy (b) has been implemented; that's what the edited PEP DG> 287 uses. I'd recommend against it, but if you insist on DG> existing PEP structure, strategy (a) fits better although DG> inconsistently (depending on the decision on subsections). a) might also mean you'd have to reflow paragraphs to fit in the column width restrictions. I'd prefer a) but it may be more problematic. Moot if YAGNI prevails. >> A little competition never hurt anyone. :) So I'd open it up >> and let PEP authors decide, and we can do a side-by-side >> comparison of which format folks prefer to use. DG> Sure. Once authors see what the new markup gives them, I'm DG> sure there will be some converts. Let's find out. -Barry From guido@python.org Fri Aug 2 15:19:53 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 02 Aug 2002 10:19:53 -0400 Subject: [Python-Dev] PEP 298, final (?) version In-Reply-To: Your message of "Fri, 02 Aug 2002 12:36:52 +1200." <200208020036.g720aq905371@oma.cosc.canterbury.ac.nz> References: <200208020036.g720aq905371@oma.cosc.canterbury.ac.nz> Message-ID: <200208021419.g72EJrB29608@pcp02138704pcs.reston01.va.comcast.net> > > > void PyObject_ReleaseFixedBuffer(PyObject *obj); > > > > Would it be useful to allow bf_releasefixedbuffer to return an int > > indicating an exception? For instance, it could raise an exception if the > > extension errantly releases more times than it has acquired > > The code making the call might not be in an easy position > to deal with an exception -- e.g. an asynchronous I/O > routine called from a signal handler, another thread, > etc. > > Maybe use the warning mechanism to produce a message? In an asynch I/O situation, calling PyErr_Warn() is out of the question (it invokes Python code!). I propose to make it a fatal error -- after all the only reason why bf_releasefixedbuffer could fail should be that the caller makes a mistake. Since that's a bug in C code, a fatail error is acceptable. --Guido van Rossum (home page: http://www.python.org/~guido/) From theller@python.net Fri Aug 2 15:23:42 2002 From: theller@python.net (Thomas Heller) Date: Fri, 2 Aug 2002 16:23:42 +0200 Subject: [Python-Dev] PEP 298, __buffer__ Message-ID: <011901c23a30$3449d3d0$e000a8c0@thomasnotebook> [Unfortunately my email-server seems to be on vacation even earlier than myself. It seems I have not received some posts/replies: I'm currently reading the archives. Hopefully this one gets through] [Not the first time.] Scott writes: > There isn't currently a way for a class object created from Python script > to indicate that it wishes to implement the buffer interface. In the > Numeric source, I've seen them use self.__buffer__ for this purpose, but > this isn't actually an officially sanctioned magic name. This is an idea I also had for quite some time (very vague, maybe). I like it, but I haven't thought about it very carefully. Thomas From theller@python.net Fri Aug 2 15:31:17 2002 From: theller@python.net (Thomas Heller) Date: Fri, 2 Aug 2002 16:31:17 +0200 Subject: [Python-Dev] Email problems, PEP 298 Message-ID: <012d01c23a31$435493a0$e000a8c0@thomasnotebook> I'm currently having severe email-problems here: right in place for my vacation :-(. I cannot participate in the discussion anyway for two weeks, but hopefully I will be able to read it afterwards. Maybe I should have posted PEP 298 in it's current form to python-dev. Some of the suggestions mentioned here are already included. Thomas From guido@python.org Fri Aug 2 15:38:14 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 02 Aug 2002 10:38:14 -0400 Subject: [Python-Dev] Weird error handling in os._execvpe In-Reply-To: Your message of "Thu, 01 Aug 2002 22:24:33 PDT." <20020802052433.GA466@codesourcery.com> References: <20020801161946.GA32076@codesourcery.com> <200208011723.g71HN4025731@odiug.zope.com> <20020801182340.GB27575@codesourcery.com> <200208011927.g71JRhm10269@odiug.zope.com> <20020802052433.GA466@codesourcery.com> Message-ID: <200208021438.g72EcE629794@pcp02138704pcs.reston01.va.comcast.net> > On Thu, Aug 01, 2002 at 03:27:43PM -0400, Guido van Rossum wrote: > > > ... I don't recall exactly why we ended up in this situation in the > > first place. It's possible that it's an unnecessary sacrifice of a > > dead chicken, but it's also possible that there are platforms where > > this addressed a real need. I'd like to think that it was because I > > didn't want to add more cruft to posixmodule.c (I've long given up > > on that :-). > > I found out why it's done the way it is: There is no execvpe() in C, > not even in the extended-to-hell-and-back GNU libc. I considered > dinking around with the C-level environ pointer so that execvp() would > do what we want, but this seems unreliable at best, given how many > different ways to access the environment there are. :-) > So I think we're back to option 2 (enumerate the possible errors for > each platform). ENOENT and ENOTDIR should cover it for Unix. Would > other platform maintainers care to comment, please? Don't wait for them. Just submit a patch and assign it to me. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz@pythoncraft.com Fri Aug 2 15:48:30 2002 From: aahz@pythoncraft.com (Aahz) Date: Fri, 2 Aug 2002 10:48:30 -0400 Subject: [Python-Dev] PEP 1, PEP Purpose and Guidelines In-Reply-To: <15685.35726.678832.241665@anthem.wooz.org> References: <15685.35726.678832.241665@anthem.wooz.org> Message-ID: <20020802144830.GA4876@panix.com> All right, here are some suggested changes. Actual suggestions are indented; commentary and meta-information is not indented. On Mon, Jul 29, 2002, Barry A. Warsaw wrote: > > Kinds of PEPs > > There are two kinds of PEPs. A standards track PEP describes a > new feature or implementation for Python. An informational PEP > describes a Python design issue, or provides general guidelines or > information to the Python community, but does not propose a new > feature. Informational PEPs do not necessarily represent a Python > community consensus or recommendation, so users and implementors > are free to ignore informational PEPs or follow their advice. Add: Some informational PEPs become Meta-PEPs that describe the workflow of the Python project. Project contributions that fail to follow the prescriptions of Meta-PEPs are likely to be rejected. > If the PEP editor approves, he will assign the PEP a number, label > it as standards track or informational, give it status 'draft', > and create and check-in the initial draft of the PEP. The PEP > editor will not unreasonably deny a PEP. Reasons for denying PEP > status include duplication of effort, being technically unsound, > not providing proper motivation or addressing backwards > compatibility, or not in keeping with the Python philosophy. The > BDFL (Benevolent Dictator for Life, Guido van Rossum) can be > consulted during the approval phase, and is the final arbitrator > of the draft's PEP-ability. Substitute: If the PEP editor approves, he will assign the pre-PEP a number, label it as standards track or informational, give it status 'draft', and create and check-in the initial draft of the PEP. The PEP editor will not unreasonably deny a pre-PEP. Reasons for denying PEP status include duplication of effort, being technically unsound, not providing proper motivation or addressing backwards compatibility, or not in keeping with the Python philosophy. The BDFL (Benevolent Dictator for Life, Guido van Rossum) can be consulted during the approval phase, and is the final arbitrator of the draft's PEP-ability. Generally speaking, if a pre-PEP meets technical standards, it will be accepted as a PEP to provide a historical record even if likely to be rejected (see the later section on rejecting a PEP). (This is to clarify the distinction between denying a pre-PEP and rejecting a PEP later in the process.) > A PEP can also be `Rejected'. Perhaps after all is said and done > it was not a good idea. It is still important to have a record of > this fact. Add (not sure whether it should be a separate paragraph): The PEP author is responsible for recording summaries of all arguments in favor and opposition. This is particularly important for rejected PEPs to reduce the likelihood of rehashing the same debates. > 6. Rationale -- The rationale fleshes out the specification by > describing what motivated the design and why particular design > decisions were made. It should describe alternate designs that > were considered and related work, e.g. how the feature is > supported in other languages. > > The rationale should provide evidence of consensus within the > community and discuss important objections or concerns raised > during discussion. I'm thinking we should add a section 9) titled "Discussion summary" to make it clearer that the PEP author is required to include this information. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From gmcm@hypernet.com Fri Aug 2 16:32:20 2002 From: gmcm@hypernet.com (Gordon McMillan) Date: Fri, 2 Aug 2002 11:32:20 -0400 Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs In-Reply-To: References: Message-ID: <3D4A6DC4.26978.2F61817D@localhost> On 1 Aug 2002 at 22:28, David Goodger wrote: > Seriously, any significant technology requires a > significant spec. The wheel? > Speaking from experience having hashed out all these > issues over the last two years, "a simpler set of > rules" won't work. Ah, but you've been hashing it out with a group of people who *care* about things like this. Welcome to the larger world (where dinosaurs still roam). -- Gordon http://www.mcmillan-inc.com/ From guido@python.org Fri Aug 2 16:34:33 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 02 Aug 2002 11:34:33 -0400 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: Your message of "Fri, 02 Aug 2002 11:34:55 BST." <2msn1xpn00.fsf@starship.python.net> References: <200208011819.g71IJBR25893@odiug.zope.com> <2msn1xpn00.fsf@starship.python.net> Message-ID: <200208021534.g72FYXA11029@pcp02138704pcs.reston01.va.comcast.net> > I've found another annoying problem. I'm not really expecting someone > here to sovle it for me, but writing it down might help me think > clearly. > > This is about the function epilogues that always get generated. I.e: [...] > The debugger stopping on the "print 1" is confusing. > > There's an "obvious" solution to this: check it we're less than 4 > bytes from the end of the code string and don't do anything if we are. Um, I think that's less than reliable. I believe we just discussed this when Oren's patch for yield in try/finally did a similar thing (and weren't you the one who mentioned that your bytecodehacks can cause this assumption to fail? :-). I'm not actually sure that this needs fixing. Surely the --Return-- should be a sufficient hint. I note that without your patch it also stops at a confusing place, albeit a different one (on the "if a:" line). > This would be easy, except that for some bonkers reason, we support > arbitrary buffer objects for code strings! (see _PyCode_GETCODEPTR in > Include/compile.h -- though at least you can't create a code object > with an array code string from python, the getreadbuffer failing will > cause the interpreter to unceremoniously crash and burn). That went a little too fast. Can you explain that parenthetical remark more clearly? > I guess I can store the length somewhere -- _PyCode_GETCODEPTR returns > this, more by accident than design I suspect -- or call > bf_getsegcount(frame->f_code->co_code, &length) or something. > > Does anyone actually *use* this feature? I see Guido checked it in > and the patch was written by Greg Stein. Anyone remember motivations > from the time? Yes, Greg insisted that he might want to store Python bytecode in Flash ROM, and that this way the bytecode would not have to be copied to RAM. But I don't think this ever happened (well, maybe the now-dead Pippy port to PalmOS used it???). I'd be happy to kill it as a YAGNI. But that still doesn't mean I approve checking for "4 bytes from the end". --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Aug 2 16:35:55 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 02 Aug 2002 11:35:55 -0400 Subject: [Python-Dev] PEP 298, __buffer__ In-Reply-To: Your message of "Fri, 02 Aug 2002 06:41:05 EDT." <3D4A61C1.8030800@stsci.edu> References: <20020802055412.12716.qmail@web40106.mail.yahoo.com> <3D4A61C1.8030800@stsci.edu> Message-ID: <200208021535.g72FZtG11040@pcp02138704pcs.reston01.va.comcast.net> > Scott Gilbert wrote: > > >Tonight, I remember another thought that I've had for a while. > > > >There isn't currently a way for a class object created from Python script > >to indicate that it wishes to implement the buffer interface. In the > >Numeric source, I've seen them use self.__buffer__ for this purpose, but > >this isn't actually an officially sanctioned magic name. > > > > > >I'm thinking one of: > > > > class OneWay(object): > > def __init__(self): > > self.__buffer__ = bytes(1000) > > > >Or: > > > > class SomeOther(object): > > def __init__(self): > > self._private = bytes(1000) > > def __buffer__(self): > > return self._private > > > >I believe the first one is the way it's done in Numeric (Numarray too?). [Todd Miller] > The numarray C-API essentially supports both usages, although we only > use the __buffer__ name in the second case. > > > > >(Maybe Todd Miller will comment on this and whether it's useful to him.) > > > Yes, it is useful for prototyping. Numarray calls a __buffer__() > method to support python class wrappers around mmap. We use our class > wrappers around mmap to add the ability to chop a file up into > non-overlapping resizeable slices. Each slice can be used as the buffer > of an independent memory mapped array. This would be easy enough to add, I suppose, but (a) I don't think it's got much to do with PEP 298, and (b) let's wait until we have a real use case, so perhaps we can decide which form it should take. Until then, I call YAGNI. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Aug 2 16:43:06 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 02 Aug 2002 11:43:06 -0400 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: Your message of "Fri, 02 Aug 2002 09:27:17 BST." <2mlm7pfyxm.fsf@starship.python.net> References: <200208011819.g71IJBR25893@odiug.zope.com> <2mlm7pfyxm.fsf@starship.python.net> Message-ID: <200208021543.g72Fh6011093@pcp02138704pcs.reston01.va.comcast.net> > pystone is a mystery. It's a fair bit slower but also much more > variable with my patch. Moving trace code out of line helps quite a > bit but it's still ~1% slower. Hm. For me (with your latest patch which moves the trace code out of line) pystone is actually *less* variable with your patch than without, and it's also faster with -O than before. So I wouldn't lose any sleep over pystone (leave that to Tim :-). Maybe you should increase LOOPS in pystone.py; I usually set it to 40K or even 100K. > I think I'd like to wait for serious review. I'd be surprised if the > patch went out of date at all quickly. Fair enough. > Also, it seems Lib/compiler currently works by generating SET_LINENO > and then builds co_lnotab by scanning for them afterwards. That's not > going to work in the new world, so I should probably think about how > to change it... Or wait for Jeremy. (I suppose you still *support* the SET_LINENO opcode?) BTW, you should change the .pyc magic number in your patch. --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh@python.net Fri Aug 2 16:46:51 2002 From: mwh@python.net (Michael Hudson) Date: 02 Aug 2002 16:46:51 +0100 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: Guido van Rossum's message of "Fri, 02 Aug 2002 11:43:06 -0400" References: <200208011819.g71IJBR25893@odiug.zope.com> <2mlm7pfyxm.fsf@starship.python.net> <200208021543.g72Fh6011093@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <2mofcluutw.fsf@starship.python.net> Guido van Rossum writes: > > pystone is a mystery. It's a fair bit slower but also much more > > variable with my patch. Moving trace code out of line helps quite a > > bit but it's still ~1% slower. > > Hm. For me (with your latest patch which moves the trace code out of > line) pystone is actually *less* variable with your patch than > without, and it's also faster with -O than before. > > So I wouldn't lose any sleep over pystone (leave that to Tim :-). I wasn't going to. > Maybe you should increase LOOPS in pystone.py; I usually set it to 40K > or even 100K. Did that. I thought measuring 0.8 secs was a bit on the thin side. [...] > > Also, it seems Lib/compiler currently works by generating SET_LINENO > > and then builds co_lnotab by scanning for them afterwards. That's not > > going to work in the new world, so I should probably think about how > > to change it... > > Or wait for Jeremy. Yes. > (I suppose you still *support* the SET_LINENO opcode?) No. Do you think I should? > BTW, you should change the .pyc magic number in your patch. Really? It's already changed since the last released Python. Easy enough to change again, though, and it makes testing easier. Cheers, M. -- You have run into the classic Dmachine problem: your machine has become occupied by a malevolent spirit. Replacing hardware or software will not fix this - you need an exorcist. -- Tim Bradshaw, comp.lang.lisp From guido@python.org Fri Aug 2 16:47:14 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 02 Aug 2002 11:47:14 -0400 Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs In-Reply-To: Your message of "Fri, 02 Aug 2002 11:32:20 EDT." <3D4A6DC4.26978.2F61817D@localhost> References: <3D4A6DC4.26978.2F61817D@localhost> Message-ID: <200208021547.g72FlE711150@pcp02138704pcs.reston01.va.comcast.net> > > Speaking from experience having hashed out all these > > issues over the last two years, "a simpler set of > > rules" won't work. > > Ah, but you've been hashing it out with a group of > people who *care* about things like this. Welcome > to the larger world (where dinosaurs still roam). Funny, Ping doesn't strike me as a dinosaur. More as someone who enjoys a good argument. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh@python.net Fri Aug 2 16:53:55 2002 From: mwh@python.net (Michael Hudson) Date: 02 Aug 2002 16:53:55 +0100 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: Guido van Rossum's message of "Fri, 02 Aug 2002 11:34:33 -0400" References: <200208011819.g71IJBR25893@odiug.zope.com> <2msn1xpn00.fsf@starship.python.net> <200208021534.g72FYXA11029@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <2mlm7puui4.fsf@starship.python.net> Guido van Rossum writes: > > I've found another annoying problem. I'm not really expecting someone > > here to sovle it for me, but writing it down might help me think > > clearly. > > > > This is about the function epilogues that always get generated. I.e: > [...] > > The debugger stopping on the "print 1" is confusing. > > > > There's an "obvious" solution to this: check it we're less than 4 > > bytes from the end of the code string and don't do anything if we are. > > Um, I think that's less than reliable. I believe we just discussed > this when Oren's patch for yield in try/finally did a similar thing > (and weren't you the one who mentioned that your bytecodehacks can > cause this assumption to fail? :-). Good point. > I'm not actually sure that this needs fixing. Surely the --Return-- > should be a sufficient hint. I note that without your patch it also > stops at a confusing place, albeit a different one (on the "if a:" > line). The problem is that when we jump into the epilogue, a 'line' trace event gets generated before the 'return' one. So there is no --Return-- hint. > > This would be easy, except that for some bonkers reason, we support > > arbitrary buffer objects for code strings! (see _PyCode_GETCODEPTR in > > Include/compile.h -- though at least you can't create a code object > > with an array code string from python, the getreadbuffer failing will > > cause the interpreter to unceremoniously crash and burn). > > That went a little too fast. Can you explain that parenthetical > remark more clearly? 1) Don't you find the idea of type(co.co_code) == types.ArrayType at least a little scary? Mainly due to resizes -- having mutable code might be nice for development environments and such. 2) I thought it was possible for bf_getreadbuffer to fail (maybe I'm wrong here). _PyCode_GETCODEPTR does no error checking. > > I guess I can store the length somewhere -- _PyCode_GETCODEPTR returns > > this, more by accident than design I suspect -- or call > > bf_getsegcount(frame->f_code->co_code, &length) or something. > > > > Does anyone actually *use* this feature? I see Guido checked it in > > and the patch was written by Greg Stein. Anyone remember motivations > > from the time? > > Yes, Greg insisted that he might want to store Python bytecode in > Flash ROM, and that this way the bytecode would not have to be copied > to RAM. I see. > But I don't think this ever happened Gosh. > (well, maybe the now-dead Pippy port to PalmOS used it???). Maybe. Somehow doubt it, though. > I'd be happy to kill it as a YAGNI. That's nice, but if... > But that still doesn't mean I approve checking for "4 bytes from the > end". ...it doesn't actually help. Does anyone have any better ideas for not generating 'line' trace events in the epilogue? Cheers, M. -- I also feel it essential to note, [...], that Description Logics, non-Monotonic Logics, Default Logics and Circumscription Logics can all collectively go suck a cow. Thank you. -- http://advogato.org/person/Johnath/diary.html?start=4 From guido@python.org Fri Aug 2 16:51:24 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 02 Aug 2002 11:51:24 -0400 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: Your message of "Fri, 02 Aug 2002 16:46:51 BST." <2mofcluutw.fsf@starship.python.net> References: <200208011819.g71IJBR25893@odiug.zope.com> <2mlm7pfyxm.fsf@starship.python.net> <200208021543.g72Fh6011093@pcp02138704pcs.reston01.va.comcast.net> <2mofcluutw.fsf@starship.python.net> Message-ID: <200208021551.g72FpOg11206@pcp02138704pcs.reston01.va.comcast.net> > > (I suppose you still *support* the SET_LINENO opcode?) > > No. Do you think I should? I like to be on the conservative side. Also, it would make life easier for the compiler package until Jeremy has time to fix it. :-) > > BTW, you should change the .pyc magic number in your patch. > > Really? It's already changed since the last released Python. Easy > enough to change again, though, and it makes testing easier. Given that each time I try your patches I get unknown opcode errors, please change it. --Guido van Rossum (home page: http://www.python.org/~guido/) From xscottg@yahoo.com Fri Aug 2 16:55:53 2002 From: xscottg@yahoo.com (Scott Gilbert) Date: Fri, 2 Aug 2002 08:55:53 -0700 (PDT) Subject: [Python-Dev] PEP 298, final (?) version In-Reply-To: <20020802132556.GA18086@panix.com> Message-ID: <20020802155553.5138.qmail@web40110.mail.yahoo.com> --- Aahz wrote: > > Seems to me that part of my confusion lies in the fact that PEP 296 says > that the bytes object is suitable for implementing arrays, whereas the > discussion surrounding PEP 298 coughed up the issue that pure fixed > buffers without locking were insufficient for arrays. > Theoretically, you could use the bytes object and the struct module to implement something that is functionally equivalent to arrays from arraymodule.c (at least from the Python scripting point of view). Lets call that hypothetical reimplementation "array.py". However, since arrays from arraymodule.c can be resized in place, the pointer is not necessarily constant for the lifetime of the array object. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com From skip@pobox.com Fri Aug 2 17:07:44 2002 From: skip@pobox.com (Skip Montanaro) Date: Fri, 2 Aug 2002 11:07:44 -0500 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: <2mlm7puui4.fsf@starship.python.net> References: <200208011819.g71IJBR25893@odiug.zope.com> <2msn1xpn00.fsf@starship.python.net> <200208021534.g72FYXA11029@pcp02138704pcs.reston01.va.comcast.net> <2mlm7puui4.fsf@starship.python.net> Message-ID: <15690.44624.228485.503769@localhost.localdomain> Michael> Does anyone have any better ideas for not generating 'line' Michael> trace events in the epilogue? How about adding a field to the code object which holds the byte code offset of the epilogue? The code which emits line events (where is that, btw?) would not emit if the current instruction offset is >= the epilogue offset. Skip From guido@python.org Fri Aug 2 17:13:56 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 02 Aug 2002 12:13:56 -0400 Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: Your message of "Fri, 02 Aug 2002 16:53:55 BST." <2mlm7puui4.fsf@starship.python.net> References: <200208011819.g71IJBR25893@odiug.zope.com> <2msn1xpn00.fsf@starship.python.net> <200208021534.g72FYXA11029@pcp02138704pcs.reston01.va.comcast.net> <2mlm7puui4.fsf@starship.python.net> Message-ID: <200208021613.g72GDug11388@pcp02138704pcs.reston01.va.comcast.net> > The problem is that when we jump into the epilogue, a 'line' trace > event gets generated before the 'return' one. So there is no > --Return-- hint. Ah, I missed that detail in your transcript. > 1) Don't you find the idea of type(co.co_code) == types.ArrayType at > least a little scary? Mainly due to resizes -- having mutable code > might be nice for development environments and such. Yes, it's scary. But nobody does this, and as you say, it can't be done from Python. > 2) I thought it was possible for bf_getreadbuffer to fail (maybe I'm > wrong here). _PyCode_GETCODEPTR does no error checking. So one should only use objects whose bf_getreadbuffer won't fail (when invoked with segment index 0). > > I'd be happy to kill it as a YAGNI. > > That's nice, but if... > > > But that still doesn't mean I approve checking for "4 bytes from the > > end". > > ...it doesn't actually help. Well, it kills off a potentially unsafe feature. > Does anyone have any better ideas for not generating 'line' trace > events in the epilogue? Use a separate opcode for which you could check? --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh@python.net Fri Aug 2 16:56:57 2002 From: mwh@python.net (Michael Hudson) Date: Fri, 2 Aug 2002 16:56:57 +0100 (BST) Subject: [Python-Dev] seeing off SET_LINENO In-Reply-To: <200208021551.g72FpOg11206@pcp02138704pcs.reston01.va.comcast.net> Message-ID: On Fri, 2 Aug 2002, Guido van Rossum wrote: > > > (I suppose you still *support* the SET_LINENO opcode?) > > > > No. Do you think I should? > > I like to be on the conservative side. Also, it would make life > easier for the compiler package until Jeremy has time to fix it. :-) OK. I guess it can go down the bottom of the eval loop now (if that makes any difference). > > > BTW, you should change the .pyc magic number in your patch. > > > > Really? It's already changed since the last released Python. Easy > > enough to change again, though, and it makes testing easier. > > Given that each time I try your patches I get unknown opcode errors, > please change it. OK, but it's 5pm on a Friday here :) Have a good weekend everyone. Cheers, M. From skip@pobox.com Fri Aug 2 18:19:52 2002 From: skip@pobox.com (Skip Montanaro) Date: Fri, 2 Aug 2002 12:19:52 -0500 Subject: [Python-Dev] dbm module, whichdb, test_whichdb Message-ID: <15690.48952.51439.449198@localhost.localdomain> Folks, I just checked in a modified dbmmodule.c, whichdb.py and a new regression test file, test_whichdb.py. The change to dbmmodule.c accommodates linkage with Berkeley DB. The change to whichdb catches this case (opening "foo" actually creates "foo.db"). The new test_whichdb.py file simply adds regression tests for the whole mess. Please have a look, try it out, and let me know if it gives your system heartburn. Having messed around with this stuff off-and-on for awhile, I have no illusions about this going in without tickling some platform dependency. Jack, you're especially on alert, because I know you had problems with some earlier bsddb-related changes to setup.py. My iMac w/ MacOS X is still in Michigan with Ellen for the summer. Skip From nas@python.ca Fri Aug 2 19:15:27 2002 From: nas@python.ca (Neil Schemenauer) Date: Fri, 2 Aug 2002 11:15:27 -0700 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib heapq.py,NONE,1.1 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Fri, Aug 02, 2002 at 09:44:34AM -0700 References: Message-ID: <20020802111527.A13965@glacier.arctrix.com> gvanrossum@users.sourceforge.net wrote: > Adding the heap queue algorithm, per discussion in python-dev last > week. Cool. > __about__ = """Heap queues [...] Is this going to become a "blessed" special name or do you consider it harmless abuse of the namespace? Neil From zack@codesourcery.com Fri Aug 2 19:23:34 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Fri, 2 Aug 2002 11:23:34 -0700 Subject: [Python-Dev] Weird error handling in os._execvpe In-Reply-To: <200208021438.g72EcE629794@pcp02138704pcs.reston01.va.comcast.net> References: <20020801161946.GA32076@codesourcery.com> <200208011723.g71HN4025731@odiug.zope.com> <20020801182340.GB27575@codesourcery.com> <200208011927.g71JRhm10269@odiug.zope.com> <20020802052433.GA466@codesourcery.com> <200208021438.g72EcE629794@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020802182334.GC466@codesourcery.com> On Fri, Aug 02, 2002 at 10:38:14AM -0400, Guido van Rossum wrote: > > So I think we're back to option 2 (enumerate the possible errors for > > each platform). ENOENT and ENOTDIR should cover it for Unix. Would > > other platform maintainers care to comment, please? > > Don't wait for them. Just submit a patch and assign it to me. :-) Done: id 590294. zw From guido@python.org Fri Aug 2 19:31:44 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 02 Aug 2002 14:31:44 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib heapq.py,NONE,1.1 In-Reply-To: Your message of "Fri, 02 Aug 2002 11:15:27 PDT." <20020802111527.A13965@glacier.arctrix.com> References: <20020802111527.A13965@glacier.arctrix.com> Message-ID: <200208021831.g72IViY13196@pcp02138704pcs.reston01.va.comcast.net> > > __about__ = """Heap queues [...] > > Is this going to become a "blessed" special name or do you consider it > harmless abuse of the namespace? The latter. I figured François' treatise was too long for the docstring. I was originally going to make it an unnamed string literal -- maybe that's better? --Guido van Rossum (home page: http://www.python.org/~guido/) From goodger@users.sourceforge.net Fri Aug 2 21:47:52 2002 From: goodger@users.sourceforge.net (David Goodger) Date: Fri, 02 Aug 2002 16:47:52 -0400 Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs In-Reply-To: <15690.38042.441354.272859@anthem.wooz.org> Message-ID: Barry A. Warsaw wrote: > An of course RFCs are also converted to html: > http://www.faqs.org/rfcs/rfc2822.html So they are. Pretty picture at the top, navigation bar at top & bottom, and a huge
 in-between (with live RFC links at least).
Impressive.  ;-)

> Taken to the extreme, why do we even use a text based format at all?
> We could, of course, get all that by authoring the PEPs directly in
> HTML.

To answer your hypothetical (I assume), it's because raw HTML/XML/SGML
is unreadable to most people.  Plaintext is a common denominator,
useful because it's universally readable.  But texts like RFCs and
PEPs do have some structure; by formalizing that structure we can use
it.  Current PEPs are one up on RFCs, recognizing section titles for
HTML.  ReStructuredText just takes that further.

>     >> I'd like for PEP authors to explicitly choose one or the other,
>     >> preferrably by file extension (e.g. .txt for plain text .rst or
>     >> .rest for reST).
> 
>     DG> I'm not keen on a new file extension (this issue has come up
>     DG> before).  There's so much in place on many platforms that says
>     DG> .txt means text files, and reStructuredText files *are* text
>     DG> files, with just a bit of formal structure sprinkled over.
>     DG> Browsers know what to do with .txt files; they wouldn't know
>     DG> what to do with .rest or .rtxt files.  Near-universal file
>     DG> naming conventions are not the place to innovate IMHO.
> 
> Don't most servers default to text/plain for types they don't know?
> I'm pretty sure Apache does.

I don't know the answer to that.  There's still what the browser does
with it at the client end, and what apps like Windows Explorer and
Mac's File Exchange do with file extensions.  I think those
side-effects make keeping .txt worth it.

> If a file extension isn't acceptable, then I'd still want the
> determination of plaintext vs. reST to be explicit.  The other
> alternative is to add a PEP header to specify.  I'd propose calling
> it Content-Type: and use text/x-rest as the value.

[Already replied to in private email; repeated here to elicit
opinions.]

Good idea.  But the header "Content-Type: text/x-rest" seems to imply
much more than is intended.  PEP 258 proposes a __docformat__ variable
to contain the name of the format being used for docstrings; perhaps a
"Format:" header for PEPs?  For example:

    Format: reStructuredText

Alternatively:

    Format: RST

I prefer "RST" to "rest", which is already used as an acronym for the
"Representational State Transfer" protocol (see Paul Prescod's article
at http://www.xml.com/pub/a/2002/02/20/rest.html).

The existing format could be called "Plaintext" (or "PEP 1.0" ;-).
Without the "Format:" header, "Plaintext" would be the default.

[In his reply to the aforementioned private email,]

Barry pointed out:

    Since the PEP headers are modeled on RFC 2822, I say we stick with
    established standards rather than invent our own.  So
    "Content-Type: text/x-rest" seems natural, and for most related
    standards, if there is no Content-Type: header, text/plain is
    already the documented default.

Looking at the relevant standards (RFC 2616 etc.) I see his point.
Using "Content-type:" may seem like overkill now, but it's flexible
and future-proof (!).  The "charset:" part could also come in handy;
already, there are some PEPs (including PEP 0) which implicitly use
Latin-1.

But "text/x-rst" would be better. :-)

> Fair enough.  Let's do this: send me a diff against v1.39 of
> pep2html.py.

Will do.

> I just downloaded docutils-0.2, but I'm not sure of the
> best way to integrate this in the nondist/peps directory.
> 
> - If we do the normal setup.py install, that's fine for my machine but
>   it means that everyone who will be pushing out peps will have to do
>   the same.
> 
> - If we hack pep2html.py to put ./docutils-0.2 on sys.path, then we
>   can just check this stuff into the peps directory and it should Just
>   Work.  We'd have to update it when new docutils releases are made.

The "docutils" package could be a subdirectory of nondist/peps under
CVS.  When pep2html.py is run, the current working directory is
already on the path so "import docutils" should just work and no
sys.path manipulation would be necessary.  But Docutils is
substantial and evolving.  I don't mind keeping Python's repository in
sync but would others object to the added files and CVS traffic?
Eventually I hope for Docutils to go into the stdlib, but it's not
ready for consideration yet.

I agree with the direct email consensus that "python setup.py install"
is best.

> OTOH, if plaintext PEPs work without having access to the docutils
> package, that would be fine too (another reason perhaps for an
> explicit flag).

Your wish is my command.  If Docutils isn't installed and pep2html.py
is asked to process a reStructuredText PEP, it will report the problem
and move on gracefully (no traceback).

>     DG> Sure.  Once authors see what the new markup gives them, I'm
>     DG> sure there will be some converts.
> 
> Let's find out.

Great.  I'll work on pep2html.py, a README, and a new "Template"
Meta-PEP including a recommended reStructuredText subset.

-- 
David Goodger    Open-source projects:
  - Python Docutils: http://docutils.sourceforge.net/
    (includes reStructuredText: http://docutils.sf.net/rst.html)
  - The Go Tools Project: http://gotools.sourceforge.net/



From goodger@users.sourceforge.net  Fri Aug  2 21:48:05 2002
From: goodger@users.sourceforge.net (David Goodger)
Date: Fri, 02 Aug 2002 16:48:05 -0400
Subject: [Python-Dev] Docutils/reStructuredText is ready to process
 PEPs
In-Reply-To: <3D4A6DC4.26978.2F61817D@localhost>
Message-ID: 

[David]
>> Seriously, any significant technology requires a
>> significant spec.

[Gordon]
> The wheel?

If the ISO were around at the time, yes!

> Ah, but you've been hashing it out with a group of
> people who *care* about things like this. Welcome
> to the larger world (where dinosaurs still roam).

[Guido]
> Funny, Ping doesn't strike me as a dinosaur.  More as someone who
> enjoys a good argument. :-)

So that wasn't Abuse?  What a relief!

-- 
David Goodger    Open-source projects:
  - Python Docutils: http://docutils.sourceforge.net/
    (includes reStructuredText: http://docutils.sf.net/rst.html)
  - The Go Tools Project: http://gotools.sourceforge.net/



From bckfnn@worldonline.dk  Fri Aug  2 21:52:19 2002
From: bckfnn@worldonline.dk (Finn Bock)
Date: Fri, 02 Aug 2002 22:52:19 +0200
Subject: [Python-Dev] timsort for jython
Message-ID: <3D4AF103.5030403@worldonline.dk>

Hi,

Here are some numbers for a javaport of the timsort code.

https://sourceforge.net/tracker/index.php?func=detail&aid=590360&group_id=12867&atid=312867

The old sorting code in jython was the 1.5 code from CPython with a 
quicksort implementaion also inspired by Tim Peters.

Switching to timsort is obviously a nobrainer for us. You also don't 
need to hold back on giving stability garanties in the documentation for 
jython's sake.

All numbers using JDK1.4.1 on Win2K & 1300Mhz AMD. I gave up waiting for 
the i=20 line.

quicksort/insertionsort:

  i  *sort  \sort  /sort  3sort  +sort  %sort  ~sort  =sort  !sort
15   0.66   0.50   0.30   0.29   0.40   0.42   0.32   0.28   2.53
16   1.36   1.10   0.67   0.67   0.94   1.05   0.75   0.60   6.12
17   3.12   2.38   1.47   1.47   1.88   2.12   1.62   1.28  15.43
18   6.52   5.14   3.22   3.22   4.52   5.56   3.35   2.73  36.04
19  14.32  11.07   6.99   6.99   8.71  11.72   7.33   5.86  87.80

timsort:

  i  *sort  \sort  /sort  3sort  +sort  %sort  ~sort  =sort  !sort
15   0.44   0.05   0.03   0.03   0.04   0.06   0.17   0.02   0.06
16   0.82   0.08   0.06   0.07   0.07   0.11   0.32   0.05   0.11
17   1.76   0.18   0.13   0.13   0.13   0.23   0.64   0.11   0.22
18   3.87   0.34   0.26   0.29   0.27   0.49   1.29   0.21   0.45
19   8.91   0.70   0.53   0.54   0.54   1.07   2.62   0.43   0.90


regards,
finn



From guido@python.org  Fri Aug  2 22:07:09 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 02 Aug 2002 17:07:09 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: Your message of "Fri, 02 Aug 2002 22:52:19 +0200."
 <3D4AF103.5030403@worldonline.dk>
References: <3D4AF103.5030403@worldonline.dk>
Message-ID: <200208022107.g72L79i16056@pcp02138704pcs.reston01.va.comcast.net>

> Here are some numbers for a javaport of the timsort code.
> 
> https://sourceforge.net/tracker/index.php?func=detail&aid=590360&group_id=12867&atid=312867
> 
> The old sorting code in jython was the 1.5 code from CPython with a 
> quicksort implementaion also inspired by Tim Peters.
> 
> Switching to timsort is obviously a nobrainer for us. You also don't 
> need to hold back on giving stability garanties in the documentation for 
> jython's sake.

Woo hoo!  Way to go, Finn.  Sounds like you'll be able to make the
stability guarantee in Jython 2.2, whereas we can only make it for
CPython 2.3. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From goodger@users.sourceforge.net  Fri Aug  2 22:56:55 2002
From: goodger@users.sourceforge.net (David Goodger)
Date: Fri, 02 Aug 2002 17:56:55 -0400
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib
 heapq.py,NONE,1.1
Message-ID: 

Guido wrote:
> I was originally going to make it an unnamed string
> literal -- maybe that's better?

In PEP 258 I call those "Additional Docstrings":

    Many programmers would like to make extensive use of docstrings
    for API documentation.  However, docstrings do take up space in
    the running program, so some of these programmers are reluctant to
    "bloat up" their code.  Also, not all API documentation is
    applicable to interactive environments, where __doc__ would be
    displayed.

    The docstring processing system's extraction tools will
    concatenate all string literal expressions which appear at the
    beginning of a definition or after a simple assignment.  Only the
    first strings in definitions will be available as __doc__, and can
    be used for brief usage text suitable for interactive sessions;
    subsequent string literals and all attribute docstrings are
    ignored by the Python bytecode compiler and may contain more
    extensive API information.

    Example::

        def function(arg):
            """This is __doc__, function's docstring."""
            """
            This is an additional docstring, ignored by the bytecode
            compiler, but extracted by the Docutils.
            """
            pass

(Original idea from Moshe Zadka.)

-- 
David Goodger    Open-source projects:
  - Python Docutils: http://docutils.sourceforge.net/
    (includes reStructuredText: http://docutils.sf.net/rst.html)
  - The Go Tools Project: http://gotools.sourceforge.net/



From Jack.Jansen@oratrix.com  Fri Aug  2 21:35:24 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Fri, 2 Aug 2002 22:35:24 +0200
Subject: [Python-Dev] dbm module, whichdb, test_whichdb
In-Reply-To: <15690.48952.51439.449198@localhost.localdomain>
Message-ID: <5F17B1A0-A657-11D6-AE39-003065517236@oratrix.com>

On vrijdag, augustus 2, 2002, at 07:19 , Skip Montanaro wrote:
>   Jack, you're especially on alert, because I know you had
> problems with some earlier bsddb-related changes to setup.py.

Both test_whichdb and test_anydbm pass without problems.
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -



From tim.one@comcast.net  Fri Aug  2 23:51:25 2002
From: tim.one@comcast.net (Tim Peters)
Date: Fri, 02 Aug 2002 18:51:25 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: <3D4AF103.5030403@worldonline.dk>
Message-ID: 

[Finn Bock]
> ...
> The old sorting code in jython was the 1.5 code from CPython with a
> quicksort implementaion also inspired by Tim Peters.

The sad thing is, that was a very good quicksort -- I thought I was done
when I wrote that .

> Switching to timsort is obviously a nobrainer for us.

Thanks for sharing this!  Made my day .  I noted in the Jython patch
that you should see a nice speedup by nuking the assert() calls once you're
confident in the port; Java is checking out-of-bound array indices for you,
and that's largely what the asserts are guarding against in the C
implementation.

> You also don't need to hold back on giving stability garanties in the
> documentation for jython's sake.

I didn't .  Stability doesn't come free, and for all I know, in
another 3 years a method will be discovered that's 3x faster but not stable.
For example, Splaysort is (as an email correspondent reminded me) provably
adaptive to all known measures of presortedness, but when I looked at the
code it "was obvious" that it wouldn't be competitive on random data; it
also requires two pointers per list element.  In coming years, researchers
may well dream up quicker ways to get the same goodness, but Splaysort isn't
stable, and very few fast algorithms are.  So I don't want to hobble future
implementations by holding Python to promises I don't care much about.
OTOH, I do expect that once code relies on stability, we'll have about as
much chance of taking that away as getting rid of list.append().



From aahz@pythoncraft.com  Sat Aug  3 00:41:45 2002
From: aahz@pythoncraft.com (Aahz)
Date: Fri, 2 Aug 2002 19:41:45 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: 
References: <3D4AF103.5030403@worldonline.dk> 
Message-ID: <20020802234145.GA28343@panix.com>

On Fri, Aug 02, 2002, Tim Peters wrote:
>
> Stability doesn't come free, and for all I know, in another 3 years a
> method will be discovered that's 3x faster but not stable.

You're pulling our legs, right?  I thought you said this version of
mergesort was converging on the theoretical lower bound.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Project Vote Smart: http://www.vote-smart.org/


From guido@python.org  Sat Aug  3 01:28:43 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 02 Aug 2002 20:28:43 -0400
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib heapq.py,NONE,1.1
In-Reply-To: Your message of "Fri, 02 Aug 2002 17:56:55 EDT."
 
References: 
Message-ID: <200208030028.g730Sm219662@pcp02138704pcs.reston01.va.comcast.net>

> > I was originally going to make it an unnamed string
> > literal -- maybe that's better?
> 
> In PEP 258 I call those "Additional Docstrings":
> 
>     Many programmers would like to make extensive use of docstrings
>     for API documentation.  However, docstrings do take up space in
>     the running program, so some of these programmers are reluctant to
>     "bloat up" their code.  Also, not all API documentation is
>     applicable to interactive environments, where __doc__ would be
>     displayed.
> 
>     The docstring processing system's extraction tools will
>     concatenate all string literal expressions which appear at the
>     beginning of a definition or after a simple assignment.  Only the
>     first strings in definitions will be available as __doc__, and can
>     be used for brief usage text suitable for interactive sessions;
>     subsequent string literals and all attribute docstrings are
>     ignored by the Python bytecode compiler and may contain more
>     extensive API information.
> 
>     Example::
> 
>         def function(arg):
>             """This is __doc__, function's docstring."""
>             """
>             This is an additional docstring, ignored by the bytecode
>             compiler, but extracted by the Docutils.
>             """
>             pass
> 
> (Original idea from Moshe Zadka.)

Ah, I thought there had to be something like that. :-)

Do you also recognize this if there are comments between?  Or blank
lines?  E.g.

   def f():
       """
       foo
       """

       # blah

       """
       bar
       """

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net  Sat Aug  3 01:51:22 2002
From: tim.one@comcast.net (Tim Peters)
Date: Fri, 02 Aug 2002 20:51:22 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: <20020802234145.GA28343@panix.com>
Message-ID: 

[Tim]
> Stability doesn't come free, and for all I know, in another 3 years a
> method will be discovered that's 3x faster but not stable.

[Aahz]
> You're pulling our legs, right?  I thought you said this version of
> mergesort was converging on the theoretical lower bound.

For # of comparisons done on randomly ordered data, yes, there's a hard
lower bound of lg(n!) comparisons, but the samplesort hybrid was close
enough to that too that there wouldn't have been much point to timsort.  For
various kinds of partially ordered data, the only catch-all hard lower bound
is n-1 comparisons (read timsort.txt attached to the patch on SF, or
Objects/listsort.txt in current CVS -- there's much more info in those than
in the text file I posted to Python-Dev).

Comparisons aren't the whole story, either, as ~sort showed dramatically in
the x-platform timings (see the patch).  I believe timsort is sometimes more
cache-friendly than the samplesort hybrid (&, e.g., I see no other way to
explain the wild ~sort x-platform behavior), but it's not doing anything
heroic for cache-friendliness.  The pending-run stack invariants
automatically implement what's called "tiling" in the literature, but that's
not the only cache trick it *could* play.

i'll-be-dead-before-the-sorting-story-ly y'rs  - tim



From skip@pobox.com  Sat Aug  3 03:01:34 2002
From: skip@pobox.com (Skip Montanaro)
Date: Fri, 2 Aug 2002 21:01:34 -0500
Subject: [Python-Dev] dbm module, whichdb, test_whichdb
In-Reply-To: <5F17B1A0-A657-11D6-AE39-003065517236@oratrix.com>
References: <15690.48952.51439.449198@localhost.localdomain>
 <5F17B1A0-A657-11D6-AE39-003065517236@oratrix.com>
Message-ID: <15691.14718.263409.776329@localhost.localdomain>

    Jack> Both test_whichdb and test_anydbm pass without problems.

Excellent...

S


From ping@zesty.ca  Sat Aug  3 07:21:20 2002
From: ping@zesty.ca (Ka-Ping Yee)
Date: Sat, 3 Aug 2002 01:21:20 -0500 (CDT)
Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs
In-Reply-To: 
Message-ID: 

On Fri, 2 Aug 2002, David Goodger wrote:
> [Guido]
> > Funny, Ping doesn't strike me as a dinosaur.  More as someone who
> > enjoys a good argument. :-)
>
> So that wasn't Abuse?  What a relief!

Just to make sure you know: i don't argue only for the sake of arguing.
I argue when i think it will make Python better.


-- ?!ng



From aahz@pythoncraft.com  Sat Aug  3 07:27:14 2002
From: aahz@pythoncraft.com (Aahz)
Date: Sat, 3 Aug 2002 02:27:14 -0400
Subject: [Python-Dev] Docutils/reStructuredText is ready to process PEPs
In-Reply-To: 
References:  
Message-ID: <20020803062714.GA19878@panix.com>

On Sat, Aug 03, 2002, Ka-Ping Yee wrote:
> On Fri, 2 Aug 2002, David Goodger wrote:
>> [Guido]
>>>
>>> Funny, Ping doesn't strike me as a dinosaur.  More as someone who
>>> enjoys a good argument. :-)
>>
>> So that wasn't Abuse?  What a relief!
> 
> Just to make sure you know: i don't argue only for the sake of arguing.
> I argue when i think it will make Python better.

"That's what they all say."  ;-)
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Project Vote Smart: http://www.vote-smart.org/


From goodger@users.sourceforge.net  Sat Aug  3 15:34:43 2002
From: goodger@users.sourceforge.net (David Goodger)
Date: Sat, 03 Aug 2002 10:34:43 -0400
Subject: [Python-Dev] Docutils/reStructuredText is ready to process
 PEPs
In-Reply-To: 
Message-ID: 

Ka-Ping Yee wrote:
> On Fri, 2 Aug 2002, David Goodger wrote:
>> [Guido]
>>> Funny, Ping doesn't strike me as a dinosaur.  More as someone who
>>> enjoys a good argument. :-)
>> 
>> So that wasn't Abuse?  What a relief!
> 
> Just to make sure you know: i don't argue only for the sake of arguing.
> I argue when i think it will make Python better.

Understood and appreciated.  My comment was just a lame attempt at humor.
Apologies for omission of ";-)".

-- 
David Goodger    Open-source projects:
  - Python Docutils: http://docutils.sourceforge.net/
    (includes reStructuredText: http://docutils.sf.net/rst.html)
  - The Go Tools Project: http://gotools.sourceforge.net/



From rasjidw@openminddev.net  Sat Aug  3 16:01:05 2002
From: rasjidw@openminddev.net (Rasjid Wilcox)
Date: Sun, 4 Aug 2002 01:01:05 +1000
Subject: [Python-Dev] Adding popen2 like functionality to pty.py
Message-ID: <200208040101.05141.rasjidw@openminddev.net>

Dear Python Developers,

I have submited a patch to add a popen2 like function to the pty.py library.

It is just a first draft, and I'm happy to develop it further if there is 
interest.  If so, I will do some docs and have a look at the test library for 
it.  I would also be looking for some guidance on the best way to resolve 
some issues.

I'm new to Python and its development process, so I'm hoping I have not broken 
any rules by not waiting a response via the patch manager before posting to 
python-dev.

I would like to contribute to Python, as I think it is a truly delightful 
language.  I don't have a computer science background as such, more pure 
mathematics (set theory, group theory, logic etc). I don't know C (yet), so I 
would be looking to work on the pure Python libraries, or help create new 
ones.  I'm also willing to help with documentation.

Cheers,

Rasjid.



From xscottg@yahoo.com  Sat Aug  3 21:44:17 2002
From: xscottg@yahoo.com (Scott Gilbert)
Date: Sat, 3 Aug 2002 13:44:17 -0700 (PDT)
Subject: [Python-Dev] PEP 298, final (?) version
In-Reply-To: <200208021419.g72EJrB29608@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020803204417.36857.qmail@web40107.mail.yahoo.com>

--- Guido van Rossum  wrote:
> > > >         void PyObject_ReleaseFixedBuffer(PyObject *obj);
> > > 
> > > Would it be useful to allow bf_releasefixedbuffer to return an int
> > > indicating an exception?  For instance, it could raise an exception
> > > if the extension errantly releases more times than it has acquired
> > 
> > The code making the call might not be in an easy position
> > to deal with an exception -- e.g. an asynchronous I/O
> > routine called from a signal handler, another thread,
> > etc.
> > 
> > Maybe use the warning mechanism to produce a message?
> 
> In an asynch I/O situation, calling PyErr_Warn() is out of the
> question (it invokes Python code!).
> 
> I propose to make it a fatal error -- after all the only reason why
> bf_releasefixedbuffer could fail should be that the caller makes a
> mistake.  Since that's a bug in C code, a fatail error is acceptable.
> 

I don't know if you guys are hinting at the possibility of the
PyObject_ReleaseFixedBuffer function being called without holding
the GIL or not, but I think the GIL should be necessary during
this call.  (As such, the code making the call *could* deal with
the exception...  even if we don't want it to have to.)

So while a fatal error is still a reasonable response, the 
asynchronous I/O routine or signal handler or whatever really
should acquire the GIL before doing the release.  For one thing
this protects the lock_count variable from race conditions, and
another, it allows the implementation of bf_releasefixedbuffer
to use other Python APIs.




Cheers,
    -Scott


__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com


From guido@python.org  Sun Aug  4 01:22:41 2002
From: guido@python.org (Guido van Rossum)
Date: Sat, 03 Aug 2002 20:22:41 -0400
Subject: [Python-Dev] PEP 298, final (?) version
In-Reply-To: Your message of "Sat, 03 Aug 2002 13:44:17 PDT."
 <20020803204417.36857.qmail@web40107.mail.yahoo.com>
References: <20020803204417.36857.qmail@web40107.mail.yahoo.com>
Message-ID: <200208040022.g740Mfq26243@pcp02138704pcs.reston01.va.comcast.net>

> > > > >         void PyObject_ReleaseFixedBuffer(PyObject *obj);
> > > > 
> > > > Would it be useful to allow bf_releasefixedbuffer to return an int
> > > > indicating an exception?  For instance, it could raise an exception
> > > > if the extension errantly releases more times than it has acquired
> > > 
> > > The code making the call might not be in an easy position
> > > to deal with an exception -- e.g. an asynchronous I/O
> > > routine called from a signal handler, another thread,
> > > etc.
> > > 
> > > Maybe use the warning mechanism to produce a message?
> > 
> > In an asynch I/O situation, calling PyErr_Warn() is out of the
> > question (it invokes Python code!).
> > 
> > I propose to make it a fatal error -- after all the only reason why
> > bf_releasefixedbuffer could fail should be that the caller makes a
> > mistake.  Since that's a bug in C code, a fatail error is acceptable.
> 
> I don't know if you guys are hinting at the possibility of the
> PyObject_ReleaseFixedBuffer function being called without holding
> the GIL or not, but I think the GIL should be necessary during
> this call.  (As such, the code making the call *could* deal with
> the exception...  even if we don't want it to have to.)

Good point.

> So while a fatal error is still a reasonable response, the 
> asynchronous I/O routine or signal handler or whatever really
> should acquire the GIL before doing the release.  For one thing
> this protects the lock_count variable from race conditions, and
> another, it allows the implementation of bf_releasefixedbuffer
> to use other Python APIs.

Agreed.

Is the PEP clear that you have to hold the GIL for these calls?
(Can't hurt to be explicit, given the fact that one intention is to
*use* the buffer while the GIL is released...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net  Sun Aug  4 01:43:05 2002
From: tim.one@comcast.net (Tim Peters)
Date: Sat, 03 Aug 2002 20:43:05 -0400
Subject: [Python-Dev] seeing off SET_LINENO
In-Reply-To: <2mptx1liay.fsf@starship.python.net>
Message-ID: 

[Michael Hudson]
> It's about 5%:
>
> $ ../../build/python pystone.py
> Pystone(1.1) time for 100000 passes = 8.11
> This machine benchmarks at 12330.5 pystones/second
> $ ../../build/python pystone.py
> Pystone(1.1) time for 100000 passes = 7.69
> This machine benchmarks at 13003.9 pystones/second

If I didn't know better, I'd think you ran the same python twice there.

> I can run the vanilla pystone whilst compiling or something if you
> like :)

That's OK, the speedup will be larger on Windows.  I can guarantee that,
since I'll be doing the Windows timings .

> ...
> It's the old boys club effect: I worked damn hard to get to the point
> of understanding this stuff, so everyone else should bloody well have
> to too!
> ...
> All I can say is that I'd been driven insane by co_lnotab and
> Python/compile.c when I wrote that comment .

I understand.  It was insanity that drove me to write the co_lnotab comments
that tempted you into believing it was possible to do something rational
with it, and I apologize for that .  I like the new comments!  Thank
you.



From tim.one@comcast.net  Sun Aug  4 01:57:45 2002
From: tim.one@comcast.net (Tim Peters)
Date: Sat, 03 Aug 2002 20:57:45 -0400
Subject: [Python-Dev] seeing off SET_LINENO
In-Reply-To: <2msn1xpn00.fsf@starship.python.net>
Message-ID: 

[Michael Hudson]
> I've found another annoying problem.  I'm not really expecting someone
> here to sovle it for me, but writing it down might help me think
> clearly.
>
> This is about the function epilogues that always get generated.  I.e:
>
> >>> def f():
> ...     if a:
> ...         print 1
> ...
> >>> import dis
> >>> dis.dis(f)
>   2           0 LOAD_GLOBAL              0 (a)
>               3 JUMP_IF_FALSE            9 (to 15)
>               6 POP_TOP
>
>   3           7 LOAD_CONST               1 (1)
>              10 PRINT_ITEM
>              11 PRINT_NEWLINE
>              12 JUMP_FORWARD             1 (to 16)
>         >>   15 POP_TOP
>         >>   16 LOAD_CONST               0 (None)
>              19 RETURN_VALUE
>
> You can see here that the epilogue gets associated with line 3,
> whereas it shouldn't really be associated with any line at all.

It has to be associated with some line >= 3, as c_lnotab isn't capable of
expressing anything other than that.  It *could* associate it with "line 4",
though, if the compiler were changed to pump out another c_lntab entry at
the epilogue.  That would be better than saying the time is charged to line
3, since it isn't on line 3 then.  I'd be happy to trade away total insanity
for partial insanity .

> For why this is a problem:
>
> $ cat t.py
> a = 0
> def f():
>     if a:
>         print 1
>
> >>> pdb.runcall(t.f)
> > /home/mwh/src/sf/python/dist/src/build/t.py(3)f()
> -> if a:
> (Pdb) s
> > /home/mwh/src/sf/python/dist/src/build/t.py(4)f()
> -> print 1
> (Pdb)
> --Return--
> > /home/mwh/src/sf/python/dist/src/build/t.py(4)f()->None
> -> print 1
> (Pdb)
>
> The debugger stopping on the "print 1" is confusing.

It stops on the "if a:" for me twice today, and I doubt that's any less
confusing.  If it were set to line 4 instead, an unaltered pdb would
presumably show a blank line (whatever) after the function body, and an
altered pdb could be taught that "the last line" c_lnotab claims exists is
really devoted to exit code not associated with any source-file line.



From tim.one@comcast.net  Sun Aug  4 02:06:47 2002
From: tim.one@comcast.net (Tim Peters)
Date: Sat, 03 Aug 2002 21:06:47 -0400
Subject: [Python-Dev] Sorting
In-Reply-To: <20020802030441.GA831@panix.com>
Message-ID: 

[Aahz]
> ...
> It seems pretty clear by now that the new mergesort is going to replace
> samplesort, but since nobody else has said this, I figured I'd add one
> more comment:
>
> I actually understood your description of the new mergesort algorithm.
> Unless you can come up with similar docs for samplesort, the Beer Truck
> scenario dictates that mergesort be the new gold standard.
>
> Or to quote Tim Peters, "Complex is better than complicated."

Good observation!  I wish I'd thought of that .  The mergesort is more
complex, but it doesn't have so many fiddly little complications obscuring
it.  There were was an extensive description of the samplesort hybrid in
listobject.c's comments, but you know you're in Complication Heaven when
you've got to document half a dozen distinct "tuning macros" in hand-wavy
terms.  The only tuning parameter in the meregesort is MIN_GALLOP, and the
tradeoff it makes is explainable.



From tismer@tismer.com  Sun Aug  4 02:27:06 2002
From: tismer@tismer.com (Christian Tismer)
Date: Sun, 04 Aug 2002 03:27:06 +0200
Subject: [Python-Dev] timsort for jython
References: 
Message-ID: <3D4C82EA.4050307@tismer.com>

Tim Peters wrote:
> [Finn Bock]
> 
>>...
>>The old sorting code in jython was the 1.5 code from CPython with a
>>quicksort implementaion also inspired by Tim Peters.
> 
> 
> The sad thing is, that was a very good quicksort -- I thought I was done
> when I wrote that .

I'd like to pet you for your new version, and your split personality
which manages to create so much creativeness out of being
you best own enemy.

- chris

-- 
Christian Tismer             :^)   
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/




From tim.one@comcast.net  Sun Aug  4 02:44:21 2002
From: tim.one@comcast.net (Tim Peters)
Date: Sat, 03 Aug 2002 21:44:21 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: <3D4C82EA.4050307@tismer.com>
Message-ID: 

[Christian Tismer]
> I'd like to pet you for your new version,

LOL -- that comes off a bit, umm, endearing to American English ears.  But I
understand the sentiment, and thank you for it.

> and your split personality which manages to create so much creativeness
> out of being you best own enemy.

Yes, it takes one to know one indeed .

let's-everyone-get-together-for-a-big-Group-Pet!-ly y'rs  - tim



From tim.one@comcast.net  Sun Aug  4 04:32:30 2002
From: tim.one@comcast.net (Tim Peters)
Date: Sat, 03 Aug 2002 23:32:30 -0400
Subject: [Python-Dev] split('') revisited
In-Reply-To: <200208011314.g71DE4w07989@europa.research.att.com>
Message-ID: 

...

[Tim]
> It's the last line in the loop body that makes empty matches
> a wart if allowed:  they wouldn't advance the position at all, and an
> infinite loop would result.  In order to make them do what you think you
> want, we'd have to add, at the end of the loop body
>
>    ah, and if the match was emtpy, advance the position again, by, oh,
>    i don't know, how about 1?  That's close to 0 .

[Andrew Koenig]
> Indeed, that's an arbitrary rule -- just about as arbitrary as the one
> that you abbreviated above, which should really be
>
> 	    find the next match, but if the match is empty, disregard it;
> 	    instead, find the next match with a length of at least,
> 	    oh, I don't know, how about 1?  That's close to 0 .

You really think so?  I expect almost all programmers would understand what
"find next non-empty match" means at first glance -- and especially
regexp-slingers, who are often burned in their matching lives by the
consequences of having large pieces of their patterns unexpectedly match an
empty string.  That makes "non-empty match" seem a natural concept to me.

> What I'm trying to do is come up with a useful example to convince
> myself that one is better than the other.

Have you found one yet?  I confess that re.findall() implements a "if the
match was empty, advance the position by 1" rule, as in

>>> re.findall("x?", "abc")
['', '', '', '']
>>>

But I don't think we're doing anyone a favor with stuff like that.  I think
it's a dubious idea that

>>> "abc".find('')
0
>>>

"works" too.  If a program does s1.find(s2) and s2 is an empty string, I
expect the chances are good it's a logic error in the program.  Analogies
to, e.g., i+j when j happens to be 0 leave me cold, since I can think of a
thousand reasons for why j might naturally be 0.  But I've had a hard time
thinking of a reasonable algorithm where the expression s1.find(s2) could be
expected to have s2=="" in normal operation (and am sure it would have been
a logic error elsewhere in any uses of string.find() I've made; ditto
searching for, or splitting on, empty strings via regexps).



From aahz@pythoncraft.com  Sun Aug  4 07:53:12 2002
From: aahz@pythoncraft.com (Aahz)
Date: Sun, 4 Aug 2002 02:53:12 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: 
References: <3D4C82EA.4050307@tismer.com> 
Message-ID: <20020804065312.GA10986@panix.com>

On Sat, Aug 03, 2002, Tim Peters wrote:
>
> let's-everyone-get-together-for-a-big-Group-Pet!-ly y'rs  - tim

Anything you say, Commodore.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Project Vote Smart: http://www.vote-smart.org/


From skip@manatee.mojam.com  Sun Aug  4 13:00:22 2002
From: skip@manatee.mojam.com (Skip Montanaro)
Date: Sun, 4 Aug 2002 07:00:22 -0500
Subject: [Python-Dev] Weekly Python Bug/Patch Summary
Message-ID: <200208041200.g74C0MQb018932@manatee.mojam.com>

Bug/Patch Summary
-----------------

272 open / 2715 total bugs (+8)
131 open / 1632 total patches (-12)

New Bugs
--------

Invalid mmap crashes Python interpreter (2002-07-24)
	http://python.org/sf/585792
re.finditer (2002-07-24)
	http://python.org/sf/585882
macfs.FSSpec fails for "new" files (2002-07-24)
	http://python.org/sf/585923
Two corrects for weakref docs (2002-07-25)
	http://python.org/sf/586583
-S hides standard dynamic modules (2002-07-25)
	http://python.org/sf/586680
site-packages & build-dir python (2002-07-25)
	http://python.org/sf/586700
email package does not work with mailbox (2002-07-26)
	http://python.org/sf/586899
Empty genindex.html pages (2002-07-26)
	http://python.org/sf/586926
pydoc -w fails with path specified (2002-07-26)
	http://python.org/sf/586931
references to email package (2002-07-26)
	http://python.org/sf/586937
OSA Python integration (2002-07-26)
	http://python.org/sf/586998
IBCarbon module (2002-07-26)
	http://python.org/sf/587011
ur'\u' not handled properly (2002-07-26)
	http://python.org/sf/587087
python-mode and nested indents (2002-07-26)
	http://python.org/sf/587239
imaplib: prefix-quoted strings (2002-07-30)
	http://python.org/sf/588711
python should obey the FHS (2002-07-30)
	http://python.org/sf/588756
add main to py_pycompile (2002-07-30)
	http://python.org/sf/588768
unittest.py, better error message (2002-07-30)
	http://python.org/sf/588825
Memory leakage in SAX with exception (2002-07-31)
	http://python.org/sf/589149
socket.py wrapper needs a class (2002-07-31)
	http://python.org/sf/589262
shared libpython & dependant libraries (2002-07-31)
	http://python.org/sf/589422
standard include paths on command line (2002-07-31)
	http://python.org/sf/589427
"".split() ignores maxsplit arg (2002-08-01)
	http://python.org/sf/589965
PyMapping_Keys unexported in dll (2002-08-02)
	http://python.org/sf/590207
preconvert AppleSingle resource files (2002-08-02)
	http://python.org/sf/590456

New Patches
-----------

PEP 282 Implementation (2002-07-07)
	http://python.org/sf/578494
Adds Galeon support to webbrowser.py (2002-07-24)
	http://python.org/sf/585913
galeon support in webbrowser (2002-07-25)
	http://python.org/sf/586437
Better token-related error messages (2002-07-25)
	http://python.org/sf/586561
alternative SET_LINENO killer (2002-07-29)
	http://python.org/sf/587993
Cygwin _hotshot patch (2002-07-30)
	http://python.org/sf/588561
_locale library patch (2002-07-30)
	http://python.org/sf/588564
LDFLAGS support for build_ext.py (2002-07-30)
	http://python.org/sf/588809
Mindless editing, DL_EXPORT/IMPORT (2002-07-31)
	http://python.org/sf/588982
tempfile.py rewrite (2002-08-01)
	http://python.org/sf/589982
types.BoolType (2002-08-02)
	http://python.org/sf/590119
os._execvpe security fix (2002-08-02)
	http://python.org/sf/590294
py2texi.el update (2002-08-02)
	http://python.org/sf/590352
db4 include not found (2002-08-02)
	http://python.org/sf/590377
Add popen2 like functionality to pty.py. (2002-08-03)
	http://python.org/sf/590513
New codecs: html, asciihtml (2002-08-03)
	http://python.org/sf/590682

Closed Bugs
-----------

Need user-centered info for Windows users. (2000-11-27)
	http://python.org/sf/223599
profiling with xml parsing asserts (2002-03-25)
	http://python.org/sf/534864
Compile error _sre.c on Cray T3E (2002-05-19)
	http://python.org/sf/558153
DL_EXPORT on VC7 broken (2002-05-20)
	http://python.org/sf/558488
crash on gethostbyaddr (2002-06-07)
	http://python.org/sf/565747
asynchat module undocumented (2002-06-12)
	http://python.org/sf/568134
socket module htonl/ntohl bug (2002-06-12)
	http://python.org/sf/568322
.PYO files not imported unless -OO used (2002-06-18)
	http://python.org/sf/570640
CGIHTTPServer flushes read-only file. (2002-06-18)
	http://python.org/sf/570678
Negative __len__ provokes SystemError (2002-06-30)
	http://python.org/sf/575773
Sig11 in cPickle (stack overflow) (2002-07-01)
	http://python.org/sf/576084
Infinite recursion in Pickle (2002-07-02)
	http://python.org/sf/576419
GC Changes not mentioned in What's New (2002-07-12)
	http://python.org/sf/580462
pty.spawn - wrong error caught (2002-07-15)
	http://python.org/sf/581698
os.getlogin() fails (2002-07-21)
	http://python.org/sf/584566

Closed Patches
--------------

timestamp function for time module (2001-08-17)
	http://python.org/sf/452232
Unambiguous import for encodings (2001-09-06)
	http://python.org/sf/459381
no '_d' ending for mingw32 (2001-09-18)
	http://python.org/sf/462754
HTML version of the Idle "documentation" (2001-10-12)
	http://python.org/sf/470607
whichdb unittest (2002-04-09)
	http://python.org/sf/541694
s/Copyright/License/ in bdist_rpm.py (2002-04-13)
	http://python.org/sf/543498
merging sorted sequences (2002-04-15)
	http://python.org/sf/544113
Read/Write buffers from buffer() (2002-04-30)
	http://python.org/sf/550551
Better description in "python -h" for -u (2002-05-06)
	http://python.org/sf/552812
Cygwin make install patch (2002-05-08)
	http://python.org/sf/553702
__va_copy patches (2002-05-10)
	http://python.org/sf/554716
Ebcdic compliancy in stringobject source (2002-05-19)
	http://python.org/sf/557946
README additions for Cray T3E (2002-05-28)
	http://python.org/sf/561724
Fix bug in encodings.search_function (2002-06-20)
	http://python.org/sf/571603
Executable .pyc-files with hashbang (2002-06-23)
	http://python.org/sf/572796
Changing owner of symlinks (2002-06-25)
	http://python.org/sf/573770
Make python-mode.el use jython (2002-06-27)
	http://python.org/sf/574747
list.extend docstring fix (2002-06-27)
	http://python.org/sf/574867
SSL release GIL (2002-06-30)
	http://python.org/sf/575827
Extend PyErr_SetFromWindowsErr (2002-07-02)
	http://python.org/sf/576458
Remove PyArg_Parse() and METH_OLDARGS (2002-07-03)
	http://python.org/sf/577031
Merge xrange() into slice() (2002-07-05)
	http://python.org/sf/577875
fix for problems with test_longexp (2002-07-06)
	http://python.org/sf/578297
less restrictive HTML comments (2002-07-12)
	http://python.org/sf/580670
Canvas "select_item" always returns None (2002-07-14)
	http://python.org/sf/581396
info reader bug (2002-07-14)
	http://python.org/sf/581414
fix to pty.spawn error on Linux (2002-07-15)
	http://python.org/sf/581705
get python to link on OSF1 (Dec Unix) (2002-07-20)
	http://python.org/sf/584245


From tismer@tismer.com  Sun Aug  4 13:01:47 2002
From: tismer@tismer.com (Christian Tismer)
Date: Sun, 04 Aug 2002 14:01:47 +0200
Subject: [Python-Dev] On C inheritance
Message-ID: <3D4D17AB.9040704@tismer.com>

Hi Guido,

as you know, I love your new type/class implementation very
much as it is right now, probably not completely ready, but
performing great.
Yesterday, at the Berlin Python Community Meeting, we were
discussing several aspects of this.
A special issue was overloading of methods from C.
With the current design, it appears to be "correct" to call my
own methods always vial the dictionary interface of the type,
since users might have derived from it and want their versions
to be called.

Now, this is a performance issue, and there are of course special
cased things like "getattr" already, which make use of an extra slot
in the type structure to speed it up.

Now, all my new stackless objects are made inheritable from, and
I'd like to support it from C code as well, but I hesitate to
spend the extra dictionary lookup for a probably seldom case.
Therefore, I intended to extend my types in a way, that they
provide some extra type slots for overridden builtin methods.

Unfortunately, this is not supported at the moment, due to some
extension class compatibility issues. I'd like to patch this,
and allow metatypes to be extzended with extra function fields.

Would you support this? Or is something already on your boiler plate?

thanks - chris
-- 
Christian Tismer             :^)   
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?   http://www.stackless.com/





From tismer@tismer.com  Sun Aug  4 15:24:32 2002
From: tismer@tismer.com (Christian Tismer)
Date: Sun, 04 Aug 2002 16:24:32 +0200
Subject: [Python-Dev] timsort for jython
References: 
Message-ID: <3D4D3920.4030002@tismer.com>

Tim Peters wrote:
> [Christian Tismer]
> 
>>I'd like to pet you for your new version,
> 
> 
> LOL -- that comes off a bit, umm, endearing to American English ears.  But I
> understand the sentiment, and thank you for it.

I meant "to tap s.o. on the shoulder". Does this have the
meaning of encouraging and honoring?

-- 
Christian Tismer             :^)   
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/




From jeremy@alum.mit.edu  Sun Aug  4 11:52:18 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Sun, 4 Aug 2002 06:52:18 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: <3D4D3920.4030002@tismer.com>
References: 
 <3D4D3920.4030002@tismer.com>
Message-ID: <15693.1890.186232.931091@slothrop.zope.com>

>>>>> "CT" == Christian Tismer  writes:

  CT> Tim Peters wrote:
  >> [Christian Tismer]
  >>
  >>> I'd like to pet you for your new version,
  >>
  >>
  >> LOL -- that comes off a bit, umm, endearing to American English
  >> ears.  But I understand the sentiment, and thank you for it.

  CT> I meant "to tap s.o. on the shoulder". Does this have the
  CT> meaning of encouraging and honoring?

The verb pet is most often used to mean stroking or caressing an
animal -- a pet dog or cat.

There is also a slang usage of pet that is straight out of the
Hungarian phrasebook.  Not "My hovercraft is full of eels," but "Drop
your panties, Sir William, I cannot wait 'til lunchtime."

Your meaning was clear, but it was impossible to suppress a wry grin.

Jeremy



From gmcm@hypernet.com  Sun Aug  4 16:41:14 2002
From: gmcm@hypernet.com (Gordon McMillan)
Date: Sun, 4 Aug 2002 11:41:14 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: <3D4D3920.4030002@tismer.com>
Message-ID: <3D4D12DA.24979.39B66004@localhost>

On 4 Aug 2002 at 16:24, Christian Tismer wrote:

> >>I'd like to pet you for your new version,
> > 
> > 
> > LOL -- that comes off a bit, umm, endearing to
> > American English ears.  But I understand the
> > sentiment, and thank you for it.
> 
> I meant "to tap s.o. on the shoulder". Does this
> have the meaning of encouraging and honoring? 

For that you'd use "pat", as in "pat on the back".

"Pet" means (idiomatically) "stroke affectionately",
which is what you do to household animals & sexual
partners.

And, incidentally, "tap" is "light blow" as with
hammer or finger, where "blow" used in any other
context will likely be taken to mean "oral sex"
unless you're obviously discussing movement
of a gaseous media or the act of setting off a
bomb.

And that just covers those words as verbs (and
worse, I've probably missed a few meanings).

Don't you wish German were so, er, expressive ?

-- Gordon
http://www.mcmillan-inc.com/



From goodger@users.sourceforge.net  Sun Aug  4 16:43:15 2002
From: goodger@users.sourceforge.net (David Goodger)
Date: Sun, 04 Aug 2002 11:43:15 -0400
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib
In-Reply-To: <20020803160008.8652.80237.Mailman@mail.python.org>
Message-ID: 

[Guido]
>>> I was originally going to make it an unnamed string
>>> literal -- maybe that's better?

[David]
>> In PEP 258 I call those "Additional Docstrings":

[Guido]
> Ah, I thought there had to be something like that. :-)
> 
> Do you also recognize this if there are comments between?  Or blank
> lines?  E.g.
> 
>    def f():
>        """
>        foo
>        """
> 
>        # blah
> 
>        """
>        bar
>        """

We haven't gotten that far get.  I see no problems with blank lines, but
comments may block recognition, unless we choose to ignore them.  On the
other hand, comments themselves may be used in some circumstances; HappyDoc
recognizes comments *before* a def/class statement if there's no docstring.
There's still much to be thought out.

-- 
David Goodger    Open-source projects:
  - Python Docutils: http://docutils.sourceforge.net/
    (includes reStructuredText: http://docutils.sf.net/rst.html)
  - The Go Tools Project: http://gotools.sourceforge.net/



From tim.one@comcast.net  Sun Aug  4 19:09:02 2002
From: tim.one@comcast.net (Tim Peters)
Date: Sun, 04 Aug 2002 14:09:02 -0400
Subject: [Python-Dev] New encoding error in debug build
Message-ID: 

This assert near the end of get_coding_spec() in tokenizer.c triggers when
running test_heapq in a debug build:

					assert(strlen(r) >= strlen(q));

It's a very new failure.  Note that heapq.py begins with the line

# -*- coding: Latin-1 -*-

I assume that's relevant, but Latin-1 is way beyond my personal experience
with strange character sets .



From oren-py-d@hishome.net  Sun Aug  4 19:30:46 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Sun, 4 Aug 2002 21:30:46 +0300
Subject: [Python-Dev] Re: [ python-Patches-590682 ] New codecs: html, asciihtml
In-Reply-To: ; from noreply@sourceforge.net on Sun, Aug 04, 2002 at 08:54:05AM -0700
References: 
Message-ID: <20020804213046.A1460@hishome.net>

(I'm moving this to python-dev)

On Sun, Aug 04, 2002 at 08:54:05AM -0700, noreply@sourceforge.net wrote:
> >Comment By: Martin v. Löwis (loewis)
> Date: 2002-08-04 17:54
> 
> I'm in favour of exposing this via a search functions, for
> generated codec names, on top of PEP 293 (I would not like
> your codec to compete with the alternative mechanism). My
> dislike for the current patch also comes from the fact that
> it singles-out ASCII, which the search function would not.

I find PEP 293 too complex while my solution is, admittedly, too 
simplistic.

Some of my reservations about PEP 293:

It overloads the meaning of the error handling argument in an unintuitive
way.  It gets to the point where it's much more than just error handling - 
it's actually extending the functionality of the codec. 

Why implement yet another name-based registry?  There must be a simpler way 
to do it.

Generating an exception for each character that isn't handled by simple 
lookup probably adds quite a lot of overhead.

What are the use cases?  Maybe a simple extension to charmap would be enough 
for all the practical cases?

> In anycase, I'd encourage you to contribute to the progress
> of PEP 293 first - this has been an issue for several years
> now, and I would be sorry if it would fail.

Me too.  But if you really don't want it to be rejected you should try to
find a way to make it simpler.

> While you are waiting for PEP 293 to complete, please do
> consider cleaning up htmlentitydefs to provide mappings from
> and to Unicode characters.

No problem.  The question is whether anyone depends on its current form.  
My proposed changes:

1. Use all lowercase entity names as keys.
2. Map "entityname" to u"\uXXXX" (currently it's mapped to "&#nnnn;")

In its current form I find htmlentitydefs.py pretty useless. Names in the
input in arbitrary case will not match the MixedCase keys in the entitydefs 
dictionary and the decimal character reference isn't really more useful than 
the named entity reference. 

	Oren



From martin@v.loewis.de  Sun Aug  4 20:30:06 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 04 Aug 2002 21:30:06 +0200
Subject: [Python-Dev] Re: [ python-Patches-590682 ] New codecs: html, asciihtml
In-Reply-To: <20020804213046.A1460@hishome.net>
References: 
 <20020804213046.A1460@hishome.net>
Message-ID: 

Oren Tirosh  writes:

> It overloads the meaning of the error handling argument in an
> unintuitive way.  It gets to the point where it's much more than
> just error handling - it's actually extending the functionality of
> the codec.

Isn't that precisely the meaning fo "to handle"?

3 : to act on or perform a required function with regard to 
   

It produces a replacement text, just in the same way as "ignore" or
"replace" produce replacement texts.

> Why implement yet another name-based registry?  

Namespaces are one honking great idea -- let's do more of those!

> There must be a simpler way to do it.

Propose one.

> What are the use cases?  Maybe a simple extension to charmap would
> be enough for all the practical cases?

The primary use case is XML: how do you efficiently use xml charrefs.
Notice that you can *not* use the charmap codec, since the underlying
encoding may not be based on the charmap codec.

In addition, it allows to give a more detailed analysis of an encoding
error, as it exposes the string position where the error occurs. This
allows to determine a "best" encoding (i.e. one that needs the fewest
amounts of exceptions, or the one that has the longest sequences of
same encodings).

> Me too.  But if you really don't want it to be rejected you should
> try to find a way to make it simpler.

Can you please elaborate why you think this is difficult? Is this a
concern about 
- the implementation of the PEP, or
- the implementation of error handlers, or
- the usage of error handlers?

I couldn't really believe that you find usage of this feature
difficult: just pass an error handling string to your codec just as
you currently do.

> 
> > While you are waiting for PEP 293 to complete, please do
> > consider cleaning up htmlentitydefs to provide mappings from
> > and to Unicode characters.
> 
> No problem.  The question is whether anyone depends on its current form.  
> My proposed changes:
> 
> 1. Use all lowercase entity names as keys.

That is probably a bad idea. Atleast for XHTML, the case of entity
references is normative. Even for HTML 4, it would be good if this
precisely matches the DTD.

You could provide a case-insensitive lookup function in addition.

> 2. Map "entityname" to u"\uXXXX" (currently it's mapped to "&#nnnn;")

I think htmlentitydefs.entitydefs must stay as-is, for
compatibility. Instead, I'd suggest to add additional
objects/functions. Of course, the data should be present only once -
all other functions/dictionaries could be derived.

> In its current form I find htmlentitydefs.py pretty useless. Names in the
> input in arbitrary case will not match the MixedCase keys in the entitydefs 
> dictionary and the decimal character reference isn't really more useful than 
> the named entity reference. 

Indeed. However, people probably rely on its specific contents, so any
more useful access to the data must preserve entitydefs in its current
form.

Regards,
Martin


From martin@v.loewis.de  Sun Aug  4 21:12:17 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 04 Aug 2002 22:12:17 +0200
Subject: [Python-Dev] New encoding error in debug build
In-Reply-To: 
References: 
Message-ID: 

Tim Peters  writes:

> # -*- coding: Latin-1 -*-
> 
> I assume that's relevant, but Latin-1 is way beyond my personal experience
> with strange character sets .

It was normalizing that to "iso-8859-1", and was then surprised that
it got longer.

Regards,
Martin


From pobrien@orbtech.com  Sun Aug  4 22:07:08 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Sun, 4 Aug 2002 16:07:08 -0500
Subject: [Python-Dev] Single- vs. Multi-pass iterability
In-Reply-To: <200207171503.g6HF3mW01047@odiug.zope.com>
Message-ID: 

[Guido van Rossum]
>
> - There really isn't anything "broken" about the current situation;
>   it's just that "next" is the only method name mapped to a slot in
>   the type object that doesn't have leading and trailing double
>   underscores.

I'm way behind on the email for this list, but I wanted to chime in with an
idea related to this old thread. I know we want to limit the rate of
language/feature changes for the business community. At the same time, this
situation with iterators is proof that even the best thought out new
features can still have a few blemishes that get discovered after they've
been incorporated into Python proper. It's just terribly difficult to get
anything "right" the very first time, and it would be nice to fix these
blemishes sooner, rather than later.

So perhaps we need some sort of concept of a "grace period" on brand-new
features during which blemishes can be polished off, even if the polishing
breaks backward compatibility. After the grace period, breaking backward
compatibility becomes a higher priority. Since we are talking about backward
compatibility only as it relates to the brand-new features themselves,
Python-In-A-Tie folks can avoid the issue altogether by not using the new
features during the grace period.

Would something like this be an acceptable compromise?

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------



From pinard@iro.umontreal.ca  Mon Aug  5 02:34:26 2002
From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard)
Date: 04 Aug 2002 21:34:26 -0400
Subject: [Python-Dev] Re: Single- vs. Multi-pass iterability
In-Reply-To: 
References: 
Message-ID: 

[Patrick K. O'Brien]

> So perhaps we need some sort of concept of a "grace period" on brand-new
> features during which blemishes can be polished off, even if the polishing
> breaks backward compatibility.  [...]  Would something like this be an
> acceptable compromise?

I know it was not the original intent of importing from __future__,
but maybe this could be linked with __future__ as well.  People wanting
guaranteed stability should just never import from __future__ at all,
It's just an idea, I'm not pushing for it.  I do not even like it...

For one, I'm quite ready to adjust the things I'm responsible for, whenever
the need arises, not being part of the bullet tied to Python development.
On the other hand, I know administrators not far from me that get very
upset when they learn about specification changes for any software they
rely on for production, and I do understand their need for a peaceful life.
Surely, it's not easy to please everybody.  In French, there is this nice
proverb, which is in fact the last verses of one of LaFontaine's fables:

   "On ne peut, dit le meunier,
    plaire à tout le monde et à son père:
    bien faire et laisser braire!".

In word, that means "do well and let mumble!". :-)

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard


From tim.one@comcast.net  Mon Aug  5 04:39:34 2002
From: tim.one@comcast.net (Tim Peters)
Date: Sun, 04 Aug 2002 23:39:34 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: <15693.1890.186232.931091@slothrop.zope.com>
Message-ID: 

[Jeremy Hylton]
> The verb pet is most often used to mean stroking or caressing an
> animal -- a pet dog or cat.

So *that's* what it means!  Boy, is my face red.  Christian can think of me
like that all he likes.  I was afraid he meant it in the other sense, and I
never drop my panties before lunchtime, stackless be damned .



From greg@cosc.canterbury.ac.nz  Mon Aug  5 04:59:16 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 05 Aug 2002 15:59:16 +1200 (NZST)
Subject: [Python-Dev] timsort for jython
In-Reply-To: <3D4D12DA.24979.39B66004@localhost>
Message-ID: <200208050359.g753xFH20970@oma.cosc.canterbury.ac.nz>

Gordon McMillan :

> where "blow" used in any other
> context will likely be taken to mean "oral sex"

Which is a very odd usage, when you think about it --
I mean, more of a sucking action is involved than
anything...

And I am *not* going to be the first person to
mention the song "Sit On My Face" in this thread.
Oops... dash it...


From oren-py-d@hishome.net  Mon Aug  5 06:51:51 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Mon, 5 Aug 2002 01:51:51 -0400
Subject: [Python-Dev] Single- vs. Multi-pass iterability
In-Reply-To: 
References: <200207171503.g6HF3mW01047@odiug.zope.com> 
Message-ID: <20020805055151.GA70679@hishome.net>

On Sun, Aug 04, 2002 at 04:07:08PM -0500, Patrick K. O'Brien wrote:
> [Guido van Rossum]
> >
> > - There really isn't anything "broken" about the current situation;
> >   it's just that "next" is the only method name mapped to a slot in
> >   the type object that doesn't have leading and trailing double
> >   underscores.
> 
> I'm way behind on the email for this list, but I wanted to chime in with an
> idea related to this old thread. I know we want to limit the rate of
> language/feature changes for the business community. At the same time, this
> situation with iterators is proof that even the best thought out new
> features can still have a few blemishes that get discovered after they've
> been incorporated into Python proper. 

I think I have a reasonable solution for the re-iteration blemish in the
iteration protcol without breaking backward compatibility:

http://mail.python.org/pipermail/python-dev/2002-July/026960.html

> So perhaps we need some sort of concept of a "grace period" on brand-new
> features during which blemishes can be polished off, even if the polishing
> breaks backward compatibility. After the grace period, breaking backward
> compatibility becomes a higher priority.

Giving more people a chance to play with new features before they are 
finalized is a very good idea. When a significant new feature is checked in 
to the CVS a preview version can be released in source and precompiled form
to encourage more people to test it. Most CVS snapshots seem stable enough 
for a programmer's daily use.

A good example of such a significant new feature is the source encoding 
just checked in.

	Oren


From mal@lemburg.com  Mon Aug  5 09:03:16 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 05 Aug 2002 10:03:16 +0200
Subject: [Python-Dev] Re: [ python-Patches-590682 ] New codecs: html,
 asciihtml
References:  <20020804213046.A1460@hishome.net>
Message-ID: <3D4E3144.8070704@lemburg.com>

Oren Tirosh wrote:
> (I'm moving this to python-dev)

I've already answered on the SF tracker. Won't repeat things
here.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From mal@lemburg.com  Mon Aug  5 09:12:30 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 05 Aug 2002 10:12:30 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <3D1057E8.9090200@livinglogic.de>
Message-ID: <3D4E336E.8070700@lemburg.com>

I'd like to put the following PEP up for pronouncement. Walter
is currently on vacation, but he asked me to already go ahead
with the process.

	http://www.python.org/peps/pep-0293.html

I like the patch a lot and the implementation strategy is very
interesting as well (just wish that classes were new types --
then things could run a tad faster and the patch would be
simpler).

The basic idea of the patch is to provide a way to elegantly
handle error situations in codecs which go beyond the standard
cases 'ignore', 'replace' and 'strict', e.g. to automagically
escape problem case, to log errors for later review or to
fetch additional information for the proper handling at coding
time (for example, fetching entity definitions from a URL).

Thanks,
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From mwh@python.net  Mon Aug  5 09:54:22 2002
From: mwh@python.net (Michael Hudson)
Date: 05 Aug 2002 09:54:22 +0100
Subject: [Python-Dev] seeing off SET_LINENO
In-Reply-To: Tim Peters's message of "Sat, 03 Aug 2002 20:57:45 -0400"
References: 
Message-ID: <2mfzxt3cu9.fsf@starship.python.net>

Tim Peters  writes:

> [Michael Hudson]
> > I've found another annoying problem.  I'm not really expecting someone
> > here to sovle it for me, but writing it down might help me think
> > clearly.
> >
> > This is about the function epilogues that always get generated.  I.e:
[snip example]
> > You can see here that the epilogue gets associated with line 3,
> > whereas it shouldn't really be associated with any line at all.
> 
> It has to be associated with some line >= 3, as c_lnotab isn't capable of
> expressing anything other than that.

Yes.

> It *could* associate it with "line 4", though, if the compiler were
> changed to pump out another c_lntab entry at the epilogue.  That
> would be better than saying the time is charged to line 3, since it
> isn't on line 3 then.  I'd be happy to trade away total insanity for
> partial insanity .

This would be bad if you had

def f():
    print 1
def g():
    print 2

Anyway, I think I've found a way to get around this (see the patch).

> It stops on the "if a:" for me twice today, and I doubt that's any less
> confusing.  If it were set to line 4 instead, an unaltered pdb would
> presumably show a blank line (whatever) after the function body, and an
> altered pdb could be taught that "the last line" c_lnotab claims exists is
> really devoted to exit code not associated with any source-file line.

Yes.  I didn't really like the idea of heavily hacking pdb, as I don't
understand it.

Cheers,
M.

-- 
39. Re graphics:  A picture is worth 10K words - but only those
    to describe the picture.  Hardly any sets of 10K words can be
    adequately described with pictures.
  -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html


From tismer@tismer.com  Mon Aug  5 11:43:37 2002
From: tismer@tismer.com (Christian Tismer)
Date: Mon, 05 Aug 2002 12:43:37 +0200
Subject: [Python-Dev] Simpler reformulation of C inheritance Q.
References: <3D4D17AB.9040704@tismer.com>
Message-ID: <3D4E56D9.3090503@tismer.com>

Hi Guido:

here a simpler formulation of my question:

I would like to create types with overridable methods.
This is supported by the new type system.

But I'd also like to make this as fast as possible and
therefore to avoid extra dictionary lookups for methods,
especially if they are most likely not overridden.

This would mean to create an extra meta type which creates
types with a couple of extra slots, for caching overridden
methods.

My problem is now that type objects are already variable
sized and cannot support slots in the metatype.
Is there a workaround on the boilerplate, or is there
interest in a solution?
Any suggestion how to implement it?

thanks - chris

-- 
Christian Tismer             :^)   
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?   http://www.stackless.com/





From tismer@tismer.com  Mon Aug  5 11:45:08 2002
From: tismer@tismer.com (Christian Tismer)
Date: Mon, 05 Aug 2002 12:45:08 +0200
Subject: [Python-Dev] timsort for jython
References: <3D4D12DA.24979.39B66004@localhost>
Message-ID: <3D4E5734.3070708@tismer.com>

Gordon McMillan wrote:

 > For that you'd use "pat", as in "pat on the back".

Should have looked into dict.leo.org, before sending :)

 > "Pet" means (idiomatically) "stroke affectionately",
 > which is what you do to household animals & sexual
 > partners.
 >
 > And, incidentally, "tap" is "light blow" as with
 > hammer or finger, where "blow" used in any other
 > context will likely be taken to mean "oral sex"
 > unless you're obviously discussing movement
 > of a gaseous media or the act of setting off a
 > bomb.
 >
 > And that just covers those words as verbs (and
 > worse, I've probably missed a few meanings).

Any chance to learn all about that?

 > Don't you wish German were so, er, expressive ?

In fact, it is. But I guess you have to be born here,
to know about all+1 of the nuances.

-- 
Christian Tismer             :^)   
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?   http://www.stackless.com/





From sholden@holdenweb.com  Mon Aug  5 12:25:49 2002
From: sholden@holdenweb.com (Steve Holden)
Date: Mon, 5 Aug 2002 07:25:49 -0400
Subject: [Python-Dev] Single- vs. Multi-pass iterability
References: <200207171503.g6HF3mW01047@odiug.zope.com>  <20020805055151.GA70679@hishome.net>
Message-ID: <01c801c23c72$da6e3870$6300000a@holdenweb.com>

[Oren Tirosh]
> On Sun, Aug 04, 2002 at 04:07:08PM -0500, Patrick K. O'Brien wrote:
> > [Guido van Rossum]
> > >
> > > - There really isn't anything "broken" about the current situation;
> > >   it's just that "next" is the only method name mapped to a slot in
> > >   the type object that doesn't have leading and trailing double
> > >   underscores.
> >
But would you define it as __next__() if you had to do it again? A
__next__()/next() relationship does seem to fit more neatly.

> > I'm way behind on the email for this list, but I wanted to chime in with
an
> > idea related to this old thread. I know we want to limit the rate of
> > language/feature changes for the business community. At the same time,
this
> > situation with iterators is proof that even the best thought out new
> > features can still have a few blemishes that get discovered after
they've
> > been incorporated into Python proper.
>
> I think I have a reasonable solution for the re-iteration blemish in the
> iteration protcol without breaking backward compatibility:
>
> http://mail.python.org/pipermail/python-dev/2002-July/026960.html
>

> > So perhaps we need some sort of concept of a "grace period" on brand-new
> > features during which blemishes can be polished off, even if the
polishing
> > breaks backward compatibility. After the grace period, breaking backward
> > compatibility becomes a higher priority.
>
> Giving more people a chance to play with new features before they are
> finalized is a very good idea. When a significant new feature is checked
in
> to the CVS a preview version can be released in source and precompiled
form
> to encourage more people to test it. Most CVS snapshots seem stable enough
> for a programmer's daily use.
>
Given the general lack of alpha- and beta-testing there'd be very little
feedback. I seem to remember that the CVS snapshots went missing in action
recently without anyone noticing, which shows that they aren't much used,
and I guess the same would be true of preview versions. Tracking the CVS
repository will test such features, but getting more testing than that would
be difficult.

I *do* agree that such feature testing would be inestimably useful.

> A good example of such a significant new feature is the source encoding
> just checked in.
>
Indeed.

regards
-----------------------------------------------------------------------
Steve Holden                                 http://www.holdenweb.com/
Python Web Programming                http://pydish.holdenweb.com/pwp/
-----------------------------------------------------------------------






From oren-py-d@hishome.net  Mon Aug  5 12:46:36 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Mon, 5 Aug 2002 14:46:36 +0300
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: <3D4E336E.8070700@lemburg.com>
Message-ID: <20020805144636.A9355@hishome.net>

On Mon, Aug 05, 2002 at 10:12:30AM +0200, M.-A. Lemburg wrote:
> I'd like to put the following PEP up for pronouncement. Walter
> is currently on vacation, but he asked me to already go ahead
> with the process.
> 
> 	http://www.python.org/peps/pep-0293.html
> 
> I like the patch a lot and the implementation strategy is very
> interesting as well (just wish that classes were new types --
> then things could run a tad faster and the patch would be
> simpler).

Here's another implementation strategy:

Charmap entries can currently be None, an integer or a unicode string. I
suggest adding another option: a function or other callable. The function
will be called with the input string and current position as arguments and
return a 2-tuple of the replacement string and number of characters
consumed.  This will make it very easy to take the decoding charmap of an 
existing codec and patch it with a special-case for one character like '&'
to generate character references, for example. 

The function may raise an exception.  The error strategy argument will 
not be overloaded with new functionality - it will just determine whether 
this exception will be ignored or passed on.

An existing encoding charmap (usually a dictionary) can also be patched for 
special characters like <,>,&.  A special entry with a None key will be
the default entry used on a KeyError and will usually be mapped to a 
function.  If no None key is present the charmap will behave exactly the way 
it does now.  

Tying it all together:

A codec that does both charmap and entity reference translations may be 
dynamically generated.  A function will be registered that intercepts 
any codec name that looks like 'xmlcharref.CODECNAME', import that codec, 
create patched charmaps and return the (enc, dec, reader, writer) tuple.
The dynamically created entry will be cached for later use. 

	Oren


From fredrik@pythonware.com  Mon Aug  5 14:13:08 2002
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Mon, 5 Aug 2002 15:13:08 +0200
Subject: [Python-Dev] Re: Docutils/reStructuredText is ready to process PEPs
References: 
Message-ID: <024401c23c84$1f8b07b0$05d141d5@hagrid>

Ka-Ping Yee wrote:

> I would be very unhappy about having to enter and edit inline
> documentation in an XML-based markup language.

have you tried it?

I suggest taking a look at 2.3's xmlrpclib.py module.

does the comments that start with a single ## line look scary
to you?

it's javadoc-style markup, which is based on HTML.  if you've
ever written a webpage, you can learn the rest in a couple of
minutes.





From fredrik@pythonware.com  Mon Aug  5 14:29:19 2002
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Mon, 5 Aug 2002 15:29:19 +0200
Subject: [Python-Dev] timsort for jython
References: 
Message-ID: <024701c23c84$2037c270$05d141d5@hagrid>

tim wrote:

> > You also don't need to hold back on giving stability garanties in the
> > documentation for jython's sake.
> 
> I didn't .  Stability doesn't come free, and for all I know, in
> another 3 years a method will be discovered that's 3x faster but not
> stable.

sounds like yet another reason to add two methods; one that
guarantees stability, and one that doesn't.

the only counter-argument I've seen from you is code bloat, but
I cannot see what stops us from mapping *both* methods to a
single implementation in CPython 2.3.

an alternative would be to add a sortlib module:

    $ more Lib/sortlib.py

    def stablesort(list):
        list.sort() # 2.3's timsort is stable!

and a regression test script that makes sure that it really is stable
(can a test program ever be sure?)





From David Abrahams"  <3D4E56D9.3090503@tismer.com>
Message-ID: <01d601c23c84$d2783c80$62a6accf@boostconsulting.com>

From: "Christian Tismer" 


> Hi Guido:
>
> here a simpler formulation of my question:
>
> I would like to create types with overridable methods.
> This is supported by the new type system.
>
> But I'd also like to make this as fast as possible and
> therefore to avoid extra dictionary lookups for methods,
> especially if they are most likely not overridden.
>
> This would mean to create an extra meta type which creates
> types with a couple of extra slots, for caching overridden
> methods.
>
> My problem is now that type objects are already variable
> sized and cannot support slots in the metatype.
> Is there a workaround on the boilerplate, or is there
> interest in a solution?
> Any suggestion how to implement it?

I believe this is roughly the same thing I was bugging Guido about just
before Python-dev. I wanted types which acted like new-style classes, but
with room for an 'C' int to store some extra information -- namely, whether
there were multiple 'C' extension classes being used as bases. IIRC the
verdict was, "you can't do that today, but there should be a way to do it".
Also if I remember anything about my hasty analysis at the time, the
biggest challenge would be getting code which accesses types to rely on
their tp_basicsize in order to find the beginning of the variable stuff.

FWIW, I'm still interested in seeing this addressed.

Thanks,
Dave


-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com




From guido@python.org  Mon Aug  5 14:48:24 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 09:48:24 -0400
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: Your message of "Mon, 05 Aug 2002 10:12:30 +0200."
 <3D4E336E.8070700@lemburg.com>
References: <3D1057E8.9090200@livinglogic.de>
 <3D4E336E.8070700@lemburg.com>
Message-ID: <200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>

> I'd like to put the following PEP up for pronouncement. Walter
> is currently on vacation, but he asked me to already go ahead
> with the process.
> 
> 	http://www.python.org/peps/pep-0293.html
> 
> I like the patch a lot and the implementation strategy is very
> interesting as well (just wish that classes were new types --
> then things could run a tad faster and the patch would be
> simpler).
> 
> The basic idea of the patch is to provide a way to elegantly
> handle error situations in codecs which go beyond the standard
> cases 'ignore', 'replace' and 'strict', e.g. to automagically
> escape problem case, to log errors for later review or to
> fetch additional information for the proper handling at coding
> time (for example, fetching entity definitions from a URL).

I know you want me to pronounce on this, but I'd like to abstain.

I have no experience in using codecs to have any kind of sense about
whether this is good or not.  If you feel confident that it's good,
you can make the decision on your own.  If you'r not yet confident, I
suggest getting more review.  I do note that the patch is humungous
(isn't everything related to Unicode? :-) so might need more review
before it goes it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fredrik@pythonware.com  Mon Aug  5 14:57:10 2002
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Mon, 5 Aug 2002 15:57:10 +0200
Subject: [Python-Dev] Re: [ python-Patches-590682 ] New codecs: html, asciihtml
References:  <20020804213046.A1460@hishome.net>
Message-ID: <029301c23c88$02a287a0$05d141d5@hagrid>

Oren Tirosh wrote:

> In its current form I find htmlentitydefs.py pretty useless.

I use it a lot, and find it reasonably useful.  sure beats typing in
the HTML character tables myself, or writing a DTD parser.

> Names in the input in arbitrary case will not match the MixedCase
> keys in the entitydefs dictionary

people who use oddball characters may prefer to keep uppercase
letters separate from lowercase letters.  if I type "Link=F6ping" using
a named entity, I don't want it to come out as "Link=D6ping".

if you don't care, nothing stops you from using  the "lower" string
method.

> and the decimal character reference isn't really more useful than
> the named entity reference.

really?  converting a decimal character reference to a unicode character
is trivial, but how do you convert a named entity reference to a unicode
character?  (look it up in the htmlentitydefs?)

here's a trivial piece of code that converts the entitydefs dictionary to
a entity->unicode mapping:

    entitydefs_unicode =3D {}
    for entity, char in entitydefs.items():
        if char[:2] =3D=3D "&#":
            char =3D unichr(int(char[2:-1]))
        else:
            char =3D unicode(char, "iso-8859-1")
        entitydefs_unicode[entity] =3D char





From guido@python.org  Mon Aug  5 15:08:04 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 10:08:04 -0400
Subject: [Python-Dev] Simpler reformulation of C inheritance Q.
In-Reply-To: Your message of "Mon, 05 Aug 2002 09:32:51 EDT."
 <01d601c23c84$d2783c80$62a6accf@boostconsulting.com>
References: <3D4D17AB.9040704@tismer.com> <3D4E56D9.3090503@tismer.com>
 <01d601c23c84$d2783c80$62a6accf@boostconsulting.com>
Message-ID: <200208051408.g75E84113668@pcp02138704pcs.reston01.va.comcast.net>

[Christian Tismer]
> > I would like to create types with overridable methods.
> > This is supported by the new type system.
> >
> > But I'd also like to make this as fast as possible and
> > therefore to avoid extra dictionary lookups for methods,
> > especially if they are most likely not overridden.
> >
> > This would mean to create an extra meta type which creates
> > types with a couple of extra slots, for caching overridden
> > methods.
> >
> > My problem is now that type objects are already variable
> > sized and cannot support slots in the metatype.
> > Is there a workaround on the boilerplate, or is there
> > interest in a solution?
> > Any suggestion how to implement it?

[David Abrahams]
> I believe this is roughly the same thing I was bugging Guido about
> just before Python-dev. I wanted types which acted like new-style
> classes, but with room for an 'C' int to store some extra
> information -- namely, whether there were multiple 'C' extension
> classes being used as bases. IIRC the verdict was, "you can't do
> that today, but there should be a way to do it".  Also if I remember
> anything about my hasty analysis at the time, the biggest challenge
> would be getting code which accesses types to rely on their
> tp_basicsize in order to find the beginning of the variable stuff.

Yes, we need a solution for this, but I still haven't figured out how
to do it.  Help (best in the form of a suggested strategy) would be
appreciated.

>From Christian's post I can't tell if he wants his types to be dynamic
or static (i.e. if he's creating an arbitrary number of them at
run-time or only a fixed number that's known at compile-time).

Here's a hack.

For static extensions, you could extend one of the extension structs,
e.g. PyMappingMethods (which is the smallest and also least likely to
grow new methods), with additional fields.  Then you'd have to know
whether you can access those extra fields; I suggest checking for the
metatype.  A few casts and you're done.

For dynamic extensions, you might be able to do the same: after
type_new() has given you an object, allocate memory for an extended
PyMappingMethods struct, copy the existing PyMappingMethods struct
into it (if it exists), and replace the pointer.  Then in your
deallocation function, make sure to free the pointer.

Hope this helps in the short run.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From aahz@pythoncraft.com  Mon Aug  5 15:15:24 2002
From: aahz@pythoncraft.com (Aahz)
Date: Mon, 5 Aug 2002 10:15:24 -0400
Subject: [Python-Dev] Re: Docutils/reStructuredText is ready to process PEPs
In-Reply-To: <024401c23c84$1f8b07b0$05d141d5@hagrid>
References:  <024401c23c84$1f8b07b0$05d141d5@hagrid>
Message-ID: <20020805141524.GA9513@panix.com>

On Mon, Aug 05, 2002, Fredrik Lundh wrote:
> Ka-Ping Yee wrote:
>> 
>> I would be very unhappy about having to enter and edit inline
>> documentation in an XML-based markup language.
> 
> have you tried it?

Yes.

> I suggest taking a look at 2.3's xmlrpclib.py module.
> 
> does the comments that start with a single ## line look scary
> to you?
>
> it's javadoc-style markup, which is based on HTML.  if you've
> ever written a webpage, you can learn the rest in a couple of
> minutes.

That's not XML, and I wouldn't even call it XML-based.  It's yet another
structured text markup that includes bits of XML (or HTML or whatever)
and can be converted to XML.  I don't know what exactly you're using in
xmlrpclib.py, but I took a look at the javadoc docs when the discussion
of reST came up because I wanted to know what reST had that javadoc
didn't (and vice-versa) -- it's clear to me that javadoc is at least
somewhat limited compared to reST, and that using javadoc for any kind of
heavily marked-up docs looks far uglier than reST.

The part of reST that's as limited as what you're using in xmlrpclib.py
can also be learned in a couple of minutes.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Project Vote Smart: http://www.vote-smart.org/


From guido@python.org  Mon Aug  5 15:24:54 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 10:24:54 -0400
Subject: [Python-Dev] Single- vs. Multi-pass iterability
In-Reply-To: Your message of "Sun, 04 Aug 2002 16:07:08 CDT."
 
References: 
Message-ID: <200208051424.g75EOsj13845@pcp02138704pcs.reston01.va.comcast.net>

[me]
> > - There really isn't anything "broken" about the current situation;
> >   it's just that "next" is the only method name mapped to a slot in
> >   the type object that doesn't have leading and trailing double
> >   underscores.

[Patrick]
> I'm way behind on the email for this list, but I wanted to chime in
> with an idea related to this old thread. I know we want to limit the
> rate of language/feature changes for the business community. At the
> same time, this situation with iterators is proof that even the best
> thought out new features can still have a few blemishes that get
> discovered after they've been incorporated into Python proper. It's
> just terribly difficult to get anything "right" the very first time,
> and it would be nice to fix these blemishes sooner, rather than
> later.
> 
> So perhaps we need some sort of concept of a "grace period" on
> brand-new features during which blemishes can be polished off, even
> if the polishing breaks backward compatibility. After the grace
> period, breaking backward compatibility becomes a higher
> priority. Since we are talking about backward compatibility only as
> it relates to the brand-new features themselves, Python-In-A-Tie
> folks can avoid the issue altogether by not using the new features
> during the grace period.
> 
> Would something like this be an acceptable compromise?

I guess we could explicitly label certain features as experimental in
2.3.  I don't think we can interpret 2.2 like this retroactively --
while the new type stuff was labeled experimental at some point, the
iterators and generators were not, and the new type stuff was pretty
much fixed by releasing 2.2.1 (and by declaring 2.2 the tie-wearing
Python).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From oren-py-d@hishome.net  Mon Aug  5 15:47:03 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Mon, 5 Aug 2002 17:47:03 +0300
Subject: [Python-Dev] Re: [ python-Patches-590682 ] New codecs: html, asciihtml
In-Reply-To: <029301c23c88$02a287a0$05d141d5@hagrid>; from fredrik@pythonware.com on Mon, Aug 05, 2002 at 03:57:10PM +0200
References:  <20020804213046.A1460@hishome.net> <029301c23c88$02a287a0$05d141d5@hagrid>
Message-ID: <20020805174703.A11301@hishome.net>

On Mon, Aug 05, 2002 at 03:57:10PM +0200, Fredrik Lundh wrote:
> > and the decimal character reference isn't really more useful than
> > the named entity reference.
>
> really?  converting a decimal character reference to a unicode character
> is trivial, but how do you convert a named entity reference to a unicode
> character?  (look it up in the htmlentitydefs?)
>
> here's a trivial piece of code that converts the entitydefs dictionary to
> a entity->unicode mapping:
>
>     entitydefs_unicode = {}
>     for entity, char in entitydefs.items():
>         if char[:2] == "&#":
>             char = unichr(int(char[2:-1]))
>         else:
>             char = unicode(char, "iso-8859-1")
>         entitydefs_unicode[entity] = char

Sure it's trivial but why should I be forced to do this conversion? I'm
sorry if I didn't explain myself so well. What I meant is not that the
entitydefs dictionary is useless but that decimal character references are
not useful by themselves - they are just another intermediate form.  Why
does the dictionary convert from "α" to "α" and not to the
fully decoded form which is the single unicode character u'\u03b1'?

I can't think of a case where numeric references are really useful by
themselves and not as some intermediate form.  Browsers understand
"α" and "α" equally well. Humans find the named references
easier to understand. Processing programs can't understand "α"
without first isolating the digits and converting them to a number. 

About case sensitivity you're right - smashing case does lose some
information. If a parser needs to understand sloppy manually-generated
HTML with tags like ">" it should be a little smarter than that.

	Oren



From mal@lemburg.com  Mon Aug  5 16:01:34 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 05 Aug 2002 17:01:34 +0200
Subject: [Python-Dev] Re: [ python-Patches-590682 ] New codecs: html,
 asciihtml
References:  <20020804213046.A1460@hishome.net> <029301c23c88$02a287a0$05d141d5@hagrid> <20020805174703.A11301@hishome.net>
Message-ID: <3D4E934E.8030302@lemburg.com>

Oren Tirosh wrote:
> On Mon, Aug 05, 2002 at 03:57:10PM +0200, Fredrik Lundh wrote:
> 
>>>and the decimal character reference isn't really more useful than
>>>the named entity reference.
>>
>>really?  converting a decimal character reference to a unicode character
>>is trivial, but how do you convert a named entity reference to a unicode
>>character?  (look it up in the htmlentitydefs?)
>>
>>here's a trivial piece of code that converts the entitydefs dictionary to
>>a entity->unicode mapping:
>>
>>    entitydefs_unicode = {}
>>    for entity, char in entitydefs.items():
>>        if char[:2] == "&#":
>>            char = unichr(int(char[2:-1]))
>>        else:
>>            char = unicode(char, "iso-8859-1")
>>        entitydefs_unicode[entity] = char
> 
> 
> Sure it's trivial but why should I be forced to do this conversion? 

Maybe because users of htmlentitydefs don't want to pay for
the extra table even though they don't use it ?

 > I'm
> sorry if I didn't explain myself so well. What I meant is not that the
> entitydefs dictionary is useless but that decimal character references are
> not useful by themselves - they are just another intermediate form.  Why
> does the dictionary convert from "α" to "α" and not to the
> fully decoded form which is the single unicode character u'\u03b1'?

Because that only works for Unicode and not all applications
are written to work with Unicode. The table maps entities to
Latin-1 which is HTML's default encoding.

> I can't think of a case where numeric references are really useful by
> themselves and not as some intermediate form.  Browsers understand
> "α" and "α" equally well. Humans find the named references
> easier to understand. Processing programs can't understand "α"
> without first isolating the digits and converting them to a number. 
> 
> About case sensitivity you're right - smashing case does lose some
> information. If a parser needs to understand sloppy manually-generated
> HTML with tags like ">" it should be a little smarter than that.

That is application specific. The htmlentitydefs were generated
from the HTML spec files themselves, so they provide the basics
needed to work from. It's easy enough for you to write a function
which translates the basic table into anything you need.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From fredrik@pythonware.com  Mon Aug  5 16:06:50 2002
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Mon, 5 Aug 2002 17:06:50 +0200
Subject: [Python-Dev] Re: Docutils/reStructuredText is ready to process PEPs
References:  <024401c23c84$1f8b07b0$05d141d5@hagrid> <20020805141524.GA9513@panix.com>
Message-ID: <035801c23c91$bcaf6150$05d141d5@hagrid>

Aahz wrote:

> > have you tried it?
> 
> Yes.

Details, please.

We've recently used javadoc/pythondoc in a relatively large Python
project (currently 30ksloc python, about 350 pages extracted docs)
with good results.  Most people involved had some exposure to html,
but not javadoc.  I don't think we've seen any markup errors at all.

> > it's javadoc-style markup, which is based on HTML.  if you've
> > ever written a webpage, you can learn the rest in a couple of
> > minutes.
> 
> That's not XML, and I wouldn't even call it XML-based.

It all ends up in an XML infoset, and the mapping is can be
described in a single sentence.  Close enough for me.

> using javadoc for any kind of heavily marked-up docs looks far
> uglier than reST.

Why would anyone put heavily marked-up documentation in
docstrings?  Are you doing that?  Any reason you cannot use
a word processor (interactive or batch) for those parts?

> The part of reST that's as limited as what you're using in xmlrpclib.py
> can also be learned in a couple of minutes.

Perhaps, but I already know HTML and JavaDoc; why waste brain
cells on learning yet another homebrewn markup language?  





From mal@lemburg.com  Mon Aug  5 16:20:07 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 05 Aug 2002 17:20:07 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <3D1057E8.9090200@livinglogic.de>              <3D4E336E.8070700@lemburg.com> <200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3D4E97A7.7000904@lemburg.com>

Guido van Rossum wrote:
>>I'd like to put the following PEP up for pronouncement. Walter
>>is currently on vacation, but he asked me to already go ahead
>>with the process.
>>
>>	http://www.python.org/peps/pep-0293.html
>>
>>I like the patch a lot and the implementation strategy is very
>>interesting as well (just wish that classes were new types --
>>then things could run a tad faster and the patch would be
>>simpler).
>>
>>The basic idea of the patch is to provide a way to elegantly
>>handle error situations in codecs which go beyond the standard
>>cases 'ignore', 'replace' and 'strict', e.g. to automagically
>>escape problem case, to log errors for later review or to
>>fetch additional information for the proper handling at coding
>>time (for example, fetching entity definitions from a URL).
> 
> 
> I know you want me to pronounce on this, but I'd like to abstain.

Ok.

> I have no experience in using codecs to have any kind of sense about
> whether this is good or not.  If you feel confident that it's good,
> you can make the decision on your own.  If you'r not yet confident, I
> suggest getting more review.  I do note that the patch is humungous
> (isn't everything related to Unicode? :-) so might need more review
> before it goes it.

Walter has written a pretty good test suite for the patch
and I have a good feeling about it. I'd like Walter to check
it into CVS and then see whether the alpha tests bring up any
quirks. The patch only touches the codecs and adds some new
exceptions. There are no other changes involved.

I think that together with PEP 263 (source code encoding) this
is a great step forward in Python's i18n capabilities.

BTW, the test script contains some examples of how to put the
error callbacks to use:

http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=27815&aid=432401

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From guido@python.org  Mon Aug  5 16:27:00 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 11:27:00 -0400
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: Your message of "Mon, 05 Aug 2002 17:20:07 +0200."
 <3D4E97A7.7000904@lemburg.com>
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com> <200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>
 <3D4E97A7.7000904@lemburg.com>
Message-ID: <200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net>

> Walter has written a pretty good test suite for the patch
> and I have a good feeling about it. I'd like Walter to check
> it into CVS and then see whether the alpha tests bring up any
> quirks. The patch only touches the codecs and adds some new
> exceptions. There are no other changes involved.
> 
> I think that together with PEP 263 (source code encoding) this
> is a great step forward in Python's i18n capabilities.
> 
> BTW, the test script contains some examples of how to put the
> error callbacks to use:
> 
> http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=27815&aid=432401

Sounds like a plan then.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Mon Aug  5 16:27:05 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 05 Aug 2002 17:27:05 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <20020805144636.A9355@hishome.net>
Message-ID: <3D4E9949.9090906@lemburg.com>

Oren Tirosh wrote:
> Here's another implementation strategy:
 > [hacking charmap codec]
> 
> Tying it all together:
> 
> A codec that does both charmap and entity reference translations may be 
> dynamically generated.  A function will be registered that intercepts 
> any codec name that looks like 'xmlcharref.CODECNAME', import that codec, 
> create patched charmaps and return the (enc, dec, reader, writer) tuple.
> The dynamically created entry will be cached for later use. 

Even though that's possible, why add more magic to the codec registry ?
u.encode('latin-1', 'xmlcharrefreplace') looks much clearer to me.

You are of course free to write a codec which implements this
directly. No change to the core is needed for that.

However, PEP 293 addresses a much wider application space than
just escaping unmappable characters.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From Jack.Jansen@cwi.nl  Mon Aug  5 16:31:45 2002
From: Jack.Jansen@cwi.nl (Jack Jansen)
Date: Mon, 5 Aug 2002 17:31:45 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: <200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <73092160-A888-11D6-B799-0030655234CE@cwi.nl>

Having to register the error handler first and then finding it by name 
smells like a very big hack to me. I understand the reasoning (that you 
don't want to modify the API of a gazillion C routines to add an error 
object argument) but it still seems like a hack....
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From mal@lemburg.com  Mon Aug  5 16:47:21 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 05 Aug 2002 17:47:21 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <73092160-A888-11D6-B799-0030655234CE@cwi.nl>
Message-ID: <3D4E9E09.9070102@lemburg.com>

Jack Jansen wrote:
> Having to register the error handler first and then finding it by name 
> smells like a very big hack to me. I understand the reasoning (that you 
> don't want to modify the API of a gazillion C routines to add an error 
> object argument) but it still seems like a hack....

Well, in that case, you would have to call the whole codec registry
a hack ;-)

I find having the callback available by an alias name very user
friendly, but YMMV. The main reason behind this way of doing it
is to maintain C API compatibility without adding a complete
new b/w compatiblity layer (Walter started out this way; see the
SF patch page).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From tim.one@comcast.net  Mon Aug  5 17:06:28 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 12:06:28 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: <024701c23c84$2037c270$05d141d5@hagrid>
Message-ID: 

[/F]
> sounds like yet another reason to add two methods; one that
> guarantees stability, and one that doesn't.

I haven't heard the first reason, only people latching on to that a
distinction *can* be drawn, "so therefore it must be" (or something like
that ...).  The only portable way you can get stability is to do the DSU
business anyway.

> the only counter-argument I've seen from you is code bloat, but
> I cannot see what stops us from mapping *both* methods to a
> single implementation in CPython 2.3.

I passed that suggestion on in the patch report, when I asked Guido to
Pronounce, and he didn't want that.  Perl 5.8 "has a sort pragma for limited
control of the sort ... [which] may not persist into future perls",
according to .  Maybe
we should do that too .

> an alternative would be to add a sortlib module:
>
>     $ more Lib/sortlib.py
>
>     def stablesort(list):
>         list.sort() # 2.3's timsort is stable!
>
> and a regression test script that makes sure that it really is stable
> (can a test program ever be sure?)

I've suggested before that you may very well want to use DSU indices even if
you *know* the underlying sort is stable, in order to prevent massive
increase in sort time due to equal keys falling back to comparing records
(some sorts from Kevin Altis's database showed that dramatically).  So the
use cases for relying on stability *in Python* aren't all that clear:
passing an explicit comparison function is way slower, but sorting (key,
record) tuples instead is also prone to major slowdown surprises.  Sorting
(key, index, record) tuples remains your safest bet (unless you don't care
about speed).

So I'd like to see some real use cases.  An appropriate design for a sortlib
module may (or may not) suggest itself then.

BTW, list.sort() is stable in CPython iff

    [].sort.__doc__.find('stable')

is true.  Short of that, the stability test in Lib/test/test_sort.py will
almost certainly determine whether it's stable (not 100% certain, but
99.999999% easy ).



From guido@python.org  Mon Aug  5 17:16:49 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 12:16:49 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: Your message of "Mon, 05 Aug 2002 12:06:28 EDT."
 
References: 
Message-ID: <200208051616.g75GGn422821@pcp02138704pcs.reston01.va.comcast.net>

> BTW, list.sort() is stable in CPython iff
> 
>     [].sort.__doc__.find('stable')
> 
> is true.

Um, you meant "is >= 0".  The find() method doesn't return a bool, it
returns the first index where the string is found, and -1 if it is not
found.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net  Mon Aug  5 17:28:25 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 12:28:25 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: <200208051616.g75GGn422821@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

[Tim]
>> BTW, list.sort() is stable in CPython iff
>>
>>     [].sort.__doc__.find('stable')
>>
>> is true^H^H^H^H> 0.

[Guido]
> Um, you meant "is >= 0".  The find() method doesn't return a bool, it
> returns the first index where the string is found, and -1 if it is not
> found.

What, you mean you haven't retroactively redefined -1 to be False yet?  For
shame .



From aahz@pythoncraft.com  Mon Aug  5 17:30:36 2002
From: aahz@pythoncraft.com (Aahz)
Date: Mon, 5 Aug 2002 12:30:36 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: <200208051616.g75GGn422821@pcp02138704pcs.reston01.va.comcast.net>
References:  <200208051616.g75GGn422821@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020805163036.GA14290@panix.com>

On Mon, Aug 05, 2002, Guido van Rossum wrote:
>Tim Peters:
>>
>> BTW, list.sort() is stable in CPython iff
>> 
>>     [].sort.__doc__.find('stable')
>> 
>> is true.
> 
> Um, you meant "is >= 0".  The find() method doesn't return a bool, it
> returns the first index where the string is found, and -1 if it is not
> found.

Which only goes to prove that the people who've been whining about that
characteristic of find() were right all along.  ;-)
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Project Vote Smart: http://www.vote-smart.org/


From barry@python.org  Mon Aug  5 17:35:43 2002
From: barry@python.org (Barry A. Warsaw)
Date: Mon, 5 Aug 2002 12:35:43 -0400
Subject: [Python-Dev] timsort for jython
References: <200208051616.g75GGn422821@pcp02138704pcs.reston01.va.comcast.net>
 
Message-ID: <15694.43359.859181.718287@anthem.wooz.org>

>>>>> "TP" == Tim Peters  writes:

    TP> What, you mean you haven't retroactively redefined -1 to be
    TP> False yet?  For shame .

Have you checked current cvs?

Python 2.3a0 (#1, Aug  5 2002, 12:06:31) 
[GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> int(True)
1
>>> int(False)
0
>>> int(Maybe)
-1
>>> 

-Barry


From aahz@pythoncraft.com  Mon Aug  5 17:38:35 2002
From: aahz@pythoncraft.com (Aahz)
Date: Mon, 5 Aug 2002 12:38:35 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208051633.g75GXCe23884@pcp02138704pcs.reston01.va.comcast.net>
References:  <200208051616.g75GGn422821@pcp02138704pcs.reston01.va.comcast.net> <20020805163036.GA14290@panix.com> <200208051633.g75GXCe23884@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020805163835.GB14290@panix.com>

On Mon, Aug 05, 2002, Guido van Rossum wrote:
>Aahz:
>>Guido:
>>>
>>> Um, you meant "is >= 0".  The find() method doesn't return a bool, it
>>> returns the first index where the string is found, and -1 if it is not
>>> found.
>> 
>> Which only goes to prove that the people who've been whining about that
>> characteristic of find() were right all along.  ;-)
> 
> So what would you like it to return?  True/False, with no possibility
> of finding where the substring starts?  That defeats a common use
> case.

Well, of course it can't be changed, but if Tim of all people made that
mistake, I think it's a good indicator that something's wrong.  I believe
the suggestion has been made to add an exists() method or something
similar; it's probably better to have that in the core under some
standard name instead of each person who needs it implementing the
one-liner under different names.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Project Vote Smart: http://www.vote-smart.org/


From barry@python.org  Mon Aug  5 17:41:35 2002
From: barry@python.org (Barry A. Warsaw)
Date: Mon, 5 Aug 2002 12:41:35 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
References: 
 <200208051616.g75GGn422821@pcp02138704pcs.reston01.va.comcast.net>
 <20020805163036.GA14290@panix.com>
 <200208051633.g75GXCe23884@pcp02138704pcs.reston01.va.comcast.net>
 <20020805163835.GB14290@panix.com>
Message-ID: <15694.43711.180767.22408@anthem.wooz.org>

>>>>> "A" == Aahz   writes:

    A> Well, of course it can't be changed, but if Tim of all people
    A> made that mistake, I think it's a good indicator that
    A> something's wrong.  I believe the suggestion has been made to
    A> add an exists() method or something similar; it's probably
    A> better to have that in the core under some standard name
    A> instead of each person who needs it implementing the one-liner
    A> under different names.

What about extending `in' to allow strings longer than a single
character?  E.g.

>>> 'lo' in 'hello world'

?  That seems like the most natural way to want to spell it, and is an
extension of what you can already do.
-Barry


From guido@python.org  Mon Aug  5 17:43:20 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 12:43:20 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Mon, 05 Aug 2002 12:38:35 EDT."
 <20020805163835.GB14290@panix.com>
References:  <200208051616.g75GGn422821@pcp02138704pcs.reston01.va.comcast.net> <20020805163036.GA14290@panix.com> <200208051633.g75GXCe23884@pcp02138704pcs.reston01.va.comcast.net>
 <20020805163835.GB14290@panix.com>
Message-ID: <200208051643.g75GhLH24525@pcp02138704pcs.reston01.va.comcast.net>

> Well, of course it can't be changed, but if Tim of all people made that
> mistake, I think it's a good indicator that something's wrong.

I'm not arguing with that, but I'm not sure how to fix it.  We've
already got two substring test methods (index() and find()).  Do we
really need a third?

> I believe
> the suggestion has been made to add an exists() method or something
> similar; it's probably better to have that in the core under some
> standard name instead of each person who needs it implementing the
> one-liner under different names.

Nobody writes the one-liner, everybody tries to remember to use
.find()>=0.

I don't like exists().  Maybe we should finally implement "s1 in s2"
as "s2.find(s1) >= 0", i.e. add a __contains__ method to strings?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net  Mon Aug  5 17:47:37 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 12:47:37 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208051643.g75GhLH24525@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

[Guido]
> ...
> I don't like exists().  Maybe we should finally implement "s1 in s2"
> as "s2.find(s1) >= 0", i.e. add a __contains__ method to strings?

I asked you about that a few weeks ago, and you were agreeable.  I posted
that info to c.l.py, saying that if anyone cared enough to submit a patch,
the idea was pre-approved.  AFAIK, nobody bit (but I didn't pay much
attention to patches last week -- hope springs eternal ).



From tim.one@comcast.net  Mon Aug  5 17:48:53 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 12:48:53 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <20020805163835.GB14290@panix.com>
Message-ID: 

[Aahz]
> Well, of course it can't be changed, but if Tim of all people made that
> mistake, I think it's a good indicator that something's wrong.

Na, I make a lot of mistakes at these ungodly early hours.  "str1 in str2"
is the right solution now.



From skip@pobox.com  Mon Aug  5 17:52:56 2002
From: skip@pobox.com (Skip Montanaro)
Date: Mon, 5 Aug 2002 11:52:56 -0500
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <20020805163835.GB14290@panix.com>
References: 
 <200208051616.g75GGn422821@pcp02138704pcs.reston01.va.comcast.net>
 <20020805163036.GA14290@panix.com>
 <200208051633.g75GXCe23884@pcp02138704pcs.reston01.va.comcast.net>
 <20020805163835.GB14290@panix.com>
Message-ID: <15694.44392.580198.319793@localhost.localdomain>

    aahz> Well, of course it can't be changed, but if Tim of all people made
    aahz> that mistake, I think it's a good indicator that something's
    aahz> wrong.

I don't think that means any such thing.  First, bot or not, Tim is allowed
to make the occasional mistake.  Everybody does.  Making a mistake doesn't
mean the language is flawed in this case.  "Find" seems like the perfect
name ("tell me where this is") and its return value is absolutely correct
for further operation on the found substring (where it was found).  I don't
believe strings need to grow an .exists() method which in effect does

    def exists(self, sub, start=None, end=None):
        return self.find(sub, start, end) >= 0

which would probably be used a lot less than .find() anyway.

Skip




From guido@python.org  Mon Aug  5 17:33:12 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 12:33:12 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: Your message of "Mon, 05 Aug 2002 12:30:36 EDT."
 <20020805163036.GA14290@panix.com>
References:  <200208051616.g75GGn422821@pcp02138704pcs.reston01.va.comcast.net>
 <20020805163036.GA14290@panix.com>
Message-ID: <200208051633.g75GXCe23884@pcp02138704pcs.reston01.va.comcast.net>

> > Um, you meant "is >= 0".  The find() method doesn't return a bool, it
> > returns the first index where the string is found, and -1 if it is not
> > found.
> 
> Which only goes to prove that the people who've been whining about that
> characteristic of find() were right all along.  ;-)

So what would you like it to return?  True/False, with no possibility
of finding where the substring starts?  That defeats a common use
case.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From esr@thyrsus.com  Mon Aug  5 18:22:54 2002
From: esr@thyrsus.com (Eric S Raymond)
Date: Mon, 5 Aug 2002 13:22:54 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: <200208051633.g75GXCe23884@pcp02138704pcs.reston01.va.comcast.net>
References:  <200208051616.g75GGn422821@pcp02138704pcs.reston01.va.comcast.net> <20020805163036.GA14290@panix.com> <200208051633.g75GXCe23884@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020805172254.GA31517@thyrsus.com>

Guido van Rossum :
> > > Um, you meant "is >= 0".  The find() method doesn't return a bool, it
> > > returns the first index where the string is found, and -1 if it is not
> > > found.
> > 
> > Which only goes to prove that the people who've been whining about that
> > characteristic of find() were right all along.  ;-)
> 
> So what would you like it to return?  True/False, with no possibility
> of finding where the substring starts?  That defeats a common use
> case.

True.  On the other hand, this is a very common gotcha.  I've been bitten by 
it three times in the last week, and I should know better.  Fact is that
missing > -1 is hard to spot.

I think the right answer is to leave find() as it is and have a different
notation that returns bool.  How about `a in b' whenever a and b are
both string-valued?  Seems the most natural candidate.
-- 
		Eric S. Raymond


From damien.morton@acm.org  Mon Aug  5 18:23:06 2002
From: damien.morton@acm.org (Damien Morton)
Date: Mon, 5 Aug 2002 13:23:06 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
Message-ID: <000a01c23ca4$c32a5630$6a906c42@damien>

There was a thread on this a while back on c.l.py

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&safe=off&th=c47dfccc1
410c7c7&seekm=4abd9ce7.0203081538.6ee9a2cc%40posting.google.com




From tim.one@comcast.net  Mon Aug  5 18:39:06 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 13:39:06 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: <20020805172254.GA31517@thyrsus.com>
Message-ID: 

[Eric S Raymond, breaking a too-long silence]
> ...
> I think the right answer is to leave find() as it is and have a different
> notation that returns bool.  How about `a in b' whenever a and b are
> both string-valued?  Seems the most natural candidate.

I want to raise one other issue here:  should

    '' in 'xyz'

return True or raise an exception?  I've been burned, e.g., by

    >>> 'xyz'.startswith('')
    True
    >>>

when '' was computed by an expression that didn't "expect to" reduce to
nothingness, and I expect *everyone* here has been saved more than once by
that

    '' in 'xyz'

currently raises an exception.  If we make __contains__ act like

    'xyz'.find('') >= 0

that (very probable) error will pass silently in the future:

    >>> 'xyz'.find('')
    0
    >>>

IOW, do we follow find() rigidly, or retain "str1 in str2"'s current
behavior when str1 is empty?



From barry@python.org  Mon Aug  5 18:44:35 2002
From: barry@python.org (Barry A. Warsaw)
Date: Mon, 5 Aug 2002 13:44:35 -0400
Subject: [Python-Dev] timsort for jython
References: <20020805172254.GA31517@thyrsus.com>
 
Message-ID: <15694.47491.933128.953764@anthem.wooz.org>

>>>>> "TP" == Tim Peters  writes:

    TP> IOW, do we follow find() rigidly, or retain "str1 in str2"'s
    TP> current behavior when str1 is empty?

Is the nothing part of the everything?

I'm not sure what the natural interpretation should be, but why would
you ever want to know if '' is in somestring?  Usually I think you'd
only want to know if '' == somestring, so perhaps we should break the
symmetry here.

yin-yang-ly y'rs,
-Barry


From esr@thyrsus.com  Mon Aug  5 18:49:42 2002
From: esr@thyrsus.com (Eric S Raymond)
Date: Mon, 5 Aug 2002 13:49:42 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: 
References: <20020805172254.GA31517@thyrsus.com> 
Message-ID: <20020805174942.GA17014@thyrsus.com>

Tim Peters :
> [Eric S Raymond, breaking a too-long silence]

Thank you, Tim!

> > I think the right answer is to leave find() as it is and have a different
> > notation that returns bool.  How about `a in b' whenever a and b are
> > both string-valued?  Seems the most natural candidate.
> 
> I want to raise one other issue here:  should
> 
>     '' in 'xyz'
> 
> return True or raise an exception?

> IOW, do we follow find() rigidly, or retain "str1 in str2"'s current
> behavior when str1 is empty?

Raise an exception.  Definitely.  There is no reason to follow find() 
rigidly when the whole point is to have semantics different from find().  
Besides, you're right to point out that changing this behavior could 
break existing code, and that is a big no-no.
-- 
		Eric S. Raymond


From tim.one@comcast.net  Mon Aug  5 18:53:42 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 13:53:42 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: <15694.47491.933128.953764@anthem.wooz.org>
Message-ID: 

[Barry, on  '' in 'xyz']
> Is the nothing part of the everything?

That's right, and if Python is a programming language for mystics that's
clearly the best answer .

> I'm not sure what the natural interpretation should be,

    s1 in s2

if and only if there exists an int i such that

    s2[i : i+len(s1)] == s1

is the acade^H^H^H^H^Hmystic meaning, and that's true of every i
in -sys.maxint .. sys.maxint when s1 is empty.

> but why would you ever want to know if '' is in somestring?

That's the practical rub indeed.  You never want to know that, so if you end
up asking it it's almost certainly a logic error in preceding code.

> Usually I think you'd only want to know if '' == somestring, so perhaps
> we should break the symmetry here.

That is the question.



From guido@python.org  Mon Aug  5 18:56:42 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 13:56:42 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Mon, 05 Aug 2002 13:39:06 EDT."
 
References: 
Message-ID: <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net>

> I want to raise one other issue here:  should
> 
>     '' in 'xyz'
> 
> return True or raise an exception?  I've been burned, e.g., by
> 
>     >>> 'xyz'.startswith('')
>     True
>     >>>
> 
> when '' was computed by an expression that didn't "expect to" reduce to
> nothingness, and I expect *everyone* here has been saved more than once by
> that
> 
>     '' in 'xyz'
> 
> currently raises an exception.

I dunno.  The exception has annoyed me too.

> If we make __contains__ act like
> 
>     'xyz'.find('') >= 0
> 
> that (very probable) error will pass silently in the future:
> 
>     >>> 'xyz'.find('')
>     0
>     >>>
> 
> IOW, do we follow find() rigidly, or retain "str1 in str2"'s current
> behavior when str1 is empty?

I expect that Andrew Koenig would delight in this question. :-)

I personally see no way to defend ('' in 'x') returning false; it's so
clearly a substring that any definition of substring-ness that
excludes this seems mathematically wrong, despite your good
intentions.

I guess we'll have to cope in the same way as we cope with the
behavior of find() and startswith() in similar cases.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Mon Aug  5 19:03:58 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 14:03:58 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Mon, 05 Aug 2002 13:56:42 EDT."
 <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net>
References: 
 <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200208051803.g75I3wG26804@pcp02138704pcs.reston01.va.comcast.net>

I wrote:
> I personally see no way to defend ('' in 'x') returning false; it's so
> clearly a substring that any definition of substring-ness that
> excludes this seems mathematically wrong, despite your good
> intentions.

However, the backwards compatibility argument makes sense.  It used to
raise an exception and it would probably break code if it stopped
doing so; longer strings are much less likely to be passed by accident
so the need for the exception there is less strong.  I'm of two minds
on this now...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas@python.ca  Mon Aug  5 19:13:02 2002
From: nas@python.ca (Neil Schemenauer)
Date: Mon, 5 Aug 2002 11:13:02 -0700
Subject: [Python-Dev] timsort for jython
In-Reply-To: ; from tim.one@comcast.net on Mon, Aug 05, 2002 at 01:39:06PM -0400
References: <20020805172254.GA31517@thyrsus.com> 
Message-ID: <20020805111302.A28557@glacier.arctrix.com>

Tim Peters wrote:
> IOW, do we follow find() rigidly, or retain "str1 in str2"'s current
> behavior when str1 is empty?

I vote for the former.

  Neil


From nas@python.ca  Mon Aug  5 19:17:11 2002
From: nas@python.ca (Neil Schemenauer)
Date: Mon, 5 Aug 2002 11:17:11 -0700
Subject: [Python-Dev] timsort for jython
In-Reply-To: <20020805111302.A28557@glacier.arctrix.com>; from nas@python.ca on Mon, Aug 05, 2002 at 11:13:02AM -0700
References: <20020805172254.GA31517@thyrsus.com>  <20020805111302.A28557@glacier.arctrix.com>
Message-ID: <20020805111711.B28557@glacier.arctrix.com>

Neil Schemenauer wrote:
> Tim Peters wrote:
> > IOW, do we follow find() rigidly, or retain "str1 in str2"'s current
> > behavior when str1 is empty?
> 
> I vote for the former.

D'oh.  I meant the LATER (e.g. raise an error for an empty LHS).

  Neil


From ark@research.att.com  Mon Aug  5 19:51:45 2002
From: ark@research.att.com (Andrew Koenig)
Date: 05 Aug 2002 14:51:45 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: <20020805174942.GA17014@thyrsus.com>
References: <20020805172254.GA31517@thyrsus.com>
 
 <20020805174942.GA17014@thyrsus.com>
Message-ID: 

Eric> Raise an exception.  Definitely.  There is no reason to follow
Eric> find() rigidly when the whole point is to have semantics
Eric> different from find().  Besides, you're right to point out that
Eric> changing this behavior could break existing code, and that is a
Eric> big no-no.

Changing the meaning of ('ab' in 'abc') might also break existing code.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From BPettersen@NAREX.com  Mon Aug  5 20:00:27 2002
From: BPettersen@NAREX.com (Bjorn Pettersen)
Date: Mon, 5 Aug 2002 13:00:27 -0600
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
Message-ID: <60FB8BB7F0EFC7409B75EEEC13E201922151F4@admin56.narex.com>

> From: Tim Peters [mailto:tim.one@comcast.net]=20
>=20
> [Guido]
> > ...
> > I don't like exists().  Maybe we should finally implement=20
> "s1 in s2"=20
> > as "s2.find(s1) >=3D 0", i.e. add a __contains__ method to strings?
>=20
> I asked you about that a few weeks ago, and you were=20
> agreeable.  I posted that info to c.l.py, saying that if=20
> anyone cared enough to submit a patch, the idea was=20
> pre-approved.  AFAIK, nobody bit (but I didn't pay much=20
> attention to patches last week -- hope springs eternal ).

Well, there was
http://groups.google.com/groups?q=3DBjorn+PyUnicode+group:comp.lang.pytho=
n
&hl=3Den&lr=3D&ie=3DUTF-8&oe=3DUTF-8&selm=3Dmailman.1024100114.13008.pyth=
on-list%4
0python.org&rnum=3D1.

I'll see if I can find time next weekend to figure out the compilation
warnings, adding back special casing for single char containment, and
adding test and doc patches...

-- bjorn


From sholden@holdenweb.com  Mon Aug  5 19:44:50 2002
From: sholden@holdenweb.com (Steve Holden)
Date: Mon, 5 Aug 2002 14:44:50 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
References:               <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net>  <200208051803.g75I3wG26804@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <042501c23cb0$302a62b0$6300000a@holdenweb.com>

[GvR]
> I wrote:
> > I personally see no way to defend ('' in 'x') returning false; it's so
> > clearly a substring that any definition of substring-ness that
> > excludes this seems mathematically wrong, despite your good
> > intentions.
>
If you are serious about this proposal then clearly it would be as well to
have "in" agree with find(), and currently anystring.find('') returns zero,
suggesting the null string first appears at the beginning.

> However, the backwards compatibility argument makes sense.  It used to
> raise an exception and it would probably break code if it stopped
> doing so; longer strings are much less likely to be passed by accident
> so the need for the exception there is less strong.  I'm of two minds
> on this now...
>

However, I'm somewhat horrified to see this being discussed seriously. You
can take pragmatism too far, you know ;-)

Are you also proposing to allow

    if [2, 3] in [1, 2, 3, 4]

which is effectively the meaning you seem to be proposing for strings? Where
else in the language does the keyword "in" refer to anything other than
membership? Why do we need another way to do what find() and index() already
do?

Should we also ensure that

    for s in "abc":
        print s

prints

    a
    ab
    abc
    b
    bc
    c

Should it also print a blank line because "'' in anystring" is true? I can
see why users might want to be able to use a "string in string" construct,
but it would seem to confuse the "for" semantics. Is there some other
construct for which

    for v in object_or_instance:

does not assign to v all x such that "x in object_or_instance" is true? I
can see a few teaching problems here.

my-god-*am*-i-really-a-bigot-ly y'rs  - steve
-----------------------------------------------------------------------
Steve Holden                                 http://www.holdenweb.com/
Python Web Programming                http://pydish.holdenweb.com/pwp/
-----------------------------------------------------------------------






From guido@python.org  Mon Aug  5 20:03:55 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 15:03:55 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Mon, 05 Aug 2002 13:00:27 MDT."
 <60FB8BB7F0EFC7409B75EEEC13E201922151F4@admin56.narex.com>
References: <60FB8BB7F0EFC7409B75EEEC13E201922151F4@admin56.narex.com>
Message-ID: <200208051903.g75J3uC00672@pcp02138704pcs.reston01.va.comcast.net>

> Well, there was
> http://groups.google.com/groups?q=Bjorn+PyUnicode+group:comp.lang.python
> &hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=mailman.1024100114.13008.python-list%4
> 0python.org&rnum=1.
> 
> I'll see if I can find time next weekend to figure out the compilation
> warnings, adding back special casing for single char containment, and
> adding test and doc patches...

Cool.  Please use the SourceForge patch manager!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From sholden@holdenweb.com  Mon Aug  5 20:05:43 2002
From: sholden@holdenweb.com (Steve Holden)
Date: Mon, 5 Aug 2002 15:05:43 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
References: <20020805172254.GA31517@thyrsus.com><20020805174942.GA17014@thyrsus.com> 
Message-ID: <043901c23cb3$1b680690$6300000a@holdenweb.com>

[Andrew Koenig]
> Eric> Raise an exception.  Definitely.  There is no reason to follow
> Eric> find() rigidly when the whole point is to have semantics
> Eric> different from find().  Besides, you're right to point out that
> Eric> changing this behavior could break existing code, and that is a
> Eric> big no-no.
>
> Changing the meaning of ('ab' in 'abc') might also break existing code.
>

True, but it does seem unlikely (though not impossible) that many are
relying on "ab" in "abc" raising an exception.

regards
-----------------------------------------------------------------------
Steve Holden                                 http://www.holdenweb.com/
Python Web Programming                http://pydish.holdenweb.com/pwp/
-----------------------------------------------------------------------






From guido@python.org  Mon Aug  5 20:11:15 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 15:11:15 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Mon, 05 Aug 2002 14:44:50 EDT."
 <042501c23cb0$302a62b0$6300000a@holdenweb.com>
References:  <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net> <200208051803.g75I3wG26804@pcp02138704pcs.reston01.va.comcast.net>
 <042501c23cb0$302a62b0$6300000a@holdenweb.com>
Message-ID: <200208051911.g75JBGJ00739@pcp02138704pcs.reston01.va.comcast.net>

> [GvR]
> > > I personally see no way to defend ('' in 'x') returning false;
> > > it's so clearly a substring that any definition of
> > > substring-ness that excludes this seems mathematically wrong,
> > > despite your good intentions.

[SteveH]
> If you are serious about this proposal then clearly it would be as
> well to have "in" agree with find(), and currently
> anystring.find('') returns zero, suggesting the null string first
> appears at the beginning.

Yes, consistency strongly suggests that.

> > However, the backwards compatibility argument makes sense.  It used to
> > raise an exception and it would probably break code if it stopped
> > doing so; longer strings are much less likely to be passed by accident
> > so the need for the exception there is less strong.  I'm of two minds
> > on this now...
> 
> However, I'm somewhat horrified to see this being discussed
> seriously. You can take pragmatism too far, you know ;-)
> 
> Are you also proposing to allow
> 
>     if [2, 3] in [1, 2, 3, 4]
> 
> which is effectively the meaning you seem to be proposing for
> strings?

No, since it's not a common thing to need.

> Where else in the language does the keyword "in" refer to anything
> other than membership?

Dictionary keys?  That's certainly something very different from
sequence membership!

> Why do we need another way to do what find() and index() already do?

You must've missed the earlier thread -- it's because a substring test
is a common operation and the way to spell it with find() requires you
to tack on ">= 0" which many people accidentally leave out when in a
hurry.

> Should we also ensure that
> 
>     for s in "abc":
>         print s
> 
> prints
> 
>     a
>     ab
>     abc
>     b
>     bc
>     c
> 
> Should it also print a blank line because "'' in anystring" is true? I can
> see why users might want to be able to use a "string in string" construct,
> but it would seem to confuse the "for" semantics. Is there some other
> construct for which
> 
>     for v in object_or_instance:
> 
> does not assign to v all x such that "x in object_or_instance" is true? I
> can see a few teaching problems here.

To this latter example I can only say, "A foolish consistency is the
hobgoblin of little minds."

At least this still holds (unless x is an iterator or otherwise
mutated by access :-):

  for v in x:
     assert v in x

--Guido van Rossum (home page: http://www.python.org/~guido/)


From neal@metaslash.com  Mon Aug  5 20:12:03 2002
From: neal@metaslash.com (Neal Norwitz)
Date: Mon, 05 Aug 2002 15:12:03 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
References: 
 <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net> <200208051803.g75I3wG26804@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3D4ECE03.D8653061@metaslash.com>

Guido van Rossum wrote:
> 
> I wrote:
> > I personally see no way to defend ('' in 'x') returning false; it's so
> > clearly a substring that any definition of substring-ness that
> > excludes this seems mathematically wrong, despite your good
> > intentions.
> 
> However, the backwards compatibility argument makes sense.  It used to
> raise an exception and it would probably break code if it stopped
> doing so; longer strings are much less likely to be passed by accident
> so the need for the exception there is less strong.  I'm of two minds
> on this now...

Here's a patch:  http://python.org/sf/591250

In testing this patch, I ran across this:

	>>> 's' in 's'
	True
	>>> 's' in 's' == True
	False
	>>> 's' in 's' is True
	False
	>>> id('s' in 's')
	135246792
	>>> id(True)
	135246792

What's up with that?  Am I missing something?  
Note: this occurs before the patch too.

Neal


From ark@research.att.com  Mon Aug  5 20:14:30 2002
From: ark@research.att.com (Andrew Koenig)
Date: Mon, 5 Aug 2002 15:14:30 -0400 (EDT)
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: <043901c23cb3$1b680690$6300000a@holdenweb.com>
 (sholden@holdenweb.com)
References: <20020805172254.GA31517@thyrsus.com><20020805174942.GA17014@thyrsus.com>  <043901c23cb3$1b680690$6300000a@holdenweb.com>
Message-ID: <200208051914.g75JEUh20157@europa.research.att.com>

>> Changing the meaning of ('ab' in 'abc') might also break existing code.

Steve> True, but it does seem unlikely (though not impossible) that many are
Steve> relying on "ab" in "abc" raising an exception.

How many are relying on '' in 'abc' raising an exception?


From guido@python.org  Mon Aug  5 20:14:37 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 15:14:37 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Mon, 05 Aug 2002 15:11:15 EDT."
 <200208051911.g75JBGJ00739@pcp02138704pcs.reston01.va.comcast.net>
References:  <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net> <200208051803.g75I3wG26804@pcp02138704pcs.reston01.va.comcast.net> <042501c23cb0$302a62b0$6300000a@holdenweb.com>
 <200208051911.g75JBGJ00739@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200208051914.g75JEbt02451@pcp02138704pcs.reston01.va.comcast.net>

> > Are you also proposing to allow
> > 
> >     if [2, 3] in [1, 2, 3, 4]
> > 
> > which is effectively the meaning you seem to be proposing for
> > strings?
> 
> No, since it's not a common thing to need.

Of course, there's another reason why that can't be done even if it
*was* a common need: [2, 3] could be a list item, e.g. [1, [2, 3], 4].
This kind of thing can't happen for strings.

--Guido van Rossum (home page: http://www.python.org/~guido/)



From jepler@unpythonic.net  Mon Aug  5 20:15:28 2002
From: jepler@unpythonic.net (Jeff Epler)
Date: Mon, 5 Aug 2002 14:15:28 -0500
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <3D4ECE03.D8653061@metaslash.com>
References:  <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net> <200208051803.g75I3wG26804@pcp02138704pcs.reston01.va.comcast.net> <3D4ECE03.D8653061@metaslash.com>
Message-ID: <20020805191521.GB10926@unpythonic.net>

On Mon, Aug 05, 2002 at 03:12:03PM -0400, Neal Norwitz wrote:
> 	>>> 's' in 's' == True
> 	False

>>> ('s' in 's') == True
True
>>> ('s' in 's') and ('s' == True)
False

short-circuit comparisons include == and is.


From guido@python.org  Mon Aug  5 20:16:18 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 15:16:18 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Mon, 05 Aug 2002 15:12:03 EDT."
 <3D4ECE03.D8653061@metaslash.com>
References:  <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net> <200208051803.g75I3wG26804@pcp02138704pcs.reston01.va.comcast.net>
 <3D4ECE03.D8653061@metaslash.com>
Message-ID: <200208051916.g75JGJs03471@pcp02138704pcs.reston01.va.comcast.net>

> In testing this patch, I ran across this:
> 
> 	>>> 's' in 's'
> 	True
> 	>>> 's' in 's' == True
> 	False
> 	>>> 's' in 's' is True
> 	False
> 	>>> id('s' in 's')
> 	135246792
> 	>>> id(True)
> 	135246792
> 
> What's up with that?  Am I missing something?  

Yes, 'is' and'in' and '==' are all comparison operators, and the
chaining syntax makes this interpreted as (roughly)

    ('s' in 's') and ('s' == True)
    ('s' in 's') and ('s' is True)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From neal@metaslash.com  Mon Aug  5 20:19:37 2002
From: neal@metaslash.com (Neal Norwitz)
Date: Mon, 05 Aug 2002 15:19:37 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
References:  <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net> <200208051803.g75I3wG26804@pcp02138704pcs.reston01.va.comcast.net>
 <3D4ECE03.D8653061@metaslash.com> <200208051916.g75JGJs03471@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3D4ECFC9.5214DEBE@metaslash.com>

Guido van Rossum wrote:
> 
> > In testing this patch, I ran across this:
> >
> >       >>> 's' in 's' is True
> >       False
> >
> > What's up with that?  Am I missing something?
> 
> Yes, 'is' and'in' and '==' are all comparison operators, and the
> chaining syntax makes this interpreted as (roughly)

Thanks (to Jeff too).  I knew I had to be missing something.
Well there's a still the patch with a working test.

Neal


From guido@python.org  Mon Aug  5 20:25:51 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 15:25:51 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: Your message of "Mon, 05 Aug 2002 15:14:30 EDT."
 <200208051914.g75JEUh20157@europa.research.att.com>
References: <20020805172254.GA31517@thyrsus.com>  <20020805174942.GA17014@thyrsus.com>  <043901c23cb3$1b680690$6300000a@holdenweb.com>
 <200208051914.g75JEUh20157@europa.research.att.com>
Message-ID: <200208051925.g75JPpB03666@pcp02138704pcs.reston01.va.comcast.net>

> How many are relying on '' in 'abc' raising an exception?

That's impossible to know.

The case that I am familiar with is roughly as follows.  Suppose you
want to check whether a string begins with a certain character, and
you write something like this:

  c = s[0]
  ...do stuff with c...
  if c in string.letters:
     ...parse it further...

The first time this is called with s being empty, the assignment to c
fails because the empty string doesn't have a first item.

So you "fix" that by changing it to this:

  c = s[:1]

But the code is still broken.  Currently, the "if c in string.letters"
will raise an exception, and you'll figure out that s=="" should be
special-cased earlier on.  With the proposed "in" semantics, this
failure is only detected when the "parse it further" code does the
wrong thing -- either it raises another exception, or it produces the
wrong result without raising an exception.  I expect that that will be
harder to debug because the source of the error is farther away from
the detection.

Note that we make similar exceptions for the empty string in other
places:

  >>> "xxx".islower()
  True
  >>> "xx".islower()
  True
  >>> "x".islower()
  True
  >>> "".islower()
  False
  >>> 

Somehow this reminds me of the 0**0 debate recently in edu-sig...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From zack@codesourcery.com  Mon Aug  5 20:27:46 2002
From: zack@codesourcery.com (Zack Weinberg)
Date: Mon, 5 Aug 2002 12:27:46 -0700
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <3D4ECE03.D8653061@metaslash.com>
References:  <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net> <200208051803.g75I3wG26804@pcp02138704pcs.reston01.va.comcast.net> <3D4ECE03.D8653061@metaslash.com>
Message-ID: <20020805192746.GN466@codesourcery.com>

On Mon, Aug 05, 2002 at 03:12:03PM -0400, Neal Norwitz wrote:
> 
>	>>> 's' in 's'
>	True
>	>>> 's' in 's' == True
>	False

The operator precedence is not what you expect.

	>>> ('s' in 's') == True
	True

zw


From tim.one@comcast.net  Mon Aug  5 20:27:11 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 15:27:11 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

>> and I expect *everyone* here has been saved more than once by that
>>
>>     '' in 'xyz'
>> currently raises an exception.

[Guido]
> I dunno.  The exception has annoyed me too.

Annoyed because it pointed out an error in your code, or because True would
have been a useful result?  It's annoyed me too, but it was always for the
former reason.

> I expect that Andrew Koenig would delight in this question. :-)

Believe me, he already did .

> I personally see no way to defend ('' in 'x') returning false;

The suggestion is not that it return False, but that it raise an exception,
as in "errors should never pass silently".

> it's so clearly a substring that any definition of substring-ness that
> excludes this seems mathematically wrong, despite your good intentions.

I'd like to see a plausible use case for

    '' in str

returning True, then.  Do keep in mind that nobody can be more anal about
mathematical consistency than me <0.9 wink>, but the real world isn't much
impressed with our abstractions.



From tim.one@comcast.net  Mon Aug  5 20:30:52 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 15:30:52 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <60FB8BB7F0EFC7409B75EEEC13E201922151F4@admin56.narex.com>
Message-ID: 

[Bjorn Pettersen]
> Well, there was
> http://groups.google.com/groups?q=Bjorn+PyUnicode+group:comp.lang.python
> &hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=mailman.1024100114.13008.python-list%4
> 0python.org&rnum=1.

Nice URL .  Please put patches on SourceForge -- anywhere else and
they may as well not exist:

    http://sf.net/patch/?group_id=5470

> I'll see if I can find time next weekend to figure out the compilation
> warnings, adding back special casing for single char containment, and
> adding test and doc patches...

Cool!



From tim.one@comcast.net  Mon Aug  5 20:43:18 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 15:43:18 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: <200208051925.g75JPpB03666@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

[Guido]
> Somehow this reminds me of the 0**0 debate recently in edu-sig...

Not quite yet:  there are good domain-specific reasons for wanting 0**0 to
do one of {return 0, return 1, raise an exception}.  In the

    '' in str

case we know that returning True can cause problems, but haven't seen an
example where returning True is useful.



From sholden@holdenweb.com  Mon Aug  5 20:43:49 2002
From: sholden@holdenweb.com (Steve Holden)
Date: Mon, 5 Aug 2002 15:43:49 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
References:  <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net> <200208051803.g75I3wG26804@pcp02138704pcs.reston01.va.comcast.net>              <042501c23cb0$302a62b0$6300000a@holdenweb.com>  <200208051911.g75JBGJ00739@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <048101c23cb8$6e0e86d0$6300000a@holdenweb.com>

[GvR]
> > [GvR]
> > > > I personally see no way to defend ('' in 'x') returning false;
> > > > it's so clearly a substring that any definition of
> > > > substring-ness that excludes this seems mathematically wrong,
> > > > despite your good intentions.
>
> [SteveH]
> > If you are serious about this proposal then clearly it would be as
> > well to have "in" agree with find(), and currently
> > anystring.find('') returns zero, suggesting the null string first
> > appears at the beginning.
>
> Yes, consistency strongly suggests that.
>
And of course, that wouldn't be a foolish consistency :-)

> > > However, the backwards compatibility argument makes sense.  It used to
> > > raise an exception and it would probably break code if it stopped
> > > doing so; longer strings are much less likely to be passed by accident
> > > so the need for the exception there is less strong.  I'm of two minds
> > > on this now...
> >
[ ... ]
> > Why do we need another way to do what find() and index() already do?
>
> You must've missed the earlier thread -- it's because a substring test
> is a common operation and the way to spell it with find() requires you
> to tack on ">= 0" which many people accidentally leave out when in a
> hurry.
>
Nope, I didn't miss it. As I said, I just found it hard to believe this was
a serious discussion.

> > Should we also ensure that
> >
> >     for s in "abc":
> >         print s
> >
> > prints
> >
> >     a
> >     ab
> >     abc
> >     b
> >     bc
> >     c
> >
> > Should it also print a blank line because "'' in anystring" is true? I
can
> > see why users might want to be able to use a "string in string"
construct,
> > but it would seem to confuse the "for" semantics. Is there some other
> > construct for which
> >
> >     for v in object_or_instance:
> >
> > does not assign to v all x such that "x in object_or_instance" is true?
I
> > can see a few teaching problems here.
>
> To this latter example I can only say, "A foolish consistency is the
> hobgoblin of little minds."
>
Of course the string has always been an anomalous sequence anyway, but it
seems to be becoming less of a sequence.

> At least this still holds (unless x is an iterator or otherwise
> mutated by access :-):
>
>   for v in x:
>      assert v in x
>
Indeed. A rather weaker assertion, though. Anyhoo, no other arguments
against s1 in s2, so I'll make one parting comment. While I understand
perfectly well the pragmatic case for this change, it appears to blur the
borders between set membership and subsetting; if it's so desirable, why
didn't the need arise earlier?.

regards
-----------------------------------------------------------------------
Steve Holden                                 http://www.holdenweb.com/
Python Web Programming                http://pydish.holdenweb.com/pwp/
-----------------------------------------------------------------------





From guido@python.org  Mon Aug  5 20:54:57 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 15:54:57 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Mon, 05 Aug 2002 15:43:49 EDT."
 <048101c23cb8$6e0e86d0$6300000a@holdenweb.com>
References:  <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net> <200208051803.g75I3wG26804@pcp02138704pcs.reston01.va.comcast.net> <042501c23cb0$302a62b0$6300000a@holdenweb.com> <200208051911.g75JBGJ00739@pcp02138704pcs.reston01.va.comcast.net>
 <048101c23cb8$6e0e86d0$6300000a@holdenweb.com>
Message-ID: <200208051954.g75JsvI04367@pcp02138704pcs.reston01.va.comcast.net>

[SteveH]
> While I understand perfectly well the pragmatic case for this
> change, it appears to blur the borders between set membership and
> subsetting; if it's so desirable, why didn't the need arise
> earlier?.

Couldn't be done before __contains__ was a separately overloadable
operator.  That's relatively recent (Python 2.0).  And playing with it
in innovative ways is even more recent (Python 2.2, for "has_key").
But the satisfaction that spelling "has_key" as "in" gives me suggests
that there's more potential to it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gmcm@hypernet.com  Mon Aug  5 21:03:09 2002
From: gmcm@hypernet.com (Gordon McMillan)
Date: Mon, 5 Aug 2002 16:03:09 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: 
References: <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3D4EA1BD.11180.3FCC867A@localhost>

On 5 Aug 2002 at 15:27, Tim Peters wrote:

> >> and I expect *everyone* here has been saved more
> >> than once by that
> >>
> >>     '' in 'xyz'
> >> currently raises an exception.

Not that I can recall.

The exception, however, is a TypeError saying the left
operand isn't a character. It's not a
TrueButYourProbablyMakingAMistakeException .
 
> I'd like to see a plausible use case for
> 
>     '' in str
> 
> returning True, then.  

Any code that currently does 
 str.find(x) >= 0

I tend to use:

 pos = str.find(x)
 if pos > -1:

because I'm normally interested in where.
If it's a pure membership test, I tend not
to use a string but a tuple:

 if c in ('a', 'b', 'c'):

This is, at least partially, because "character"
is not an official Python type so I always expect
 str1 in str2 
to work when it only sometimes does.

-- Gordon
http://www.mcmillan-inc.com/



From tim.one@comcast.net  Mon Aug  5 21:03:39 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 16:03:39 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <048101c23cb8$6e0e86d0$6300000a@holdenweb.com>
Message-ID: 

[Steve Holden]
> ...
> While I understand perfectly well the pragmatic case for this change, it
> appears to blur the borders between set membership and subsetting; if
> it's so desirable, why didn't the need arise earlier?.

Mostly because the possibility for a type to define a __contains__
implementation didn't used to exist.  Now that any type can define "x in y"
to do what makes most sense for its instances, the rationale for strings
retaining strained (for strings) "I'm just a sequence, you see, exactly like
any other sequence" __contains__ semantics has grown much weaker.



From martin@v.loewis.de  Mon Aug  5 21:08:43 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 05 Aug 2002 22:08:43 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: <20020805144636.A9355@hishome.net>
References: <20020805144636.A9355@hishome.net>
Message-ID: 

Oren Tirosh  writes:

> Charmap entries can currently be None, an integer or a unicode string. I
> suggest adding another option: a function or other callable.

That helps only for a subset of all codecs (the charmap based ones),
and thus is unacceptable. I want it to work for, say, big5 also.

Regards,
Martin


From tim.one@comcast.net  Mon Aug  5 21:15:41 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 16:15:41 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <3D4EA1BD.11180.3FCC867A@localhost>
Message-ID: 

[Tim]
> I'd like to see a plausible use case for
>
>     '' in str
>
> returning True, then.

[Gordon McMillan]
> Any code that currently does
>  str.find(x) >= 0

You're saying that you actually do that in cases where x may be an empty
string, and that it's useful to get a True result in at least one such case?
If you are saying that, it needs more details; but if you're not saying
that, it's not a relevant use case.

> I tend to use:
>
>  pos = str.find(x)
>  if pos > -1:
>
> because I'm normally interested in where.

Sure -- that's what .find() is for, after all.  But you're also saying that
your algorithms expect to search for empty strings?  Like in:

    index = option_letter_string.find(letter)
    if index >= 0:
        list_of_option_functions[index]()
    else:
        raise UnknownOptionLetter(letter)

you make sure that list_of_option_functions[0] is suitable for processing
both the first option in option_letter_string and an empty "option letter"?



From tim.one@comcast.net  Mon Aug  5 21:18:37 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 16:18:37 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: <20020805111711.B28557@glacier.arctrix.com>
Message-ID: 

[Neil Schemenauer]
> I vote for the former.

[Another Neil Schemenauer]
> D'oh.  I meant the LATER (e.g. raise an error for an empty LHS).

Damn -- too bad your votes cancelled ut.  Next time just give me your proxy
.



From oren-py-d@hishome.net  Mon Aug  5 21:44:04 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Mon, 5 Aug 2002 23:44:04 +0300
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: ; from martin@v.loewis.de on Mon, Aug 05, 2002 at 10:08:43PM +0200
References: <20020805144636.A9355@hishome.net> 
Message-ID: <20020805234404.A24400@hishome.net>

On Mon, Aug 05, 2002 at 10:08:43PM +0200, Martin v. Loewis wrote:
> Oren Tirosh  writes:
> 
> > Charmap entries can currently be None, an integer or a unicode string. I
> > suggest adding another option: a function or other callable.
> 
> That helps only for a subset of all codecs (the charmap based ones),
> and thus is unacceptable. I want it to work for, say, big5 also.

With the ability to embed functions inside a charmap big5 and other encodings
could be converted to be charmap based, too :-)

I just feel that there must be *some* simpler way. A patch with 87k of code 
scares the hell out of me. 

"There are no complex things. Only things that I haven't yet understood 
why they are really simple."

	Oren




From gmcm@hypernet.com  Mon Aug  5 21:50:20 2002
From: gmcm@hypernet.com (Gordon McMillan)
Date: Mon, 5 Aug 2002 16:50:20 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: 
References: <3D4EA1BD.11180.3FCC867A@localhost>
Message-ID: <3D4EACCC.31934.3FF7B782@localhost>

On 5 Aug 2002 at 16:15, Tim Peters wrote:

> [Tim]
> > I'd like to see a plausible use case for
> >
> >     '' in str
> >
> > returning True, then.
> 
> [Gordon McMillan]
> > Any code that currently does
> >  str.find(x) >= 0
> 
> You're saying that you actually do that in cases
> where x may be an empty string, and that it's useful
> to get a True result in at least one such case? 

What I'm really saying is that I almost never use
 x in str
because it's semantics have always been peculiar.
Thus, I don't *really* care whether '' in str raises
an exception, because if it does, I won't train myself
to use it .

[...]

> Sure -- that's what .find() is for, after all.  But
> you're also saying that your algorithms expect to
> search for empty strings?  Like in:
> 
>     index = option_letter_string.find(letter)
>     if index >= 0:
>         list_of_option_functions[index]()
>     else:
>         raise UnknownOptionLetter(letter)
> 
> you make sure that list_of_option_functions[0] is
> suitable for processing both the first option in
> option_letter_string and an empty "option letter"?

Say we have a sequence of objects where obj.options
uses a string to hold (orthogonal) option codes. We're
selecting a subset based on the user's criteria, and
empty means "don't care".

Then
 for obj in seq:
   if obj.options.find(criteria):
     rslt.append(obj)

makes perfect sense.

I rather doubt I have code in that exact
form, because I'd probably special case
it if it were that obvious.

if not criteria:
  return seq
for obj in seq:
  .... 

OTOH, I use find() a lot, and since I can't
recall having been bit by find('') returning 0, I
have to conclude that the mystically / mathematically
correct answer is, in my case at least, also
the pragmatically correct one.

But you solved a similar problem once
already, by noting that a large quantity had
to have at least 537 objects in it.

-- Gordon
http://www.mcmillan-inc.com/



From barry@python.org  Mon Aug  5 21:59:54 2002
From: barry@python.org (Barry A. Warsaw)
Date: Mon, 5 Aug 2002 16:59:54 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
References: 
 <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net>
 <200208051803.g75I3wG26804@pcp02138704pcs.reston01.va.comcast.net>
 <3D4ECE03.D8653061@metaslash.com>
Message-ID: <15694.59210.871941.1027@anthem.wooz.org>

>>>>> "NN" == Neal Norwitz  writes:

    NN> Here's a patch: http://python.org/sf/591250

Updated with a few fixes and nits, and some additional tests.

All that's left is the documentation.
-Barry


From esr@thyrsus.com  Mon Aug  5 22:03:33 2002
From: esr@thyrsus.com (Eric S Raymond)
Date: Mon, 5 Aug 2002 17:03:33 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208051954.g75JsvI04367@pcp02138704pcs.reston01.va.comcast.net>
References:  <200208051756.g75Hug426210@pcp02138704pcs.reston01.va.comcast.net> <200208051803.g75I3wG26804@pcp02138704pcs.reston01.va.comcast.net> <042501c23cb0$302a62b0$6300000a@holdenweb.com> <200208051911.g75JBGJ00739@pcp02138704pcs.reston01.va.comcast.net> <048101c23cb8$6e0e86d0$6300000a@holdenweb.com> <200208051954.g75JsvI04367@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020805210333.GA16397@thyrsus.com>

Guido van Rossum :
> But the satisfaction that spelling "has_key" as "in" gives me suggests
> that there's more potential to it.

Not a trivial datum.  Tools that feel good in the hand are not mere
self-indulgence; they promote relaxation and creativity in the user. 
-- 
		Eric S. Raymond


From martin@v.loewis.de  Mon Aug  5 22:06:25 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 05 Aug 2002 23:06:25 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: <20020805234404.A24400@hishome.net>
References: <20020805144636.A9355@hishome.net>
 
 <20020805234404.A24400@hishome.net>
Message-ID: 

Oren Tirosh  writes:

> With the ability to embed functions inside a charmap big5 and other
> encodings could be converted to be charmap based, too :-)

This is precisely what PEP 293 does: allow to embed functions in any
codec.

> I just feel that there must be *some* simpler way. 

Why do you think so? It is not difficult.

> A patch with 87k of code scares the hell out of me.

Ah, so it is the size of the patch? Some of it could be moved to
Python perhaps, thus reducing the size of the patch (e.g. the registry
comes to mind)

If you look at the patch, you see that it precisely does what you
propose to do: add a callback to the charmap codec:

- it deletes charmap_decoding_error
- it adds state to feed the callback function
- it replaces the old call to charmap_decoding_error by

! 	    outpos = p-PyUnicode_AS_UNICODE(v);
! 	    startinpos = s-starts;
! 	    endinpos = startinpos+1;
! 	    if (unicode_decode_call_errorhandler(
! 		 errors, &errorHandler,
! 		 "charmap", "character maps to ",
! 		 starts, size, &startinpos, &endinpos, &exc, &s,
! 		 (PyObject **)&v, &outpos, &p)) {#

  (original code was)

! 	    if (charmap_decoding_error(&s, &p, errors, 
! 				       "character maps to ")) {

- likewise for encoding.

Now, apply the same change to all other codecs (as you propose to do
for big5), and you obtain the patch for PEP 293.

In doing so, you find that the modifications needed for each codec are
so similar that you add some supporting infrastructure, and correct
errors in the existing codecs that you spot, and so on. 

The diffstat is

 Include/codecs.h        |   37
 Include/pyerrors.h      |   67 +
 Lib/codecs.py           |    5
 Modules/_codecsmodule.c |   61 +
 Objects/stringobject.c  |    7
 Objects/unicodeobject.c | 1794 +++++++++++++-------!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 Python/codecs.c         |  399 ++++++++++
 Python/exceptions.c     |  603 ++++++++++++++++
 8 files changed, 1678 insertions(+), 236 deletions(-), 1059 modifications(!)

If you look at the large blocks of new code, you find that it is in

- charmap_encoding_error, which insists on implementing known error
  handling algorithms inline,

- the default error handlers, of which atleast
  PyCodec_XMLCharRefReplaceErrors should be pure-Python

- PyCodec_BackslashReplaceErrors, likewise,

- the UnicodeError exception methods (which could be omitted, IMO).

So, if you look at the patch, it isn't really that large.

Regards,
Martin


From esr@thyrsus.com  Mon Aug  5 22:04:53 2002
From: esr@thyrsus.com (Eric S Raymond)
Date: Mon, 5 Aug 2002 17:04:53 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: 
References: <20020805172254.GA31517@thyrsus.com>  <20020805174942.GA17014@thyrsus.com> 
Message-ID: <20020805210453.GB16397@thyrsus.com>

Andrew Koenig :
> Eric> Raise an exception.  Definitely.  There is no reason to follow
> Eric> find() rigidly when the whole point is to have semantics
> Eric> different from find().  Besides, you're right to point out that
> Eric> changing this behavior could break existing code, and that is a
> Eric> big no-no.
> 
> Changing the meaning of ('ab' in 'abc') might also break existing code.

I could construct a try/except case that would change, yes.  Are you
being pedantic, or is this intended as a serious objection?
-- 
		Eric S. Raymond


From ark@research.att.com  Mon Aug  5 22:07:12 2002
From: ark@research.att.com (Andrew Koenig)
Date: Mon, 5 Aug 2002 17:07:12 -0400 (EDT)
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: <20020805210453.GB16397@thyrsus.com> (message from Eric S Raymond
 on Mon, 5 Aug 2002 17:04:53 -0400)
References: <20020805172254.GA31517@thyrsus.com>  <20020805174942.GA17014@thyrsus.com>  <20020805210453.GB16397@thyrsus.com>
Message-ID: <200208052107.g75L7CA21732@europa.research.att.com>

>> Changing the meaning of ('ab' in 'abc') might also break existing code.

Eric> I could construct a try/except case that would change, yes.  Are you
Eric> being pedantic, or is this intended as a serious objection?

I think it's very nearly as serious as the objection that changing
the meaning of ('' in 'abc') might break code.

The reason for the "very nearly" is that is is easier to obtain empty
strings by accident than it is to obtain nonempty ones.



From guido@python.org  Mon Aug  5 22:23:27 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 17:23:27 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: Your message of "Mon, 05 Aug 2002 17:04:53 EDT."
 <20020805210453.GB16397@thyrsus.com>
References: <20020805172254.GA31517@thyrsus.com>  <20020805174942.GA17014@thyrsus.com> 
 <20020805210453.GB16397@thyrsus.com>
Message-ID: <200208052123.g75LNRV22589@pcp02138704pcs.reston01.va.comcast.net>

> > Eric> Raise an exception.  Definitely.  There is no reason to follow
> > Eric> find() rigidly when the whole point is to have semantics
> > Eric> different from find().  Besides, you're right to point out that
> > Eric> changing this behavior could break existing code, and that is a
> > Eric> big no-no.

> Andrew Koenig :
> > Changing the meaning of ('ab' in 'abc') might also break existing code.

> I could construct a try/except case that would change, yes.  Are you
> being pedantic, or is this intended as a serious objection?

Andrew appears to say that if you object against '' in 'abc' not
raising an exception, you should also object against the other one;
but his real point is the corollary: since you don't object against
giving 'ab' in 'abc' new meaning, you shouldn't object against a new
meaning for '' in 'abc' either -- at least not based on the argument
of breaking code.  Whenever we say that a change doesn't break code,
we almost always imply "except code that depends on a particular thing
raising an exception".

That '' in 'abc' or 'ab' in 'abc' raises TypeError tells me that it is
okay to change this behavior into doing something useful, *if* we have
a useful thing to substitute for the exception.

Tim is arguing that '' in 'abc' is not a useful question to ask.  The
usefulness of the exception is not that it's a feature on which
correct programs depend, but that it's an early warning that your
program is broken.  Losing that early warning sign would mean more
time wasted debugging.

OTOH I'm worried that some code doing some mathematical proof using
substring relationships would find it irritating to have to work
around the irregularity.  But I admit that this is a purely
theoretical fear for now.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Jack.Jansen@oratrix.com  Mon Aug  5 22:24:39 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Mon, 5 Aug 2002 23:24:39 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: <3D4E9E09.9070102@lemburg.com>
Message-ID: 

On maandag, augustus 5, 2002, at 05:47 , M.-A. Lemburg wrote:

> Jack Jansen wrote:
>> Having to register the error handler first and then finding it 
>> by name smells like a very big hack to me. I understand the 
>> reasoning (that you don't want to modify the API of a 
>> gazillion C routines to add an error object argument) but it 
>> still seems like a hack....
>
> Well, in that case, you would have to call the whole codec registry
> a hack ;-)

No, not really. For codecs I think that there needn't be much of 
a connection between the codec-supplier and the codec-user. 
Conceivably the encoding-identifying string being passed to 
encode() could even have been read from a data file or something.

For error handling this is silly: the code calling encode() or 
decode() will know how it wants errors handled. And if you argue 
that it isn't really error handling but an extension to the 
encoding name then maybe it should be treated as such (by 
appending it to the codec name in the string, as in 
"ascii;xmlentitydefs" or so?).
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -



From ark@research.att.com  Mon Aug  5 22:29:35 2002
From: ark@research.att.com (Andrew Koenig)
Date: 05 Aug 2002 17:29:35 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: <200208052123.g75LNRV22589@pcp02138704pcs.reston01.va.comcast.net>
References: <20020805172254.GA31517@thyrsus.com>
 
 <20020805174942.GA17014@thyrsus.com>
 
 <20020805210453.GB16397@thyrsus.com>
 <200208052123.g75LNRV22589@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

Guido> Andrew appears to say that if you object against '' in 'abc' not
Guido> raising an exception, you should also object against the other one;
Guido> but his real point is the corollary: since you don't object against
Guido> giving 'ab' in 'abc' new meaning, you shouldn't object against a new
Guido> meaning for '' in 'abc' either -- at least not based on the argument
Guido> of breaking code.  Whenever we say that a change doesn't break code,
Guido> we almost always imply "except code that depends on a particular thing
Guido> raising an exception".

Exactly.

Guido> Tim is arguing that '' in 'abc' is not a useful question to ask.  The
Guido> usefulness of the exception is not that it's a feature on which
Guido> correct programs depend, but that it's an early warning that your
Guido> program is broken.  Losing that early warning sign would mean more
Guido> time wasted debugging.

Yes.

Guido> OTOH I'm worried that some code doing some mathematical proof using
Guido> substring relationships would find it irritating to have to work
Guido> around the irregularity.  But I admit that this is a purely
Guido> theoretical fear for now.

Also yes.

On the other hand, I have a practical fear: There are lots of
different ways of asking whether a string s contains a substring s1.
If those ways behave in diverse manners when s1 is empty, I am going
to have to remember which way to obtain which behavior.  I would
really like to avoid having to do that.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From esr@thyrsus.com  Mon Aug  5 22:38:09 2002
From: esr@thyrsus.com (Eric S Raymond)
Date: Mon, 5 Aug 2002 17:38:09 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: <200208052123.g75LNRV22589@pcp02138704pcs.reston01.va.comcast.net>
References: <20020805172254.GA31517@thyrsus.com>  <20020805174942.GA17014@thyrsus.com>  <20020805210453.GB16397@thyrsus.com> <200208052123.g75LNRV22589@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020805213809.GA17896@thyrsus.com>

Guido van Rossum :
> Andrew appears to say that if you object against '' in 'abc' not
> raising an exception, you should also object against the other one;
> but his real point is the corollary: since you don't object against
> giving 'ab' in 'abc' new meaning, you shouldn't object against a new
> meaning for '' in 'abc' either -- at least not based on the argument
> of breaking code.

I understand.  But there is a difference between changes that seem likely to 
silently break a lot iof things and changes for which one almost has to
contrive an example that would break.  I think this one is in the latter
category.

>                Whenever we say that a change doesn't break code,
> we almost always imply "except code that depends on a particular thing
> raising an exception".

Agreed.

> That '' in 'abc' or 'ab' in 'abc' raises TypeError tells me that it is
> okay to change this behavior into doing something useful, *if* we have
> a useful thing to substitute for the exception.

Also agreed; I parallel your reasoning as well is your conclusion, and
in fact thought this issue through before before I raised the possibility
again.

> Tim is arguing that '' in 'abc' is not a useful question to ask.  The
> usefulness of the exception is not that it's a feature on which
> correct programs depend, but that it's an early warning that your
> program is broken.  Losing that early warning sign would mean more
> time wasted debugging.

Yes.  Best for things to fail noisily if they're going to fail.
 
> OTOH I'm worried that some code doing some mathematical proof using
> substring relationships would find it irritating to have to work
> around the irregularity.  But I admit that this is a purely
> theoretical fear for now.

This doesn't concern me, and I used to be a mathematical logician
myself.  Don't worry about my ex-colleagues -- you're designing a tool
for programming, not a formalism for doing proof theory.
-- 
		Eric S. Raymond


From tim.one@comcast.net  Mon Aug  5 22:49:14 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 17:49:14 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: 
Message-ID: 

[Andrew Koenig]
> On the other hand, I have a practical fear: There are lots of
> different ways of asking whether a string s contains a substring s1.
> If those ways behave in diverse manners when s1 is empty, I am going
> to have to remember which way to obtain which behavior.  I would
> really like to avoid having to do that.

I don't count that as "a practical fear" unless you actually search for
empty strings, and I don't believe that you do (or at least not on
purpose -- you can change my mind in a hurry by posting your Python code
that does do so, though!).  If searching for empty strings isn't something
you do, then all methods of asking about substrings yield the same outcome.

This isn't, e,g., SNOBOL4, where matching againt a pattern variable that
somtimes contains a null pattern can be useful for its control-flow side
effects.  These are "just strings" in Python, and searching is just
searching.



From esr@thyrsus.com  Mon Aug  5 23:10:27 2002
From: esr@thyrsus.com (Eric S Raymond)
Date: Mon, 5 Aug 2002 18:10:27 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: 
References:  
Message-ID: <20020805221027.GC17896@thyrsus.com>

Tim Peters :
> This isn't, e,g., SNOBOL4, where matching againt a pattern variable that
> somtimes contains a null pattern can be useful for its control-flow side
> effects.  These are "just strings" in Python, and searching is just
> searching.

 And sometimes, a cigar is just a cigar. 
-- 
		Eric S. Raymond


From ark@research.att.com  Mon Aug  5 23:30:56 2002
From: ark@research.att.com (Andrew Koenig)
Date: Mon, 5 Aug 2002 18:30:56 -0400 (EDT)
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To:  (message from
 Tim Peters on Mon, 05 Aug 2002 17:49:14 -0400)
References: 
Message-ID: <200208052230.g75MUu922234@europa.research.att.com>

Tim> I don't count that as "a practical fear" unless you actually
Tim> search for empty strings, and I don't believe that you do (or at
Tim> least not on purpose -- you can change my mind in a hurry by
Tim> posting your Python code that does do so, though!).  If searching
Tim> for empty strings isn't something you do, then all methods of
Tim> asking about substrings yield the same outcome.

Unless you're trying to teach the language to someone else, in which
case you have to explain the behavior regardless of whether you've
written programs that depend on it.

I doubt you've ever written a program that searches for the string
'asoufnyqcynreqywrycq98746qwh', yet I imagine that you would still
object to a search function that throws an exception when presented
with that particular string.

I'm not trying to be flip here -- I'm trying to make the point that in
my opinion, having a uniform rule is preferable to catching particular
cases that are sometimes mistakes.


From aahz@pythoncraft.com  Mon Aug  5 23:49:03 2002
From: aahz@pythoncraft.com (Aahz)
Date: Mon, 5 Aug 2002 18:49:03 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: <200208052230.g75MUu922234@europa.research.att.com>
References:  <200208052230.g75MUu922234@europa.research.att.com>
Message-ID: <20020805224903.GB8242@panix.com>

On Mon, Aug 05, 2002, Andrew Koenig wrote:
>
> I'm not trying to be flip here -- I'm trying to make the point that in
> my opinion, having a uniform rule is preferable to catching particular
> cases that are sometimes mistakes.

It's not so much that '' in 'abc' is a mistake as that there's no
sensible answer to be given.  When Python can't figure out how to
deliver a sensible answer, it raises an exception: "In the face of
ambiguity, refuse the temptation to guess."
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Project Vote Smart: http://www.vote-smart.org/


From jeremy@alum.mit.edu  Mon Aug  5 19:14:07 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Mon, 5 Aug 2002 14:14:07 -0400
Subject: [Python-Dev] framer tool
In-Reply-To: <15694.63696.655874.808626@localhost.localdomain>
References: 
 
 <15694.63696.655874.808626@localhost.localdomain>
Message-ID: <15694.49263.520792.939798@slothrop.zope.com>

>>>>> "SM" == Skip Montanaro  writes:

  [From a checkin that I made recently of Tools/framer]
  >>> framer is a tool to generate boilerplate code for C extension
  >>> types.

  Jack> how does framer relate to modulator? Is it a replacement?
  Jack> Should modulator be adapted to framer? (And, if so, who's
  Jack> going to do it? :-)

Framer could be a replacement for modulator.  The original impetus for
framer came from Jim Fulton, who suggested that modulator be updated
so that it could be used for C extension types.

I thought that Zope-style interfaces would be a nice way to specify
the signatures of the extension module and types.  Since modulator
didn't handle the specifications or the new 2.2/2.3 features, I didn't
really look at it.

Should I try to make framer a modulator replacement?  I've got some
time to work on it, but checked in the current progress in hopes of
finding more help.

  SM> How does framer relate to Pyrex?

Pyrex is a tool to generate a complete C module from a variant of
Python source.  Framer is a tool to generate just the boilerplate --
the frame.  Framer is intended to support people who are going to
maintain a C extension by hand.  The code it generates is easy to read
and edit.  I wouldn't want to read the Pyrex-generated C code.

Pyrex is intended for converting existing Python code to C, for
performance.  (I think.)  Framer is intended for C programmers who
don't want to type all the boilerplate for an extension.  In some
ways, it's closer to SWIG than to Pyrex.

I think there is a common subset of functionality to Pyrex, SWIG, and
Framer -- namely generating the basic wrapper code to make C code
callable from Python.  It might be worthwhile to share that code among
the projects; Greg certainly seems to have covered a lot of ground
handling __methods__ with Pyrex.

Jeremy




From tim.one@comcast.net  Tue Aug  6 00:17:04 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 19:17:04 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: <200208052230.g75MUu922234@europa.research.att.com>
Message-ID: 

[Andrew Koenig]
> Unless you're trying to teach the language to someone else, in which
> case you have to explain the behavior regardless of whether you've
> written programs that depend on it.

I would tell them that searching for an empty string is silly, that they're
never going to need to do it, but if they do then they should consider
whatever happens an accident.

> I doubt you've ever written a program that searches for the string
> 'asoufnyqcynreqywrycq98746qwh', yet I imagine that you would still
> object to a search function that throws an exception when presented
> with that particular string.

Of course, but I can easily conceive of *wanting* to search for
'asoufnyqcynreqywrycq98746qwh'.  Indeed, I just did a grep over my Python
code to be sure that I never had searched for it before .  But I can't
conceive of wanting to search for an empty string, despite effort after
suspension of disbelief.

> I'm not trying to be flip here -- I'm trying to make the point that in
> my opinion, having a uniform rule is preferable to catching particular
> cases that are sometimes mistakes.

The distinction between empty and non-empty is the only one being made here,
and (unlike picking on 'asouf'etc) is a natural distinction in its domain.



From pobrien@orbtech.com  Tue Aug  6 00:27:50 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Mon, 5 Aug 2002 18:27:50 -0500
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: <20020805224903.GB8242@panix.com>
Message-ID: 

[Aahz]
> >
> > I'm not trying to be flip here -- I'm trying to make the point that in
> > my opinion, having a uniform rule is preferable to catching particular
> > cases that are sometimes mistakes.
>
> It's not so much that '' in 'abc' is a mistake as that there's no
> sensible answer to be given.  When Python can't figure out how to
> deliver a sensible answer, it raises an exception: "In the face of
> ambiguity, refuse the temptation to guess."

In what way does find('') return a sensible answer?

>>> 'help'.find('')
0
>>> 'help'.find('h')
0
>>> 'help'.find('e')
1
>>> 'help'.find('l')
2
>>> 'help'[0]
'h'
>>> 'help'[1]
'e'
>>> 'help'[2]
'l'
>>> s = 'help'
>>> s[s.find('')]
'h'
>>> s[s.find('h')]
'h'

I don't see the logic in this and I couldn't find anything in the docs to
explain this behavior. I'm guessing this is old hat for most of you, but I
find this a bit surprising myself.

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------



From tim.one@comcast.net  Tue Aug  6 00:27:30 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 19:27:30 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: <20020805224903.GB8242@panix.com>
Message-ID: 

[Aahz]
> It's not so much that '' in 'abc' is a mistake as that there's no
> sensible answer to be given.

    s2.find(s1)

returns the smallest non-negative int i such that

    s2[i : i+len(s1] == s1

provided such an i exists.  That's as sensible as any answer, and more
sensible than most , if you have to give a meaning when s1 is empty.

> When Python can't figure out how to deliver a sensible answer,

Well, i==0 isn't a *compelling* answer when s1=="".  "It falls out of the
forumla" is about the best that can be said for it.

> it raises an exception: "In the face of ambiguity, refuse the temptation
> to guess."

That's pretty much my view.  The user has just given us reason to doubt they
know what their program is doing, and I'd rather be *helpful* then than push
on in the interest of purity.

The most plausible use case I've been able to dream up is representing small
finite sets as sorted strings of characters.  Then having

    s1 in s2

raise an exception when s1 is "" doesn't do the right thing for "the empty
set".  OTOH, it doesn't do the right thing in most other cases either, like

    "ac" in "abc" -> False

so it's hard to get too upset about the empty set failing .



From pobrien@orbtech.com  Tue Aug  6 00:36:48 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Mon, 5 Aug 2002 18:36:48 -0500
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: 
Message-ID: 

[Patrick K. O'Brien]
>
> I don't see the logic in this and I couldn't find anything in the docs to
> explain this behavior. I'm guessing this is old hat for most of you, but I
> find this a bit surprising myself.

This one is even more fun. (Apologies in advance if I'm pouring salt on old
wounds.)

>>> s = 'help'
>>> s.rfind('')
4
>>> s[s.rfind('')]
Traceback (most recent call last):
  File "", line 1, in ?
IndexError: string index out of range
>>>

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------



From tim.one@comcast.net  Tue Aug  6 00:33:37 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 19:33:37 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: 
Message-ID: 

[Patrick K. O'Brien]
> In what way does find('') return a sensible answer?
>
> >>> 'help'.find('')
> 0
> >>> 'help'.find('h')
> 0
> >>> 'help'.find('e')
> 1
> >>> 'help'.find('l')
> 2
> >>> 'help'[0]
> 'h'
> >>> 'help'[1]
> 'e'
> >>> 'help'[2]
> 'l'
> >>> s = 'help'
> >>> s[s.find('')]
> 'h'
> >>> s[s.find('h')]
> 'h'
>
> I don't see the logic in this

In what?  The meaning of "this" isn't clear.  Do you mean that not a single
one of those results makes sense to you, or that some particular ones don't
make sense to you?  If the latter case, which particular ones?

Note that searching for any prefix of 'help' returns 0:

>>> 'help'.find('help')
0
>>> 'help'.find('hel')
0
>>> 'help'.find('he')
0
>>> 'help'.find('h')
0
>>> 'help'.find('')
0
>>>

Of course '' is a prefix of any string whatsoever, so it's not like the
final result is of much use (it's more like "no information in, no
information out").



From ark@research.att.com  Tue Aug  6 00:35:01 2002
From: ark@research.att.com (Andrew Koenig)
Date: Mon, 5 Aug 2002 19:35:01 -0400 (EDT)
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To:  (message from
 Tim Peters on Mon, 05 Aug 2002 19:17:04 -0400)
References: 
Message-ID: <200208052335.g75NZ1l22651@europa.research.att.com>

Tim> The distinction between empty and non-empty is the only one being made here,
Tim> and (unlike picking on 'asouf'etc) is a natural distinction in its domain.

Fair enough.

Nevertheless, you have not convinced me that this distinction
is useful in this context.

I will agree with you that (a) Many times, people search for literals
in strings, and (b) it is hard to imagine why an empty literal would
be useful.

However, that says nothing to me about why an expression of the form
(s in t) should be considered an error when s has no characters.
And it says even less about why it would be a good idea to have
the result of such a search yield different results in different contexts.



From pobrien@orbtech.com  Tue Aug  6 00:49:45 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Mon, 5 Aug 2002 18:49:45 -0500
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: 
Message-ID: 

[Tim Peters]
>
> Of course '' is a prefix of any string whatsoever, so it's not like the
> final result is of much use (it's more like "no information in, no
> information out").

I just never thought of a Python string as beginning and ending with a null.
So the fact that find('') and rfind('') both return something other than -1
was surprising to me. My "plain English" intrepretation of the docstring for
find gave me the impression that if I searched for a single character and
got back an index other than -1 that I could then retrieve that character
from the string and it would equal the character used in the original
search.

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------



From greg@cosc.canterbury.ac.nz  Tue Aug  6 00:58:27 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 06 Aug 2002 11:58:27 +1200 (NZST)
Subject: [Python-Dev] Simpler reformulation of C inheritance Q.
In-Reply-To: <200208051408.g75E84113668@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200208052358.g75NwRt28374@oma.cosc.canterbury.ac.nz>

> Here's a hack.
> 
> For static extensions, you could extend one of the extension structs,
> e.g. PyMappingMethods

This perhaps suggests a way of handling this in a more
general way in the future:

Add a slot to the typeobject which points to a variable-sized
array of pointers. There is one entry in the array for each
level of inheritance, and it points to a struct containing
whatever extra stuff you want to add at that level.

This would only handle single inheritance, but I think
that's all you can have at the C level anyway, isn't
it?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From pobrien@orbtech.com  Tue Aug  6 01:05:53 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Mon, 5 Aug 2002 19:05:53 -0500
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: 
Message-ID: 

[Tim Peters]
>
>     s2.find(s1)
>
> returns the smallest non-negative int i such that
>
>     s2[i : i+len(s1)] == s1  # Fixed typo in original.
>
> provided such an i exists.  That's as sensible as any answer, and more
> sensible than most , if you have to give a meaning when s1 is empty.

That clarified things for me, thanks.

(But if you squint while thinking about finding single characters and using
the result to access the original string via index notation, rather than
slice, and you ignore the fact that '' isn't even a single character and you
do this late in the day... you might see why I wasn't seeing the logic of
'whatever'.find('') returning 0.)

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------



From guido@python.org  Tue Aug  6 01:06:04 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 05 Aug 2002 20:06:04 -0400
Subject: [Python-Dev] framer tool
In-Reply-To: Your message of "Mon, 05 Aug 2002 14:14:07 EDT."
 <15694.49263.520792.939798@slothrop.zope.com>
References:   <15694.63696.655874.808626@localhost.localdomain>
 <15694.49263.520792.939798@slothrop.zope.com>
Message-ID: <200208060006.g76064C24429@pcp02138704pcs.reston01.va.comcast.net>

>   Jack> how does framer relate to modulator? Is it a replacement?
>   Jack> Should modulator be adapted to framer? (And, if so, who's
>   Jack> going to do it? :-)
> 
> Framer could be a replacement for modulator.  The original impetus for
> framer came from Jim Fulton, who suggested that modulator be updated
> so that it could be used for C extension types.
> 
> I thought that Zope-style interfaces would be a nice way to specify
> the signatures of the extension module and types.  Since modulator
> didn't handle the specifications or the new 2.2/2.3 features, I didn't
> really look at it.

Jeremy points out that framer's *output* is different.  I'd like to
mention that framer's *input* is also different; Modulator is  GUI
tool, framer reads .py files.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net  Tue Aug  6 01:11:21 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 20:11:21 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: 
Message-ID: 

[Patrick K. O'Brien]
> I just never thought of a Python string as beginning and ending
> with a null.

Oh, it's worse than just *that*.  There's a null string at s[i:i] for every
value of i, although the *implementation* of find() seems flawed in this
respect; e.g.,

>>> 'abc'.find('', 3)
3
>>>

violates the doc's promise that the result returned (when not -1) is an
"index in s" (but 3 is not an index in 'abc'), while

>>> 'abc'.find('', 4)
-1
>>>

is anybody's guess ('' is certainly a substring of 'abc'[4:]).

However, when you're in the business of returning results that don't have
concrete meaning, things like this happen .

> So the fact that find('') and rfind('') both return something
> other than -1 was surprising to me.

Then you'll be glad to hear that we're going to make

    '' in 'abc'

return True too to help you build on your now-clear understanding .



From greg@cosc.canterbury.ac.nz  Tue Aug  6 01:31:33 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 06 Aug 2002 12:31:33 +1200 (NZST)
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208051954.g75JsvI04367@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200208060031.g760VXu28532@oma.cosc.canterbury.ac.nz>

> But the satisfaction that spelling "has_key" as "in" gives me suggests
> that there's more potential to it.

I thought you'd always argued against this before, on
the grounds that the convenience wasn't worth the
inconsistency. Are you starting to change your mind?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From gmcm@hypernet.com  Tue Aug  6 02:36:48 2002
From: gmcm@hypernet.com (Gordon McMillan)
Date: Mon, 5 Aug 2002 21:36:48 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <3D4EACCC.31934.3FF7B782@localhost>
References: 
Message-ID: <3D4EEFF0.27145.40FDFEA8@localhost>

On 5 Aug 2002 at 16:50, Gordon McMillan wrote:

> What I'm really saying is that I almost never use x
> in str because it's semantics have always been
> peculiar. Thus, I don't *really* care whether '' in
> str raises an exception, because if it does, I
> won't train myself to use it . 

Turns out that's not true. When I want set membership,
I first write "char in ('a', 'b', 'c')", then
sometimes change it because "char in 'abc'" is more
efficient.

So whether '' in 'abc' will work or not is a red
herring. The real issue is that membership gets
conflated with subsetting.

-- Gordon
http://www.mcmillan-inc.com/



From tim.one@comcast.net  Tue Aug  6 03:18:31 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 22:18:31 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: <200208052335.g75NZ1l22651@europa.research.att.com>
Message-ID: 

[Andrew Koenig]
> ...
> Nevertheless, you have not convinced me that this distinction
> is useful in this context.

That will have to wait until it burns you in practice.

> I will agree with you that (a) Many times, people search for literals
> in strings, and (b) it is hard to imagine why an empty literal would
> be useful.
>
> However, that says nothing to me about why an expression of the form
> (s in t) should be considered an error when s has no characters.

Nor should it -- literals are the simplest form of expression, used here
just for concreteness.  The question for you is, *however* the value of s
was obtained, if you end up doing "s in t" when s happens to be an empty
string, is it more likely that your program has strayed from your intent, or
that a result of True *was* your intent?

    if s[j+k1:j+k2] in t:

Assuming type correctness, if I know that raises an exception whenever k1 >=
k2, then I have confidence I know what the code is trying to do, and rest
easy knowing it won't do something nuts if the index expressions go crazy.
If instead it never(!) raises an exception, no matter what the values of j,
k1 and k2, this code scares me.

When Python switched to allowing negative indices as sequence subscripts (it
didn't always -- they used to raise exceptions), it introduced a nasty class
of bug caused by conceptually non-negative indices going negative by
mistake, but no longer complaining.  Overall I think negative indices added
enough expressiveness to outweigh that drawback, but it was far from a pure
win.  This is a case where we're also keen to make a formerly exceptional
operation "mean something", but there's one particular case of it where I
know doing so will create similar new problems -- and it's a case that's of
no *real* use to allow.

> And it says even less about why it would be a good idea to have
> the result of such a search yield different results in different
> contexts.

I agree that's not a good thing at all, and it may well win Guido in the
end.  I just hope he feels rotten about it, because the children will suffer
as a result .



From tim.one@comcast.net  Tue Aug  6 03:24:35 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 22:24:35 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208060031.g760VXu28532@oma.cosc.canterbury.ac.nz>
Message-ID: 

[Guido]
> But the satisfaction that spelling "has_key" as "in" gives me suggests
> that there's more potential to it.

[Greg Ewing]
> I thought you'd always argued against this before, on
> the grounds that the convenience wasn't worth the
> inconsistency. Are you starting to change your mind?

This one's a done deal; it was released in 2.2:

>>> 2 in {2: 3}
1
>>>

Similarly, "for k in dict" is like "for k in dict.iterkeys()" in 2.2.  Guido
never changes his mind .



From tim.one@comcast.net  Tue Aug  6 03:33:04 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 05 Aug 2002 22:33:04 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <3D4EEFF0.27145.40FDFEA8@localhost>
Message-ID: 

[Gordon McMillan]
> Turns out that's not true. When I want set membership,
> I first write "char in ('a', 'b', 'c')", then
> sometimes change it because "char in 'abc'" is more
> efficient.

"char in dict_acting_as_a_set" is faster still, if you're really keen to
speed it.

> So whether '' in 'abc' will work or not is a red
> herring.

For your particular use, possibly.  If "char" is computed and may become
empty by mistake, then it's not a red herring (it's the difference between
getting True and getting an exception).

> The real issue is that membership gets conflated with subsetting.

For strings, yes, if you change it to "the membership meaning goes away
entirely in general, and a substring meaning replaces it".  If "char" is
computed and may become longer than one character by mistake, then in your
use something that used to raise an exception would instead return True or
False, depending on the data values.



From greg@cosc.canterbury.ac.nz  Tue Aug  6 03:33:45 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 06 Aug 2002 14:33:45 +1200 (NZST)
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: 
Message-ID: <200208060233.g762Xjm29305@oma.cosc.canterbury.ac.nz>

Tim:
> [Greg Ewing]
> > I thought you'd always argued against this before, on
> > the grounds that the convenience wasn't worth the
> > inconsistency. Are you starting to change your mind?
> 
> This one's a done deal; it was released in 2.2:
> 
> >>> 2 in {2: 3}
> 1

I'm talking about making "for x in string" do a substring
test. This is different from "for x in dict", because at
least the latter is still a kind of membership test.

I thought Guido was against having "in" do anything
other than membership tests, but his last message sounded
like he was changing his mind.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From pinard@iro.umontreal.ca  Tue Aug  6 04:25:18 2002
From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard)
Date: 05 Aug 2002 23:25:18 -0400
Subject: [Python-Dev] Pyrex praise [was: Re: framer tool]
In-Reply-To: <15694.49263.520792.939798@slothrop.zope.com>
References: 
 
 <15694.63696.655874.808626@localhost.localdomain>
 <15694.49263.520792.939798@slothrop.zope.com>
Message-ID: 

[Jeremy Hylton]

> Pyrex is intended for converting existing Python code to C, for
> performance.  (I think.)

Here is how I see Pyrex, and why I have much interest in it.

I do not see Pyrex as a tool whose main goal is converting existing Python
code to C, yet to some extent, it could be used with this goal in head.
It is a tool for the programmer to express various interfaces between C and
Python, using a variant of the Python language augmented with C declarations,
_instead_ of the more usual C language augmented with macros and an API.
The fact that Pyrex produces C code along the road is only part of the
internal mechanics, but is not much of practical interest to the programmer.

Pyrex can be used to extend Python with C code, written in C for the
circumstance, or to wrap pre-existing C libraries.  Pyrex can also be used
to embed Python functions, written and interpreted by Python, within what
would otherwise be a pure C application.  I would guess that Pyrex could
also be used with Python only or (with proper care) with C only, and not
to build an Python-C interface, but these cases are probably not goint to
be usual for me.

The same as it is generally easier and more comfortable to develop and
debug an algorithm or program in Python than in C, would it be only because
C forces you into many details of memory management intricacies, it is
much more easier and more comfortable developing and debugging a Python-C
interface using Pyrex than using more traditional ways: you concentrate
on the interface without having to cautiously swim among reference counts
and the various and numerous API functions or macros.

A neat advantage at using Python instead of C to write your interface is that
you are much less likely to have bugs.  Pyrex knows how to break apart Python
structures and how to rebuild them, it takes care of properly maintaining
reference count invariants, etc. so as long as Pyrex is not itself buggy,
your interface is really on the safe side, bug-wise.  As Python-C interface
bugs might be painful to track down, this is big incentive towards Pyrex.
Being allowed to forget (or avoid learning) all the details of the C API
is yet another good selling point for Pyrex: it would be a spoiling of
resources having many members of a development team to learn the C API for
Python, while I can expect everybody in a programmer team to know Pyrex,
because the learning curve is so small.  Pyrex is more democratic! :-)

A final point, which looks important to me, is that any good wrapping of a
pre-existing C library is best done while giving a Python flavour to the
interface, would it be only for a nicer and natural object orientation.
If the wrapping is done using C to express the interface, the effort of
programming the interface in C while adding more Python-typical paradigms
is complicated by the language distance between C and Python.  But as Pyrex
is very close to Python, Pyrex allows natural and speedy building of more
proper interfaces.  The Pyrex code itself, which holds the glue between
C and Python, is exactly the right place for implementing that necessary
layer meant to reshape the C facilities into Python ways.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard


From David Abrahams" 
Message-ID: <01fa01c23cfa$373cd7a0$62f0fc0c@boostconsulting.com>

From: "Greg Ewing" 


> > Here's a hack.
> >
> > For static extensions, you could extend one of the extension structs,
> > e.g. PyMappingMethods
>
> This perhaps suggests a way of handling this in a more
> general way in the future:
>
> Add a slot to the typeobject which points to a variable-sized
> array of pointers. There is one entry in the array for each
> level of inheritance, and it points to a struct containing
> whatever extra stuff you want to add at that level.
>
> This would only handle single inheritance, but I think
> that's all you can have at the C level anyway, isn't
> it?

1. I'm pretty sure the answer to the above question is no
2. The scheme you propose is more costly in memory and cycles than I'd like
(FWIW)

-Dave

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com





From ark@research.att.com  Tue Aug  6 05:12:07 2002
From: ark@research.att.com (Andrew Koenig)
Date: 06 Aug 2002 00:12:07 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: 
References: 
Message-ID: 

Tim>     s2.find(s1)

Tim> returns the smallest non-negative int i such that

Tim>     s2[i : i+len(s1] == s1

Tim> provided such an i exists.  That's as sensible as any answer, and more
Tim> sensible than most , if you have to give a meaning when s1 is empty.

Well, no -- you have to put the missing parenthesis in first.

Here it is.....   :-)

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From ark@research.att.com  Tue Aug  6 05:19:53 2002
From: ark@research.att.com (Andrew Koenig)
Date: 06 Aug 2002 00:19:53 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: 
References: 
Message-ID: 

Tim> The question for you is, *however* the value of s was obtained,
Tim> if you end up doing "s in t" when s happens to be an empty
Tim> string, is it more likely that your program has strayed from your
Tim> intent, or that a result of True *was* your intent?

It isn't *the* question; it's *a* question.

Another question is whether adding additional complexity to the rules
helps or hurts in genera.

Tim>     if s[j+k1:j+k2] in t:

Tim> Assuming type correctness, if I know that raises an exception whenever k1 >=
Tim> k2, then I have confidence I know what the code is trying to do, and rest
Tim> easy knowing it won't do something nuts if the index expressions go crazy.
Tim> If instead it never(!) raises an exception, no matter what the values of j,
Tim> k1 and k2, this code scares me.

Then why not remove your fear by executing

        assert k1 < k2

first?

Tim> When Python switched to allowing negative indices as sequence
Tim> subscripts (it didn't always -- they used to raise exceptions),
Tim> it introduced a nasty class of bug caused by conceptually
Tim> non-negative indices going negative by mistake, but no longer
Tim> complaining.  Overall I think negative indices added enough
Tim> expressiveness to outweigh that drawback, but it was far from a
Tim> pure win.  This is a case where we're also keen to make a
Tim> formerly exceptional operation "mean something", but there's one
Tim> particular case of it where I know doing so will create similar
Tim> new problems -- and it's a case that's of no *real* use to allow.

Well, we don't know that yet.  We just know that you haven't seen one.
And I must say that I don't expect (s1 in s2) to be all that common
an operation anyway when s1 and s2 are strings.

>> And it says even less about why it would be a good idea to have
>> the result of such a search yield different results in different
>> contexts.

Tim> I agree that's not a good thing at all, and it may well win Guido
Tim> in the end.  I just hope he feels rotten about it, because the
Tim> children will suffer as a result .

This whole issue feels to me like the way APL behaves when you ask it
for the number of elements in a scalar:  Instead of giving the obvious
answer (a scalar has 1 element), it gives a much deeper answer (the
number of elements in a scalar is an empty vector, because a scalar
has no dimensions).  That behavior bites novices all the time, but
I have encountered programs that become much simpler as a result.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From greg@cosc.canterbury.ac.nz  Tue Aug  6 05:24:52 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 06 Aug 2002 16:24:52 +1200 (NZST)
Subject: [Python-Dev] Simpler reformulation of C inheritance Q.
In-Reply-To: <01fa01c23cfa$373cd7a0$62f0fc0c@boostconsulting.com>
Message-ID: <200208060424.g764Oqa29771@oma.cosc.canterbury.ac.nz>

David Abrahams :

> > This would only handle single inheritance, but I think
> > that's all you can have at the C level anyway, isn't
> > it?
> 
> 1. I'm pretty sure the answer to the above question is no

Er, you mean it *is* possible to inherit from multiple
extension types? How?

> 2. The scheme you propose is more costly in memory and cycles than I'd
> like

It's only one memory cycle more than it takes to access
the existing sub-structures. And it's a lot better than
the alternative, which is doing a Python dict lookup!

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From oren-py-d@hishome.net  Tue Aug  6 05:34:22 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Tue, 6 Aug 2002 07:34:22 +0300
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: ; from martin@v.loewis.de on Mon, Aug 05, 2002 at 11:06:25PM +0200
References: <20020805144636.A9355@hishome.net>  <20020805234404.A24400@hishome.net> 
Message-ID: <20020806073422.A28801@hishome.net>

On Mon, Aug 05, 2002 at 11:06:25PM +0200, Martin v. Loewis wrote:
> If you look at the patch, you see that it precisely does what you
> propose to do: add a callback to the charmap codec:
> 
> - it deletes charmap_decoding_error
> - it adds state to feed the callback function
> - it replaces the old call to charmap_decoding_error by

But it's NOT an error. It's new encoding functionality.  What if the new 
functionality you've added this way has an error of its own? Perhaps you
would like to have a flag to tell it whether to ignore error or raise an
exception?  Sorry, that argument has been taken over for another purpose.  

The real problem was some missing functionality in codecs. Here are two 
approaches to solve the problem:

1. Add the missing functionality.

2. Keep the old, limited functionality, let it fail, catch the error,
re-use an argument originally intended for an error handling strategy to 
shoehorn a callback that can implement the missing functionality, add a new 
name-based registry to overcome the fact that the argument must be a string.
Since this approach is conceptually stuck on treating it as an error it 
actually creates and discards a new exception object for each character 
converted via this path.

Ummm... , tough choice.

	Oren



From David Abrahams" 
Message-ID: <022a01c23d09$65999840$62f0fc0c@boostconsulting.com>

----- Original Message -----
From: "Greg Ewing" 


> David Abrahams :
>
> > > This would only handle single inheritance, but I think
> > > that's all you can have at the C level anyway, isn't
> > > it?
> >
> > 1. I'm pretty sure the answer to the above question is no
>
> Er, you mean it *is* possible to inherit from multiple
> extension types? How?

One way is by invoking the metatype with a bases tuple which includes the
extension types.
I think you can also fill in tp_bases explicitly in a new extension type,
but it's been a long time since I crawled through that code and discussed
it with Guido.

> > 2. The scheme you propose is more costly in memory and cycles than I'd
> > like
>
> It's only one memory cycle more than it takes to access
> the existing sub-structures. And it's a lot better than
> the alternative, which is doing a Python dict lookup!

When I spoke of memory, I was talking about the extra pointer per level of
inheritance.
When I spoke of cycles, I was talking about the cycles to manage that
memory (probably moot).

It's not too terrible, but I'd like it a lot better if types would just use
tp_basicsize to find the beginning of the variable stuff so we could embed
the memory in the type itself. 'Course, I've forgotten more than I knew
about that code, so I might be barking up the wrong banyan.

-Dave

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com




From mal@lemburg.com  Tue Aug  6 08:36:40 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 06 Aug 2002 09:36:40 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: 
Message-ID: <3D4F7C88.4070407@lemburg.com>

Jack Jansen wrote:
> 
> On maandag, augustus 5, 2002, at 05:47 , M.-A. Lemburg wrote:
> 
>> Jack Jansen wrote:
>>
>>> Having to register the error handler first and then finding it by 
>>> name smells like a very big hack to me. I understand the reasoning 
>>> (that you don't want to modify the API of a gazillion C routines to 
>>> add an error object argument) but it still seems like a hack....
>>
>>
>> Well, in that case, you would have to call the whole codec registry
>> a hack ;-)
> 
> 
> No, not really. For codecs I think that there needn't be much of a 
> connection between the codec-supplier and the codec-user. Conceivably 
> the encoding-identifying string being passed to encode() could even have 
> been read from a data file or something.
> 
> For error handling this is silly: the code calling encode() or decode() 
> will know how it wants errors handled. And if you argue that it isn't 
> really error handling but an extension to the encoding name then maybe 
> it should be treated as such (by appending it to the codec name in the 
> string, as in "ascii;xmlentitydefs" or so?).

You are omitting the fact, though, that different codecs may need
different implementations of a specific error handler. Now the
error handler will always implement the same logic, so to the users
it's all the same thing. And by using the string alias he needn't
worry about where to get the error handler from (it typically
lives with the codec itself).

Note that error handling is not really an extension to the encoding
itself. It just happens that it can be put to use that way for
e.g. escaping non-representable characters. Other applications
like fetching extra information from a external sources or logging
the positions of coding problems do not fall into this category.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From mal@lemburg.com  Tue Aug  6 09:06:13 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 06 Aug 2002 10:06:13 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <20020805144636.A9355@hishome.net>  <20020805234404.A24400@hishome.net>  <20020806073422.A28801@hishome.net>
Message-ID: <3D4F8375.6030302@lemburg.com>

Oren Tirosh wrote:
> On Mon, Aug 05, 2002 at 11:06:25PM +0200, Martin v. Loewis wrote:
> 
>>If you look at the patch, you see that it precisely does what you
>>propose to do: add a callback to the charmap codec:
> 
> But it's NOT an error. It's new encoding functionality.  What if the new 
> functionality you've added this way has an error of its own? Perhaps you
> would like to have a flag to tell it whether to ignore error or raise an
> exception?  Sorry, that argument has been taken over for another purpose.  
> 
> The real problem was some missing functionality in codecs. Here are two 
> approaches to solve the problem:
> 
> 1. Add the missing functionality.
> 
> 2. Keep the old, limited functionality, let it fail, catch the error,
> re-use an argument originally intended for an error handling strategy to 
> shoehorn a callback that can implement the missing functionality, add a new 
> name-based registry to overcome the fact that the argument must be a string.
> Since this approach is conceptually stuck on treating it as an error it 
> actually creates and discards a new exception object for each character 
> converted via this path.
> 
> Ummm... , tough choice.

Oren, if you just want a codec which encodes and decodes
HTML entities, then this can be done easily by writing a codec
which works on Unicode only and is stacked on top of the other
existing codecs, e.g. if you first encode all non-printable
and non-ASCII code points using entity escapes and then pass
this Unicode string to one of the other codecs, you have
a solution to your problem.

Note that this is different from trying to
provide a work-around for encoding code points from Unicode
for which there are no corresponding mappings in a given
encoding. These situations would normally result in an
exception. Now HTML and XML offer you the possibility to
use special escapes for these, so that you can still encode
the complete Unicode set into e.g. ASCII, but only under
the premises that the encoded data is HTML or XML text.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From martin@v.loewis.de  Tue Aug  6 09:25:34 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 06 Aug 2002 10:25:34 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: <20020806073422.A28801@hishome.net>
References: <20020805144636.A9355@hishome.net>
 
 <20020805234404.A24400@hishome.net>
 
 <20020806073422.A28801@hishome.net>
Message-ID: 

Oren Tirosh  writes:

> > If you look at the patch, you see that it precisely does what you
> > propose to do: add a callback to the charmap codec:
> > 
> > - it deletes charmap_decoding_error
> > - it adds state to feed the callback function
> > - it replaces the old call to charmap_decoding_error by
> 
> But it's NOT an error. It's new encoding functionality.  

What is not an error? The handling? Certainly: the error and the error
handler are different things; error handlers are not errors. "ignore"
and "replace" are not errors, either, they are also new encoding
functionality. That is the very nature of handlers: they add
functionality.

> The real problem was some missing functionality in codecs. Here are two 
> approaches to solve the problem:
> 
> 1. Add the missing functionality.

That is not feasible, since you want that functionality also for
codecs you haven't heard of.

> 2. Keep the old, limited functionality, let it fail, catch the error,
> re-use an argument originally intended for an error handling strategy to 
> shoehorn a callback that can implement the missing functionality, add a new 
> name-based registry to overcome the fact that the argument must be a string.

That is possible, but inefficient. It is also the approach that people
use today, and the reason for this PEP to exist. The current
UnicodeError does not report any detail on the state that the codec
was in.

> Since this approach is conceptually stuck on treating it as an error it 
> actually creates and discards a new exception object for each character 
> converted via this path.

It's worth: If you find that the entire string cannot be encoded, you
have typically two choices:
- you perform a binary search. That may cause log n exceptions.
- you encode every character on its own. That reduce the number of
  exceptions to the number of unencodable characters, but it will also
  mean that the encoding is wrong for some encodings: You will always
  get the shift-in/shift-out sequences that your encoding may specify.

On decoding, this is worse: feeding a byte at a time may fail
altogether if you happen to break a multibyte character - when feeding
the entire string happily consumes long sequences of characters, and
only runs into a single problem byte.

Regards,
Martin


From oren-py-d@hishome.net  Tue Aug  6 10:20:12 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Tue, 6 Aug 2002 12:20:12 +0300
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: ; from martin@v.loewis.de on Tue, Aug 06, 2002 at 10:25:34AM +0200
References: <20020805144636.A9355@hishome.net>  <20020805234404.A24400@hishome.net>  <20020806073422.A28801@hishome.net> 
Message-ID: <20020806122012.A14803@hishome.net>

On Tue, Aug 06, 2002 at 10:25:34AM +0200, Martin v. Loewis wrote:
> > 2. Keep the old, limited functionality, let it fail, catch the error,
> > re-use an argument originally intended for an error handling strategy to 
> > shoehorn a callback that can implement the missing functionality, add a new 
> > name-based registry to overcome the fact that the argument must be a string.
> 
> That is possible, but inefficient. 

I'm confused.

I have just described what PEP 293 is proposing and you say that it's 
inefficient :-? I find it hard to believe that this is what you relly meant 
since you are presumably in favor of this PEP in its current form. 

I can't tell if we actually disagree because apparently we don't 
understand each other.

> > Since this approach is conceptually stuck on treating it as an error it 
> > actually creates and discards a new exception object for each character 
> > converted via this path.
> 
> It's worth: If you find that the entire string cannot be encoded, you
> have typically two choices:
...

Instead of treating it as a problem ("the string cannot be encoded") and 
getting trapped in the mindset of error handling I suggest approaching it 
from a positive point of view: "how can I make the encoding work the
way I want it to work?".  Let's leave the error handling for real errors.

Treating this as an error-handling issue was so counter-intuitive to me 
that until recently I never bothered to read PEP 293. The title made me 
think that it's completely irrelevant to my needs. After all, what I 
wanted was to translate HTML to/from Unicode, not find a better way to 
handle errors.

	Oren



From pedroni@inf.ethz.ch  Tue Aug  6 10:28:42 2002
From: pedroni@inf.ethz.ch (Samuele Pedroni)
Date: Tue, 6 Aug 2002 11:28:42 +0200
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
Message-ID: <001f01c23d2b$a81a9e40$6d94fea9@newmexico>

[Greg Ewing]
>I thought Guido was against having "in" do anything
>other than membership tests, but his last message sounded
>like he was changing his mind.

If

"thon" in "python"

then why not

[1,2] in [0,1,2,3]

(it's a purely rhetorical question)

in general I don't think it is a good idea
to have "in" be a membership vs subset/subseq
operator depending on non ambiguity, convenience
or simply implementer taste,
because truly there are data types (ex. sets)
that would need both and disambiguated.

Either python grows a new subset/subseq operator
but probably this is overkill (keyword issue, new
__magic__ method, not meaningful, con
venient for a lot of types)

or strings (etc) should simply grow a new
method with an appropriate name.

"py"-in-"python"-is-dark-side-sexy-ly y'rs - Samuele Pedroni.



From Jack.Jansen@oratrix.com  Tue Aug  6 10:46:22 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Tue, 6 Aug 2002 11:46:22 +0200
Subject: [Python-Dev] Re: framer tool
In-Reply-To: <15694.49263.520792.939798@slothrop.zope.com>
Message-ID: <5D8C900C-A921-11D6-BE0E-0030655234CE@oratrix.com>

On Monday, August 5, 2002, at 08:14 , Jeremy Hylton wrote:
>
> Should I try to make framer a modulator replacement?  I've got some
> time to work on it, but checked in the current progress in hopes of
> finding more help.

I think that would be a good idea. Modulator was something I quickly 
threw together years ago, I think that it may even have been the first 
Tkinter program I did (that may even have been the main reason for 
writing it:-). The code quality shows this, and it hasn't been 
maintained in aeons. Still, because it's such a quick and dirty tool it 
has it's place. It would be good if framer could grow similar 
functionality (a GUI where you tap a couple of buttons to create objects 
and methods, plus a couple of switches to select the protocols the 
objects should adhere to) so we can lay modulator to rest.
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From tismer@tismer.com  Tue Aug  6 10:51:38 2002
From: tismer@tismer.com (Christian Tismer)
Date: Tue, 06 Aug 2002 11:51:38 +0200
Subject: [Python-Dev] Simpler reformulation of C inheritance Q.
References: <200208060424.g764Oqa29771@oma.cosc.canterbury.ac.nz> <022a01c23d09$65999840$62f0fc0c@boostconsulting.com>
Message-ID: <3D4F9C2A.2040704@tismer.com>

David Abrahams wrote:

[Greg Ewing, adding a level of indirection]

[David Abrahams]
>>>2. The scheme you propose is more costly in memory and cycles than I'd
>>>like
>>

[Greg]
>>It's only one memory cycle more than it takes to access
>>the existing sub-structures. And it's a lot better than
>>the alternative, which is doing a Python dict lookup!

Of course it is better. But since it it possible to
do more better, a sub-optimal solution will not
make me forget about it.

[David]
> When I spoke of memory, I was talking about the extra pointer per level of
> inheritance.
> When I spoke of cycles, I was talking about the cycles to manage that
> memory (probably moot).

Since we are talking of types and meta-types, I believe
memory issues are of minor interest.
There will not be more then a few hundred classes,
and they will be created just once.
The reason why I want to have extra data and function
caches in the types is that this is *very* memory
efficient, in comparison to stuffing things into the
instances (which would be easy to implement).

> It's not too terrible, but I'd like it a lot better if types would just use
> tp_basicsize to find the beginning of the variable stuff so we could embed
> the memory in the type itself. 'Course, I've forgotten more than I knew
> about that code, so I might be barking up the wrong banyan.

That's exactly what I want to do, but I have to find
out how the variable part of types is used at the moment,
and I admit I didn't understand it, yet.

The place where user stuff should go is where instances
have their slots. With meta-types, it now happens that
types become instances, but types refuse to have slots.
This needs to be changed, everything else is a workaround.

regards - chris

-- 
Christian Tismer             :^)   
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/




From Jack.Jansen@cwi.nl  Tue Aug  6 10:54:22 2002
From: Jack.Jansen@cwi.nl (Jack Jansen)
Date: Tue, 6 Aug 2002 11:54:22 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: <20020806122012.A14803@hishome.net>
Message-ID: <7C1726B2-A922-11D6-BE0E-0030655234CE@cwi.nl>

On Tuesday, August 6, 2002, at 11:20 , Oren Tirosh wrote:
> Treating this as an error-handling issue was so counter-intuitive to me
> that until recently I never bothered to read PEP 293. The title made me
> think that it's completely irrelevant to my needs. After all, what I
> wanted was to translate HTML to/from Unicode, not find a better way to
> handle errors.

I think that this is really also the gist of my misgiving about the 
design: enhancing a codec/adding extra filtering is a different thing 
than error handling. The PEP uses "error handing" in the prose, but the 
API is geared towards adding extra filtering.
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From tismer@tismer.com  Tue Aug  6 11:02:14 2002
From: tismer@tismer.com (Christian Tismer)
Date: Tue, 06 Aug 2002 12:02:14 +0200
Subject: [Python-Dev] Simpler reformulation of C inheritance Q.
References: <3D4D17AB.9040704@tismer.com> <3D4E56D9.3090503@tismer.com>              <01d601c23c84$d2783c80$62a6accf@boostconsulting.com> <200208051408.g75E84113668@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3D4F9EA6.9010802@tismer.com>

Guido van Rossum wrote:
...

>>From Christian's post I can't tell if he wants his types to be dynamic
> or static (i.e. if he's creating an arbitrary number of them at
> run-time or only a fixed number that's known at compile-time).

I'm not absolutely sure what I meant. Actually, I wanted to
cache existing methods, which are known at compile-time.
At run-time, they would be replaced for derived types.
But a run-time solution might make sense, to generate
very fast class variables, maybe.

> Here's a hack.
> 
> For static extensions, you could extend one of the extension structs,
> e.g. PyMappingMethods (which is the smallest and also least likely to
> grow new methods), with additional fields.  Then you'd have to know
> whether you can access those extra fields; I suggest checking for the
> metatype.  A few casts and you're done.
> 
> For dynamic extensions, you might be able to do the same: after
> type_new() has given you an object, allocate memory for an extended
> PyMappingMethods struct, copy the existing PyMappingMethods struct
> into it (if it exists), and replace the pointer.  Then in your
> deallocation function, make sure to free the pointer.
> 
> Hope this helps in the short run.

Thanks a lot. Yes, it helps in the short run, but stays
a hack. I'm trying to find a way that allows meta-types
to support slots for its type instances without introducing
too much special-casing.
What I do not understand yet is who uses the variable type
part and in which way. I'd like to collaborate with it.

ciao - chris

-- 
Christian Tismer             :^)   
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/




From martin@v.loewis.de  Tue Aug  6 11:12:54 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 06 Aug 2002 12:12:54 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: <20020806122012.A14803@hishome.net>
References: <20020805144636.A9355@hishome.net>
 
 <20020805234404.A24400@hishome.net>
 
 <20020806073422.A28801@hishome.net>
 
 <20020806122012.A14803@hishome.net>
Message-ID: 

Oren Tirosh  writes:

> > > 2. Keep the old, limited functionality, let it fail, catch the
> > > error, re-use an argument originally intended for an error
> > > handling strategy to shoehorn a callback that can implement the
> > > missing functionality, add a new name-based registry to overcome
> > > the fact that the argument must be a string.

> > That is possible, but inefficient. 
> 
> I'm confused.
> 
> I have just described what PEP 293 is proposing and you say that it's 
> inefficient :-? 

Perhaps I have misunderstood your description. I was assuming an
algorithm like

def new_encode(str, encoding, errors):
  return dispatch[errors](str, encoding)

def xml_encode(str, encoding):
  try:
    return str.encode(encoding, "strict")
  except UnicodeError:
    if len(str) == 1:
      return "&#%d;" % ord(str)
    return xml_encode(str[:len(str)/2], encoding) + \
           xml_encode(str[len(str)/2:], encoding)

dispatch['xmlcharref'] = xml_encode

This seems to match the description "keep the old, limited
functionality, let it fail, catch the error", and it has all the
deficiencies I mentioned. 

It also is not the meaning of PEP 293. The whole idea is that the
handler is invoked *before* something has failed.

> Instead of treating it as a problem ("the string cannot be encoded") and 
> getting trapped in the mindset of error handling I suggest approaching it 
> from a positive point of view: "how can I make the encoding work the
> way I want it to work?".  Let's leave the error handling for real errors.

Sounds good, but how does this help in finding a solution?

> Treating this as an error-handling issue was so counter-intuitive to me 
> that until recently I never bothered to read PEP 293. The title made me 
> think that it's completely irrelevant to my needs. After all, what I 
> wanted was to translate HTML to/from Unicode, not find a better way to 
> handle errors.

If you think this is a documentation issue - I'm fine with documenting
the feature differently.

Regards,
Martin


From mal@lemburg.com  Tue Aug  6 11:33:30 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 06 Aug 2002 12:33:30 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <7C1726B2-A922-11D6-BE0E-0030655234CE@cwi.nl>
Message-ID: <3D4FA5FA.6090200@lemburg.com>

Jack Jansen wrote:
> 
> On Tuesday, August 6, 2002, at 11:20 , Oren Tirosh wrote:
> 
>> Treating this as an error-handling issue was so counter-intuitive to me
>> that until recently I never bothered to read PEP 293. The title made me
>> think that it's completely irrelevant to my needs. After all, what I
>> wanted was to translate HTML to/from Unicode, not find a better way to
>> handle errors.
> 
> I think that this is really also the gist of my misgiving about the 
> design: enhancing a codec/adding extra filtering is a different thing 
> than error handling. The PEP uses "error handing" in the prose, but the 
> API is geared towards adding extra filtering.

That's a wrong impression. The new error handling API allows
you to do many different things base on the current position
of the codec in the input stream.

The fact that this can be used to apply escaping to otherwise
illegal mappings stems from the basics behind this new API. It's
an application, not its main purpose. Filtering can be had using
different techniques such as by stacking codecs as well.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From mwh@python.net  Tue Aug  6 12:06:45 2002
From: mwh@python.net (Michael Hudson)
Date: 06 Aug 2002 12:06:45 +0100
Subject: [Python-Dev] Simpler reformulation of C inheritance Q.
In-Reply-To: Christian Tismer's message of "Mon, 05 Aug 2002 12:43:37 +0200"
References: <3D4D17AB.9040704@tismer.com> <3D4E56D9.3090503@tismer.com>
Message-ID: <2mhei8gsai.fsf@starship.python.net>

Christian Tismer  writes:

> Hi Guido:
> 
> here a simpler formulation of my question:
> 
> I would like to create types with overridable methods.
> This is supported by the new type system.
> 
> But I'd also like to make this as fast as possible and
> therefore to avoid extra dictionary lookups for methods,
> especially if they are most likely not overridden.

I would wonder how much this saves.

How many more instructions does

  PyDict_GetItem(ob->ob_type->tp_dict, interned_string)

take than

  ob->ob_type->tp_my_field->mf_my_method

?  Sure, *some* but not all that many esp. if the called function is
actually doing significant work.

Of course, the first gets you a PyCFunctionObject* (or similar) not a
function pointer and that adds a layer of overhead.  In fact, this is
probably the greater source of overhead (you might have to box up the
arguments, allocate & deallocate the argument tuple, etc).

I doubt my opinion counts here, but I think I'd prefer to see *less*,
not more, methods in type object in future.  Particularly if there's
some way to call functions with known signatures efficiently.
Unfortunately, that seems pretty hard after five minutes thinking.

Cheers,
M.

-- 
  I wouldn't trust the Anglo-Saxons for much anything else.  Given
  they way English is spelled, who could trust them on _anything_ that
  had to do with writing things down, anyway?
                                        -- Erik Naggum, comp.lang.lisp


From guido@python.org  Tue Aug  6 12:39:49 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 06 Aug 2002 07:39:49 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Tue, 06 Aug 2002 12:31:33 +1200."
 <200208060031.g760VXu28532@oma.cosc.canterbury.ac.nz>
References: <200208060031.g760VXu28532@oma.cosc.canterbury.ac.nz>
Message-ID: <200208061139.g76Bdn525636@pcp02138704pcs.reston01.va.comcast.net>

[me]
> > But the satisfaction that spelling "has_key" as "in" gives me suggests
> > that there's more potential to it.

[GregE]
> I thought you'd always argued against this before, on
> the grounds that the convenience wasn't worth the
> inconsistency. Are you starting to change your mind?

Yes.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Aug  6 12:44:17 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 06 Aug 2002 07:44:17 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Mon, 05 Aug 2002 21:36:48 EDT."
 <3D4EEFF0.27145.40FDFEA8@localhost>
References: 
 <3D4EEFF0.27145.40FDFEA8@localhost>
Message-ID: <200208061144.g76BiHn25666@pcp02138704pcs.reston01.va.comcast.net>

> On 5 Aug 2002 at 16:50, Gordon McMillan wrote:
> 
> > What I'm really saying is that I almost never use x
> > in str because it's semantics have always been
> > peculiar. Thus, I don't *really* care whether '' in
> > str raises an exception, because if it does, I
> > won't train myself to use it . 
> 
> Turns out that's not true. When I want set membership,
> I first write "char in ('a', 'b', 'c')", then
> sometimes change it because "char in 'abc'" is more
> efficient.
> 
> So whether '' in 'abc' will work or not is a red
> herring. The real issue is that membership gets
> conflated with subsetting.

Well, in current Python you can only safely make that transformation
when you're damn sure that char is a string of length one, otherwise
you'd risk a TypeError.  So this code (if correct) will continue to
work, assuming you're not cathing TypeError (which is often an
assumption when we say that a new feature "won't break old code").

--Guido van Rossum (home page: http://www.python.org/~guido/)



From sholden@holdenweb.com  Tue Aug  6 12:56:21 2002
From: sholden@holdenweb.com (Steve Holden)
Date: Tue, 6 Aug 2002 07:56:21 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
References: <001f01c23d2b$a81a9e40$6d94fea9@newmexico>
Message-ID: <05d501c23d40$49043910$6300000a@holdenweb.com>

[Samuele Pedroni]
>
> [Greg Ewing]
> >I thought Guido was against having "in" do anything
> >other than membership tests, but his last message sounded
> >like he was changing his mind.
>
> If
>
> "thon" in "python"
>
> then why not
>
> [1,2] in [0,1,2,3]
>
> (it's a purely rhetorical question)
>
Which I also asked. But Guido pointed out htat [1, 2] may well be a member
of a list such as [0, [1, 2], [3, 4], 5].

> in general I don't think it is a good idea
> to have "in" be a membership vs subset/subseq
> operator depending on non ambiguity, convenience
> or simply implementer taste,
> because truly there are data types (ex. sets)
> that would need both and disambiguated.
>
Well, it looks like you lose!

> Either python grows a new subset/subseq operator
> but probably this is overkill (keyword issue, new
> __magic__ method, not meaningful, con
> venient for a lot of types)
>
> or strings (etc) should simply grow a new
> method with an appropriate name.
>
> "py"-in-"python"-is-dark-side-sexy-ly y'rs - Samuele Pedroni.
>
>
Consistency apparently loses out to pragmatism in this case.

regards
-----------------------------------------------------------------------
Steve Holden                                 http://www.holdenweb.com/
Python Web Programming                http://pydish.holdenweb.com/pwp/
-----------------------------------------------------------------------






From pedroni@inf.ethz.ch  Tue Aug  6 13:12:27 2002
From: pedroni@inf.ethz.ch (Samuele Pedroni)
Date: Tue, 6 Aug 2002 14:12:27 +0200
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
Message-ID: <006901c23d42$87b64340$6d94fea9@newmexico>

[Steve Holden]
>> (it's a purely rhetorical question)
>>
>Which I also asked. But Guido pointed out htat [1, 2] may well be a member
>of a list such as [0, [1, 2], [3, 4], 5].

which just reinfornces my point below,
anyway I knew that before, even
without Guido. You were not supposed
to answer a rethorical question anyway .
I have not read the entire
unbearably long thread.

>> in general I don't think it is a good idea
>> to have "in" be a membership vs subset/subseq
>> operator depending on non ambiguity, convenience
>> or simply implementer taste,
>> because truly there are data types (ex. sets)
>> that would need both and disambiguated.
>>
>Well, it looks like you lose!

I'm not taking this personally,
the problem one operator, two potential
semantics remains.

>Consistency apparently loses out to pragmatism in this case.

What do you want "in" to do for you today? .

That's my last input on the matter.

regards.

PS: these days I read python-dev through the archives,
it seems that this time I have added to redudance
department myself, oh well...



From guido@python.org  Tue Aug  6 13:30:35 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 06 Aug 2002 08:30:35 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Tue, 06 Aug 2002 11:28:42 +0200."
 <001f01c23d2b$a81a9e40$6d94fea9@newmexico>
References: <001f01c23d2b$a81a9e40$6d94fea9@newmexico>
Message-ID: <200208061230.g76CUZ025786@pcp02138704pcs.reston01.va.comcast.net>

[Samuele]
> If
> 
> "thon" in "python"
> 
> then why not
> 
> [1,2] in [0,1,2,3]
> 
> (it's a purely rhetorical question)
> 
> in general I don't think it is a good idea
> to have "in" be a membership vs subset/subseq
> operator depending on non ambiguity, convenience
> or simply implementer taste,
> because truly there are data types (ex. sets)
> that would need both and disambiguated.
> 
> Either python grows a new subset/subseq operator
> but probably this is overkill (keyword issue, new
> __magic__ method, not meaningful, con
> venient for a lot of types)
> 
> or strings (etc) should simply grow a new
> method with an appropriate name.

I recognize this as related to the argument that Ping was (still is?)
making against "for x in "; but not because the same
operator "in" is involved.

It has to do with polymorphism (functions that accept different types
of arguments; it's somewhat different from operator overloading).

Suppose we have an operator @.  (Take operator in a wide enough sense,
including other bits of grammar, like "for".)  If there's only one
type (or one narrow set or related types) for which @ makes sense,
human readers of a program will use @ as a clue about the type of the
arguments, and (if correct) that will help reasoning about the
expression in which it occurs.

ABC uses this property of operators to do type inference: if an ABC
expression contains "a+b", a and b must be numbers; and so on.

Python chose to allow operators to be overloaded by different types
with different meanings, and the language gives a+b a very different
meaning for numbers than for sequences, for example.  (And an
important invariant is lost in this example: for numbers, a+b == b+a,
but not so for sequences!)

Is this a problem?

The ease with which we get used to "key in dict" makes me think it is
not.  While Python doesn't require you to declare the types of your
arguments, the type (or set of allowed types) for arguments is usually
strongly known in the mind of the programmer, and most often strong
hints are given either by the choice of argument name or by
documentation.

While it's possible in theory, in practice nobody writes polymorphic
code that uses + and * on its arguments and yet accepts both numbers
and strings.

The reality is that some types are more related than others, and the
substitutability property only makes sense for types that are
sufficiently related.  We *do* write code that accepts any kind of
sequence, including strings.  We do *not* write code that accepts any
kind of container (sequence or mapping), even though some operations
apply to both kinds of container (len, a[b], and since 2.2, x in a).

In code that applies to all (or even just some) kinds of sequences,
the 'in' operator will continue to stand for membership.  This won't
cause a problem with strings: correct code using 'in' for membership
will never use seq1 in seq2, it will use item in seq, where the type
of item is "whatever the type of seq[0] is, if it exists."  When the
seq is a string, item will be a one-char string -- not a "type" in
Python's type system, but certainly a useful concept.

But there's also lots of code that deals only with strings.  This is
normally be completely clear to the casual reader: either because
string literals are used, compared, etc., or because values are
obtained from functions known to return strings (such as
file.readline()), or because methods unique to strings (e.g. s.lower()
are used, and so on.  Strings are very important in lots of programs,
and we want our notations for string operations to be readable and
expressive.  (Regular expressions are extreme in expressiveness, but
lack readability, which is why they're relegated to an imported module
in Python.)  Substring containment testing is a common operation on
strings, so being able to write it as 's1 in s2' rather than
's2.find(s1) >= 0' is a big win, IMO.


PS. Sets are a different case again.  They are containers but neither
sequences nor mappings (though depending on what you want to do they
can resemble either).  We will have to think about which operators
make sense for them.  I'd say that 'elem in set' is an appropriate way
to spell set membership; how to spell subset is a matter of discussion
(maybe 'set1 <= set2' is a good idea; maybe not).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Aug  6 13:31:40 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 06 Aug 2002 08:31:40 -0400
Subject: [Python-Dev] Re: framer tool
In-Reply-To: Your message of "Tue, 06 Aug 2002 11:46:22 +0200."
 <5D8C900C-A921-11D6-BE0E-0030655234CE@oratrix.com>
References: <5D8C900C-A921-11D6-BE0E-0030655234CE@oratrix.com>
Message-ID: <200208061231.g76CVe925799@pcp02138704pcs.reston01.va.comcast.net>

> It would be good if framer could grow similar 
> functionality (a GUI where you tap a couple of buttons to create objects 
> and methods, plus a couple of switches to select the protocols the 
> objects should adhere to) so we can lay modulator to rest.

But modulator is such a cool name!  Maybe that part of framer could be
called modulator 2, in honor of the original.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gmcm@hypernet.com  Tue Aug  6 13:37:32 2002
From: gmcm@hypernet.com (Gordon McMillan)
Date: Tue, 6 Aug 2002 08:37:32 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208061144.g76BiHn25666@pcp02138704pcs.reston01.va.comcast.net>
References: Your message of "Mon, 05 Aug 2002 21:36:48 EDT." <3D4EEFF0.27145.40FDFEA8@localhost>
Message-ID: <3D4F8ACC.18886.435AE7E6@localhost>

On 6 Aug 2002 at 7:44, Guido van Rossum wrote:

> > So whether '' in 'abc' will work or not is a red
> > herring. The real issue is that membership gets
> > conflated with subsetting.
> 
> Well, in current Python you can only safely make
> that transformation when you're damn sure that char
> is a string of length one, otherwise you'd risk a
> TypeError. So this code (if correct) will continue
> to work, assuming you're not cathing TypeError
> (which is often an assumption when we say that a new
> feature "won't break old code"). 

I agree that x in str meaning "subset of" is more
intuitive. I believe you are correct (at least most
old code will still work), but this one makes me
uneasy (I admit possibly because x in str has
always made me uneasy).

And finally, I vote that testing for subset should
work in the mathematically correct way (when
testing for the empty subset). This does not
affect your argument. (In fact, Tim is arguing to
have half[1] the code that catches TypeErrors
continue to work, while the other half doesn't.)

'Nuff said.

-- Gordon
http://www.mcmillan-inc.com/

[1]No, probably not by lines of code.


From pedroni@inf.ethz.ch  Tue Aug  6 13:54:50 2002
From: pedroni@inf.ethz.ch (Samuele Pedroni)
Date: Tue, 6 Aug 2002 14:54:50 +0200
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
References: <001f01c23d2b$a81a9e40$6d94fea9@newmexico>  <200208061230.g76CUZ025786@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <008901c23d48$73b86840$6d94fea9@newmexico>

Thanks for the detailed argument.

[GvR]
> In code that applies to all (or even just some) kinds of sequences,
> the 'in' operator will continue to stand for membership.  This won't
> cause a problem with strings: correct code using 'in' for membership
> will never use seq1 in seq2, it will use item in seq, where the type
> of item is "whatever the type of seq[0] is, if it exists."  When the
> seq is a string, item will be a one-char string -- not a "type" in
> Python's type system, but certainly a useful concept.
> 
> But there's also lots of code that deals only with strings.  This is
> normally be completely clear to the casual reader: either because
> string literals are used, compared, etc., or because values are
> obtained from functions known to return strings (such as
> file.readline()), or because methods unique to strings (e.g. s.lower()
> are used, and so on.  Strings are very important in lots of programs,
> and we want our notations for string operations to be readable and
> expressive.  (Regular expressions are extreme in expressiveness, but
> lack readability, which is why they're relegated to an imported module
> in Python.)  Substring containment testing is a common operation on
> strings, so being able to write it as 's1 in s2' rather than
> 's2.find(s1) >= 0' is a big win, IMO.
> 
> 

My only remark is that this opens the temptation for someone
to subclass say UserList and define "in" as subseq
because it is convenient for the application, for some
value of convenient. And write "seq1 in seq2".
One can generalize saying that it is OK for sequences
that are not full-fledged containers and in particular
do not accept (per contract) subseqs as elements.
All the subtle explanation shows that this is indeed a subtle
point.

Thanks again.

PS: is pure substring testing such a common idiom?
I have not found so many
matches for   find\(.*\)\s*>  in the std lib,
but maybe the re is not general enough or
the std lib is not typical in this respect. Or some
op error.




From ark@research.att.com  Tue Aug  6 15:01:56 2002
From: ark@research.att.com (Andrew Koenig)
Date: 06 Aug 2002 10:01:56 -0400
Subject: [Python-Dev] Dafanging the find() gotcha
In-Reply-To: 
References: 
Message-ID: 

Tim> I don't count that as "a practical fear" unless you actually
Tim> search for empty strings, and I don't believe that you do (or at
Tim> least not on purpose -- you can change my mind in a hurry by
Tim> posting your Python code that does do so, though!).

A hypothetical example for you.

Imagine an interactive program that rummages through a pile of
files to find files with particular properties.  Such a program
might allow one to request a search by presenting a form to fill
out.  Suppose that form has a fragment that looks like this:

                                                +-------------------+
     Search only files containing this string:  |                   |
                                                +-------------------+

If the user doesn't type anything into this part of the form, we would
like the search to cover all files.

If (s in t) yields true whenever s is null, this example just works.
Otherwise, the code needs a special case.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From barry@python.org  Tue Aug  6 15:16:07 2002
From: barry@python.org (Barry A. Warsaw)
Date: Tue, 6 Aug 2002 10:16:07 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
References: <001f01c23d2b$a81a9e40$6d94fea9@newmexico>
 <200208061230.g76CUZ025786@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <15695.55847.503943.847004@anthem.wooz.org>

Great analysis Guido, thanks.

    GvR> Strings are very important in lots of programs, and we want
    GvR> our notations for string operations to be readable and
    GvR> expressive.  (Regular expressions are extreme in
    GvR> expressiveness, but lack readability, which is why they're
    GvR> relegated to an imported module in Python.)  Substring
    GvR> containment testing is a common operation on strings, so
    GvR> being able to write it as 's1 in s2' rather than 's2.find(s1)
    GvR> >= 0' is a big win, IMO.

I agree completely.  The other thing about strings is that they are of
a dual nature, being both a sequence of characters, and an atomic
object.  At least, /I/ usually think about strings as whole units,
except when I want to slice and dice them.  And "substr in str" is
just such a natural extension of "char in str" because when I do the
former, I'm still thinking about looking for a substring, just one of
a single character in length.

-Barry


From guido@python.org  Tue Aug  6 15:34:34 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 06 Aug 2002 10:34:34 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Tue, 06 Aug 2002 14:54:50 +0200."
 <008901c23d48$73b86840$6d94fea9@newmexico>
References: <001f01c23d2b$a81a9e40$6d94fea9@newmexico> <200208061230.g76CUZ025786@pcp02138704pcs.reston01.va.comcast.net>
 <008901c23d48$73b86840$6d94fea9@newmexico>
Message-ID: <200208061434.g76EYYx01789@odiug.zope.com>

> My only remark is that this opens the temptation for someone
> to subclass say UserList and define "in" as subseq
> because it is convenient for the application, for some
> value of convenient. And write "seq1 in seq2".

Yeah, once you allow overloading, you can't prevent abuse.  I've heard
of bad C++ programmers who write A+B meaning an assignment to A.

> One can generalize saying that it is OK for sequences
> that are not full-fledged containers and in particular
> do not accept (per contract) subseqs as elements.

In the context of a particular application it can be very useful and
completely unambiguous.

> All the subtle explanation shows that this is indeed a subtle
> point.

Yes!

> Thanks again.

You're welcome.  And thanks for your question -- it made me see this
issue in a different light (the correct one :-).

> PS: is pure substring testing such a common idiom?
> I have not found so many
> matches for   find\(.*\)\s*>  in the std lib,
> but maybe the re is not general enough or
> the std lib is not typical in this respect. Or some
> op error.

The std lib is probably low on string processing ops compared to many
real apps.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com  Tue Aug  6 15:47:57 2002
From: skip@pobox.com (Skip Montanaro)
Date: Tue, 6 Aug 2002 09:47:57 -0500
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <001f01c23d2b$a81a9e40$6d94fea9@newmexico>
References: <001f01c23d2b$a81a9e40$6d94fea9@newmexico>
Message-ID: <15695.57757.655927.698893@localhost.localdomain>

    Samuele> If

    Samuele> "thon" in "python"

    Samuele> then why not

    Samuele> [1,2] in [0,1,2,3]

    Samuele> (it's a purely rhetorical question)

    Samuele> in general I don't think it is a good idea to have "in" be a
    Samuele> membership vs subset/subseq operator depending on non
    Samuele> ambiguity, convenience or simply implementer taste, because
    Samuele> truly there are data types (ex. sets) that would need both and
    Samuele> disambiguated.

Perhaps it makes sense to allow "'thon' in 'python'" to return True, but
still have "[1,2] in [0,1,2,3]" return False if we loosen the steadfast
requirement that strings and lists be as much alike as possible.  That is,
while both are sequences, we take advantage of the distinction between their
basic structures (sequence of characters vs. sequeunce of arbitrary
objects).

Skip


From mcherm@destiny.com  Tue Aug  6 15:46:52 2002
From: mcherm@destiny.com (Michael Chermside)
Date: Tue, 06 Aug 2002 10:46:52 -0400
Subject: [Python-Dev] Re: Dafanging the find() gotcha
Message-ID: <3D4FE15C.6060302@destiny.com>

Tim> I don't count that as "a practical fear" unless you actually
Tim> search for empty strings, and I don't believe that you do (or at
Tim> least not on purpose -- you can change my mind in a hurry by
Tim> posting your Python code that does do so, though!).

Andrew> A hypothetical example for you.
Andrew>
Andrew> Imagine an interactive program that [...] present[s] a form to 
Andrew> fill out [which] looks like this:
Andrew>
Andrew>     Search only files containing this string:
Andrew> If the user doesn't type anything into this part of the form, we 
Andrew> would like the search to cover all files.

I think this is an extremely unconvincing example. You have pushed the 
API up to the user of a program and supposed that they expect the 
behavior which you are trying to defend. In practice, what users expect 
in cases where a field is left blank is for that field to be IGNORED, 
not for it to be processed, but its contents treated as containing an 
empty string.

If you had an algorithm which worked on strings generally but only if 
the null string behavior was as desired, that would be convincing. But 
saying that the user might expect this behavior seems a poor argument... 
user's expectations are usually Do-What-I-Mean, not Do-It-Right. 
Programming languages, though, work better when designed to Do-It-Right.

Perl-being-the-exception-that-proves-the-rule -lly yours,

-- Michael Chermside



From guido@python.org  Tue Aug  6 15:59:28 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 06 Aug 2002 10:59:28 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Tue, 06 Aug 2002 09:47:57 CDT."
 <15695.57757.655927.698893@localhost.localdomain>
References: <001f01c23d2b$a81a9e40$6d94fea9@newmexico>
 <15695.57757.655927.698893@localhost.localdomain>
Message-ID: <200208061459.g76ExS109232@odiug.zope.com>

> Perhaps it makes sense to allow "'thon' in 'python'" to return True,
> but still have "[1,2] in [0,1,2,3]" return False if we loosen the
> steadfast requirement that strings and lists be as much alike as
> possible.

That was never a requirement.  Strings and lists are merely similar
insofar as they have very similar needs for a slicing and subscripting
notation, and to a lesser extent for concatenation, repetition and
comparison.

Note that the sets of methods supported are almost entirely distinct
(only count and index are shared).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Aug  6 16:19:18 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 06 Aug 2002 11:19:18 -0400
Subject: [Python-Dev] Simpler reformulation of C inheritance Q.
In-Reply-To: Your message of "Tue, 06 Aug 2002 11:51:38 +0200."
 <3D4F9C2A.2040704@tismer.com>
References: <200208060424.g764Oqa29771@oma.cosc.canterbury.ac.nz> <022a01c23d09$65999840$62f0fc0c@boostconsulting.com>
 <3D4F9C2A.2040704@tismer.com>
Message-ID: <200208061519.g76FJIp13244@odiug.zope.com>

[David A]
> > It's not too terrible, but I'd like it a lot better if types would
> > just use tp_basicsize to find the beginning of the variable stuff
> > so we could embed the memory in the type itself. 'Course, I've
> > forgotten more than I knew about that code, so I might be barking
> > up the wrong banyan.

[Chris T]
> That's exactly what I want to do, but I have to find
> out how the variable part of types is used at the moment,
> and I admit I didn't understand it, yet.
> 
> The place where user stuff should go is where instances
> have their slots. With meta-types, it now happens that
> types become instances, but types refuse to have slots.
> This needs to be changed, everything else is a workaround.

You're right.  And David has the right idea.  The problem is that for
convenience I defined the variable part of a type object as a private
structure (etype).  It's a lot of work to change that -- not very deep
perhaps, but a lot of refactoring code that does deep things.

To remind myself of this task, I've added a new SF bug:
python.org/sf/591586

--Guido van Rossum (home page: http://www.python.org/~guido/)


From ark@research.att.com  Tue Aug  6 16:26:31 2002
From: ark@research.att.com (Andrew Koenig)
Date: 06 Aug 2002 11:26:31 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208061434.g76EYYx01789@odiug.zope.com>
References: <001f01c23d2b$a81a9e40$6d94fea9@newmexico>
 <200208061230.g76CUZ025786@pcp02138704pcs.reston01.va.comcast.net>
 <008901c23d48$73b86840$6d94fea9@newmexico>
 <200208061434.g76EYYx01789@odiug.zope.com>
Message-ID: 

Guido> Yeah, once you allow overloading, you can't prevent abuse.  I've heard
Guido> of bad C++ programmers who write A+B meaning an assignment to A.

That kind of thing is uncommon, partly because it can't be done for
built-in types.  Such practices are widely derided in the C++
community, too.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From ark@research.att.com  Tue Aug  6 16:28:57 2002
From: ark@research.att.com (Andrew Koenig)
Date: 06 Aug 2002 11:28:57 -0400
Subject: [Python-Dev] Re: Dafanging the find() gotcha
In-Reply-To: <3D4FE15C.6060302@destiny.com>
References: <3D4FE15C.6060302@destiny.com>
Message-ID: 

Michael> I think this is an extremely unconvincing example. You have
Michael> pushed the API up to the user of a program and supposed that
Michael> they expect the behavior which you are trying to defend. In
Michael> practice, what users expect in cases where a field is left
Michael> blank is for that field to be IGNORED, not for it to be
Michael> processed, but its contents treated as containing an empty
Michael> string.

I understand.  My point is that in this particular example, what the
user perceives as ignoring the request is obtained by the
implementation technique of treating it as an empty string.  The user
doesn't have to know about this implementation technique, of course.


-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From guido@python.org  Tue Aug  6 16:31:45 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 06 Aug 2002 11:31:45 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Tue, 06 Aug 2002 11:26:31 EDT."
 
References: <001f01c23d2b$a81a9e40$6d94fea9@newmexico> <200208061230.g76CUZ025786@pcp02138704pcs.reston01.va.comcast.net> <008901c23d48$73b86840$6d94fea9@newmexico> <200208061434.g76EYYx01789@odiug.zope.com>
 
Message-ID: <200208061531.g76FVjR13306@odiug.zope.com>

> Guido> Yeah, once you allow overloading, you can't prevent abuse.  I've heard
> Guido> of bad C++ programmers who write A+B meaning an assignment to A.
> 
> That kind of thing is uncommon, partly because it can't be done for
> built-in types.  Such practices are widely derided in the C++
> community, too.

And that's exactly my answer in the Python case, too.  You can't
prevent people from writing bad code, but you can make them look
foolish. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Aug  6 16:36:13 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 06 Aug 2002 11:36:13 -0400
Subject: [Python-Dev] Re: Dafanging the find() gotcha
In-Reply-To: Your message of "Tue, 06 Aug 2002 11:28:57 EDT."
 
References: <3D4FE15C.6060302@destiny.com>
 
Message-ID: <200208061536.g76FaDc13376@odiug.zope.com>

> Michael> I think this is an extremely unconvincing example. You have
> Michael> pushed the API up to the user of a program and supposed that
> Michael> they expect the behavior which you are trying to defend. In
> Michael> practice, what users expect in cases where a field is left
> Michael> blank is for that field to be IGNORED, not for it to be
> Michael> processed, but its contents treated as containing an empty
> Michael> string.
> 
> I understand.  My point is that in this particular example, what the
> user perceives as ignoring the request is obtained by the
> implementation technique of treating it as an empty string.  The user
> doesn't have to know about this implementation technique, of course.

I think it's a poor implementation technique. :-)  Opening the file to
search for an empty string is very inefficient.

My own potential example was some kind of graph traversal algorithm,
representing paths by sequences of letters (the letters labeling
edges), and involving paths that are subpaths of other paths.
Certainly the empty path should be considered a valid subpath of other
paths.

BTW, a more fool-proof (though unfortunately slower) way of testing
for substring containment in existing Python would be s2.count(s1) --
this returns the number of occurrences.  And of course,
'abc'.count('') returns 4.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From ark@research.att.com  Tue Aug  6 16:38:21 2002
From: ark@research.att.com (Andrew Koenig)
Date: Tue, 6 Aug 2002 11:38:21 -0400 (EDT)
Subject: [Python-Dev] Re: Dafanging the find() gotcha
In-Reply-To: <200208061536.g76FaDc13376@odiug.zope.com> (message from Guido
 van Rossum on Tue, 06 Aug 2002 11:36:13 -0400)
References: <3D4FE15C.6060302@destiny.com>
  <200208061536.g76FaDc13376@odiug.zope.com>
Message-ID: <200208061538.g76FcLT14559@europa.research.att.com>

>> I understand.  My point is that in this particular example, what the
>> user perceives as ignoring the request is obtained by the
>> implementation technique of treating it as an empty string.  The user
>> doesn't have to know about this implementation technique, of course.

Guido> I think it's a poor implementation technique. :-)  Opening the file to
Guido> search for an empty string is very inefficient.

I'm assuming that the file is going to be opened anyway, possibly
to check for other search criteria.

Guido> My own potential example was some kind of graph traversal algorithm,
Guido> representing paths by sequences of letters (the letters labeling
Guido> edges), and involving paths that are subpaths of other paths.
Guido> Certainly the empty path should be considered a valid subpath of other
Guido> paths.

I can imagine similar applications that deal with file names

Guido> BTW, a more fool-proof (though unfortunately slower) way of testing
Guido> for substring containment in existing Python would be s2.count(s1) --
Guido> this returns the number of occurrences.  And of course,
Guido> 'abc'.count('') returns 4.

That could be much slower, of course.


Incidentally, one other argument that might be relevant is that in
every other programming language I've ever seen that supports string
searching, the null string is accepted as a search argument and is
always found.




From guido@python.org  Tue Aug  6 16:42:37 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 06 Aug 2002 11:42:37 -0400
Subject: [Python-Dev] Re: Dafanging the find() gotcha
In-Reply-To: Your message of "Tue, 06 Aug 2002 11:38:21 EDT."
 <200208061538.g76FcLT14559@europa.research.att.com>
References: <3D4FE15C.6060302@destiny.com>  <200208061536.g76FaDc13376@odiug.zope.com>
 <200208061538.g76FcLT14559@europa.research.att.com>
Message-ID: <200208061542.g76Fgb913553@odiug.zope.com>

> Incidentally, one other argument that might be relevant is that in
> every other programming language I've ever seen that supports string
> searching, the null string is accepted as a search argument and is
> always found.

Same for Python, until now.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tismer@tismer.com  Tue Aug  6 16:58:43 2002
From: tismer@tismer.com (Christian Tismer)
Date: Tue, 06 Aug 2002 17:58:43 +0200
Subject: [Python-Dev] Simpler reformulation of C inheritance Q.
References: <3D4D17AB.9040704@tismer.com> <3D4E56D9.3090503@tismer.com> <2mhei8gsai.fsf@starship.python.net>
Message-ID: <3D4FF233.5000100@tismer.com>

Michael Hudson wrote:

[slots in types]

> I would wonder how much this saves.
> 
> How many more instructions does
> 
>   PyDict_GetItem(ob->ob_type->tp_dict, interned_string)
> 
> take than
> 
>   ob->ob_type->tp_my_field->mf_my_method
> 
> ?  Sure, *some* but not all that many esp. if the called function is
> actually doing significant work.

The comparison doesn't hit the nail (as you explain
as well), since what I do right now is to call
a highly optimized C function, directly, and the
speed concerns are mainly for my C API, which is
supposed to be much faster then the Python interface.

Having to call anything but my builtin stuff hurts.
So I want at least to 'know' that my function is
not overridden, and be able to call the builtin stuff.
Doing the call all the time via

ob->ob_type->tp_my_field->mf_my_method

would be nice, but I'd even be pleased with some flag.
But there is no space for nothing in a type.

Second, this is most time critical code, since my
tasklet switching is now very fast (half the time
of a function call from Python) for my CFrames.
And now people ask for overriding there, which hurts
me most possible. I will either find the solution,
or leave it as it is and ask C programmers to
"grab the thing if you want the overridden method".

...

> I doubt my opinion counts here, but I think I'd prefer to see *less*,
> not more, methods in type object in future.  Particularly if there's
> some way to call functions with known signatures efficiently.
> Unfortunately, that seems pretty hard after five minutes thinking.

I'm not going to introduces masses of new methods for
type objects, but a generic way to introduce private
stuff.

not-easy-to-stop-me-anyway-at-all-ly y'rs -- chris

-- 
Christian Tismer             :^)   
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/




From David Abrahams"                 <200208061536.g76FaDc13376@odiug.zope.com>
Message-ID: <03a701c23d62$41eaa020$62f0fc0c@boostconsulting.com>

A relevant discussion came up recently on the boost list. In our regex
library there's a function which tells you whether a string was a partial
match to the beginning of a pattern. The example used in the docs was a
credit-card number validator which watches what you type and beeps at you
if there's a mistake. Unfortunately, the implementation would return false
if the input string was empty. Of course that required special-casing for
the empty string. Eventually complaints from users caused the library
maintainer to change his mind about the response to the empty string.

http://aspn.activestate.com/ASPN/search?query=another+regex+partial+match+b
ug&type=Archive_boost_list&x=0&y=0

FWIW-ly yr's,
Dave

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com


----- Original Message -----
From: "Guido van Rossum" 
To: "Andrew Koenig" 
Cc: "Michael Chermside" ; "python-dev"

Sent: Tuesday, August 06, 2002 11:36 AM
Subject: Re: [Python-Dev] Re: Dafanging the find() gotcha


> > Michael> I think this is an extremely unconvincing example. You have
> > Michael> pushed the API up to the user of a program and supposed that
> > Michael> they expect the behavior which you are trying to defend. In
> > Michael> practice, what users expect in cases where a field is left
> > Michael> blank is for that field to be IGNORED, not for it to be
> > Michael> processed, but its contents treated as containing an empty
> > Michael> string.
> >
> > I understand.  My point is that in this particular example, what the
> > user perceives as ignoring the request is obtained by the
> > implementation technique of treating it as an empty string.  The user
> > doesn't have to know about this implementation technique, of course.
>
> I think it's a poor implementation technique. :-)  Opening the file to
> search for an empty string is very inefficient.
>
> My own potential example was some kind of graph traversal algorithm,
> representing paths by sequences of letters (the letters labeling
> edges), and involving paths that are subpaths of other paths.
> Certainly the empty path should be considered a valid subpath of other
> paths.
>
> BTW, a more fool-proof (though unfortunately slower) way of testing
> for substring containment in existing Python would be s2.count(s1) --
> this returns the number of occurrences.  And of course,
> 'abc'.count('') returns 4.
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
>



From esr@thyrsus.com  Tue Aug  6 17:29:21 2002
From: esr@thyrsus.com (Eric S Raymond)
Date: Tue, 6 Aug 2002 12:29:21 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208061434.g76EYYx01789@odiug.zope.com>
References: <001f01c23d2b$a81a9e40$6d94fea9@newmexico> <200208061230.g76CUZ025786@pcp02138704pcs.reston01.va.comcast.net> <008901c23d48$73b86840$6d94fea9@newmexico> <200208061434.g76EYYx01789@odiug.zope.com>
Message-ID: <20020806162921.GB6683@thyrsus.com>

Guido van Rossum :
> The std lib is probably low on string processing ops compared to many
> real apps.

Yes, it is.  I've noticed this myself.
-- 
		Eric S. Raymond


From guido@python.org  Tue Aug  6 18:09:15 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 06 Aug 2002 13:09:15 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Tue, 06 Aug 2002 10:16:07 EDT."
 <15695.55847.503943.847004@anthem.wooz.org>
References: <001f01c23d2b$a81a9e40$6d94fea9@newmexico> <200208061230.g76CUZ025786@pcp02138704pcs.reston01.va.comcast.net>
 <15695.55847.503943.847004@anthem.wooz.org>
Message-ID: <200208061709.g76H9FD19499@odiug.zope.com>

I think we've argued about '' in 'abc' long enough.  Tim has failed to
convince me, so '' in 'abc' returns True.  Barry has checked it all
in.

(In other news, I've checked in Oren's latest patch for making a file
its own iterator.  In the process, the xreadlines module has become
deprecated.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de  Tue Aug  6 20:35:50 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 06 Aug 2002 21:35:50 +0200
Subject: [Python-Dev] The memo of pickle
Message-ID: 

pickle currently puts tuples into the memo on pickling, but only ever
uses the position field ([0]), never the object itself ([1]).

I understand that the reference to the object is needed to keep it
alive while pickling.

Unfortunately, this means one needs to allocate 36 bytes for the
tuple.

I think this memory consumption could be reduced by saving the objects
in a list, and only saving the position in the memo dictionary. That
would save roughly 32 bytes per memoized object, assuming there is no
malloc overhead.

What do you think?

Regards,
Martin

P.S. It would be even more efficient if there was an identity
dictionary.


From guido@python.org  Tue Aug  6 20:49:03 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 06 Aug 2002 15:49:03 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: Your message of "Tue, 06 Aug 2002 21:35:50 +0200."
 
References: 
Message-ID: <200208061949.g76Jn3n20783@odiug.zope.com>

> pickle currently puts tuples into the memo on pickling, but only ever
> uses the position field ([0]), never the object itself ([1]).
> 
> I understand that the reference to the object is needed to keep it
> alive while pickling.
> 
> Unfortunately, this means one needs to allocate 36 bytes for the
> tuple.
> 
> I think this memory consumption could be reduced by saving the objects
> in a list, and only saving the position in the memo dictionary. That
> would save roughly 32 bytes per memoized object, assuming there is no
> malloc overhead.
> 
> What do you think?

Is it worth it?  Have you made a patch?  What use case are you
thinking of?

> Regards,
> Martin
> 
> P.S. It would be even more efficient if there was an identity
> dictionary.

Sorry, what's an identity dict?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Tue Aug  6 21:31:41 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 06 Aug 2002 22:31:41 +0200
Subject: [Python-Dev] The memo of pickle
References: 
Message-ID: <3D50322D.1070107@lemburg.com>

Martin v. Loewis wrote:
> pickle currently puts tuples into the memo on pickling, but only ever
> uses the position field ([0]), never the object itself ([1]).
> 
> I understand that the reference to the object is needed to keep it
> alive while pickling.
> 
> Unfortunately, this means one needs to allocate 36 bytes for the
> tuple.
> 
> I think this memory consumption could be reduced by saving the objects
> in a list, and only saving the position in the memo dictionary. That
> would save roughly 32 bytes per memoized object, assuming there is no
> malloc overhead.

While that may save you some bytes, wouldn't it break pickle
subclasses using the memo as well ?

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From martin@v.loewis.de  Tue Aug  6 22:21:37 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 06 Aug 2002 23:21:37 +0200
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <3D50322D.1070107@lemburg.com>
References: 
 <3D50322D.1070107@lemburg.com>
Message-ID: 

"M.-A. Lemburg"  writes:

> While that may save you some bytes, wouldn't it break pickle
> subclasses using the memo as well ?

Yes. Are there such things?

Regards,
Martin


From mal@lemburg.com  Tue Aug  6 22:35:13 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 06 Aug 2002 23:35:13 +0200
Subject: [Python-Dev] The memo of pickle
References: 	<3D50322D.1070107@lemburg.com> 
Message-ID: <3D504111.3080104@lemburg.com>

Martin v. Loewis wrote:
> "M.-A. Lemburg"  writes:
> 
> 
>>While that may save you some bytes, wouldn't it break pickle
>>subclasses using the memo as well ?
> 
> 
> Yes. Are there such things?

Sure. I use pickle subclasses with hooks for various special
object types a lot in my applications... would be nice if
I could start subclassing cPickles sometime in the future :-)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From Jack.Jansen@oratrix.com  Tue Aug  6 22:49:12 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Tue, 6 Aug 2002 23:49:12 +0200
Subject: [Python-Dev] cvs crash while updating Doc
Message-ID: <5877597A-A986-11D6-8D1F-003065517236@oratrix.com>

Folks,
I've stared at this for a while now, and I'm out of good ideas, 
so if anyone has any idea how to debug this please let me know.

As of recently I can't do a full checkout of the Python sources 
anymore *with MacCVS Pro on Mac OS 9*. (distant rumbling and 
cursing of MacCVS Pro is heard)

What happens is that the cvs *server* aborts with a signal 11 
while trying to check out Doc/pyexpat.tex. Of course, if I try 
with a different CVS client the server happily checks the file 
out, otherwise I wouldn't be bothering you. And I inspected the 
last few revisions of pyexpat.tex and there's no obvious changes 
that I can imagine would blow up a cvs server.

I can get rid of MacCVS Pro and switch back to the 
much-more-pro-in-my-mind MacCVS (as it supports ssh nowadays, 
finally) but that'll be a hassle, so if anyone has any bright 
ideas please fire away!
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -



From martin@v.loewis.de  Tue Aug  6 22:52:22 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 06 Aug 2002 23:52:22 +0200
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <3D504111.3080104@lemburg.com>
References: 
 <3D50322D.1070107@lemburg.com>
 
 <3D504111.3080104@lemburg.com>
Message-ID: 

"M.-A. Lemburg"  writes:

> Sure. I use pickle subclasses with hooks for various special
> object types a lot in my applications... 

Can you provide the source of one such subclass?

TIA,
Martin


From martin@v.loewis.de  Tue Aug  6 23:07:01 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 07 Aug 2002 00:07:01 +0200
Subject: [Python-Dev] cvs crash while updating Doc
In-Reply-To: <5877597A-A986-11D6-8D1F-003065517236@oratrix.com>
References: <5877597A-A986-11D6-8D1F-003065517236@oratrix.com>
Message-ID: 

Jack Jansen  writes:

> What happens is that the cvs *server* aborts with a signal 11 while
> trying to check out Doc/pyexpat.tex. Of course, if I try with a
> different CVS client the server happily checks the file out, otherwise
> I wouldn't be bothering you. And I inspected the last few revisions of
> pyexpat.tex and there's no obvious changes that I can imagine would
> blow up a cvs server.

If you want to investigate this in detail, you can download the CVS
archive from SF, and try to replicate the problem locally.

Regards,
Martin


From mal@lemburg.com  Tue Aug  6 23:20:20 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 07 Aug 2002 00:20:20 +0200
Subject: [Python-Dev] The memo of pickle
References: 	<3D50322D.1070107@lemburg.com>		<3D504111.3080104@lemburg.com> 
Message-ID: <3D504BA4.7080409@lemburg.com>

Martin v. Loewis wrote:
> "M.-A. Lemburg"  writes:
> 
> 
>>Sure. I use pickle subclasses with hooks for various special
>>object types a lot in my applications... 
> 
> 
> Can you provide the source of one such subclass?

No, they are closed-source. But the idea should be obvious:
I want to pickle the various mx types faster then by
relying on the reduce mechanism.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From martin@v.loewis.de  Tue Aug  6 23:30:18 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 07 Aug 2002 00:30:18 +0200
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <3D504BA4.7080409@lemburg.com>
References: 
 <3D50322D.1070107@lemburg.com>
 
 <3D504111.3080104@lemburg.com>
 
 <3D504BA4.7080409@lemburg.com>
Message-ID: 

"M.-A. Lemburg"  writes:

> >>Sure. I use pickle subclasses with hooks for various special
> >> object types a lot in my applications...
> > Can you provide the source of one such subclass?
> 
> No, they are closed-source. But the idea should be obvious:
> I want to pickle the various mx types faster then by
> relying on the reduce mechanism.

Ok. I think I could making this change without breaking your code:
Subclasses won't read the memo; they will only write to it -
Pickler.save is the only place that ever reads the memo.

So subclasses could safely put tuples into the dictionary; the base
class would then look for either tuples or numbers.

Regards,
Martin



From sholden@holdenweb.com  Tue Aug  6 23:36:52 2002
From: sholden@holdenweb.com (Steve Holden)
Date: Tue, 6 Aug 2002 18:36:52 -0400
Subject: [Python-Dev] cvs crash while updating Doc
References: <5877597A-A986-11D6-8D1F-003065517236@oratrix.com>
Message-ID: <012f01c23d99$c37d3ad0$6300000a@holdenweb.com>

----- Original Message -----
From: "Jack Jansen" 
To: 
Sent: Tuesday, August 06, 2002 5:49 PM
Subject: [Python-Dev] cvs crash while updating Doc


> Folks,
> I've stared at this for a while now, and I'm out of good ideas,
> so if anyone has any idea how to debug this please let me know.
>
> As of recently I can't do a full checkout of the Python sources
> anymore *with MacCVS Pro on Mac OS 9*. (distant rumbling and
> cursing of MacCVS Pro is heard)
>
> What happens is that the cvs *server* aborts with a signal 11
> while trying to check out Doc/pyexpat.tex. Of course, if I try
> with a different CVS client the server happily checks the file
> out, otherwise I wouldn't be bothering you. And I inspected the
> last few revisions of pyexpat.tex and there's no obvious changes
> that I can imagine would blow up a cvs server.
>
> I can get rid of MacCVS Pro and switch back to the
> much-more-pro-in-my-mind MacCVS (as it supports ssh nowadays,
> finally) but that'll be a hassle, so if anyone has any bright
> ideas please fire away!

I wonder if this could be the reason I'm currently seeing

cvs server: [15:36:32] waiting for jackjansen's lock in
/cvsroot/python/python/dist/src/Doc/lib
cvs server: [15:37:02] waiting for jackjansen's lock in
/cvsroot/python/python/dist/src/Doc/lib
cvs server: [15:37:32] waiting for jackjansen's lock in
/cvsroot/python/python/dist/src/Doc/lib

as I try and check a small change in to the library documentation? Looks
like I'll have to try later.

regards
-----------------------------------------------------------------------
Steve Holden                                 http://www.holdenweb.com/
Python Web Programming                http://pydish.holdenweb.com/pwp/
-----------------------------------------------------------------------






From greg@cosc.canterbury.ac.nz  Wed Aug  7 03:03:25 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 07 Aug 2002 14:03:25 +1200 (NZST)
Subject: [Python-Dev] Simpler reformulation of C inheritance Q.
In-Reply-To: <3D4F9C2A.2040704@tismer.com>
Message-ID: <200208070203.g7723PN07448@oma.cosc.canterbury.ac.nz>

Christian Tismer :

> The reason why I want to have extra data and function
> caches in the types is that this is *very* memory
> efficient, in comparison to stuffing things into the
> instances (which would be easy to implement).

Maybe you misunderstood -- the stuff I was talking
about *would* go in the type, not in the instances.
I was suggesting a generalisation of the way the
type object keeps some of its slots in extra
structures, and allowing you to add more such
structures.

> With meta-types, it now happens that
> types become instances, but types refuse to have slots.
> This needs to be changed, everything else is a workaround.

Yes, that would be more elegant, if it could be done.
I haven't looked closely enough at exactly why types
can't have slots to know how difficult it would be.
Maybe it's not difficult, in which case my suggestion
is unnecessary.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Wed Aug  7 03:08:43 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 07 Aug 2002 14:08:43 +1200 (NZST)
Subject: [Python-Dev] Simpler reformulation of C inheritance Q.
In-Reply-To: <3D4F9EA6.9010802@tismer.com>
Message-ID: <200208070208.g7728hk07536@oma.cosc.canterbury.ac.nz>

Christian Tismer :

> I'm not absolutely sure what I meant.

It sounds to me like Christian wants to be able to extend
the typeobject with new built-in method slots. Ideally
these would behave just like the existing ones, to the
extent of PyType_Ready generating Python wrappers for
them automatically.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Wed Aug  7 03:27:04 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 07 Aug 2002 14:27:04 +1200 (NZST)
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <008901c23d48$73b86840$6d94fea9@newmexico>
Message-ID: <200208070227.g772R4i07678@oma.cosc.canterbury.ac.nz>

> PS: is pure substring testing such a common idiom?
> I have not found so many
> matches for   find\(.*\)\s*>  in the std lib

For more generality, maybe

  re in string

should be made to work too, where re is a regular
expression object?

Or would that be starting on a slippery slope
towards Perl...?-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From jepler@unpythonic.net  Wed Aug  7 03:57:05 2002
From: jepler@unpythonic.net (jepler@unpythonic.net)
Date: Tue, 6 Aug 2002 21:57:05 -0500
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208070227.g772R4i07678@oma.cosc.canterbury.ac.nz>
References: <008901c23d48$73b86840$6d94fea9@newmexico> <200208070227.g772R4i07678@oma.cosc.canterbury.ac.nz>
Message-ID: <20020806215659.A988@unpythonic.net>

On Wed, Aug 07, 2002 at 02:27:04PM +1200, Greg Ewing wrote:
> > PS: is pure substring testing such a common idiom?
> > I have not found so many
> > matches for   find\(.*\)\s*>  in the std lib
> 
> For more generality, maybe
> 
>   re in string
> 
> should be made to work too, where re is a regular
> expression object?

Surely the re is the thing that expresses a set of strings ...
    string in re
would be the same as
    re.match(string)

Oh, so
    re in string
would be
    re.search(string)
right?

Clear to me!

> Or would that be starting on a slippery slope
> towards Perl...?-)

Nah.

Jeff


From greg@cosc.canterbury.ac.nz  Wed Aug  7 03:56:47 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 07 Aug 2002 14:56:47 +1200 (NZST)
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <20020806215659.A988@unpythonic.net>
Message-ID: <200208070256.g772ulB07791@oma.cosc.canterbury.ac.nz>

jepler@unpythonic.net:

> Surely the re is the thing that expresses a set of strings ...
>     string in re
> would be the same as
>     re.match(string)
> 
> Oh, so
>     re in string
> would be
>     re.search(string)
> right?

That distinction might just be a tad too subtle...

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Wed Aug  7 03:58:10 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 07 Aug 2002 14:58:10 +1200 (NZST)
Subject: [Python-Dev] The memo of pickle
In-Reply-To: 
Message-ID: <200208070258.g772wAN07798@oma.cosc.canterbury.ac.nz>

> I think this memory consumption could be reduced by saving the objects
> in a list, and only saving the position in the memo dictionary.

Do you need the list at all? Won't the object be kept
alive by the fact that it's a key in the dictionary?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From tim_one@email.msn.com  Wed Aug  7 04:31:04 2002
From: tim_one@email.msn.com (Tim Peters)
Date: Tue, 6 Aug 2002 23:31:04 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208070256.g772ulB07791@oma.cosc.canterbury.ac.nz>
Message-ID: 

    re1 in re2

should be True iff the language accepted by re1 is a subset of the language
accepted by re2.  In this case, it's OK to consider the empty language a
subset of all others, since nobody will be able to make head or tail out of
the code anyway.

flexible-to-a-fault-ly y'rs  - tim



From tim_one@email.msn.com  Wed Aug  7 04:32:26 2002
From: tim_one@email.msn.com (Tim Peters)
Date: Tue, 6 Aug 2002 23:32:26 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208070258.g772wAN07798@oma.cosc.canterbury.ac.nz>
Message-ID: 

[Greg Ewing]
> Do you need the list at all? Won't the object be kept
> alive by the fact that it's a key in the dictionary?

The object's id() (address) is the key.  Else only hashable objects could be
pickled.



From greg@cosc.canterbury.ac.nz  Wed Aug  7 04:51:47 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 07 Aug 2002 15:51:47 +1200 (NZST)
Subject: [Python-Dev] The memo of pickle
In-Reply-To: 
Message-ID: <200208070351.g773plu07973@oma.cosc.canterbury.ac.nz>

> The object's id() (address) is the key.  Else only hashable objects
> could be pickled.

Hmmm, I see.

It occurs to me that what you really want here is a special
kind of dictionary that uses "is" instead of "==" to compare
key values.

Or is that what was meant by an "identity dictionary"?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+



From skip@pobox.com  Wed Aug  7 05:04:51 2002
From: skip@pobox.com (Skip Montanaro)
Date: Tue, 6 Aug 2002 23:04:51 -0500
Subject: [Python-Dev] Do I misunderstand how codecs.EncodedFile is supposed to work?
Message-ID: <15696.40035.180993.654851@localhost.localdomain>

The following simple session suggests I misunderstood how the
codecs.EncodedFile function should work:

    >>> import codecs
    >>> f = codecs.EncodedFile(open("unicode-test.txt", "w"), "utf-8")
    >>> s = 'Caffe\x92 Lena'
    >>> u = unicode(s, "cp1252")
    >>> u
    u'Caffe\u2019 Lena'
    >>> f.write(u.encode("utf-8"))
    >>> f.write(u)
    Traceback (most recent call last):
      File "", line 1, in ?
      File "/usr/local/lib/python2.3/codecs.py", line 453, in write
        data, bytesdecoded = self.decode(data, self.errors)
    UnicodeError: ASCII encoding error: ordinal not in range(128)

I thought the whole purpose of the EncodedFile class was to provide
transparent encoding.  Shouldn't it support transparent encoding of Unicode
objects?  That is, I told the system I want writes to be in utf-8 when I
instantiated the class.  I don't think I should have to call .encode()
directly.  I realize I can wrap the function in a class that adds the
transparency I desire, but it seems the whole point should be to make it
easy to write Unicode objects to files.

Skip


From tim_one@email.msn.com  Wed Aug  7 05:09:01 2002
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 7 Aug 2002 00:09:01 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208070351.g773plu07973@oma.cosc.canterbury.ac.nz>
Message-ID: 

[Tim]
> The object's id() (address) is the key.  Else only hashable objects
> could be pickled.

[Greg Ewing]
> Hmmm, I see.
>
> It occurs to me that what you really want here is a special
> kind of dictionary that uses "is" instead of "==" to compare
> key values.

Possibly.  The *effect* of that could be gotten via a wrapper object, like

class KeyViaId:
    def __init__(self, obj):
        self.obj = obj
    def __hash__(self):
        return hash(id(self.obj))
    def __eq__(self, other):
        return self.obj is other.obj

but if Martin is worried about two-tuple sizes, he's not going to fall in
love with that.

> Or is that what was meant by an "identity dictionary"?

Guido asked, but if Martin answered that question I haven't seen it yet.



From greg@cosc.canterbury.ac.nz  Wed Aug  7 05:29:08 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 07 Aug 2002 16:29:08 +1200 (NZST)
Subject: Is-dict? (RE: [Python-Dev] The memo of pickle)
In-Reply-To: 
Message-ID: <200208070429.g774T8N08132@oma.cosc.canterbury.ac.nz>

> The *effect* of that could be gotten via a wrapper object
> ...
> but if Martin is worried about two-tuple sizes, he's not going to fall in
> love with that.

Indeed, which is why I suggested it.

I was wondering whether it would be worth putting one of
these in the standard library. I could have used one in
Plex, when I wanted to map a dictionary to something else
by identity, and I wanted to do it as fast as possible.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From martin@v.loewis.de  Wed Aug  7 07:15:56 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 07 Aug 2002 08:15:56 +0200
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208070351.g773plu07973@oma.cosc.canterbury.ac.nz>
References: <200208070351.g773plu07973@oma.cosc.canterbury.ac.nz>
Message-ID: 

Greg Ewing  writes:

> It occurs to me that what you really want here is a special
> kind of dictionary that uses "is" instead of "==" to compare
> key values.
> 
> Or is that what was meant by an "identity dictionary"?

Yes; that would be a dictionary that uses identity instead of
equality.

Regards,
Martin



From martin@v.loewis.de  Wed Aug  7 07:35:08 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 07 Aug 2002 08:35:08 +0200
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208061949.g76Jn3n20783@odiug.zope.com>
References: 
 <200208061949.g76Jn3n20783@odiug.zope.com>
Message-ID: 

Guido van Rossum  writes:

> Is it worth it?  

If you believe that the problem is real, yes.

> Have you made a patch?  

Not yet, no. With Mark's objection, it is more difficult than I
thought.

> What use case are you thinking of?

People repeatedly complain that pickle consumes too much memory. The
most recent instance was

http://groups.google.de/groups?selm=slrnal05v1.c05.Andreas.Leitgeb%40pc7499.gud.siemens.at&output=gplain

Earlier reports are

http://groups.google.de/groups?hl=de&lr=&ie=UTF-8&selm=mailman.1026940226.16076.python-list%40python.org

http://groups.google.de/groups?q=pickle+memory+group:comp.lang.python.*&hl=de&lr=&ie=UTF-8&selm=396B069A.9EBDD68B%40muc.das-werk.de&rnum=4

> Sorry, what's an identity dict?

IdentityDictionary is the name of a Smalltalk class that uses identity
instead of equality when comparing keys:

http://minnow.cc.gatech.edu/squeak/1845

In Python, it would allow arbitrary objects as keys, and allow equal
duplicates as different keys. For pickle, this would mean that we
could save both the creation of the id() object (since the object
itself is used as a key), and the creation of the tuple (since the
value is only the position).

Regards,
Martin


From martin@v.loewis.de  Wed Aug  7 07:46:59 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 07 Aug 2002 08:46:59 +0200
Subject: [Python-Dev] Do I misunderstand how codecs.EncodedFile is supposed to work?
In-Reply-To: <15696.40035.180993.654851@localhost.localdomain>
References: <15696.40035.180993.654851@localhost.localdomain>
Message-ID: 

Skip Montanaro  writes:

> I thought the whole purpose of the EncodedFile class was to provide
> transparent encoding.  

    """ Return a wrapped version of file which provides transparent
        encoding translation.

        Strings written to the wrapped file are interpreted according
        to the given data_encoding and then written to the original
        file as string using file_encoding. The intermediate encoding
        will usually be Unicode but depends on the specified codecs.

        Strings are read from the file using file_encoding and then
        passed back to the caller as string using data_encoding.

        If file_encoding is not given, it defaults to data_encoding.
    """

So, no. It provides transparent recoding: with a file encoding, and a
data encoding.

I never found this class useful.

What you want is a StreamWriter:

f = codecs.get_writer('utf-8')(open('unicode-test', 'w'))

Of course, *this* specific case can be written much easier as

f = codecs.open('unicode-test', 'w', encoding = 'utf-8')

The get_writer case is useful if you already got a file-like object
from somewhere.


> Shouldn't it support transparent encoding of Unicode
> objects?  That is, I told the system I want writes to be in utf-8 when I
> instantiated the class.  

You told it also that input data are in utf-8, as you have omitted the
data_encoding.

> I don't think I should have to call .encode() directly.  I realize I
> can wrap the function in a class that adds the transparency I
> desire, but it seems the whole point should be to make it easy to
> write Unicode objects to files.

Not this class, no. 

Now, you may ask what else is the purpose of this class. I really
don't know - it is against everything I'm advocating, as it assumes
that you have byte strings in a certain encoding in your memory that
you want to save in a different encoding. That should never happen -
all your text data should be Unicode strings.

Regards,
Martin


From drifty@bigfoot.com  Wed Aug  7 07:47:38 2002
From: drifty@bigfoot.com (Brett Cannon)
Date: Tue, 6 Aug 2002 23:47:38 -0700 (PDT)
Subject: [Python-Dev] python-dev summaries?
Message-ID: 

I pretty much no the answer to this vague question is, "no one is doing
them at the moment/anymore", but I though I would ask in case someone is
and I am completely oblivious to them being sent out to the list and
Google can't find any of them.

-Brett C.



From mal@lemburg.com  Wed Aug  7 08:38:59 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 07 Aug 2002 09:38:59 +0200
Subject: [Python-Dev] The memo of pickle
References: 	<200208061949.g76Jn3n20783@odiug.zope.com> 
Message-ID: <3D50CE93.3000102@lemburg.com>

Martin v. Loewis wrote:
> Guido van Rossum  writes:
> 
> 
>>Is it worth it?  
> 
> 
> If you believe that the problem is real, yes.

I think that the tuple is not the problem here, it's the
fact that so many objects are recorded in the memo to
later rebuild recursive structures.

Now, I believe that recursive structures in pickles are
not very common, so the memo is mostly useless in these
cases.

Perhaps pickle could grow an option to assume that a
data structure is non-recursive ?! In that case, no
data would be written to the memo (or only the id()
mapped to 1 to double-check).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From mal@lemburg.com  Wed Aug  7 08:46:55 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 07 Aug 2002 09:46:55 +0200
Subject: [Python-Dev] Do I misunderstand how codecs.EncodedFile is supposed
 to work?
References: <15696.40035.180993.654851@localhost.localdomain> 
Message-ID: <3D50D06F.6040305@lemburg.com>

Martin v. Loewis wrote:
> Skip Montanaro  writes:
> 
> 
>>I thought the whole purpose of the EncodedFile class was to provide
>>transparent encoding.  
> 
> 
>     """ Return a wrapped version of file which provides transparent
>         encoding translation.
> 
>         Strings written to the wrapped file are interpreted according
>         to the given data_encoding and then written to the original
>         file as string using file_encoding. The intermediate encoding
>         will usually be Unicode but depends on the specified codecs.
> 
>         Strings are read from the file using file_encoding and then
>         passed back to the caller as string using data_encoding.
> 
>         If file_encoding is not given, it defaults to data_encoding.
>     """
> 
> So, no. It provides transparent recoding: with a file encoding, and a
> data encoding.
> 
> I never found this class useful.

It's not a class, just a helper for StreamRecoder. It's purpose
is to provide an easy way of saying "the inside world is encoding
X while the outside world uses Y":

     # Make stdout translate Latin-1 output into UTF-8 output
     sys.stdout = EncodedFile(sys.stdout, 'latin-1', 'utf-8')

     # Have stdin translate UTF-8 input into Latin-1 input
     sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1')

Here the inside world uses Latin-1 while the outside world
uses UTF-8.

You could also use it to talk to a gzipped file or, provided
you have such a codec, to an encrypted file.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From sholden@holdenweb.com  Wed Aug  7 12:20:48 2002
From: sholden@holdenweb.com (Steve Holden)
Date: Wed, 7 Aug 2002 07:20:48 -0400
Subject: [Python-Dev] CVS fails to commit
Message-ID: <023201c23e04$7b94b610$6300000a@holdenweb.com>

I find that this morning I am still prevented from committing changes to

    ~/pythoncvs/python/dist/src/Doc/lib/libposixpath.tex

Is this a problem that's only affecting a small portion of the repository,
or is it more general? To repeat yesterday's notification, the error message
I'm seeing is:

    cvs server: [04:14:04] waiting for jackjansen's lock in
/cvsroot/python/python/dist/src/Doc/lib

locked-out-ly y'rs  - steve
-----------------------------------------------------------------------
Steve Holden                                 http://www.holdenweb.com/
Python Web Programming                http://pydish.holdenweb.com/pwp/
-----------------------------------------------------------------------






From sjoerd@acm.org  Wed Aug  7 12:35:19 2002
From: sjoerd@acm.org (Sjoerd Mullender)
Date: Wed, 07 Aug 2002 13:35:19 +0200
Subject: [Python-Dev] CVS fails to commit
In-Reply-To: <023201c23e04$7b94b610$6300000a@holdenweb.com>
References: <023201c23e04$7b94b610$6300000a@holdenweb.com>
Message-ID: <200208071135.g77BZJu09411@indus.ins.cwi.nl>

It looks like Jack's problems caused a lock file to be stuck there.  I
expect this affects a small part of the repository, and also that it
needs manual intervention to correct the problem.  So please submit a
service request to SourceForge to remove the lock file(s) in
/cvsroot/python/python/dist/src/Doc/lib (and all subdirectories).

On Wed, Aug 7 2002 "Steve Holden" wrote:

> I find that this morning I am still prevented from committing changes to
> 
>     ~/pythoncvs/python/dist/src/Doc/lib/libposixpath.tex
> 
> Is this a problem that's only affecting a small portion of the repository,
> or is it more general? To repeat yesterday's notification, the error message
> I'm seeing is:
> 
>     cvs server: [04:14:04] waiting for jackjansen's lock in
> /cvsroot/python/python/dist/src/Doc/lib
> 
> locked-out-ly y'rs  - steve
> -----------------------------------------------------------------------
> Steve Holden                                 http://www.holdenweb.com/
> Python Web Programming                http://pydish.holdenweb.com/pwp/
> -----------------------------------------------------------------------
> 
> 
> 
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> 

-- Sjoerd Mullender 


From sholden@holdenweb.com  Wed Aug  7 13:02:10 2002
From: sholden@holdenweb.com (Steve Holden)
Date: Wed, 7 Aug 2002 08:02:10 -0400
Subject: [Python-Dev] CVS fails to commit
References: <023201c23e04$7b94b610$6300000a@holdenweb.com>  <200208071135.g77BZJu09411@indus.ins.cwi.nl>
Message-ID: <027b01c23e0a$4317e630$6300000a@holdenweb.com>

----- Original Message -----
From: "Sjoerd Mullender" 
> It looks like Jack's problems caused a lock file to be stuck there.  I
> expect this affects a small part of the repository, and also that it
> needs manual intervention to correct the problem.  So please submit a
> service request to SourceForge to remove the lock file(s) in
> /cvsroot/python/python/dist/src/Doc/lib (and all subdirectories).
>
> On Wed, Aug 7 2002 "Steve Holden" wrote:
>
> > I find that this morning I am still prevented from committing changes to
> >
> >     ~/pythoncvs/python/dist/src/Doc/lib/libposixpath.tex
> >
> > Is this a problem that's only affecting a small portion of the
repository,
> > or is it more general? To repeat yesterday's notification, the error
message
> > I'm seeing is:
> >
> >     cvs server: [04:14:04] waiting for jackjansen's lock in
> > /cvsroot/python/python/dist/src/Doc/lib
> >

Whether by accident or in response to my posting I don't know, but I could
commit within five minutes of making the support request. Kudos to
SourceForge on this one?

regards
-----------------------------------------------------------------------
Steve Holden                                 http://www.holdenweb.com/
Python Web Programming                http://pydish.holdenweb.com/pwp/
-----------------------------------------------------------------------






From guido@python.org  Wed Aug  7 13:11:03 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 07 Aug 2002 08:11:03 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: Your message of "Wed, 07 Aug 2002 09:38:59 +0200."
 <3D50CE93.3000102@lemburg.com>
References:  <200208061949.g76Jn3n20783@odiug.zope.com> 
 <3D50CE93.3000102@lemburg.com>
Message-ID: <200208071211.g77CB3g29542@pcp02138704pcs.reston01.va.comcast.net>

> I think that the tuple is not the problem here, it's the
> fact that so many objects are recorded in the memo to
> later rebuild recursive structures.
> 
> Now, I believe that recursive structures in pickles are
> not very common, so the memo is mostly useless in these
> cases.

Use cPickle, it's much more frugal with the memo, and also has some
options to control the memo (read the docs, I forget the details and
am in a hurry).

> Perhaps pickle could grow an option to assume that a
> data structure is non-recursive ?! In that case, no
> data would be written to the memo (or only the id()
> mapped to 1 to double-check).

The memo is also for sharing.  There's no recursion in this example,
but the sharing may be important:

a = [1,2,3]
b = [a,a,a]

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Wed Aug  7 14:48:22 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 07 Aug 2002 15:48:22 +0200
Subject: [Python-Dev] The memo of pickle
References:  <200208061949.g76Jn3n20783@odiug.zope.com>               <3D50CE93.3000102@lemburg.com> <200208071211.g77CB3g29542@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3D512526.9040808@lemburg.com>

Guido van Rossum wrote:
>>I think that the tuple is not the problem here, it's the
>>fact that so many objects are recorded in the memo to
>>later rebuild recursive structures.
>>
>>Now, I believe that recursive structures in pickles are
>>not very common, so the memo is mostly useless in these
>>cases.
> 
> 
> Use cPickle, it's much more frugal with the memo, and also has some
> options to control the memo (read the docs, I forget the details and
> am in a hurry).

Just to clarify: I don't have a problem with the memo
in pickle at all :-) Martin brought up this issue.

>>Perhaps pickle could grow an option to assume that a
>>data structure is non-recursive ?! In that case, no
>>data would be written to the memo (or only the id()
>>mapped to 1 to double-check).
> 
> The memo is also for sharing.  There's no recursion in this example,
> but the sharing may be important:
> 
> a = [1,2,3]
> b = [a,a,a]

Right. I don't think these references are too common in pickles.
Zope Corp should know much more about this, I guess, since ZODB
is all about pickleing.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From ark@research.att.com  Wed Aug  7 15:17:39 2002
From: ark@research.att.com (Andrew Koenig)
Date: 07 Aug 2002 10:17:39 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: 
References: 
Message-ID: 

Tim>     re1 in re2

Tim> should be True iff the language accepted by re1 is a subset of
Tim> the language accepted by re2.  In this case, it's OK to consider
Tim> the empty language a subset of all others, since nobody will be
Tim> able to make head or tail out of the code anyway.

Note the distinction between the empty language and the empty string.
As a language is a set of strings, the empty language is one that
contains no strings, not even the empty string.  Therefore, a regular
expression that accepts the empty language is one that rejects every
string, even the empty string.

Pedantically y'rs    --ark


From guido@python.org  Wed Aug  7 15:50:45 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 07 Aug 2002 10:50:45 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: Your message of "Wed, 07 Aug 2002 15:48:22 +0200."
 <3D512526.9040808@lemburg.com>
References:  <200208061949.g76Jn3n20783@odiug.zope.com>  <3D50CE93.3000102@lemburg.com> <200208071211.g77CB3g29542@pcp02138704pcs.reston01.va.comcast.net>
 <3D512526.9040808@lemburg.com>
Message-ID: <200208071450.g77EojV03604@pcp02138704pcs.reston01.va.comcast.net>

> >>Perhaps pickle could grow an option to assume that a
> >>data structure is non-recursive ?! In that case, no
> >>data would be written to the memo (or only the id()
> >>mapped to 1 to double-check).
> > 
> > The memo is also for sharing.  There's no recursion in this example,
> > but the sharing may be important:
> > 
> > a = [1,2,3]
> > b = [a,a,a]
> 
> Right. I don't think these references are too common in pickles.

I think they are.

> Zope Corp should know much more about this, I guess, since ZODB
> is all about pickleing.

Sharing object references is essential in Zope.  But only to certain
objects; sharing strings and numbers is not important, and I believe
cPickle doesn't put those in the memo, while pickle.py puts
essentially everything in the memo...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Wed Aug  7 15:58:12 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 07 Aug 2002 16:58:12 +0200
Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Python compile.c,2.250,2.251
References: 	<3D5125A9.6010301@lemburg.com> 
Message-ID: <3D513584.5030503@lemburg.com>

Martin v. L=F6wis wrote:
> "M.-A. Lemburg"  writes:
>=20
>=20
>>>+ #ifndef Py_USING_UNICODE
>>>+ 	abort();
>>>+ #else
>>
>>Shouldn't this be a call to Py_FatalError() with a proper
>>error message ?
>=20
>=20
> What is the guideline for when to use abort, and when to use
> Py_FatalError?

Looking at the code for Py_FatalError(), I'd say always use
this instead of calling abort directly, except maybe for
situations where you don't want anything printed.

--=20
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From dave@boost-consulting.com  Wed Aug  7 15:51:22 2002
From: dave@boost-consulting.com (David Abrahams)
Date: Wed, 7 Aug 2002 10:51:22 -0400
Subject: [Python-Dev] docstrings, help(), and __name__
Message-ID: <086901c23e21$e7bea8b0$62f0fc0c@boostconsulting.com>

I've recently been implementing docstring support for Boost.Python
extension classes (and in particular, their methods). I have a callable
type which wraps all C++ functions and member functions -- it basically
looks like a minimal subset of Python's function type, with a tp_descr_get
slot which does the same thing that funcobject.c's func_descr_get() does:

    static PyObject *
    function_descr_get(PyObject *func, PyObject *obj, PyObject *type_)
    {
        if (obj == Py_None)
            obj = NULL;
        return PyMethod_New(func, obj, type_);
    }

So I just recently added a descriptor for the "__doc__" string attribute,
and I thought I'd try help() on one of these methods:

*****************************************************************
Failure in example: help(X)
from line #2 of __main__
Exception raised:
Traceback (most recent call last):
  File "doctest.py", line 430, in _run_examples_inner
    compileflags, 1) in globs
  File "", line 1, in ?
  File "c:\tools\python-2.2.1\lib\site.py", line 279, in __call__
    return pydoc.help(*args, **kwds)
  File "c:\tools\python-2.2.1\lib\pydoc.py", line 1510, in __call__
    self.help(request)
  File "c:\tools\python-2.2.1\lib\pydoc.py", line 1546, in help
    else: doc(request, 'Help on %s:')
  File "c:\tools\python-2.2.1\lib\pydoc.py", line 1341, in doc
    pager(title % (desc + suffix) + '\n\n' + text.document(thing, name))
  File "c:\tools\python-2.2.1\lib\pydoc.py", line 268, in document
    if inspect.isclass(object): return apply(self.docclass, args)
  File "c:\tools\python-2.2.1\lib\pydoc.py", line 1093, in docclass
    lambda t: t[1] == 'method')
  File "c:\tools\python-2.2.1\lib\pydoc.py", line 1035, in spill
    name, mod, object))
  File "c:\tools\python-2.2.1\lib\pydoc.py", line 269, in document
    if inspect.isroutine(object): return apply(self.docroutine, args)
  File "c:\tools\python-2.2.1\lib\pydoc.py", line 1116, in docroutine
    realname = object.__name__
AttributeError: 'Boost.Python.function' object has no attribute '__name__'
*****************************************************************

It seems I'm breaking some protocol. It's easy enough to add a '__name__'
attribute to my function objects, but I'd like to be sure that I'm adding
everything I really /should/ add. Just how much like a regular Python
function does my function have to be in order to make the help system (and
other standard systems with such expectations) happy?

TIA,
Dave

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com





From mwh@python.net  Wed Aug  7 16:36:21 2002
From: mwh@python.net (Michael Hudson)
Date: 07 Aug 2002 16:36:21 +0100
Subject: [Python-Dev] docstrings, help(), and __name__
In-Reply-To: "David Abrahams"'s message of "Wed, 7 Aug 2002 10:51:22 -0400"
References: <086901c23e21$e7bea8b0$62f0fc0c@boostconsulting.com>
Message-ID: <2madnyhea2.fsf@starship.python.net>

"David Abrahams"  writes:

> [function-like object with no __name__ breaks pydoc]
> 
> It seems I'm breaking some protocol. It's easy enough to add a '__name__'
> attribute to my function objects, but I'd like to be sure that I'm adding
> everything I really /should/ add.

I am fairly certain the protocols inspect uses are not written down
anywhere.  I think they're defined entirely by the implementation.

> Just how much like a regular Python function does my function have
> to be in order to make the help system (and other standard systems
> with such expectations) happy?

"Use the source, Luke."  Not a good answer, but probably the only one.

I guess inspect thinks your object looks like a method descriptor?  It
certainly seems to think it's a "routine" whatever that means...

Cheers,
M.

-- 
  Famous remarks are very seldom quoted correctly.
                                                    -- Simeon Strunsky


From guido@python.org  Wed Aug  7 16:50:24 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 07 Aug 2002 11:50:24 -0400
Subject: [Python-Dev] docstrings, help(), and __name__
In-Reply-To: Your message of "Wed, 07 Aug 2002 10:51:22 EDT."
 <086901c23e21$e7bea8b0$62f0fc0c@boostconsulting.com>
References: <086901c23e21$e7bea8b0$62f0fc0c@boostconsulting.com>
Message-ID: <200208071550.g77FoP003782@pcp02138704pcs.reston01.va.comcast.net>

> I've recently been implementing docstring support for Boost.Python
> extension classes (and in particular, their methods). I have a callable
> type which wraps all C++ functions and member functions -- it basically
> looks like a minimal subset of Python's function type, with a tp_descr_get
> slot which does the same thing that funcobject.c's func_descr_get() does:
> 
>     static PyObject *
>     function_descr_get(PyObject *func, PyObject *obj, PyObject *type_)
>     {
>         if (obj == Py_None)
>             obj = NULL;
>         return PyMethod_New(func, obj, type_);
>     }
> 
> So I just recently added a descriptor for the "__doc__" string attribute,
> and I thought I'd try help() on one of these methods:
> 
> *****************************************************************
> Failure in example: help(X)
> from line #2 of __main__
> Exception raised:
> Traceback (most recent call last):
>   File "doctest.py", line 430, in _run_examples_inner
>     compileflags, 1) in globs
>   File "", line 1, in ?
>   File "c:\tools\python-2.2.1\lib\site.py", line 279, in __call__
>     return pydoc.help(*args, **kwds)
>   File "c:\tools\python-2.2.1\lib\pydoc.py", line 1510, in __call__
>     self.help(request)
>   File "c:\tools\python-2.2.1\lib\pydoc.py", line 1546, in help
>     else: doc(request, 'Help on %s:')
>   File "c:\tools\python-2.2.1\lib\pydoc.py", line 1341, in doc
>     pager(title % (desc + suffix) + '\n\n' + text.document(thing, name))
>   File "c:\tools\python-2.2.1\lib\pydoc.py", line 268, in document
>     if inspect.isclass(object): return apply(self.docclass, args)
>   File "c:\tools\python-2.2.1\lib\pydoc.py", line 1093, in docclass
>     lambda t: t[1] == 'method')
>   File "c:\tools\python-2.2.1\lib\pydoc.py", line 1035, in spill
>     name, mod, object))
>   File "c:\tools\python-2.2.1\lib\pydoc.py", line 269, in document
>     if inspect.isroutine(object): return apply(self.docroutine, args)
>   File "c:\tools\python-2.2.1\lib\pydoc.py", line 1116, in docroutine
>     realname = object.__name__
> AttributeError: 'Boost.Python.function' object has no attribute '__name__'
> *****************************************************************
> 
> It seems I'm breaking some protocol. It's easy enough to add a '__name__'
> attribute to my function objects, but I'd like to be sure that I'm adding
> everything I really /should/ add. Just how much like a regular Python
> function does my function have to be in order to make the help system (and
> other standard systems with such expectations) happy?

It's hard to say.  The pydoc code makes up protocols as it goes.  I
think __name__ is probably the only one you're missing in practice.

--Guido van Rossum (home page: http://www.python.org/~guido/)



From aahz@pythoncraft.com  Wed Aug  7 17:11:06 2002
From: aahz@pythoncraft.com (Aahz)
Date: Wed, 7 Aug 2002 12:11:06 -0400
Subject: [Python-Dev] jython-dev failure
Message-ID: <20020807161106.GA14912@panix.com>

I'm assuming the Jython developers are monitoring this list...

(I'm leaving for a week, so I don't have time to hunt down individual
addresses.)

----- Forwarded message from Duke  -----
> Date: Tue, 06 Aug 2002 21:08:35 -0600
> To: webmaster@python.org
> From: Duke 
> Subject: Please Forward: "A naive question about the applicability of
>   Jython..."
> 
> I tried to send the text below the line to 
> 
> as suggested under "Email Us" on the page http://www.jython.org .
> Unfortunately, it gave "599 DSMTP mail server registration lapsed.  Try 
> resending."
> Resending failed.
> Can you please help???
> Thanks!!!!
> ------------------------------------------------------------------------------------------------
> Nature of question: Can Jython do this?
> Nature of "this": read input coming to Internet Explorer, write output 
> through it.
> 
> I have been fishing around for a way to monitor data coming into IE, and 
> when
> appropriate, generate output -- basically, automatically processing the 
> input
> and responding to it.  Imagine surfing a database, and when desired, sending
> updates to it.
> 
> Can Jython do this?  Specifically, can it monitor input directed to the IE 
> window,
> and send responses?  I don't know enough about Java to understand whether it
> has that capability.  I have about 2 years experience writing Python and 
> Tkinter.
> 
> If it can great!!!  If not, do you know of any plugins that can monitor and 
> generate
> traffic inside an IE session, particularly an SSL session?
> 
> Thanks very much for your time!
> 
> Duke Winsor
> 
----- End forwarded message -----


From martin@v.loewis.de  Wed Aug  7 19:08:35 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 07 Aug 2002 20:08:35 +0200
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <3D50CE93.3000102@lemburg.com>
References: 
 <200208061949.g76Jn3n20783@odiug.zope.com>
 
 <3D50CE93.3000102@lemburg.com>
Message-ID: 

"M.-A. Lemburg"  writes:

> I think that the tuple is not the problem here, it's the
> fact that so many objects are recorded in the memo to
> later rebuild recursive structures.

It's not a matter of beliefs: each dictionary entry contributes 12
bytes. Each integer key contributes 12 bytes, each integer position
contributes 12 bytes. Each tuple contributes 36 bytes.

Assuming pymalloc and the integer allocator, this makes a total of 76
bytes per recorded object. The tuples contribute over 50% to that.

> Perhaps pickle could grow an option to assume that a
> data structure is non-recursive ?! In that case, no
> data would be written to the memo (or only the id()
> mapped to 1 to double-check).

That is already possible: You can pass a fake dictionary that records
nothing.

Regards,
Martin



From martin@v.loewis.de  Wed Aug  7 19:09:54 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 07 Aug 2002 20:09:54 +0200
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <3D512526.9040808@lemburg.com>
References: 
 <200208061949.g76Jn3n20783@odiug.zope.com>
 
 <3D50CE93.3000102@lemburg.com>
 <200208071211.g77CB3g29542@pcp02138704pcs.reston01.va.comcast.net>
 <3D512526.9040808@lemburg.com>
Message-ID: 

"M.-A. Lemburg"  writes:

> > Use cPickle, it's much more frugal with the memo, and also has some
> > options to control the memo (read the docs, I forget the details and
> > am in a hurry).
> 
> Just to clarify: I don't have a problem with the memo
> in pickle at all :-) Martin brought up this issue.

I don't have a problem with the memo, either. I have a problem with
the tuples in the memo.

Regards,
Martin



From martin@v.loewis.de  Wed Aug  7 19:11:46 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 07 Aug 2002 20:11:46 +0200
Subject: [Python-Dev] Do I misunderstand how codecs.EncodedFile is supposed to work?
In-Reply-To: <3D50D06F.6040305@lemburg.com>
References: <15696.40035.180993.654851@localhost.localdomain>
 
 <3D50D06F.6040305@lemburg.com>
Message-ID: 

"M.-A. Lemburg"  writes:

> It's not a class, just a helper for StreamRecoder. It's purpose
> is to provide an easy way of saying "the inside world is encoding
> X while the outside world uses Y":

In a well-designed designed application, you should not need to say
this. The inside world should use Unicode objects.

Regards,
Martin



From mal@lemburg.com  Wed Aug  7 20:51:59 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 07 Aug 2002 21:51:59 +0200
Subject: [Python-Dev] Do I misunderstand how codecs.EncodedFile is supposed
 to work?
References: <15696.40035.180993.654851@localhost.localdomain>		<3D50D06F.6040305@lemburg.com> 
Message-ID: <3D517A5F.2040707@lemburg.com>

Martin v. Loewis wrote:
> "M.-A. Lemburg"  writes:
> 
> 
>>It's not a class, just a helper for StreamRecoder. It's purpose
>>is to provide an easy way of saying "the inside world is encoding
>>X while the outside world uses Y":
> 
> 
> In a well-designed designed application, you should not need to say
> this. The inside world should use Unicode objects.

Agreed, but if you want to port an existing application to
the Unicode world, it sometimes helps.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From tim@zope.com  Wed Aug  7 21:02:53 2002
From: tim@zope.com (Tim Peters)
Date: Wed, 7 Aug 2002 16:02:53 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: 
Message-ID: 

> Tim>     re1 in re2
>
> Tim> should be True iff the language accepted by re1 is a subset of
> Tim> the language accepted by re2.  In this case, it's OK to consider
> Tim> the empty language a subset of all others, since nobody will be
> Tim> able to make head or tail out of the code anyway.

[ark@research.att.com]
> Note the distinction between the empty language and the empty string.
> As a language is a set of strings, the empty language is one that
> contains no strings, not even the empty string.  Therefore, a regular
> expression that accepts the empty language is one that rejects every
> string, even the empty string.

Sure, that's why I said "empty language" and not "empty string".  It
wouldn't make *any* sense for "re1 in re2" to consider a regexp that
accepted the language {""} to be "in" all other regexps.  But a regexp that
accepts the language {} (i.e., the empty language) clearly accepts a subset
of the language accepted by any regexp.

> Pedantically y'rs    --ark

Not enough to matter in this case .



From ark@research.att.com  Wed Aug  7 21:14:22 2002
From: ark@research.att.com (Andrew Koenig)
Date: 07 Aug 2002 16:14:22 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: 
References: 
Message-ID: 

>> Note the distinction between the empty language and the empty
>> string.  As a language is a set of strings, the empty language is
>> one that contains no strings, not even the empty string.
>> Therefore, a regular expression that accepts the empty language is
>> one that rejects every string, even the empty string.

Tim> Sure, that's why I said "empty language" and not "empty string".
Tim> It wouldn't make *any* sense for "re1 in re2" to consider a
Tim> regexp that accepted the language {""} to be "in" all other
Tim> regexps.  But a regexp that accepts the language {} (i.e., the
Tim> empty language) clearly accepts a subset of the language accepted
Tim> by any regexp.

Right.  (I wasn't disagreeing with you, merely pointing out a
plausible miscomprehension on the part of the reader (because
I made just that mistake the first time I read it))

>> Pedantically y'rs    --ark

Tim> Not enough to matter in this case .

Whether it matters depends on whether the reader made the same
mistake I did on first reading.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From tim@zope.com  Wed Aug  7 21:58:05 2002
From: tim@zope.com (Tim Peters)
Date: Wed, 7 Aug 2002 16:58:05 -0400
Subject: [Python-Dev] CVS fails to commit
In-Reply-To: <200208071135.g77BZJu09411@indus.ins.cwi.nl>
Message-ID: 

Sjoerd, is Jack having some systematic problem with CVS?  A stale lock of
his prevented checkins under Doc/ Saturday through Sunday afternoon too,
which also required SourceForge intervention to clear out.

> -----Original Message-----
> From: python-dev-admin@python.org [mailto:python-dev-admin@python.org]On
> Behalf Of Sjoerd Mullender
> Sent: Wednesday, August 07, 2002 7:35 AM
> To: Steve Holden
> Cc: Python-Dev
> Subject: Re: [Python-Dev] CVS fails to commit
>
>
> It looks like Jack's problems caused a lock file to be stuck there.  I
> expect this affects a small part of the repository, and also that it
> needs manual intervention to correct the problem.  So please submit a
> service request to SourceForge to remove the lock file(s) in
> /cvsroot/python/python/dist/src/Doc/lib (and all subdirectories).
>
> On Wed, Aug 7 2002 "Steve Holden" wrote:
>
> > I find that this morning I am still prevented from committing changes to
> >
> >     ~/pythoncvs/python/dist/src/Doc/lib/libposixpath.tex
> >
> > Is this a problem that's only affecting a small portion of the
> repository,
> > or is it more general? To repeat yesterday's notification, the
> error message
> > I'm seeing is:
> >
> >     cvs server: [04:14:04] waiting for jackjansen's lock in
> > /cvsroot/python/python/dist/src/Doc/lib
> >
> > locked-out-ly y'rs  - steve
> > -----------------------------------------------------------------------
> > Steve Holden                                 http://www.holdenweb.com/
> > Python Web Programming                http://pydish.holdenweb.com/pwp/
> > -----------------------------------------------------------------------
> >
> >
> >
> >
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev@python.org
> > http://mail.python.org/mailman/listinfo/python-dev
> >
>
> -- Sjoerd Mullender 
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev



From greg@cosc.canterbury.ac.nz  Wed Aug  7 22:56:07 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 08 Aug 2002 09:56:07 +1200 (NZST)
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <3D50CE93.3000102@lemburg.com>
Message-ID: <200208072156.g77Lu7u14389@oma.cosc.canterbury.ac.nz>

"M.-A. Lemburg" :

> Perhaps pickle could grow an option to assume that a
> data structure is non-recursive ?

Then you'd probably want some means of detecting cycles, or you'd get
infinite recursion when you got it wrong. That would mean keeping a
stack of objects, I think -- probably less memory than keeping all of
them at once.

But I think the idea of keeping the object references in a list is
well worth trying first. 4 bytes per object instead of 36 sounds like a
good improvement to me!

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From dave@boost-consulting.com  Wed Aug  7 23:00:50 2002
From: dave@boost-consulting.com (David Abrahams)
Date: Wed, 7 Aug 2002 18:00:50 -0400
Subject: [Python-Dev] docstrings, help(), and __name__
References: <086901c23e21$e7bea8b0$62f0fc0c@boostconsulting.com>  <200208071550.g77FoP003782@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <0a3001c23e60$3927d710$62f0fc0c@boostconsulting.com>

From: "Guido van Rossum" 

> > It seems I'm breaking some protocol. It's easy enough to add a
'__name__'
> > attribute to my function objects, but I'd like to be sure that I'm
adding
> > everything I really /should/ add. Just how much like a regular Python
> > function does my function have to be in order to make the help system
(and
> > other standard systems with such expectations) happy?
>
> It's hard to say.  The pydoc code makes up protocols as it goes.  I
> think __name__ is probably the only one you're missing in practice.

That appears to be correct. Interestingly, these methods seem to be treated
differently from ordinary ones. My methods get shown like this:

   |  __init__ = __init__(...)
   |      this is the __init__ function
   |      its documentation has two lines.


Where the 2nd instance of __init__ is given by the value of the __name__
attribute, while built-in methods get shown as follows:

  >>> class X(object):
  ...     def __init__(self): pass
  ...
  >>> help(X)
  Help on class X in module __main__:

  class X(__builtin__.object)
   |  Methods defined here:
   |
   |  __init__(self)

Does anyone know why the difference? Is it perhaps the missing 'im_class'
attribute in my case? These are the sorts of things I want to know about...

Thanks,
Dave

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com





From python@rcn.com  Wed Aug  7 23:43:46 2002
From: python@rcn.com (Raymond Hettinger)
Date: Wed, 7 Aug 2002 18:43:46 -0400
Subject: [Python-Dev] Pickling in XML format
Message-ID: <003701c23e63$e4698440$5066accf@othello>

Do you guys have any thoughts on the merits of adding dumpXML and loadXML methods to the pickle module?

The only disadvantage that comes to mind is that the file sizes are larger (though they may compress more efficiently.

The advantages center around portability and the use of existing tools:
-- The pickles would be validatable against a DTD or schema
-- Pickles would be more human readable than the current format
-- XLST make translations to HTML, JavaPickle formats, more compact formats, etc.
-- XPATH could be used as a recursive search tool
-- Pickles would be editable and viewable with XML editors
-- No need for stack machine instructions to be included
-- Python object trees could potentially be loaded in other languages
-- The DTD can be used by non-Python sources to create data that is directly loadable in to Python objects
-- Pickle security can be improved by using tight DTDs instead of copyreg.

I would appreciate you thoughts.


Raymond Hettinger

P.S.  Here's an example of what it would look like:

class Circle:
    def __init__(self, rad):
 self.rad = rad

class Square:
    def __init__(self, side):
 self.side = side
    def __getinitargs__(self):
        return (self.side,)

class Triangle:
    def __init__(self, side1, side2, side3):
        self.sides = map(math.toRadians, (side1, side2, side3))
    def __getstate__(self):
        return self.sides
    def __setstate__(self, state):
        self.sides = state

>>> d = {"one":"uno", "two":"dos"}
>>> obj = [d, 42, u"abc", [1.0,2+5j], Circle(5), Square(4), Triangle(3,4,5), d, None, True, False, Circle, len]
>>> pickle.dumpsXML(obj)



  
     one uno 
     two dos 
  
  42 
  abc 
  
    1.0 
    2+5j 
  
  
    
       rad 5 
    
  
  
    
      5
    
  
  
    
      0.052358333333333333
      0.069811111111111115
      0.087263888888888891
    
  
  
  
  
  
  
  







From tim.one@comcast.net  Thu Aug  8 00:36:36 2002
From: tim.one@comcast.net (Tim Peters)
Date: Wed, 07 Aug 2002 19:36:36 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: 
Message-ID: 

[Martin v. Loewis]
> It's not a matter of beliefs: each dictionary entry contributes 12
> bytes. Each integer key contributes 12 bytes, each integer position
> contributes 12 bytes. Each tuple contributes 36 bytes.

I'm not in love with giant pickle memos myself, but to reduce expectations
closer to reality, note that each dict entry consumes at least 18 bytes (we
keep the load factor under 2/3, so there's at least one unused entry for
every two real entries; it's an indirect overhead, but a real one).



From skip@pobox.com  Thu Aug  8 00:43:50 2002
From: skip@pobox.com (Skip Montanaro)
Date: Wed, 7 Aug 2002 18:43:50 -0500
Subject: [Python-Dev] Do I misunderstand how codecs.EncodedFile is supposed to work?
In-Reply-To: 
References: <15696.40035.180993.654851@localhost.localdomain>
 
 <3D50D06F.6040305@lemburg.com>
 
Message-ID: <15697.45238.952815.600057@localhost.localdomain>

>>>>> "Martin" == Martin v Loewis  writes:

    Martin> "M.-A. Lemburg"  writes:
    >> It's not a class, just a helper for StreamRecoder. It's purpose
    >> is to provide an easy way of saying "the inside world is encoding
    >> X while the outside world uses Y":

    Martin> In a well-designed designed application, you should not need to
    Martin> say this. The inside world should use Unicode objects.

Which is precisely what I'm trying to do. ;-)  I think I have enough clues
to make things work now.  Thanks for the pointers.

Skip




From greg@cosc.canterbury.ac.nz  Thu Aug  8 00:51:06 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 08 Aug 2002 11:51:06 +1200 (NZST)
Subject: [Python-Dev] The memo of pickle
In-Reply-To: 
Message-ID: <200208072351.g77Np6Y14634@oma.cosc.canterbury.ac.nz>

> I'm not in love with giant pickle memos myself, but to reduce expectations
> closer to reality, note that each dict entry consumes at least 18 bytes (we
> keep the load factor under 2/3, so there's at least one unused entry for
> every two real entries; it's an indirect overhead, but a real one).

Is there perhaps a more memory-efficient data structure that
could be used here instead of a dict? A b-tree, perhaps,
which with a suitable bucket size could use no more than
about 8 byte per entry -- 4 for the object reference and
4 for the integer index that it maps to.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From guido@python.org  Thu Aug  8 01:17:50 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 07 Aug 2002 20:17:50 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: Your message of "Thu, 08 Aug 2002 09:56:07 +1200."
 <200208072156.g77Lu7u14389@oma.cosc.canterbury.ac.nz>
References: <200208072156.g77Lu7u14389@oma.cosc.canterbury.ac.nz>
Message-ID: <200208080017.g780HoD20812@pcp02138704pcs.reston01.va.comcast.net>

> > Perhaps pickle could grow an option to assume that a
> > data structure is non-recursive ?
> 
> Then you'd probably want some means of detecting cycles, or you'd get
> infinite recursion when you got it wrong. That would mean keeping a
> stack of objects, I think -- probably less memory than keeping all of
> them at once.

cPickle has an obscure options for this.  You create a pickler object
and set the attribute "fast" to True, I believe.  It detects cycles by
using a nesting counter, I believe (read the source to learn more).

> But I think the idea of keeping the object references in a list is
> well worth trying first. 4 bytes per object instead of 36 sounds like a
> good improvement to me!

So maybe we need to create an identitydict...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Thu Aug  8 01:18:49 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 07 Aug 2002 20:18:49 -0400
Subject: [Python-Dev] docstrings, help(), and __name__
In-Reply-To: Your message of "Wed, 07 Aug 2002 18:00:50 EDT."
 <0a3001c23e60$3927d710$62f0fc0c@boostconsulting.com>
References: <086901c23e21$e7bea8b0$62f0fc0c@boostconsulting.com> <200208071550.g77FoP003782@pcp02138704pcs.reston01.va.comcast.net>
 <0a3001c23e60$3927d710$62f0fc0c@boostconsulting.com>
Message-ID: <200208080018.g780InV20824@pcp02138704pcs.reston01.va.comcast.net>

> That appears to be correct. Interestingly, these methods seem to be treated
> differently from ordinary ones. My methods get shown like this:
> 
>    |  __init__ = __init__(...)
>    |      this is the __init__ function
>    |      its documentation has two lines.
> 
> 
> Where the 2nd instance of __init__ is given by the value of the __name__
> attribute, while built-in methods get shown as follows:
> 
>   >>> class X(object):
>   ...     def __init__(self): pass
>   ...
>   >>> help(X)
>   Help on class X in module __main__:
> 
>   class X(__builtin__.object)
>    |  Methods defined here:
>    |
>    |  __init__(self)
> 
> Does anyone know why the difference? Is it perhaps the missing 'im_class'
> attribute in my case? These are the sorts of things I want to know about...

Who knows.  As I said, pydoc is a mess of underdocumented
heuristics...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Thu Aug  8 01:21:35 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 07 Aug 2002 20:21:35 -0400
Subject: [Python-Dev] Pickling in XML format
In-Reply-To: Your message of "Wed, 07 Aug 2002 18:43:46 EDT."
 <003701c23e63$e4698440$5066accf@othello>
References: <003701c23e63$e4698440$5066accf@othello>
Message-ID: <200208080021.g780LZr20843@pcp02138704pcs.reston01.va.comcast.net>

> Do you guys have any thoughts on the merits of adding dumpXML and
> loadXML methods to the pickle module?

That doesn't belong in the pickle module.  An XML format to store
Python-specific data structures doesn't make sense.  Storing data
in XML makes total sense, but should probably be guided by some XML
standard and not by the set of data types that happen to be available
in Python.  Put it in the xml module.

Note that xmlrpclib.py already has a way to do this, for the data
types supported by XMLRPC.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net  Thu Aug  8 01:47:16 2002
From: tim.one@comcast.net (Tim Peters)
Date: Wed, 07 Aug 2002 20:47:16 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208072351.g77Np6Y14634@oma.cosc.canterbury.ac.nz>
Message-ID: 

[Greg Ewing]
> Is there perhaps a more memory-efficient data structure that
> could be used here instead of a dict? A b-tree, perhaps,
> which with a suitable bucket size could use no more than
> about 8 byte per entry -- 4 for the object reference and
> 4 for the integer index that it maps to.

The code to support BTrees would be quite a burden.  Zope already has that,
so it wouldn't be a new burden there, except to get away from comparing keys
via __cmp__ we'd have to use IIBTrees, and those map 4-byte ints to 4-byte
ints (i.e., they wouldn't work right for this purpose on a 64-bit box --
although Yet Another Flavor of BTree could be compiled that would).

Judy tries look perfect for "this kind of thing" (fast, memory-efficient,
and would likely get significant compression benefit from that the high bits
of user-space addresses tend to be the same):

    http://sf.net/projects/judy/



From greg@cosc.canterbury.ac.nz  Thu Aug  8 01:57:33 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 08 Aug 2002 12:57:33 +1200 (NZST)
Subject: [Python-Dev] The memo of pickle
In-Reply-To: 
Message-ID: <200208080057.g780vXl14785@oma.cosc.canterbury.ac.nz>

Tim Peters :

> The code to support BTrees would be quite a burden.

It wouldn't be all that complicated, surely? And you'd
only need about half of it, because you only need to
be able to add keys, never delete them.

> http://sf.net/projects/judy/

Is there any information available about this
other than the 3 lines I managed to find amongst 
all the sourceforge crud?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From tim.one@comcast.net  Thu Aug  8 03:17:58 2002
From: tim.one@comcast.net (Tim Peters)
Date: Wed, 07 Aug 2002 22:17:58 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208080057.g780vXl14785@oma.cosc.canterbury.ac.nz>
Message-ID: 

>> The code to support BTrees would be quite a burden.

[Greg Ewing]
> It wouldn't be all that complicated, surely? And you'd
> only need about half of it, because you only need to
> be able to add keys, never delete them.

Have you done a production-quality, portable B-Tree implementation?  Could
you prevail in arguing that memory reduction is more important than speed
here?  Etc.  B-Trees entail a messy set of tradeoffs.


>> http://sf.net/projects/judy/

> Is there any information available about this
> other than the 3 lines I managed to find amongst
> all the sourceforge crud?

It was a short-lived topic on Python-Dev about two weeks ago.  Try, mmm,

    http://www.hp.com/go/judy/

for lots of info.



From tim_one@email.msn.com  Thu Aug  8 06:08:36 2002
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 8 Aug 2002 01:08:36 -0400
Subject: [Python-Dev] A different kind of heap
Message-ID: 

Just for fun, although someone may find it useful .

The new heapq module implements a classic min-heap, a binary tree where the
value at each node is <= the values of its children.

I mentioned weak heaps before (in the context of sorting), where that
condition is loosened to cover just the right child (and the root node of
the whole weak heap isn't allowed to have a left child).  This is more
efficient, but the code is substantially trickier.

A different kind of weakening is the so-called "pairing heap" (PH).  This is
more like the classic (strong) heap, except that it's a general tree, with
no constraint on how many children each node can have (0, 1, 2, ...,
thousands).  The parent value simply has to be <= the values of all its
children (if any).  Leaving aside storage efficiency, the code for this is
substantially simpler than even for classic heaps:  two PHs can be merged in
constant time, with a single compare (whichever PH has the larger root value
simply becomes another child of the other PH's root node).  The code below
uses a funky representation where a PH is just a list L, where L[0] is the
value associated with the PH's root node (which always has the smallest
value in the tree), and the rest of the list consists of 0 or more child PHs
(which are again lists of the same form).

All the usual heap operations build on this simple "pairing" operation,
called _link below.  Pushing an element x on the heap consists of viewing x
as the 0-child PH [x], and one link step completes merging it with the
existing PH.  Any collection of N values can thus be turned into a PH using
exactly N-1 compares.

A pop seems scary at first, since we may have one root node with N-1
children, and then it will take at least N-2 pairing steps to turn the
remaining forest of PHs back into a single PH.  Indeed, this happens if you
feed the numbers 1..N into an empty PH in order (each of 2 thru N becomes a
direct child of 1).  There are many ways the forest-merge step can done; the
code below implements a common way, with the remarkable property that,
despite the possibility for an O(N) pop step, the amortized cost for N pops
is worst-case O(log N).  In the "bad example" of inserting 1 thru N in
order, it actually turns out to be amortized constant time (it doesn't
matter how big N is, there's an independent (and small) constant c such that
the N pops take no more than c*N compares).  You have see that to believe
it, though .

PHs are an active area of current research.  They appear to have many
remarkable "adaptive" properties, but it seems difficult to prove or
disprove interesting general conjectures.  Playing around with the code and
a class that counts __cmp__ invocations, it's not hard to find cases of
partially ordered data where "pairing heap sort" does fewer compares than
our new mergesort.  OTOH, PHs do substantially worse than classic heaps on #
of compares when data is fed in randomly, and classic heaps in turn do
substantially worse on random data than our mergesort.

Still, if there are bursts of order in your data, and you can afford the
space, using a PH priority queue can be much faster than using a classic
heap.  Indeed, if you feed the numbers from 1 through N in *reverse* order,
then pop them off one at a time, it turns out that the PH queue doesn't need
to compare after any of the pops -- the N-1 compares at the start are the
whole banana.

Have fun!

def _link(x, y):
    if x[0] <= y[0]:
        x.append(y)
        return x
    else:
        y.append(x)
        return y

def _merge(x):
    n = len(x)
    if n == 1:
        return []
    pairs = [_link(x[i], x[i+1]) for i in xrange(1, n-1, 2)]
    if n & 1 == 0:
        pairs.append(x[-1])
    x = pairs[-1]
    for i in xrange(len(pairs)-2, -1, -1):
        x = _link(pairs[i], x)
    return x

class Heap(object):
    __slots__ = 'x'

    def __init__(self):
        self.x = []

    def __nonzero__(self):
        return bool(self.x)

    def push(self, value):
        if self.x:
            self.x = _link(self.x, [value])
        else:
            self.x.append(value)

    def pop(self):
        result = self.x[0]  # raises IndexError if empty
        self.x = _merge(self.x)
        return result



From martin@v.loewis.de  Thu Aug  8 07:48:20 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 08 Aug 2002 08:48:20 +0200
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208080017.g780HoD20812@pcp02138704pcs.reston01.va.comcast.net>
References: <200208072156.g77Lu7u14389@oma.cosc.canterbury.ac.nz>
 <200208080017.g780HoD20812@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

Guido van Rossum  writes:

> > But I think the idea of keeping the object references in a list is
> > well worth trying first. 4 bytes per object instead of 36 sounds like a
> > good improvement to me!
> 
> So maybe we need to create an identitydict...

In that case, the backwards compatibility problems are more serious.

Of course, we could chose the type of dictionary based on whether this
is a Pickler subclass or not (with some protocol to let the subclass
make us aware that we should use the identitydict, anyway).

This is, of course, a bigger change than I originally had in mind.

Regards,
Martin


From mal@lemburg.com  Thu Aug  8 08:36:20 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 08 Aug 2002 09:36:20 +0200
Subject: [Python-Dev] Pickling in XML format
References: <003701c23e63$e4698440$5066accf@othello>
Message-ID: <3D521F74.5000101@lemburg.com>

Raymond Hettinger wrote:
> Do you guys have any thoughts on the merits of adding dumpXML and loadXML methods to the pickle module?
> 
> The only disadvantage that comes to mind is that the file sizes are larger (though they may compress more efficiently.

FYI, there such a module in PyXML.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From sjoerd@acm.org  Thu Aug  8 08:44:58 2002
From: sjoerd@acm.org (Sjoerd Mullender)
Date: Thu, 08 Aug 2002 09:44:58 +0200
Subject: [Python-Dev] CVS fails to commit
In-Reply-To: 
References: 
Message-ID: <200208080744.g787iwB02714@indus.ins.cwi.nl>

Jack mentioned something on python-dev to this effect.  He had
problems with MacCVS that whenever he used it to update a file in the
Doc/lib directory the CVS *server* would crash.  You'll find it in the
archives:
.

On Wed, Aug 7 2002 "Tim Peters" wrote:

> Sjoerd, is Jack having some systematic problem with CVS?  A stale lock of
> his prevented checkins under Doc/ Saturday through Sunday afternoon too,
> which also required SourceForge intervention to clear out.
> 
> > -----Original Message-----
> > From: python-dev-admin@python.org [mailto:python-dev-admin@python.org]On
> > Behalf Of Sjoerd Mullender
> > Sent: Wednesday, August 07, 2002 7:35 AM
> > To: Steve Holden
> > Cc: Python-Dev
> > Subject: Re: [Python-Dev] CVS fails to commit
> >
> >
> > It looks like Jack's problems caused a lock file to be stuck there.  I
> > expect this affects a small part of the repository, and also that it
> > needs manual intervention to correct the problem.  So please submit a
> > service request to SourceForge to remove the lock file(s) in
> > /cvsroot/python/python/dist/src/Doc/lib (and all subdirectories).
> >
> > On Wed, Aug 7 2002 "Steve Holden" wrote:
> >
> > > I find that this morning I am still prevented from committing changes to
> > >
> > >     ~/pythoncvs/python/dist/src/Doc/lib/libposixpath.tex
> > >
> > > Is this a problem that's only affecting a small portion of the
> > repository,
> > > or is it more general? To repeat yesterday's notification, the
> > error message
> > > I'm seeing is:
> > >
> > >     cvs server: [04:14:04] waiting for jackjansen's lock in
> > > /cvsroot/python/python/dist/src/Doc/lib
> > >
> > > locked-out-ly y'rs  - steve
> > > -----------------------------------------------------------------------
> > > Steve Holden                                 http://www.holdenweb.com/
> > > Python Web Programming                http://pydish.holdenweb.com/pwp/
> > > -----------------------------------------------------------------------
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Python-Dev mailing list
> > > Python-Dev@python.org
> > > http://mail.python.org/mailman/listinfo/python-dev
> > >
> >
> > -- Sjoerd Mullender 
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev@python.org
> > http://mail.python.org/mailman/listinfo/python-dev
> 

-- Sjoerd Mullender 


From martin@v.loewis.de  Thu Aug  8 08:05:10 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 08 Aug 2002 09:05:10 +0200
Subject: [Python-Dev] Pickling in XML format
In-Reply-To: <200208080021.g780LZr20843@pcp02138704pcs.reston01.va.comcast.net>
References: <003701c23e63$e4698440$5066accf@othello>
 <200208080021.g780LZr20843@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

Guido van Rossum  writes:

> That doesn't belong in the pickle module.  

Also, it doesn't belong in the core (right now). PyXML has the
xml.marshal package, which has a "generic" XML marshaller, and one
that generates WDDX. There are a few users of WDDX, but nobody has
ever asked to provide marshalling for arbitrary Python objects.

Contributions to this package are welcome (sf.net/projects/pyxml); if
such a module has existed for a couple of PyXML releases, we can tell
whether there is enough demand for it to be in the standard library
(which I doubt).

Regards,
Martin


From loewis@informatik.hu-berlin.de  Thu Aug  8 09:28:01 2002
From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Thu, 8 Aug 2002 10:28:01 +0200
Subject: [Python-Dev] _sre as part of python.dll
Message-ID: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de>

What is the reason for _sre.pyd being a separate DLL? On Unix, it is
incorporated into the executable by default; regular expressions are
central for Python and cannot be omitted.

Would anybody object if I change the Windows build process so that it
stops having _sre as a separate target?

Regards,
Martin



From beazley@cs.uchicago.edu  Thu Aug  8 14:46:47 2002
From: beazley@cs.uchicago.edu (David Beazley)
Date: Thu, 8 Aug 2002 08:46:47 -0500 (CDT)
Subject: [Python-Dev] Operator overloading inconsistency (bug or feature?)
Message-ID: <15698.30279.468661.109094@gargoyle.cs.uchicago.edu>

Suppose that a new-style class wants to overload "*" and it
defines two methods like this:

class Foo(object):
    def __mul__(self,other):
        print "__mul__"
    def __rmul__(self,other):
        print "__rmul__"

Python-2.2.1, if you try this, you get the following behavior:
 
>>> f = Foo()
>>> f*1.0
__mul__
>>> 1.0*f
__rmul__
>>> f*1
__mul__
>>> 1*f
__mul__

So here is the question: Why does the last statement in this example
not invoke __rmul__?  In other words, why do "1.0*f" and "1*f" produce 
different behavior.  Is this intentional?  Is this documented someplace?
Is there a workaround?  Or are we just missing something obvious?

Cheers,

Dave








 


From guido@python.org  Thu Aug  8 15:27:38 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 08 Aug 2002 10:27:38 -0400
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: Your message of "Thu, 08 Aug 2002 10:28:01 +0200."
 <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de>
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de>
Message-ID: <200208081427.g78ERc414863@odiug.zope.com>

> What is the reason for _sre.pyd being a separate DLL? On Unix, it is
> incorporated into the executable by default; regular expressions are
> central for Python and cannot be omitted.
> 
> Would anybody object if I change the Windows build process so that it
> stops having _sre as a separate target?

Let me turn this around.  What advantage do you see to linking it
statically?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Thu Aug  8 15:54:55 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 08 Aug 2002 10:54:55 -0400
Subject: [Python-Dev] Operator overloading inconsistency (bug or feature?)
In-Reply-To: Your message of "Thu, 08 Aug 2002 08:46:47 CDT."
 <15698.30279.468661.109094@gargoyle.cs.uchicago.edu>
References: <15698.30279.468661.109094@gargoyle.cs.uchicago.edu>
Message-ID: <200208081454.g78Est115049@odiug.zope.com>

> Suppose that a new-style class wants to overload "*" and it
> defines two methods like this:
> 
> class Foo(object):
>     def __mul__(self,other):
>         print "__mul__"
>     def __rmul__(self,other):
>         print "__rmul__"
> 
> Python-2.2.1, if you try this, you get the following behavior:
>  
> >>> f = Foo()
> >>> f*1.0
> __mul__
> >>> 1.0*f
> __rmul__
> >>> f*1
> __mul__
> >>> 1*f
> __mul__
> 
> So here is the question: Why does the last statement in this example
> not invoke __rmul__?  In other words, why do "1.0*f" and "1*f" produce 
> different behavior.  Is this intentional?  Is this documented someplace?
> Is there a workaround?  Or are we just missing something obvious?

Aargh.  I *think* this may have to do with the hacks for sequence
repetition.  But I'm not sure.  A debug session tracing carefully
through the code is in order.

--Guido van Rossum (home page: http://www.python.org/~guido/)



From mmatus@dinha.acms.arizona.edu  Thu Aug  8 16:17:24 2002
From: mmatus@dinha.acms.arizona.edu (Marcelo Matus)
Date: Thu, 08 Aug 2002 08:17:24 -0700
Subject: [Swig-dev] Re: [Python-Dev] Operator overloading inconsistency
 (bug or feature?)
References: <15698.30279.468661.109094@gargoyle.cs.uchicago.edu> <200208081454.g78Est115049@odiug.zope.com>
Message-ID: <3D528B84.2000405@acms.arizona.edu>

Guido van Rossum wrote:

>>Suppose that a new-style class wants to overload "*" and it
>>defines two methods like this:
>>
>>class Foo(object):
>>    def __mul__(self,other):
>>        print "__mul__"
>>    def __rmul__(self,other):
>>        print "__rmul__"
>>
>>Python-2.2.1, if you try this, you get the following behavior:
>> 
>>    
>>
>>>>>f = Foo()
>>>>>f*1.0
>>>>>          
>>>>>
>>__mul__
>>    
>>
>>>>>1.0*f
>>>>>          
>>>>>
>>__rmul__
>>    
>>
>>>>>f*1
>>>>>          
>>>>>
>>__mul__
>>    
>>
>>>>>1*f
>>>>>          
>>>>>
>>__mul__
>>
>>So here is the question: Why does the last statement in this example
>>not invoke __rmul__?  In other words, why do "1.0*f" and "1*f" produce 
>>different behavior.  Is this intentional?  Is this documented someplace?
>>Is there a workaround?  Or are we just missing something obvious?
>>    
>>
>
>Aargh.  I *think* this may have to do with the hacks for sequence
>repetition.  But I'm not sure.  A debug session tracing carefully
>through the code is in order.
>
>--Guido van Rossum (home page: http://www.python.org/~guido/)
>
>  
>

I guess the problem arise from here:


intobject.c(340):
========================
static PyObject *
int_mul(PyObject *v, PyObject *w)
{
    long a, b;
    long longprod;            /* a*b in native long arithmetic */
    double doubled_longprod;    /* (double)longprod */
    double doubleprod;        /* (double)a * (double)b */

    if (!PyInt_Check(v) &&
        v->ob_type->tp_as_sequence &&
        v->ob_type->tp_as_sequence->sq_repeat) {
        /* sequence * int */
        a = PyInt_AsLong(w);
        return (*v->ob_type->tp_as_sequence->sq_repeat)(v, a);
    }
    if (!PyInt_Check(w) &&
        w->ob_type->tp_as_sequence &&
        w->ob_type->tp_as_sequence->sq_repeat) {
        /* int * sequence */
        a = PyInt_AsLong(v);
        return (*w->ob_type->tp_as_sequence->sq_repeat)(w, a);  
    }
    .............


==================

and the facts that:

1.-  there is only one 'sq_repeat' method, and not an addittional 
'sq_rrepeat' one,
      so,  n*x and x*n call the same method sq_repeat.

2.- in typeobect.c, sq_repeat is associated with __mul__

line 2775:
      SLOT1(slot_sq_repeat, "__mul__", int, "i")

line 3497:
    SQSLOT("__mul__", sq_repeat, slot_sq_repeat, wrap_intargfunc,
           "x.__mul__(n) <==> x*n"),

3.- the 'object' class by default enable the "tp_as_sequence" attribute,
triggering the call of sq_repeat in the case

       1*c

but not in

       1.0*c
  



Marcelo



From nas@python.ca  Thu Aug  8 16:46:32 2002
From: nas@python.ca (Neil Schemenauer)
Date: Thu, 8 Aug 2002 08:46:32 -0700
Subject: [Python-Dev] Operator overloading inconsistency (bug or feature?)
In-Reply-To: <200208081454.g78Est115049@odiug.zope.com>; from guido@python.org on Thu, Aug 08, 2002 at 10:54:55AM -0400
References: <15698.30279.468661.109094@gargoyle.cs.uchicago.edu> <200208081454.g78Est115049@odiug.zope.com>
Message-ID: <20020808084632.A7202@glacier.arctrix.com>

See python.org/sf/592646 for a possible fix.

  Neil


From loewis@informatik.hu-berlin.de  Thu Aug  8 18:16:59 2002
From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 08 Aug 2002 19:16:59 +0200
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: <200208081427.g78ERc414863@odiug.zope.com>
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de>
 <200208081427.g78ERc414863@odiug.zope.com>
Message-ID: 

Guido van Rossum  writes:

> Let me turn this around.  What advantage do you see to linking it
> statically?

The trigger was that it would have simplified the build for me: When
converting VC++6 projects to VC.NET, VC.NET forgets to convert the
/export: linker options, which means that you had to add them all
manually. Mark has fixed this problem differently, by removing the
need for /export:.

Integrating _sre (and _socket, select, winreg, mmap, perhaps others)
into python.dll still simplifies the build process: you don't have to
right-click that many subprojects to build them.

In addition, it should decrease startup time: Python won't need to
locate that many files anymore.

It also decreases the total size of the binary distribution slightly.

Regards,
Martin


From guido@python.org  Thu Aug  8 18:26:23 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 08 Aug 2002 13:26:23 -0400
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: Your message of "Thu, 08 Aug 2002 19:16:59 +0200."
 
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de> <200208081427.g78ERc414863@odiug.zope.com>
 
Message-ID: <200208081726.g78HQNe16854@odiug.zope.com>

> > Let me turn this around.  What advantage do you see to linking it
> > statically?
> 
> The trigger was that it would have simplified the build for me: When
> converting VC++6 projects to VC.NET, VC.NET forgets to convert the
> /export: linker options, which means that you had to add them all
> manually. Mark has fixed this problem differently, by removing the
> need for /export:.
> 
> Integrating _sre (and _socket, select, winreg, mmap, perhaps others)
> into python.dll still simplifies the build process: you don't have to
> right-click that many subprojects to build them.

I never have to do that; the dependencies in the project file make
sure that the extensions are all built when you build the 'python'
project.

> In addition, it should decrease startup time: Python won't need to
> locate that many files anymore.
> 
> It also decreases the total size of the binary distribution slightly.

Maybe _sre is used by most apps (though I doubt even that).  But
_socket, select, winreg, mmap and the others are definitely not.  On
Unix, all extensions are built as shared libraries, except the ones
that are needed by setup.py to be able to build extensions; it looks
like only posix, errno, _sre and symtable are built statically.

I'd say that making more extensions static on Windows would increase
start time of modules that don't use those extensions.

I'm -0 on doing this for _sre (I think it's a YAGNI); I'm -1 on doing
this for other extensions.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net  Thu Aug  8 18:33:15 2002
From: tim.one@comcast.net (Tim Peters)
Date: Thu, 08 Aug 2002 13:33:15 -0400
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: 
Message-ID: 

[Martin v. Lowis]
> ...
> Integrating _sre (and _socket, select, winreg, mmap, perhaps others)
> into python.dll still simplifies the build process: you don't have to
> right-click that many subprojects to build them.

If you're building via right-clicking, you're making life much harder than
necessary.  You can build from the command line, or do Build -> Batch
Build -> Build in the GUI.  The latter builds all projects in one gulp,
including both Release and Debug versions (well, it actually displays a list
of all possible project targets, and lets you select which to build in batch
mode; this selection is persistent, so you only need to do it once).



From loewis@informatik.hu-berlin.de  Thu Aug  8 18:40:41 2002
From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: 08 Aug 2002 19:40:41 +0200
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: <200208081726.g78HQNe16854@odiug.zope.com>
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de>
 <200208081427.g78ERc414863@odiug.zope.com>
 
 <200208081726.g78HQNe16854@odiug.zope.com>
Message-ID: 

Guido van Rossum  writes:

> I never have to do that; the dependencies in the project file make
> sure that the extensions are all built when you build the 'python'
> project.

Are you sure? If the python target is up-to-date (i.e. nothing has to
be done for python_d.exe), and I delete all generated _sre files
(i.e. sre_d.pyd, and the object files), and then ask VC++ 6 to build
the python target, nothing is done.

Indeed, I cannot find any place where it says that the python target
is related to _sre. I can only see dependencies with pythoncore.

Can you (or any other regular pcbuild.dsp user) please guess what I'm
doing wrong?

> Maybe _sre is used by most apps (though I doubt even that).  But
> _socket, select, winreg, mmap and the others are definitely not.  On
> Unix, all extensions are built as shared libraries, except the ones
> that are needed by setup.py to be able to build extensions; it looks
> like only posix, errno, _sre and symtable are built statically.

I do believe that is a mistake, as it will increase startup time of
applications that need them; applications that don't need them would
not be hurt if they were in the python binary.

> I'd say that making more extensions static on Windows would increase
> start time of modules that don't use those extensions.

I guess I have to measure these things.

Regards,
Martin


From guido@python.org  Thu Aug  8 18:49:55 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 08 Aug 2002 13:49:55 -0400
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: Your message of "Thu, 08 Aug 2002 19:40:41 +0200."
 
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de> <200208081427.g78ERc414863@odiug.zope.com>  <200208081726.g78HQNe16854@odiug.zope.com>
 
Message-ID: <200208081749.g78Hntg17022@odiug.zope.com>

> > I never have to do that; the dependencies in the project file make
> > sure that the extensions are all built when you build the 'python'
> > project.
> 
> Are you sure? If the python target is up-to-date (i.e. nothing has to
> be done for python_d.exe), and I delete all generated _sre files
> (i.e. sre_d.pyd, and the object files), and then ask VC++ 6 to build
> the python target, nothing is done.
> 
> Indeed, I cannot find any place where it says that the python target
> is related to _sre. I can only see dependencies with pythoncore.
> 
> Can you (or any other regular pcbuild.dsp user) please guess what I'm
> doing wrong?

I have no idea.  It's all magic for me.  But I never delete targets
manually.

> > Maybe _sre is used by most apps (though I doubt even that).  But
> > _socket, select, winreg, mmap and the others are definitely not.  On
> > Unix, all extensions are built as shared libraries, except the ones
> > that are needed by setup.py to be able to build extensions; it looks
> > like only posix, errno, _sre and symtable are built statically.
> 
> I do believe that is a mistake, as it will increase startup time of
> applications that need them; applications that don't need them would
> not be hurt if they were in the python binary.

But is the startup time of apps that use a lot of stuff the most
important thing?  I'd say that the startup time of apps that *don't*
use a lot of stuff is more important.  I'm not sure that making the
binary bigger doesn't slow it down.

> > I'd say that making more extensions static on Windows would increase
> > start time of modules that don't use those extensions.
> 
> I guess I have to measure these things.

Yes, please.  We switched to building almost all extensions as shared
libs when we switched away from Modules/Setup to setup.py.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Thu Aug  8 19:24:57 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 08 Aug 2002 14:24:57 -0400
Subject: [Swig-dev] Re: [Python-Dev] Operator overloading inconsistency (bug or feature?)
In-Reply-To: Your message of "Thu, 08 Aug 2002 08:17:24 PDT."
 <3D528B84.2000405@acms.arizona.edu>
References: <15698.30279.468661.109094@gargoyle.cs.uchicago.edu> <200208081454.g78Est115049@odiug.zope.com>
 <3D528B84.2000405@acms.arizona.edu>
Message-ID: <200208081824.g78IOvu18755@odiug.zope.com>

[me]
> >Aargh.  I *think* this may have to do with the hacks for sequence
> >repetition.  But I'm not sure.  A debug session tracing carefully
> >through the code is in order.

[Marcelo Matus]
> I guess the problem arise from here:
[...]

Good sleuthing, Marcelo!  Neil S. came up with a fix that I believe is
correct, and I recommend that we'll check that in for 2.3 as well as
on the 2.2 maintenance branch.  Thanks again, Neil!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim@zope.com  Thu Aug  8 20:56:55 2002
From: tim@zope.com (Tim Peters)
Date: Thu, 8 Aug 2002 15:56:55 -0400
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: 
Message-ID: 

[Guido]
> I never have to do that; the dependencies in the project file make
> sure that the extensions are all built when you build the 'python'
> project.

[MvL]
> Are you sure? If the python target is up-to-date (i.e. nothing has to
> be done for python_d.exe), and I delete all generated _sre files
> (i.e. sre_d.pyd, and the object files), and then ask VC++ 6 to build
> the python target, nothing is done.

Right, every project other than pythoncore and w9xpopen depends on the
pythoncore project, but that's all.  Guido doesn't normally change any code
in any other subprojects, so he doesn't notice this viscerally.  If you want
to be completely safe at all times, do Build -> Batch Build.  One step and
easy.  It won't recompile more than needed, although if the Python DLL
changes, it will pee away a little time relinking things against the new
core .lib file.



From python-dev@zesty.ca  Thu Aug  8 21:15:51 2002
From: python-dev@zesty.ca (Ka-Ping Yee)
Date: Thu, 8 Aug 2002 13:15:51 -0700 (PDT)
Subject: [Python-Dev] Re: docstrings, help(), and __name__
In-Reply-To: <200208071550.g77FoP003782@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

On Wed, 7 Aug 2002, Guido van Rossum wrote:
> > It seems I'm breaking some protocol. It's easy enough to add a '__name__'
> > attribute to my function objects, but I'd like to be sure that I'm adding
> > everything I really /should/ add. Just how much like a regular Python
> > function does my function have to be in order to make the help system (and
> > other standard systems with such expectations) happy?
>
> It's hard to say.  The pydoc code makes up protocols as it goes.  I
> think __name__ is probably the only one you're missing in practice.

pydoc does not "make up protocols as it goes".  It does its best to
utilize the protocols exposed by the Python core.  The attribute
protocols on Python built-in objects vary from type to type, and
pydoc tries to accommodate them.  Part of the purpose of pydoc and
inspect was to document and provide a more uniform interface to some
of these protocols.

All the built-in objects that are declared with a name have a __name__
attribute, so you'll want to provide that.  Beyond that, it depends
on the type of object you want to emulate; the various protocols are
documented in the 'inspect' module.  For example, see

    pydoc inspect.isfunction

for details on function objects.


-- ?!ng



From python-dev@zesty.ca  Thu Aug  8 21:24:56 2002
From: python-dev@zesty.ca (Ka-Ping Yee)
Date: Thu, 8 Aug 2002 13:24:56 -0700 (PDT)
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: <200208051911.g75JBGJ00739@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

On Mon, 5 Aug 2002, Guido van Rossum wrote:
> At least this still holds (unless x is an iterator or otherwise
> mutated by access :-):
>
>   for v in x:
>      assert v in x

And -- wham! -- the dangers of an invisibly destructive "in" operator
land in front of us once again like an enormous stomping foot falling
out of the sky.

Still not convinced?

Oh, well.  There exists a solution, in case you're curious:
http://mail.python.org/pipermail/python-dev/2002-July/026899.html


-- ?!ng



From guido@python.org  Thu Aug  8 21:36:52 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 08 Aug 2002 16:36:52 -0400
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Thu, 08 Aug 2002 13:24:56 PDT."
 
References: 
Message-ID: <200208082036.g78Kaqa24469@odiug.zope.com>

> Still not convinced?

No.  Your goal, making for-in "safe to use" is not important to me.

Read http://mail.python.org/pipermail/python-dev/2002-August/027450.html

--Guido van Rossum (home page: http://www.python.org/~guido/)


From python-dev@zesty.ca  Thu Aug  8 21:37:54 2002
From: python-dev@zesty.ca (Ka-Ping Yee)
Date: Thu, 8 Aug 2002 13:37:54 -0700 (PDT)
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: <200208061709.g76H9FD19499@odiug.zope.com>
Message-ID: 

On Tue, 6 Aug 2002, Guido van Rossum wrote:
> I think we've argued about '' in 'abc' long enough.  Tim has failed to
> convince me, so '' in 'abc' returns True.  Barry has checked it all
> in.

I would like to urge putting the brakes on this one and proceeding more
cautiously.  (I've been away for the past couple of days and missed the
discussion on this issue.)

My personal opinion sides with Tim -- i think an exception is definitely
the right choice.  (I still haven't seen convincing examples where True
is a more useful result than an exception, and the fact that there is
doubt suggests that it is an exceptional case.)

But regardless of that opinion, we should recognize that causing
'' in 'abc' to stop raising an exception is a big change -- a more
gentle introduction, with at least some sort of warning, would be better.
Silent errors are bad.


-- ?!ng



From martin@v.loewis.de  Thu Aug  8 21:42:31 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 08 Aug 2002 22:42:31 +0200
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: <200208081749.g78Hntg17022@odiug.zope.com>
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de>
 <200208081427.g78ERc414863@odiug.zope.com>
 
 <200208081726.g78HQNe16854@odiug.zope.com>
 
 <200208081749.g78Hntg17022@odiug.zope.com>
Message-ID: 

Guido van Rossum  writes:

> But is the startup time of apps that use a lot of stuff the most
> important thing?  I'd say that the startup time of apps that *don't*
> use a lot of stuff is more important.  I'm not sure that making the
> binary bigger doesn't slow it down.

I'm pretty sure that it doesn't. On Unix, the system performs a
copy-on-write mmap of the executable. No disk access is done until
page faults trigger a disk read. I believe Windows uses a similar
mechanism. The size of the executable is irrelevant (if you have no
relocations); only the part of the executable that is used matters.

On the other hand, on my Linux installation, importing a module costs
35 system calls if the module is not found, and no PYTHONPATH is set;
every directory in PYTHONPATH adds four additional system calls.

> Yes, please.  We switched to building almost all extensions as shared
> libs when we switched away from Modules/Setup to setup.py.

For modules that require configuration, this was a good thing - now
setup.py will autoconfigure them. For modules that require no
additional libraries, I hope that this decision will be reverted some
day.

Regards,
Martin



From guido@python.org  Thu Aug  8 21:49:12 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 08 Aug 2002 16:49:12 -0400
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: Your message of "Thu, 08 Aug 2002 22:42:31 +0200."
 
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de> <200208081427.g78ERc414863@odiug.zope.com>  <200208081726.g78HQNe16854@odiug.zope.com>  <200208081749.g78Hntg17022@odiug.zope.com>
 
Message-ID: <200208082049.g78KnDk24562@odiug.zope.com>

> > But is the startup time of apps that use a lot of stuff the most
> > important thing?  I'd say that the startup time of apps that *don't*
> > use a lot of stuff is more important.  I'm not sure that making the
> > binary bigger doesn't slow it down.
> 
> I'm pretty sure that it doesn't. On Unix, the system performs a
> copy-on-write mmap of the executable. No disk access is done until
> page faults trigger a disk read. I believe Windows uses a similar
> mechanism. The size of the executable is irrelevant (if you have no
> relocations); only the part of the executable that is used matters.
> 
> On the other hand, on my Linux installation, importing a module costs
> 35 system calls if the module is not found, and no PYTHONPATH is set;
> every directory in PYTHONPATH adds four additional system calls.
> 
> > Yes, please.  We switched to building almost all extensions as shared
> > libs when we switched away from Modules/Setup to setup.py.
> 
> For modules that require configuration, this was a good thing - now
> setup.py will autoconfigure them. For modules that require no
> additional libraries, I hope that this decision will be reverted some
> day.

If other people feel the same way, I won't stop progress here.  But I
find startup time a rather uninteresting detail, and everything else
being the same I would personally prefer to keep the status quo: not
because it's better, but because it's the status quo.  Why churn?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Jack.Jansen@oratrix.com  Thu Aug  8 21:56:54 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Thu, 8 Aug 2002 22:56:54 +0200
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: 
Message-ID: <5ECE36B0-AB11-11D6-9B51-003065517236@oratrix.com>

On donderdag, augustus 8, 2002, at 07:16 , Martin v. L=F6wis wrote:

> Guido van Rossum  writes:
>
>> Let me turn this around.  What advantage do you see to linking it
>> statically?
>
> The trigger was that it would have simplified the build for me: When
> converting VC++6 projects to VC.NET, VC.NET forgets to convert the
> /export: linker options, which means that you had to add them all
> manually. Mark has fixed this problem differently, by removing the
> need for /export:.
>
> Integrating _sre (and _socket, select, winreg, mmap, perhaps others)
> into python.dll still simplifies the build process: you don't have to
> right-click that many subprojects to build them.
>
> In addition, it should decrease startup time: Python won't need to
> locate that many files anymore.
>
> It also decreases the total size of the binary distribution slightly.

Note that I went exactly the other way for MacPython over the=20
last year. It used to be so that all "common" modules were=20
included in PythonCore.slb, and I used separate project build=20
files only for Mac-only modules and one or two special cases=20
(Tk, expat).

I bit the bullet half a year ago and made PythonCore.slb lean=20
and mean, but still used my own private project build file=20
generator for all extension projects.

I bit the bullet again (actually, I bit one of the two remaining=20
half-bullets, I've kept the Mac-specific modules as they are)=20
last month, and MacPython now uses the main setup.py for a large=20
collection of the cross-platform extension modules. This turned=20
out to be only one or two evenings of work.

This has immediately resulted in a decrease in my workload:=20
whereas previously whenever someone decided to add the kaboozle=20
module I had to add project files for this, etc etc etc, all=20
that is now often taken care of by distutils and setup.py.
--
- Jack Jansen               =20
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution --=20
Emma Goldman -



From python-dev@zesty.ca  Thu Aug  8 22:04:51 2002
From: python-dev@zesty.ca (Ka-Ping Yee)
Date: Thu, 8 Aug 2002 14:04:51 -0700 (PDT)
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: <200208061459.g76ExS109232@odiug.zope.com>
Message-ID: 

On Tue, 6 Aug 2002, Guido van Rossum wrote:
> > Perhaps it makes sense to allow "'thon' in 'python'" to return True,
> > but still have "[1,2] in [0,1,2,3]" return False if we loosen the
> > steadfast requirement that strings and lists be as much alike as
> > possible.
>
> That was never a requirement.  Strings and lists are merely similar
> insofar as they have very similar needs for a slicing and subscripting
> notation, and to a lesser extent for concatenation, repetition and
> comparison.

Perhaps what Skip meant was that strings and lists are both like
sequences.  At the moment, the meaning of "in" has two general
definitions: one for sequence-like objects and one for mapping-like
objects.  The former is something along the lines of "e is in s if
there exists an i such that s[i] == e".

The question from a teaching perspective is: "Are strings a kind of
sequence?"


-- ?!ng



From mal@lemburg.com  Thu Aug  8 22:05:57 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 08 Aug 2002 23:05:57 +0200
Subject: [Python-Dev] _sre as part of python.dll
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de>	<200208081427.g78ERc414863@odiug.zope.com>		<200208081726.g78HQNe16854@odiug.zope.com>		<200208081749.g78Hntg17022@odiug.zope.com> 
Message-ID: <3D52DD35.5020306@lemburg.com>

Martin v. Loewis wrote:
> On the other hand, on my Linux installation, importing a module costs
> 35 system calls if the module is not found, and no PYTHONPATH is set;
> every directory in PYTHONPATH adds four additional system calls.

Why not address this problem instead ?

Note that mxCGIPython can help a lot here: it freeze most of the
Python std lib into the executable making imports go really fast
(and that's needed if you're doing a lot of CGI scripting).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From nas@python.ca  Thu Aug  8 22:51:29 2002
From: nas@python.ca (Neil Schemenauer)
Date: Thu, 8 Aug 2002 14:51:29 -0700
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: <200208082049.g78KnDk24562@odiug.zope.com>; from guido@python.org on Thu, Aug 08, 2002 at 04:49:12PM -0400
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de> <200208081427.g78ERc414863@odiug.zope.com>  <200208081726.g78HQNe16854@odiug.zope.com>  <200208081749.g78Hntg17022@odiug.zope.com>  <200208082049.g78KnDk24562@odiug.zope.com>
Message-ID: <20020808145129.A8635@glacier.arctrix.com>

Guido van Rossum wrote:
> If other people feel the same way, I won't stop progress here.  But I
> find startup time a rather uninteresting detail,

A lot of people care about startup time.  I would like to see a few more
modules included statically.

  Neil


From tim@zope.com  Thu Aug  8 22:51:05 2002
From: tim@zope.com (Tim Peters)
Date: Thu, 8 Aug 2002 17:51:05 -0400
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: <20020808145129.A8635@glacier.arctrix.com>
Message-ID: 

[Neil Schemenauer]
> A lot of people care about startup time.  I would like to see a few more
> modules included statically.

If the real goal is to reduce startup time, we should analyze where startup
time is being spent; random thrashing "in that general direction" won't
satisfy in the end.



From martin@v.loewis.de  Thu Aug  8 23:10:25 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 09 Aug 2002 00:10:25 +0200
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: <3D52DD35.5020306@lemburg.com>
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de>
 <200208081427.g78ERc414863@odiug.zope.com>
 
 <200208081726.g78HQNe16854@odiug.zope.com>
 
 <200208081749.g78Hntg17022@odiug.zope.com>
 
 <3D52DD35.5020306@lemburg.com>
Message-ID: 

"M.-A. Lemburg"  writes:

> > On the other hand, on my Linux installation, importing a module costs
> > 35 system calls if the module is not found, and no PYTHONPATH is set;
> > every directory in PYTHONPATH adds four additional system calls.
> 
> Why not address this problem instead ?

I'm trying to: every module incorporated in config.c won't be searched
in PYTHONPATH.

> Note that mxCGIPython can help a lot here: it freeze most of the
> Python std lib into the executable making imports go really fast
> (and that's needed if you're doing a lot of CGI scripting).

Indeed, freezing also helps - but is probably only suitable for
special-purpose applications. I think people would be surprised if
they are told that editing the source of a library module won't have
any effect.

Regards,
Martin



From martin@v.loewis.de  Thu Aug  8 23:16:42 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 09 Aug 2002 00:16:42 +0200
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: <5ECE36B0-AB11-11D6-9B51-003065517236@oratrix.com>
References: <5ECE36B0-AB11-11D6-9B51-003065517236@oratrix.com>
Message-ID: 

Jack Jansen  writes:

> This has immediately resulted in a decrease in my workload: whereas
> previously whenever someone decided to add the kaboozle module I had
> to add project files for this, etc etc etc, all that is now often
> taken care of by distutils and setup.py.

Reducing the workload is a good thing, and so is sharing of build
processes across many systems; I'm not proposing to give that up.

At the moment, I'm really asking about Windows only; I'll ask about
adding things back into Setup.dist when I can show what advantages
that has. That does not mean that those things would be removed from
setup.py - that is smart enough to build only things that haven't
already been build.

Regards,
Martin



From dave@boost-consulting.com  Thu Aug  8 23:58:36 2002
From: dave@boost-consulting.com (David Abrahams)
Date: Thu, 8 Aug 2002 18:58:36 -0400
Subject: [Python-Dev] Re: docstrings, help(), and __name__
References: 
Message-ID: <120201c23f2f$23258730$62f0fc0c@boostconsulting.com>

From: "Ka-Ping Yee" 

> The attribute
> protocols on Python built-in objects vary from type to type, and
> pydoc tries to accommodate them.  Part of the purpose of pydoc and
> inspect was to document and provide a more uniform interface to some
> of these protocols.
>
> All the built-in objects that are declared with a name have a __name__
> attribute, so you'll want to provide that.  Beyond that, it depends
> on the type of object you want to emulate; the various protocols are
> documented in the 'inspect' module.  For example, see
>
>     pydoc inspect.isfunction

Do you mean help(inspect.isfunction), or am I clueless about the
environment which accepts the above command?

> for details on function objects.

It appears that ismethod is the one that's relevant to me, since the doc
system gets my functions through my descriptor, which is wrapping them with
PyMethod_New.

So far I'm getting away with not adding an im_class attribute to my
function objects, but it does result in that odd "__init__ = __init__"
output (unless I've misdiagnosed). My function objects will certainly never
have func_code, as help(inspect.isfunction) implies they should, and I'm a
little reluctant to load up functions with a lot more attributes just so
they can be like Python's functions... though I'm certain the penalty would
be lost in the noise.

The main question is this: which attributes do I absolutely /need/ in order
to avoid raising an exception or giving nonsensical output from help()?

Thanks again,
Dave





From python@rcn.com  Fri Aug  9 00:29:56 2002
From: python@rcn.com (Raymond Hettinger)
Date: Thu, 8 Aug 2002 19:29:56 -0400
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
References: 
Message-ID: <006e01c23f33$8245f520$b6b53bd0@othello>

GvR:
> > I think we've argued about '' in 'abc' long enough.  Tim has failed to
> > convince me, so '' in 'abc' returns True.  Barry has checked it all
> > in.

Ka-Ping:
> My personal opinion sides with Tim -- i think an exception is definitely
> the right choice.  (I still haven't seen convincing examples where True
> is a more useful result than an exception, and the fact that there is
> doubt suggests that it is an exceptional case.)

I think Barry and GvR are on the right track.

My gut feeling is that it is best to stay with the mathematical view that
the null set is a subset of every other set.  It doesn't seem to have hurt the
world of regular expressions where re.match('', 'abc') returns a match
object.  Likewise, the truth of "abc" ~ "" is not on the wart list for AWK.  
Excel and Lotus have both return non-zero for FIND("","abc").

Though errors should not pass silently, we are talking about an error
that is possibly very far upstream from the membership check:

   potentialsub = complicatedfunction(*manyvars) #semantic error here
   
   if potentialsub in astring:  # why raise an exception way down here
      handle_inclusion()
   else:
      handle_exclusion()

'in' should not be responsible for suggesting that complicatedfunction()
doesn't know what it is doing.  If there is an error, it isn't the membership
check; rather, it is a semantic problem with the function.  Accordingly, 
the postcondition for the function belongs at the tail of the function and 
not as a precondition for the use of the result.  Otherwise, the exception 
and its cause are too far apart (as in the example above).


Raymond Hettinger



From barry@python.org  Fri Aug  9 00:36:13 2002
From: barry@python.org (Barry A. Warsaw)
Date: Thu, 8 Aug 2002 19:36:13 -0400
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
References: 
 <006e01c23f33$8245f520$b6b53bd0@othello>
Message-ID: <15699.109.213049.369509@anthem.wooz.org>

>>>>> "RH" == Raymond Hettinger  writes:

    RH> I think Barry and GvR are on the right track.

Heh, I actually agree with Tim that it should raise an exception, but
I can also see the value in the other point of view.  This is one of
those things that we'd just have to learn to live with whichever
behavior is chosen, and Guido's made up his mind.

-Barry


From python-dev@zesty.ca  Fri Aug  9 01:22:57 2002
From: python-dev@zesty.ca (Ka-Ping Yee)
Date: Thu, 8 Aug 2002 17:22:57 -0700 (PDT)
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: <15699.109.213049.369509@anthem.wooz.org>
Message-ID: 

On Thu, 8 Aug 2002, Barry A. Warsaw wrote:
>     RH> I think Barry and GvR are on the right track.
>
> Heh, I actually agree with Tim that it should raise an exception, but
> I can also see the value in the other point of view.  This is one of
> those things that we'd just have to learn to live with whichever
> behavior is chosen, and Guido's made up his mind.

That's fine.  But what i'm trying to say is there's a migration issue:
this decision is a significant change from current behaviour, and it
worries me that we would let this change pass silently without any
grace period.


-- ?!ng



From tim.one@comcast.net  Fri Aug  9 01:25:42 2002
From: tim.one@comcast.net (Tim Peters)
Date: Thu, 08 Aug 2002 20:25:42 -0400
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: <15699.109.213049.369509@anthem.wooz.org>
Message-ID: 

[Barry]
> Heh, I actually agree with Tim that it should raise an exception, but
> I can also see the value in the other point of view.

Hah!  As I've long suspected, your mind is easily clouded.  I see no value
in any point of view, and that's why the universe will be all mine.

> This is one of those things that we'd just have to learn to live with
> whichever behavior is chosen, and Guido's made up his mind.

That too.  I didn't figure the world would actually end if Guido decided not
to promulgate conflicting definitions of what substring meant, and, so far,
I don't think it has.  And it's important to me that the world not end,
since, as demonstrated earlier, someday it will be all mine.

bill-gates-only-thinks-it's-his-ly y'rs  - tim



From mmatus@dinha.acms.arizona.edu  Fri Aug  9 01:40:53 2002
From: mmatus@dinha.acms.arizona.edu (Marcelo Matus)
Date: Thu, 08 Aug 2002 17:40:53 -0700
Subject: [Swig-dev] Re: [Python-Dev] Operator overloading inconsistency
 (bug or feature?)
References: <15698.30279.468661.109094@gargoyle.cs.uchicago.edu> <200208081454.g78Est115049@odiug.zope.com> <20020808084632.A7202@glacier.arctrix.com>
Message-ID: <3D530F95.6080906@acms.arizona.edu>

Yes, it wokrs here, thanks very much for your promptly answer

Marcelo

Neil Schemenauer wrote:

>See python.org/sf/592646 for a possible fix.
>
>  Neil
>_______________________________________________
>Swig-dev mailing list  -  Swig-dev@cs.uchicago.edu
>http://mailman.cs.uchicago.edu/mailman/listinfo/swig-dev
>  
>





From guido@python.org  Fri Aug  9 01:50:24 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 08 Aug 2002 20:50:24 -0400
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Thu, 08 Aug 2002 17:22:57 PDT."
 
References: 
Message-ID: <200208090050.g790oOq00436@pcp02138704pcs.reston01.va.comcast.net>

> That's fine.  But what i'm trying to say is there's a migration issue:
> this decision is a significant change from current behaviour, and it
> worries me that we would let this change pass silently without any
> grace period.

And others have said the exact same thing already.

But there is no backwards compatibility issue.  Correct programs
currently never ask for '' in 'abc' because that's guaranteed to raise
a TypeError.  Backwards compatibility guarantees have always had to
use the qualification "except for programs that rely on XYZ raising an
exception" so you can't argue that reasonable code could expect the
TypeError either.

The only issue is whether certain programming mistakes come to light a
little later now that '' in 'abc' no longer raises TypeError.  I'm
willing to accept that in order to make teaching the feature easier:
s1 in s2 means the same thing as s2.find(s1)>=0.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry@python.org  Fri Aug  9 01:54:51 2002
From: barry@python.org (Barry A. Warsaw)
Date: Thu, 8 Aug 2002 20:54:51 -0400
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
References: <15699.109.213049.369509@anthem.wooz.org>
 
Message-ID: <15699.4827.122951.659337@anthem.wooz.org>

>>>>> "KY" == Ka-Ping Yee  writes:

    KY> That's fine.  But what i'm trying to say is there's a
    KY> migration issue: this decision is a significant change from
    KY> current behaviour, and it worries me that we would let this
    KY> change pass silently without any grace period.

from __future__ import str_in_str

?
-Barry


From barry@python.org  Fri Aug  9 01:58:40 2002
From: barry@python.org (Barry A. Warsaw)
Date: Thu, 8 Aug 2002 20:58:40 -0400
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
References: <15699.109.213049.369509@anthem.wooz.org>
 
Message-ID: <15699.5056.825327.392953@anthem.wooz.org>

>>>>> "TP" == Tim Peters  writes:

    >> Heh, I actually agree with Tim that it should raise an
    >> exception, but I can also see the value in the other point of
    >> view.

    TP> Hah!  As I've long suspected, your mind is easily clouded.  I
    TP> see no value in any point of view, and that's why the universe
    TP> will be all mine.

Yes, but nothing will be all mine, and since nothing is in everything,
all your strings are belong to us.

    >> This is one of those things that we'd just have to learn to
    >> live with whichever behavior is chosen, and Guido's made up his
    >> mind.

    TP> That too.  I didn't figure the world would actually end if
    TP> Guido decided not to promulgate conflicting definitions of
    TP> what substring meant, and, so far, I don't think it has.  And
    TP> it's important to me that the world not end, since, as
    TP> demonstrated earlier, someday it will be all mine.

    TP> bill-gates-only-thinks-it's-his-ly y'rs - tim

If you had visited the alternative universe where Guido decided to
raise the exception, you would have noticed that the universe did
indeed end.  But it's too late now.  Of course, /they/ think our
universe ended too, so it all comes out in the wash.

go-eat-ly y'rs,
-Barry


From greg@cosc.canterbury.ac.nz  Fri Aug  9 02:42:10 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 09 Aug 2002 13:42:10 +1200 (NZST)
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: 
Message-ID: <200208090142.g791gA025976@oma.cosc.canterbury.ac.nz>

Ka-Ping Yee :

> The question from a teaching perspective is: "Are strings a kind of
> sequence?"

Seems to me they're a kind of... er... um... string.
They're like nothing else on Earth, really.

They do seem to me more like sequences than mappings,
however, if we really have to pick one.

The question is, do we have to pick one, or should
we just regard them as a third kind with its own
special rules?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From guido@python.org  Fri Aug  9 04:55:30 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 08 Aug 2002 23:55:30 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: Your message of "Thu, 08 Aug 2002 08:48:20 +0200."
 
References: <200208072156.g77Lu7u14389@oma.cosc.canterbury.ac.nz> <200208080017.g780HoD20812@pcp02138704pcs.reston01.va.comcast.net>
 
Message-ID: <200208090355.g793tVk05549@pcp02138704pcs.reston01.va.comcast.net>

Martin quoted a complaint about cPickle performance:

http://groups.google.de/groups?hl=en&lr=&ie=UTF-8&selm=mailman.1026940226.16076.python-list%40python.org

But if you read the full thread, it's clear that this complaint came
about because the author wasn't using binary pickle mode.  In binary
mode his times became acceptable.  I've run the test and I haven't
seen abnormal memory behavior -- the process grows to 26 Mb just to
create the test data, and then adds about 1 Mb during pickling.  The
loading almost doubles the process size, because another copy of the
test data is read (the test data isn't thrown away).

The slowdown of text-mode pickle is due to the extremely expensive way
of unpickling pickled strings in text-mode: it invokes eval() (well,
PyRun_String()) to parse the string literal!  (After checking that
there's really only a string literal there to prevent trojan horses.)

So I'm not sure that the memo size is worth pursuing.  I didn't look
at the other complaints referenced by Martin, but I bet they're more
of the same.

What might be worth looking into:

(1) Make binary pickling the default (in cPickle as well as pickle).
    This would definitely give the most bang for the buck.

(2) Replace the PyRun_String() call in cPickle with something faster.
    Maybe the algorithm from parsestr() from compile.c can be exposed;
    although the error reporting must be done differently then.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Aug  9 05:04:45 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 09 Aug 2002 00:04:45 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: Your message of "Thu, 08 Aug 2002 23:55:30 EDT."
Message-ID: <200208090404.g7944jL05589@pcp02138704pcs.reston01.va.comcast.net>

> The slowdown of text-mode pickle is due to the extremely expensive way
> of unpickling pickled strings in text-mode: it invokes eval() (well,
> PyRun_String()) to parse the string literal!  (After checking that
> there's really only a string literal there to prevent trojan horses.)

After re-reading the quoted thread, there was another phenomenon
remarked upon there: the slow text-mode pickle used less memory.  I
noticed this too when I ran the test program.  The explanation is that
the strings in the test program were "key0", "key1", ... "key24" and
"value0" ... "value24", over and over (each test dict has the same
keys and values).  Because these literals look like identifiers, they
are interned, so the unpickled data structure shares the string
references -- while the original test data has 10,000 copies of each
string!

If we really want this as a feature, a call to
PyString_InternFromString() could be made under certain conditions in
load_short_binstring() (e.g. when the length is at most 10 and
all_name_chars() from compile.c returns true).

I'm not sure that this is a desirable feature though.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net  Fri Aug  9 05:08:25 2002
From: tim.one@comcast.net (Tim Peters)
Date: Fri, 09 Aug 2002 00:08:25 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208090355.g793tVk05549@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

[Guido]
> ...
> (2) Replace the PyRun_String() call in cPickle with something faster.
>     Maybe the algorithm from parsestr() from compile.c can be exposed;
>     although the error reporting must be done differently then.

Note that Martin has had a patch pending for this since, umm, January:

    http://www.python.org/sf/505705

Maybe he should review it himself .


From tim.one@comcast.net  Fri Aug  9 05:13:15 2002
From: tim.one@comcast.net (Tim Peters)
Date: Fri, 09 Aug 2002 00:13:15 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208090404.g7944jL05589@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

[Guido]
> ...
> Because these literals look like identifiers, they are interned, so the
> unpickled data structure shares the string references -- while the
> original test data has 10,000 copies of each string!
>
> If we really want this as a feature, a call to
> PyString_InternFromString() could be made under certain conditions in
> load_short_binstring() (e.g. when the length is at most 10 and
> all_name_chars() from compile.c returns true).
>
> I'm not sure that this is a desirable feature though.

I hope Oren resumes his crusade to make interned strings follow the same
refcount rules as everything else, and then we wouldn't have this fear of
interning.  BTW, nobody yet has reported any code where "indirect interning"
pays -- or even triggers once in a non-eating-its-own-tail way.



From inyeol.lee@siimage.com  Fri Aug  9 07:08:35 2002
From: inyeol.lee@siimage.com (Inyeol Lee)
Date: Thu, 8 Aug 2002 23:08:35 -0700
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
Message-ID: <20020809060834.GC27637@siliconimage.com>

Guido> I think we've argued about '' in 'abc' long enough.  Tim has failed to
Guido> convince me, so '' in 'abc' returns True.  Barry has checked it all
Guido> in.

I'm Inyeol Lee, happy python user. I checked other string methods and
re functions.

1. most of them assume null character between normal characters and
   at the start/end of string;

   'abc'.count('') -> 4
   'abc'.endswith('') -> 1
   'abc'.find('') -> 0
   'abc'.index('') -> 0
   'abc'.rfind('') -> 3
   'abc'.rindex('') -> 3
   'abc'.startswith('') -> 1

   re.search('', 'abc').span() -> (0, 0)
   re.match('', 'abc').span() -> (0, 0)
   re.findall('', 'abc') -> ['', '', '', '']
   re.sub('', '_', 'abc') -> '_a_b_c_'
   re.subn('', '_', 'abc') -> ('_a_b_c_', 4)

2. some of them generate exception;

   '' in 'abc'
   'abc'.replace('', '_')
   'abc'.split('')

3. one of them ignores empty match;

   re.split('', 'abc') -> ['abc']

(couldn't test re.finditer but it seems to be the same as re.findall.)


Since '' in 'abc' now returns True, How about changing 'abc'.replace('')
to generate '_a_b_c_', too? It is consistent with re.sub()/subn() and the
cost for change is similar to '' in 'abc' case.

Inyeol Lee


From loewis@informatik.hu-berlin.de  Fri Aug  9 09:08:19 2002
From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: 09 Aug 2002 10:08:19 +0200
Subject: [Python-Dev] Patch 592529: Split-out ntmodule.c
Message-ID: 

I'm collecting opinions on whether the module nt should live in its
own source code file; it currently lives in posixmodule.c.

http://python.org/sf/592529 has a patch that implements that feature.
Tim is -0, Guido is +0.5, more votes are needed.

If you are familiar with the code, it would be good if you could
comment on the following questions:

- should os2module.c get its own source code file as well?

- are the #ifdefs in the resulting ntmodule.c still needed?
  I believe they are, as the various compilers appear to support
  different sets of functions in their C libraries. Of course,
  most of these could be eliminated if the C is avoided in favour
  of the Win32 API. Alternatively, can anybody with access to any
  of these compilers (BorlandC, Watcom, IBM) please comment on
  which functions provided by MSVC are missing in those compilers?

Regards,
Martin


From duncan@rcp.co.uk  Fri Aug  9 09:52:08 2002
From: duncan@rcp.co.uk (Duncan Booth)
Date: Fri, 9 Aug 2002 09:52:08 +0100
Subject: [Python-Dev] _sre as part of python.dll
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de> <200208081427.g78ERc414863@odiug.zope.com>  <200208081726.g78HQNe16854@odiug.zope.com>
Message-ID: <08520842188339@aluminium.rcp.co.uk>

On 08 Aug 2002, Guido van Rossum  wrote:

>> In addition, it should decrease startup time: Python won't need to
>> locate that many files anymore.
>> 
>> It also decreases the total size of the binary distribution slightly.
> 
> Maybe _sre is used by most apps (though I doubt even that).  But
> _socket, select, winreg, mmap and the others are definitely not.  On
> Unix, all extensions are built as shared libraries, except the ones
> that are needed by setup.py to be able to build extensions; it looks
> like only posix, errno, _sre and symtable are built statically.
> 
> I'd say that making more extensions static on Windows would increase
> start time of modules that don't use those extensions.

_sre is used by any application that imports 'os'. That (IMHO) is almost 
every non-trivial Python program.

Of course, we shouldn't be guessing about startup times. Someone should 
actually try building two versions and comparing them.

-- 
Duncan Booth                                             duncan@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?


From mal@lemburg.com  Fri Aug  9 10:35:19 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 09 Aug 2002 11:35:19 +0200
Subject: [Python-Dev] _sre as part of python.dll
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de>	<200208081427.g78ERc414863@odiug.zope.com>		<200208081726.g78HQNe16854@odiug.zope.com>		<200208081749.g78Hntg17022@odiug.zope.com>		<3D52DD35.5020306@lemburg.com> 
Message-ID: <3D538CD7.8000302@lemburg.com>

Martin v. Loewis wrote:
> "M.-A. Lemburg"  writes:
> 
> 
>>>On the other hand, on my Linux installation, importing a module costs
>>>35 system calls if the module is not found, and no PYTHONPATH is set;
>>>every directory in PYTHONPATH adds four additional system calls.
>>
>>Why not address this problem instead ?
> 
> 
> I'm trying to: every module incorporated in config.c won't be searched
> in PYTHONPATH.
> 
>>Note that mxCGIPython can help a lot here: it freeze most of the
>>Python std lib into the executable making imports go really fast
>>(and that's needed if you're doing a lot of CGI scripting).
> 
> 
> Indeed, freezing also helps - but is probably only suitable for
> special-purpose applications. I think people would be surprised if
> they are told that editing the source of a library module won't have
> any effect.

They shouldn't edit those anyway :-) What ever happened to the
ZIP archive import that James C. Ahlstrom was working (I think it
was him) ?

If startup time for the std lib is considered a problem, then
people could be directed to a ZIP archive incorporating the
complete pure Python std lib.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From sjoerd@acm.org  Fri Aug  9 12:16:51 2002
From: sjoerd@acm.org (Sjoerd Mullender)
Date: Fri, 09 Aug 2002 13:16:51 +0200
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: <08520842188339@aluminium.rcp.co.uk>
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de> <200208081427.g78ERc414863@odiug.zope.com>  <200208081726.g78HQNe16854@odiug.zope.com>
 <08520842188339@aluminium.rcp.co.uk>
Message-ID: <200208091116.g79BGpu17519@indus.ins.cwi.nl>

On Fri, Aug 9 2002 Duncan Booth wrote:

> On 08 Aug 2002, Guido van Rossum  wrote:
> 
> >> In addition, it should decrease startup time: Python won't need to
> >> locate that many files anymore.
> >> 
> >> It also decreases the total size of the binary distribution slightly.
> > 
> > Maybe _sre is used by most apps (though I doubt even that).  But
> > _socket, select, winreg, mmap and the others are definitely not.  On
> > Unix, all extensions are built as shared libraries, except the ones
> > that are needed by setup.py to be able to build extensions; it looks
> > like only posix, errno, _sre and symtable are built statically.
> > 
> > I'd say that making more extensions static on Windows would increase
> > start time of modules that don't use those extensions.
> 
> _sre is used by any application that imports 'os'. That (IMHO) is almost 
every non-trivial Python program.

Not on my system it isn't!

It's true that _sre does get imported whenever I start Python, but
that is not because it gets imported by os.  There is an import of re
in posixpath (imported by os), but that is inside the function
expandvars which is not called during import.

In my case site.py imports distutils.util because Python decides it is
called from the build directory.

-- Sjoerd Mullender 


From gmcm@hypernet.com  Fri Aug  9 12:51:30 2002
From: gmcm@hypernet.com (Gordon McMillan)
Date: Fri, 9 Aug 2002 07:51:30 -0400
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: 
References: <15699.109.213049.369509@anthem.wooz.org>
Message-ID: <3D537482.571.52A3D6DF@localhost>

On 8 Aug 2002 at 17:22, Ka-Ping Yee wrote:

> That's fine.  But what i'm trying to say is there's
> a migration issue: this decision is a significant
> change from current behaviour, and it worries me
> that we would let this change pass silently without
> any grace period. 

But it's no more significant than 'ab' in 'abc' not
raising an exception. If you're relying on "x in str"
to validate the "char"ness of x, you're screwed
either way.

-- Gordon
http://www.mcmillan-inc.com/



From gmcm@hypernet.com  Fri Aug  9 12:51:30 2002
From: gmcm@hypernet.com (Gordon McMillan)
Date: Fri, 9 Aug 2002 07:51:30 -0400
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: 
References: <20020808145129.A8635@glacier.arctrix.com>
Message-ID: <3D537482.23186.52A3D72F@localhost>

On 8 Aug 2002 at 17:51, Tim Peters wrote:

> If the real goal is to reduce startup time, we
> should analyze where startup time is being spent;
> random thrashing "in that general direction" won't
> satisfy in the end. 

In the 1.5.2 timeframe, most *startup* time was
spent figuring out where to root sys.path (looking
for the sentinel, deciding if this is a developer
build, etc.). In crude experiments on my Linux
box, I got rid of a few hundred system calls
just by removing most of the intelligence from
the getpath stuff. 

Then there are the things you can do with import
(archives, careful crafting of sys.path), but that's
harder to do, especially in a way that will satisfy
most people / most scripts.

So the lowest hanging fruit, I think, is to find some
way of telling Python "don't be clever - just start
here", and have it fallback to current behavior in
the absence of that info.

-- Gordon
http://www.mcmillan-inc.com/



From jeremy@alum.mit.edu  Fri Aug  9 13:28:01 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Fri, 9 Aug 2002 08:28:01 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208090355.g793tVk05549@pcp02138704pcs.reston01.va.comcast.net>
References: <200208072156.g77Lu7u14389@oma.cosc.canterbury.ac.nz>
 <200208080017.g780HoD20812@pcp02138704pcs.reston01.va.comcast.net>
 
 <200208090355.g793tVk05549@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <15699.46417.102444.810752@slothrop.zope.com>

One of the things I mentioned on the c.l.py thread is the
Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS calls around every
fwrite() call.  I looked into it, using Penrose's test case, and found
that the locking alone added 25% overhead.  I expect the layer or two
of C function calls above fwrite() add overhead.  I also expect that
calling fwrite() repeatedly for very small strings is inefficient.

If I were to suggest a cPickle project, it would be an efficient
internal buffering scheme.

Jeremy




From guido@python.org  Fri Aug  9 15:03:40 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 09 Aug 2002 10:03:40 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: Your message of "Fri, 09 Aug 2002 08:28:01 EDT."
 <15699.46417.102444.810752@slothrop.zope.com>
References: <200208072156.g77Lu7u14389@oma.cosc.canterbury.ac.nz> <200208080017.g780HoD20812@pcp02138704pcs.reston01.va.comcast.net>  <200208090355.g793tVk05549@pcp02138704pcs.reston01.va.comcast.net>
 <15699.46417.102444.810752@slothrop.zope.com>
Message-ID: <200208091403.g79E3er06780@pcp02138704pcs.reston01.va.comcast.net>

> One of the things I mentioned on the c.l.py thread is the
> Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS calls around every
> fwrite() call.  I looked into it, using Penrose's test case, and found
> that the locking alone added 25% overhead.  I expect the layer or two
> of C function calls above fwrite() add overhead.  I also expect that
> calling fwrite() repeatedly for very small strings is inefficient.
> 
> If I were to suggest a cPickle project, it would be an efficient
> internal buffering scheme.

Who's got time?  It's fast enough for Zope. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Aug  9 15:08:40 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 09 Aug 2002 10:08:40 -0400
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: Your message of "Fri, 09 Aug 2002 11:35:19 +0200."
 <3D538CD7.8000302@lemburg.com>
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de> <200208081427.g78ERc414863@odiug.zope.com>  <200208081726.g78HQNe16854@odiug.zope.com>  <200208081749.g78Hntg17022@odiug.zope.com>  <3D52DD35.5020306@lemburg.com> 
 <3D538CD7.8000302@lemburg.com>
Message-ID: <200208091408.g79E8eg06822@pcp02138704pcs.reston01.va.comcast.net>

> What ever happened to the
> ZIP archive import that James C. Ahlstrom was working (I think it
> was him) ?

python.org/sf/492105 is open for review.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Aug  9 15:13:12 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 09 Aug 2002 10:13:12 -0400
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Thu, 08 Aug 2002 23:08:35 PDT."
 <20020809060834.GC27637@siliconimage.com>
References: <20020809060834.GC27637@siliconimage.com>
Message-ID: <200208091413.g79EDCx06840@pcp02138704pcs.reston01.va.comcast.net>

> Since '' in 'abc' now returns True, How about changing
> 'abc'.replace('') to generate '_a_b_c_', too? It is consistent with
> re.sub()/subn() and the cost for change is similar to '' in 'abc'
> case.

Do you have a use case?  Or are you just striving for consistency?  It
would be more consistent but I'm not sure what the point is.  I can
think of situations where '' in 'abc' would be needed, but not so for
'abc'.replace('', '_').

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Aug  9 15:14:06 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 09 Aug 2002 10:14:06 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: Your message of "Fri, 09 Aug 2002 00:13:15 EDT."
 
References: 
Message-ID: <200208091414.g79EE6p06855@pcp02138704pcs.reston01.va.comcast.net>

> I hope Oren resumes his crusade to make interned strings follow the
> same refcount rules as everything else, and then we wouldn't have
> this fear of interning.  BTW, nobody yet has reported any code where
> "indirect interning" pays -- or even triggers once in a
> non-eating-its-own-tail way.

Maybe we should just drop indirect interning then.  It can save 31
bits per string object, right?  How to collect those savings?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry@zope.com  Fri Aug  9 15:44:36 2002
From: barry@zope.com (Barry A. Warsaw)
Date: Fri, 9 Aug 2002 10:44:36 -0400
Subject: [Python-Dev] _sre as part of python.dll
References: <20020808145129.A8635@glacier.arctrix.com>
 <3D537482.23186.52A3D72F@localhost>
Message-ID: <15699.54612.583037.274416@anthem.wooz.org>

>>>>> "Gordo" == Gordon McMillan  writes:

    Gordo> In the 1.5.2 timeframe, most *startup* time was
    Gordo> spent figuring out where to root sys.path (looking
    Gordo> for the sentinel, deciding if this is a developer
    Gordo> build, etc.). In crude experiments on my Linux
    Gordo> box, I got rid of a few hundred system calls
    Gordo> just by removing most of the intelligence from
    Gordo> the getpath stuff. 

I remember doing some similar testing probably around the Python 2.0
timeframe and found a huge speed up by avoiding the import of site.py
for largely the same reasons (avoiding tons of stat calls).  It's not
always practical to avoid loading site.py, but if you can, you can get
a big startup win.

    Gordo> So the lowest hanging fruit, I think, is to find some
    Gordo> way of telling Python "don't be clever - just start
    Gordo> here", and have it fallback to current behavior in
    Gordo> the absence of that info.

That's what $PYTHONHOME is supposed to do.  It's been a while since I
dug around in getpath.c, but setting $PYTHONHOME should set prefix and
exec_prefix unconditionally, even in the build directory.

(The comments in the file are abit little misleading.  Step 1 could be
read as implying that $PYTHONHOME isn't consulted when looking for
build directory landmarks, but that's not the case: even for a build
dir search, $PYTHONHOME is trusted unconditionally.)

-Barry


From jmiller@stsci.edu  Fri Aug  9 15:46:57 2002
From: jmiller@stsci.edu (Todd Miller)
Date: Fri, 09 Aug 2002 10:46:57 -0400
Subject: [Python-Dev] C basetype mapping protocol difference between 2.2.1 and 2.3
Message-ID: <3D53D5E1.6050805@stsci.edu>

I am trying to accelerate Numarray by "dropping the bottom out" and 
re-writing the simplest, most used portions in a  C basetype.  Looking 
at a new NumArray instance in Python-2.2.1 under GDB, I see:

(gdb) p *self->ob_type->tp_as_mapping
$2 = {mp_length = 0x4006eb7c <_ndarray_length>, mp_subscript = 0x80669b8 
,
mp_ass_subscript = 0x80669e0 }

Looking at the same code compiled for Python-2.3, _ndarray "owns" all of 
the mapping protocol slots, which is what I really want to happen:    

(gdb) p *o->ob_type->tp_as_mapping
$1 = {mp_length = 0x400c1a68 <_ndarray_length>, mp_subscript = 
0x400c1a80 <_ndarray_subscript>, mp_ass_subscript = 0x400c1188 
<_ndarray_ass_subscript>}

Did anything change between Python-2.2.1 and Python-2.3 that would 
account for this?

Todd

-- 
Todd Miller 			jmiller@stsci.edu
STSCI / SSG




From ark@research.att.com  Fri Aug  9 15:52:48 2002
From: ark@research.att.com (Andrew Koenig)
Date: 09 Aug 2002 10:52:48 -0400
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: <200208091413.g79EDCx06840@pcp02138704pcs.reston01.va.comcast.net>
References: <20020809060834.GC27637@siliconimage.com>
 <200208091413.g79EDCx06840@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

Guido> Do you have a use case?  Or are you just striving for consistency?  It
Guido> would be more consistent but I'm not sure what the point is.  I can
Guido> think of situations where '' in 'abc' would be needed, but not so for
Guido> 'abc'.replace('', '_').

It's the first way that comes to mind of  s p r e a d i n g   o u t   the
characters in a string for use in, say, the title of a report.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From guido@python.org  Fri Aug  9 16:01:30 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 09 Aug 2002 11:01:30 -0400
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: Your message of "Fri, 09 Aug 2002 10:44:36 EDT."
 <15699.54612.583037.274416@anthem.wooz.org>
References: <20020808145129.A8635@glacier.arctrix.com> <3D537482.23186.52A3D72F@localhost>
 <15699.54612.583037.274416@anthem.wooz.org>
Message-ID: <200208091501.g79F1Ui07165@pcp02138704pcs.reston01.va.comcast.net>

> I remember doing some similar testing probably around the Python 2.0
> timeframe and found a huge speed up by avoiding the import of site.py
> for largely the same reasons (avoiding tons of stat calls).  It's not
> always practical to avoid loading site.py, but if you can, you can get
> a big startup win.

It's also easy: "python -S" avoids loading site.py.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From oren-py-d@hishome.net  Fri Aug  9 16:12:08 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Fri, 9 Aug 2002 11:12:08 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208091414.g79EE6p06855@pcp02138704pcs.reston01.va.comcast.net>
References:  <200208091414.g79EE6p06855@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020809151208.GA97637@hishome.net>

On Fri, Aug 09, 2002 at 10:14:06AM -0400, Guido van Rossum wrote:
> > I hope Oren resumes his crusade to make interned strings follow the
> > same refcount rules as everything else, and then we wouldn't have
> > this fear of interning.  BTW, nobody yet has reported any code where
> > "indirect interning" pays -- or even triggers once in a
> > non-eating-its-own-tail way.
> 
> Maybe we should just drop indirect interning then.  It can save 31
> bits per string object, right?  How to collect those savings?

I was just going back to that patch.  The current savings are 24 bits (so 
now you see why I considered making 'interned' a type - to get that bit 
in without paying for a whole byte :-).

Before the nitpickers point it out: yes, the average savings are likely to 
be less than 24 bits because of allocator overhead and nonuniform 
distribution of string lengths.

	Oren



From duncan@rcp.co.uk  Fri Aug  9 16:26:19 2002
From: duncan@rcp.co.uk (Duncan Booth)
Date: Fri, 9 Aug 2002 16:26:19 +0100
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
References: <20020809060834.GC27637@siliconimage.com> <200208091413.g79EDCx06840@pcp02138704pcs.reston01.va.comcast.net> 
Message-ID: <15261956291218@aluminium.rcp.co.uk>

On 09 Aug 2002, Andrew Koenig  wrote:

>> Do you have a use case?  Or are you just striving for consistency? 
>> It would be more consistent but I'm not sure what the point is.  I
>> can think of situations where '' in 'abc' would be needed, but not so
>> for 'abc'.replace('', '_'). 
> 
> It's the first way that comes to mind of  s p r e a d i n g   o u t  
> the characters in a string for use in, say, the title of a report.

The first way that comes to my mind is:

>>> ' '.join("spreading out")
's p r e a d i n g   o u t'

-- 
Duncan Booth                                             duncan@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?


From guido@python.org  Fri Aug  9 16:11:50 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 09 Aug 2002 11:11:50 -0400
Subject: [Python-Dev] C basetype mapping protocol difference between 2.2.1 and 2.3
In-Reply-To: Your message of "Fri, 09 Aug 2002 10:46:57 EDT."
 <3D53D5E1.6050805@stsci.edu>
References: <3D53D5E1.6050805@stsci.edu>
Message-ID: <200208091511.g79FBor07407@pcp02138704pcs.reston01.va.comcast.net>

> (gdb) p *self->ob_type->tp_as_mapping
> $2 = {mp_length = 0x4006eb7c <_ndarray_length>, mp_subscript = 0x80669b8 
> ,
> mp_ass_subscript = 0x80669e0 }
> 
> Looking at the same code compiled for Python-2.3, _ndarray "owns" all of 
> the mapping protocol slots, which is what I really want to happen:    
> 
> (gdb) p *o->ob_type->tp_as_mapping
> $1 = {mp_length = 0x400c1a68 <_ndarray_length>, mp_subscript = 
> 0x400c1a80 <_ndarray_subscript>, mp_ass_subscript = 0x400c1188 
> <_ndarray_ass_subscript>}
> 
> Did anything change between Python-2.2.1 and Python-2.3 that would 
> account for this?

Yes, I did several massive refactorings of a lot of very subtle code
in typeobject.c.  Note that this is only a performance improvement,
not a semantic change: slot_mp_subscript will look for and call the
__setitem__ descriptor in the type dict, which will be a Python
wrapper around _ndarray_subscript.  The new code notices that this is
so and leaves _ndarray_subscript in the slot.

I wish it was easy to backport this to 2.2.2, but it's not. :-(

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry@zope.com  Fri Aug  9 16:37:11 2002
From: barry@zope.com (Barry A. Warsaw)
Date: Fri, 9 Aug 2002 11:37:11 -0400
Subject: [Python-Dev] _sre as part of python.dll
References: <20020808145129.A8635@glacier.arctrix.com>
 <3D537482.23186.52A3D72F@localhost>
 <15699.54612.583037.274416@anthem.wooz.org>
 <200208091501.g79F1Ui07165@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <15699.57767.828316.107103@anthem.wooz.org>

>>>>> "GvR" == Guido van Rossum  writes:

    >> I remember doing some similar testing probably around the
    >> Python 2.0 timeframe and found a huge speed up by avoiding the
    >> import of site.py for largely the same reasons (avoiding tons
    >> of stat calls).  It's not always practical to avoid loading
    >> site.py, but if you can, you can get a big startup win.

    GvR> It's also easy: "python -S" avoids loading site.py.

Yes.  The one gotcha is that site-packages is put on sys.path via
site.py so using -S means you lose that directory.  You can, of
course, reinstall it explicitly by something like:

import sys
sitedir = os.path.join(sys.prefix, 'lib', 'python'+sys.version[:3],
                       'site-packages')
sys.path.append(sitedir)

-Barry


From guido@python.org  Fri Aug  9 16:39:30 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 09 Aug 2002 11:39:30 -0400
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Fri, 09 Aug 2002 16:26:19 BST."
 <15261956291218@aluminium.rcp.co.uk>
References: <20020809060834.GC27637@siliconimage.com> <200208091413.g79EDCx06840@pcp02138704pcs.reston01.va.comcast.net> 
 <15261956291218@aluminium.rcp.co.uk>
Message-ID: <200208091539.g79FdUE08828@pcp02138704pcs.reston01.va.comcast.net>

If someone really wants 'abc'.replace('', '-') to return '-a-b-c-',
please submit patches for both 8-bit and Unicode strings to
SourceForge and assign to me.  I looked into this and it's
non-trivial: the implementation used for 8-bit strings goes into an
infinite loop when the pattern is empty, and the Unicode
implementation tacks '----' onto the end.  Please supply doc and
unittest patches too.  At least re does the right thing already:

  >>> import re
  >>> re.sub('', '-', 'abc')
  '-a-b-c-'
  >>> 

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Aug  9 16:45:03 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 09 Aug 2002 11:45:03 -0400
Subject: [Python-Dev] C basetype mapping protocol difference between 2.2.1 and 2.3
In-Reply-To: Your message of "Fri, 09 Aug 2002 11:36:23 EDT."
 <3D53E177.3000502@stsci.edu>
References: <3D53D5E1.6050805@stsci.edu> <200208091511.g79FBor07407@pcp02138704pcs.reston01.va.comcast.net>
 <3D53E177.3000502@stsci.edu>
Message-ID: <200208091545.g79Fj3K09170@pcp02138704pcs.reston01.va.comcast.net>

> Doesn't the current wrapper narrow the acceptable definitions for 
> _ndarray_subscript?  The reason I noticed this is that my 2.2.1 code 
> raises an exception:
> 
>  >>> import numarray
>  >>> a=numarray.arange(10)
>  >>> a
> Traceback (most recent call last):
> File "", line 1, in ?
> File "/home/jmiller/lib/python2.2/site-packages/numarray/numarray.py", 
> line 622, in __repr__
> MAX_LINE_WIDTH, PRECISION, SUPPRESS_SMALL, ', ', 1)
> File "/home/jmiller/lib/python2.2/site-packages/numarray/arrayprint.py", 
> line 156, in array2string
> separator, array_output)
> File "/home/jmiller/lib/python2.2/site-packages/numarray/arrayprint.py", 
> line 112, in _array2string
> max_str_len = max(len(str(max_reduce(data))),
> File "/home/jmiller/lib/python2.2/site-packages/numarray/ufunc.py", line 
> 759, in reduce
> r = self.areduce(inarr, dim, outarr)
> File "/home/jmiller/lib/python2.2/site-packages/numarray/ufunc.py", line 
> 745, in areduce
> _outarr1 = self._cumulative("reduce", _inarr, _outarr0)
> File "/home/jmiller/lib/python2.2/site-packages/numarray/ufunc.py", line 
> 653, in _cumulative
> toutarr = self._reduce_out(inarr, outarr, outtype)
> File "/home/jmiller/lib/python2.2/site-packages/numarray/ufunc.py", line 
> 591, in _reduce_out
> toutarr = inarr[...,0].copy().astype(outtype)
> TypeError: an integer is required

I guess that means it's going through the *sequence* getitem, not the
*mapping* getitem.  Have you tried leaving the sequence getitem slot
NULL, and doing everything through your mapping getitem slot?  That
should work in 2.2.

--Guido van Rossum (home page: http://www.python.org/~guido/)



From dave@boost-consulting.com  Fri Aug  9 16:56:45 2002
From: dave@boost-consulting.com (David Abrahams)
Date: Fri, 9 Aug 2002 11:56:45 -0400
Subject: [Python-Dev] Exception-handling model
Message-ID: <027301c23fbd$b3c44b30$62f0fc0c@boostconsulting.com>

I have always been confused about Python's exception-handling model. I hope
someone can clear up a few questions:

http://www.python.org/dev/doc/devel/ref/exceptions.html#l2h-225 says:

    "When an exception is raised, an object (maybe None) is passed as the
exception's value; this object does not affect the selection of an
exception handler, but is passed to the selected exception handler as
additional information. For class exceptions, this object must be an
instance of the exception class being raised."

But unless I misunderstand the source, Luke, Python itself raises
exceptions all over the place with PyErr_SetString(), which uses a class as
the exception type and a string as the exception object. Other uses of
PyErr_SetObject() that I've found /never/ seem to use an instance of the
exception class as the exception object.

If I got that right, what's the meaning of the documentation I quoted?
What rules must one actually follow when raising an exception?

TIA,
Dave

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com





From guido@python.org  Fri Aug  9 17:32:39 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 09 Aug 2002 12:32:39 -0400
Subject: [Python-Dev] Exception-handling model
In-Reply-To: Your message of "Fri, 09 Aug 2002 11:56:45 EDT."
 <027301c23fbd$b3c44b30$62f0fc0c@boostconsulting.com>
References: <027301c23fbd$b3c44b30$62f0fc0c@boostconsulting.com>
Message-ID: <200208091632.g79GWdM11773@pcp02138704pcs.reston01.va.comcast.net>

> I have always been confused about Python's exception-handling model. I hope
> someone can clear up a few questions:
> 
> http://www.python.org/dev/doc/devel/ref/exceptions.html#l2h-225 says:
> 
>     "When an exception is raised, an object (maybe None) is passed as the
> exception's value; this object does not affect the selection of an
> exception handler, but is passed to the selected exception handler as
> additional information. For class exceptions, this object must be an
> instance of the exception class being raised."
> 
> But unless I misunderstand the source, Luke, Python itself raises
> exceptions all over the place with PyErr_SetString(), which uses a class as
> the exception type and a string as the exception object. Other uses of
> PyErr_SetObject() that I've found /never/ seem to use an instance of the
> exception class as the exception object.
> 
> If I got that right, what's the meaning of the documentation I quoted?
> What rules must one actually follow when raising an exception?

This may be a case where reading the source is actually confusing. :-)

When the exception type is a class and the exception value is not an
instance of that class, eventually the class is instantiated with the
value as argument (if the value is a tuple, it is used as an argument
list).

But there's an efficiency hack that tries to put off the class
instantiation as long as possible.  It is possible for C code to
"catch" the exception and clear it without the instantiation
happening, and then the instantiation costs are saved.  Because C code
rather frequently checks and clears exceptions, this can be a big win.
Thus, in C, if you the exception value using PyErr_Fetch(), you may
see a value that's not an instance of the class.  But if you catch it
in Python with an except clause, it will be instantiated before your
except clause is entered.  This is done by PyErr_NormalizeException();
its API docs provide a summary of what I just explained.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From dave@boost-consulting.com  Fri Aug  9 17:57:12 2002
From: dave@boost-consulting.com (David Abrahams)
Date: Fri, 9 Aug 2002 12:57:12 -0400
Subject: [Python-Dev] Exception-handling model
References: <027301c23fbd$b3c44b30$62f0fc0c@boostconsulting.com>  <200208091632.g79GWdM11773@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <030801c23fc6$4a9d1c00$62f0fc0c@boostconsulting.com>

Thanks, Guido and Jeremy! I thought something like that might be the case,
but then couldn't find the code which did the work... I never expected the
instantiation to be deferred in that way so I never looked further than the
code which actually does the raising.

It would probably be useful to put pointers to the PyErr_NormailzeException
behavior right at the top of the API docs for exception-handling, since
making sense out of basic facilities like PyErr_SetString() depends on it.

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com


From: "Guido van Rossum" 
> > I have always been confused about Python's exception-handling model. I
hope
> > someone can clear up a few questions:
> >
> > http://www.python.org/dev/doc/devel/ref/exceptions.html#l2h-225 says:
> >
> >     "When an exception is raised, an object (maybe None) is passed as
the
> > exception's value; this object does not affect the selection of an
> > exception handler, but is passed to the selected exception handler as
> > additional information. For class exceptions, this object must be an
> > instance of the exception class being raised."
> >
> > But unless I misunderstand the source, Luke, Python itself raises
> > exceptions all over the place with PyErr_SetString(), which uses a
class as
> > the exception type and a string as the exception object. Other uses of
> > PyErr_SetObject() that I've found /never/ seem to use an instance of
the
> > exception class as the exception object.
> >
> > If I got that right, what's the meaning of the documentation I quoted?
> > What rules must one actually follow when raising an exception?
>
> This may be a case where reading the source is actually confusing. :-)
>
> When the exception type is a class and the exception value is not an
> instance of that class, eventually the class is instantiated with the
> value as argument (if the value is a tuple, it is used as an argument
> list).
>
> But there's an efficiency hack that tries to put off the class
> instantiation as long as possible.  It is possible for C code to
> "catch" the exception and clear it without the instantiation
> happening, and then the instantiation costs are saved.  Because C code
> rather frequently checks and clears exceptions, this can be a big win.
> Thus, in C, if you the exception value using PyErr_Fetch(), you may
> see a value that's not an instance of the class.  But if you catch it
> in Python with an except clause, it will be instantiated before your
> except clause is entered.  This is done by PyErr_NormalizeException();
> its API docs provide a summary of what I just explained.
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)



From tim.one@comcast.net  Fri Aug  9 18:50:40 2002
From: tim.one@comcast.net (Tim Peters)
Date: Fri, 09 Aug 2002 13:50:40 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208091414.g79EE6p06855@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

[Guido]
> Maybe we should just drop indirect interning then.  It can save 31
> bits per string object, right?  How to collect those savings?

Make the flag a byte insted of a pointer and it will save 3 or 7 bytes
(depending on native pointer size) "on average".  Note, assuming a 32-bit
box:  since pymalloc 8-byte aligns, the smallest footprint a string object
can have now is 24 bytes, 20 of which are consumed by bookkeeping overheads
(type pointer, refcount, ob_size, ob_shash, ob_sinterned).  Strings through
length 3 fit in this size (one byte is needed for the trailing \0 we always
put in ob_sval[]).  Saving 3 bytes wouldn't actually change the memory
burden of the smallest string object, but would allow all strings of lengths
4, 5 and 6 to consume 8 fewer bytes than at present (assuming compilers are
happy not to pad between a char member and char[] member).  That's probably
a significant savings for many string-slinging apps (count the number of
words of lengths 4, 5 and 6 in this msg (even  benefits )).



From guido@python.org  Fri Aug  9 19:03:10 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 09 Aug 2002 14:03:10 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: Your message of "Fri, 09 Aug 2002 13:50:40 EDT."
 
References: 
Message-ID: <200208091803.g79I3AN15763@pcp02138704pcs.reston01.va.comcast.net>

> [Guido]
> > Maybe we should just drop indirect interning then.  It can save 31
> > bits per string object, right?  How to collect those savings?

[Tim]
> Make the flag a byte insted of a pointer and it will save 3 or 7
> bytes (depending on native pointer size) "on average".  Note,
> assuming a 32-bit box: since pymalloc 8-byte aligns, the smallest
> footprint a string object can have now is 24 bytes, 20 of which are
> consumed by bookkeeping overheads (type pointer, refcount, ob_size,
> ob_shash, ob_sinterned).  Strings through length 3 fit in this size
> (one byte is needed for the trailing \0 we always put in ob_sval[]).
> Saving 3 bytes wouldn't actually change the memory burden of the
> smallest string object, but would allow all strings of lengths 4, 5
> and 6 to consume 8 fewer bytes than at present (assuming compilers
> are happy not to pad between a char member and char[] member).
> That's probably a significant savings for many string-slinging apps
> (count the number of words of lengths 4, 5 and 6 in this msg (even
>  benefits )).

This means a change in the string object lay-out, which breaks binary
compatibility (the PyString_AS_STRING macro depends on this).

I don't mind biting this bullet, but it means we have to increment the
API version, and perhaps the warning about API version mismatches
should become an error if an extension with too an API version before
this change is detected.

Oren, how's that patch coming along? :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jmiller@stsci.edu  Fri Aug  9 17:13:34 2002
From: jmiller@stsci.edu (Todd Miller)
Date: Fri, 09 Aug 2002 12:13:34 -0400
Subject: [Python-Dev] C basetype mapping protocol difference between 2.2.1
 and 2.3
References: <3D53D5E1.6050805@stsci.edu> <200208091511.g79FBor07407@pcp02138704pcs.reston01.va.comcast.net>              <3D53E177.3000502@stsci.edu> <200208091545.g79Fj3K09170@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3D53EA2E.1010801@stsci.edu>

Guido van Rossum wrote:

>>Doesn't the current wrapper narrow the acceptable definitions for 
>>_ndarray_subscript?  The reason I noticed this is that my 2.2.1 code 
>>raises an exception:
>>
>> >>> import numarray
>> >>> a=numarray.arange(10)
>> >>> a
>>Traceback (most recent call last):
>>File "", line 1, in ?
>>File "/home/jmiller/lib/python2.2/site-packages/numarray/numarray.py", 
>>line 622, in __repr__
>>MAX_LINE_WIDTH, PRECISION, SUPPRESS_SMALL, ', ', 1)
>>File "/home/jmiller/lib/python2.2/site-packages/numarray/arrayprint.py", 
>>line 156, in array2string
>>separator, array_output)
>>File "/home/jmiller/lib/python2.2/site-packages/numarray/arrayprint.py", 
>>line 112, in _array2string
>>max_str_len = max(len(str(max_reduce(data))),
>>File "/home/jmiller/lib/python2.2/site-packages/numarray/ufunc.py", line 
>>759, in reduce
>>r = self.areduce(inarr, dim, outarr)
>>File "/home/jmiller/lib/python2.2/site-packages/numarray/ufunc.py", line 
>>745, in areduce
>>_outarr1 = self._cumulative("reduce", _inarr, _outarr0)
>>File "/home/jmiller/lib/python2.2/site-packages/numarray/ufunc.py", line 
>>653, in _cumulative
>>toutarr = self._reduce_out(inarr, outarr, outtype)
>>File "/home/jmiller/lib/python2.2/site-packages/numarray/ufunc.py", line 
>>591, in _reduce_out
>>toutarr = inarr[...,0].copy().astype(outtype)
>>TypeError: an integer is required
>>
>
>I guess that means it's going through the *sequence* getitem, not the
>
Yes.

>
>*mapping* getitem.  Have you tried leaving the sequence getitem slot
>NULL, and doing everything through your mapping getitem slot?  
>
No.

>That
>should work in 2.2.
>
It does now.

>
>
>--Guido van Rossum (home page: http://www.python.org/~guido/)
>
Thanks!
Todd

-- 
Todd Miller 			jmiller@stsci.edu
STSCI / SSG





From pobrien@orbtech.com  Fri Aug  9 21:20:47 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Fri, 9 Aug 2002 15:20:47 -0500
Subject: [Python-Dev] timsort for jython
In-Reply-To: 
Message-ID: 

[Tim Peters]
>
> OTOH, I do expect that once code relies on stability, we'll have about as
> much chance of taking that away as getting rid of list.append().

There you go again! Your flip comment has got me thinking about the "one
best idiom" for list appending. So I'll ask the question. Is there a reason
to want to get rid of list.append()? How does one decide between
list.append() and augmented assignment (+=), such as:

>>> l = []
>>> l.append('something')
>>> l.append('else')
>>> l += ['to']
>>> l += ['consider']
>>> l
['something', 'else', 'to', 'consider']
>>>

And wasn't someone documenting current idioms in light of recent Python
features? Did that ever get posted anywhere?

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------



From tim.one@comcast.net  Fri Aug  9 21:33:55 2002
From: tim.one@comcast.net (Tim Peters)
Date: Fri, 09 Aug 2002 16:33:55 -0400
Subject: [Python-Dev] timsort for jython
In-Reply-To: 
Message-ID: 

[Tim]
> OTOH, I do expect that once code relies on stability, we'll
> have about as much chance of taking that away as getting rid of
> list.append().

[Patrick K. O'Brien]
> There you go again! Your flip comment has got me thinking about the "one
> best idiom" for list appending.

It was a serious enough comment, judged against the universe of all comments
I make .  I expect that if a hypothetical 3x-faster non-stable sort
algorithm got discovered for 2.6, we wouldn't be able to call it list.sort()
then.  It's very hard to take any perceived goodness away, ever.

> So I'll ask the question. Is there a reason to want to get rid of
> list.append()?

I believe-- and sincerely hope --that I'm the only one who ever suggested
that.

> How does one decide between list.append() and augmented assignment (+=),
> such as:
>
> >>> l = []
> >>> l.append('something')
> >>> l.append('else')
> >>> l += ['to']
> >>> l += ['consider']
> >>> l
> ['something', 'else', 'to', 'consider']
> >>>

Clarity:  l.append() is obvious; I'd never append a single item via +=.
More, I'd probably do

    push = L.append

outside a loop and call push('something') inside the loop.  "+=" as a
synonym for list.extend() is only interesting if you're writing polymorphic
code that wants to exploit the ability to define __iadd__.  For sane people,
that's approximately never .

> And wasn't someone documenting current idioms in light of recent Python
> features? Did that ever get posted anywhere?

Rings a bell, but beats me.



From guido@python.org  Fri Aug  9 21:40:02 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 09 Aug 2002 16:40:02 -0400
Subject: [Python-Dev] PEP 282 Implementation
In-Reply-To: Your message of "Mon, 08 Jul 2002 02:16:32 BST."
 <00e001c2261d$19bfc320$652b6992@alpha>
References: <00e001c2261d$19bfc320$652b6992@alpha>
Message-ID: <200208092040.g79Ke3S31416@pcp02138704pcs.reston01.va.comcast.net>

A month (!) ago, Vinay Sajip wrote:

> I've uploaded my logging module, the proposed implementation for PEP 282,
> for committer review, to the SourceForge patch manager:
> 
> http://sourceforge.net/tracker/index.php?func=detail&aid=578494&group_id=5470&atid=305470
> 
> I've assigned it to Mark Hammond as (a) he had posted some comments
> to Trent Mick's original PEP posting, and (b) Barry Warsaw advised
> not assigning to PythonLabs people on account of their current
> workload.

Well, Mark was apparently too busy too.  I've assigned this to myself
and am making progress with the review.

> The file logging.py is (apart from some test scripts) all that's
> supposed to go into Python 2.3. The file logging-0.4.6.tar.gz
> contains the module, an updated version of the PEP (which I mailed
> to Barry Warsaw on 26th June), numerous test/example scripts, TeX
> documentation etc. You can also refer to
> 
> http://www.red-dove.com/python_logging.html
> 
> Here's hoping for a speedy review :-)

Here's some feedback.

In general the code looks good.  Only one style nits: I prefer
docstrings that have a one-line summary, then a blank line, and then a
longer description.

There's a lot of code there!  Should it perhaps be broken up into
different modules?  Perhaps it should become a logging *package* with
submodules that define the various filters and handlers.

Some detailed questions:

- Why does the FileHandler open the file with mode "a+" (and later
  with "w+")?  The "+" makes the file readable, but I see no reason to
  read it.  Am I missing?

- setRollover(): the explanation isn't 100% clear.  I *think* that you
  always write to "app.log", and when that's full, you rename it to
  app.log.1, and app.log.1 gets renamed to app.log.2, and so on, and
  then you start writing to a new app.log, right?

- class SocketHandler: why set yourself up for buffer overflow by
  using only 2 bytes for the packet size?  You can use the struct
  module to encode/decode this, BTW.  I also wonder what the
  application for this is, BTW.

  - method send(): in Python 2.2 and later, you can use the sendall()
    socket method which takes care of this loop for you.

- class DatagramHandler, method send(): I don't think UDP handles
  fragmented packets very well -- if you have to break the packet up,
  there's no guarantee that the receiver will see the parts in order
  (or even all of them).

- fileConfig(): Is there documentation for the configuration file?

That's it for now.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pobrien@orbtech.com  Fri Aug  9 21:45:27 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Fri, 9 Aug 2002 15:45:27 -0500
Subject: [Python-Dev] timsort for jython
In-Reply-To: 
Message-ID: 

[Tim Peters]
> 
> > So I'll ask the question. Is there a reason to want to get rid of
> > list.append()?
> 
> I believe-- and sincerely hope --that I'm the only one who ever suggested
> that.

Okay, good. I just needed a reality check. Thanks.

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/ 
Blog: http://www.orbtech.com/blog/pobrien/ 
Wiki: http://www.orbtech.com/wiki/PatrickOBrien 
-----------------------------------------------



From inyeol.lee@siimage.com  Fri Aug  9 21:51:54 2002
From: inyeol.lee@siimage.com (Inyeol Lee)
Date: Fri, 9 Aug 2002 13:51:54 -0700
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: <200208091413.g79EDCx06840@pcp02138704pcs.reston01.va.comcast.net>
References: <20020809060834.GC27637@siliconimage.com> <200208091413.g79EDCx06840@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020809205154.GD27637@siliconimage.com>

To underline strings for viewers like less.

>>> underlined = normal.replace('', '_\b')

This also can be done with re.sub(), but I think it is natural to use
string methods to handle non-RE strings.

This cannot be done with '_\b'.join(), since it doesn't prepend '_\b'.

- Inyeol Lee


On Fri, Aug 09, 2002 at 10:13:12AM -0400, Guido van Rossum wrote:
> > Since '' in 'abc' now returns True, How about changing
> > 'abc'.replace('') to generate '_a_b_c_', too? It is consistent with
> > re.sub()/subn() and the cost for change is similar to '' in 'abc'
> > case.
> 
> Do you have a use case?  Or are you just striving for consistency?  It
> would be more consistent but I'm not sure what the point is.  I can
> think of situations where '' in 'abc' would be needed, but not so for
> 'abc'.replace('', '_').
> 
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> 


From python-dev@zesty.ca  Fri Aug  9 22:31:30 2002
From: python-dev@zesty.ca (Ka-Ping Yee)
Date: Fri, 9 Aug 2002 14:31:30 -0700 (PDT)
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: <20020809205154.GD27637@siliconimage.com>
Message-ID: 

On Fri, 9 Aug 2002, Inyeol Lee wrote:
> To underline strings for viewers like less.
>
> >>> underlined = normal.replace('', '_\b')

That doesn't quite work, since it puts an extra underbar at the end.
But it can be done fairly easily without using replace():

    underlined = ''.join(['_\b' + c for c in normal])


-- ?!ng



From ark@research.att.com  Fri Aug  9 22:47:20 2002
From: ark@research.att.com (Andrew Koenig)
Date: 09 Aug 2002 17:47:20 -0400
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: 
References: 
Message-ID: 

Ping> On Fri, 9 Aug 2002, Inyeol Lee wrote:

>> To underline strings for viewers like less.

>> >>> underlined = normal.replace('', '_\b')

Ping> That doesn't quite work, since it puts an extra underbar at the end.
Ping> But it can be done fairly easily without using replace():

Ping>     underlined = ''.join(['_\b' + c for c in normal])

With a sufficiently rich family of functions, you can avoid any one of
them if you want to do so badly enough.  Even so, that doesn't make
proposed uses of that function illegitimate.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From inyeol.lee@siimage.com  Fri Aug  9 23:20:57 2002
From: inyeol.lee@siimage.com (Inyeol Lee)
Date: Fri, 9 Aug 2002 15:20:57 -0700
Subject: [Python-Dev] Re: string.find() again (was Re: timsort for jython)
In-Reply-To: 
References: <20020809205154.GD27637@siliconimage.com> 
Message-ID: <20020809222057.GA5554@siliconimage.com>

On Fri, Aug 09, 2002 at 02:31:30PM -0700, Ka-Ping Yee wrote:
> On Fri, 9 Aug 2002, Inyeol Lee wrote:
> > To underline strings for viewers like less.
> >
> > >>> underlined = normal.replace('', '_\b')
> 
> That doesn't quite work, since it puts an extra underbar at the end.

underlined = normal.replace('', '_\b', len(normal))

Hmm... my position is getting weaker...
When I first posted this, I just thought about consistency, not about
use cases. This underline samples are created in a hurry :-)

-- Inyeol Lee


From python@rcn.com  Sat Aug 10 00:28:48 2002
From: python@rcn.com (Raymond Hettinger)
Date: Fri, 9 Aug 2002 19:28:48 -0400
Subject: [Python-Dev] timsort for jython
References: 
Message-ID: <004001c23ffc$845a15c0$b9e97ad1@othello>

From: "Patrick K. O'Brien" 
> And wasn't someone documenting current idioms in light of recent Python
> features? Did that ever get posted anywhere?

Are you referring to the modernization and migration guide,
http://www.python.org/peps/pep-0290.html ?  It documents
transition procedures for new features but doesn't make
current idioms a central focus.


Raymond






From pobrien@orbtech.com  Sat Aug 10 00:49:41 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Fri, 9 Aug 2002 18:49:41 -0500
Subject: [Python-Dev] timsort for jython
In-Reply-To: <004001c23ffc$845a15c0$b9e97ad1@othello>
Message-ID: 

[Raymond Hettinger]
> 
> From: "Patrick K. O'Brien" 
> > And wasn't someone documenting current idioms in light of recent Python
> > features? Did that ever get posted anywhere?
> 
> Are you referring to the modernization and migration guide,
> http://www.python.org/peps/pep-0290.html ?  It documents
> transition procedures for new features but doesn't make
> current idioms a central focus.

Yep. That was it. I forgot that it became a PEP. Thanks for the link.

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/ 
Blog: http://www.orbtech.com/blog/pobrien/ 
Wiki: http://www.orbtech.com/wiki/PatrickOBrien 
-----------------------------------------------



From tim@zope.com  Sat Aug 10 07:56:10 2002
From: tim@zope.com (Tim Peters)
Date: Sat, 10 Aug 2002 02:56:10 -0400
Subject: [Python-Dev] RE: companies data for sorting comparisons
In-Reply-To: 
Message-ID: 

Update:  With the last batch of checkins, all sorts on Kevin's company
database are faster (a little to a killer lot) under 2.3a0 than under 2.2.1.

A reminder of what this looks like:

> A record looks like this after running his script to turn them
> into Python dicts:
>
>   {'Address': '395 Page Mill Road\nPalo Alto, CA 94306',
>    'Company': 'Agilent Technologies Inc.',
>    'Exchange': 'NYSE',
>    'NumberOfEmployees': '41,000',
>    'Phone': '(650) 752-5000',
>    'Profile': 'http://biz.yahoo.com/p/a/a.html',
>    'Symbol': 'A',
>    'Web': 'http://www.agilent.com'}
>
> It appears to me that the XML file is maintained by hand, in order
> of ticker symbol.  But people make mistakes when alphabetizing
> by hand, and there are 37 indices i such that
>
>     data[i]['Symbol'] > data[i+1]['Symbol']
>
> So it's "almost sorted" by that measure ...
> The proper order of Yahoo profile URLs is also strongly correlated
> with ticker symbol, while both the company name and web address
> look weakly correlated
> [and Address, NumberOfEmployess, and Phone are essentially
>  randomly ordered]

Here are the latest (and I expect the last) timings, in milliseconds per
sort, on the list of (key, index, record) tuples

    values = [(x.get(fieldname), i, x) for i, x in enumerate(data)]

[I wrote a little generator to simulate 2.3's enumerate() in 2.2.1]

There are 6635 companies in the database, but not all fields are present in
all records; .get() plugs in a key of None for those cases, and the index is
to prevent equal-key cases from falling into breaking the tie via expensive
dict comparison (each record x is a dict!):

Sorting on field 'Address'
    2.2.1:  41.57
    2.3a0:  40.96

Sorting on field 'Company'
    2.2.1:  40.14
    2.3a0:  29.79

Sorting on field 'Exchange'
    2.2.1:  53.83
    2.3a0:  24.79

Sorting on field 'NumberOfEmployees'
    2.2.1:  47.89
    2.3a0:  45.74

Sorting on field 'Phone'
    2.2.1:  48.09
    2.3a0:  47.15

Sorting on field 'Profile'
    2.2.1:  58.41
    2.3a0:   8.77

Sorting on field 'Symbol'
    2.2.1:  40.78
    2.3a0:   6.30

Sorting on field 'Web'
    2.2.1:  46.79
    2.3a0:  35.64

This may have been sorted more times by now than any other database on Earth
.



From oren-py-d@hishome.net  Sat Aug 10 11:15:25 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Sat, 10 Aug 2002 06:15:25 -0400
Subject: [Python-Dev] interning
In-Reply-To: <200208091803.g79I3AN15763@pcp02138704pcs.reston01.va.comcast.net>
References:  <200208091803.g79I3AN15763@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020810101525.GA28642@hishome.net>

On Fri, Aug 09, 2002 at 02:03:10PM -0400, Guido van Rossum wrote:
> This means a change in the string object lay-out, which breaks binary
> compatibility (the PyString_AS_STRING macro depends on this).

I think that making interned string mortal is important enough by
itself even without the size reduction.  If binary compatibility is
important enough it's possible to maintain it.

> I don't mind biting this bullet, but it means we have to increment the
> API version, and perhaps the warning about API version mismatches
> should become an error if an extension with too an API version before
> this change is detected.
> 
> Oren, how's that patch coming along? :-)

I've just submitted a new patch for http://python.org/sf/576101

It passes regrtest but causes test_gc to leak 20 objects. 13 from 
test_finalizer_newclass and 7 from test_del_newclass. These leaks
go away if test_saveall is skipped. I've tried earlier versions of 
this patch (which were ok at the time) and they now create this 
leak too.

Some change since the last time I worked on interning must have 
caused this. Either this change reveals a bug in my patch or my patch 
reveals a subtle bug in the GC.

I don't know why it interacts with GC logic because strings are 
non-gc objects. I've tried to untrack the interned dictionary because
it plays dirty tricks with refcounts but it doesn't change the 
symptom.

	Oren


From guido@python.org  Sat Aug 10 14:57:53 2002
From: guido@python.org (Guido van Rossum)
Date: Sat, 10 Aug 2002 09:57:53 -0400
Subject: [Python-Dev] interning
In-Reply-To: Your message of "Sat, 10 Aug 2002 06:15:25 EDT."
 <20020810101525.GA28642@hishome.net>
References:  <200208091803.g79I3AN15763@pcp02138704pcs.reston01.va.comcast.net>
 <20020810101525.GA28642@hishome.net>
Message-ID: <200208101357.g7ADvrm04098@pcp02138704pcs.reston01.va.comcast.net>

> I've just submitted a new patch for http://python.org/sf/576101

I'll review it when I've got time.

> It passes regrtest but causes test_gc to leak 20 objects. 13 from 
> test_finalizer_newclass and 7 from test_del_newclass. These leaks
> go away if test_saveall is skipped. I've tried earlier versions of 
> this patch (which were ok at the time) and they now create this 
> leak too.
> 
> Some change since the last time I worked on interning must have 
> caused this. Either this change reveals a bug in my patch or my patch 
> reveals a subtle bug in the GC.
> 
> I don't know why it interacts with GC logic because strings are 
> non-gc objects. I've tried to untrack the interned dictionary because
> it plays dirty tricks with refcounts but it doesn't change the 
> symptom.

I've seen this too!  But only when I run the full test suite, not when
I run test_gc in isolation.  I made a number of small changes to the GC
code, I'll have to roll them back one at a time to see which one
caused this -- and then look for a solution. :-(

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Sat Aug 10 17:17:17 2002
From: guido@python.org (Guido van Rossum)
Date: Sat, 10 Aug 2002 12:17:17 -0400
Subject: [Python-Dev] interning
In-Reply-To: Your message of "Sat, 10 Aug 2002 09:57:53 EDT."
 <200208101357.g7ADvrm04098@pcp02138704pcs.reston01.va.comcast.net>
References:  <200208091803.g79I3AN15763@pcp02138704pcs.reston01.va.comcast.net> <20020810101525.GA28642@hishome.net>
 <200208101357.g7ADvrm04098@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200208101617.g7AGHHY19927@pcp02138704pcs.reston01.va.comcast.net>

> > It passes regrtest but causes test_gc to leak 20 objects. 13 from 
> > test_finalizer_newclass and 7 from test_del_newclass. These leaks
> > go away if test_saveall is skipped. I've tried earlier versions of 
> > this patch (which were ok at the time) and they now create this 
> > leak too.
> > 
> > Some change since the last time I worked on interning must have 
> > caused this. Either this change reveals a bug in my patch or my patch 
> > reveals a subtle bug in the GC.
> > 
> > I don't know why it interacts with GC logic because strings are 
> > non-gc objects. I've tried to untrack the interned dictionary because
> > it plays dirty tricks with refcounts but it doesn't change the 
> > symptom.
> 
> I've seen this too!  But only when I run the full test suite, not when
> I run test_gc in isolation.  I made a number of small changes to the GC
> code, I'll have to roll them back one at a time to see which one
> caused this -- and then look for a solution. :-(

Duh.  This warning is only printed when regrtest.py is given the -l
option.  Oren's first paragraph quoted above is exactly right.

But none of the changes to C files made in the last month made any
difference...  The difference is test_gc.py itself!  With a checkout
from a month ago, if I change the classes in test_finalizer() and
test_del() to be new-style classes, I get the same warnings.

Maybe Tim understands the problem now?  (Summary: why do I get the
Warning below.)

$ ./python ../Lib/test/regrtest.py -l test_gc
test_gc
Warning: test created 20 uncollectable object(s).
1 test OK.
$ 

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tismer@tismer.com  Sat Aug 10 19:38:51 2002
From: tismer@tismer.com (Christian Tismer)
Date: Sat, 10 Aug 2002 20:38:51 +0200
Subject: [Python-Dev] _sre as part of python.dll
References: <200208080828.g788S1Qd004601@aramis.informatik.hu-berlin.de> <200208081427.g78ERc414863@odiug.zope.com>  <200208081726.g78HQNe16854@odiug.zope.com> <08520842188339@aluminium.rcp.co.uk>
Message-ID: <3D555DBB.5040204@tismer.com>

Duncan Booth wrote:
...

> _sre is used by any application that imports 'os'. That (IMHO) is almost 
> every non-trivial Python program.

Sure? Then try this in a Windows shell:

"""
D:\>\python22\python
hey this is sitepython
Python 2.2.1 (#34, Apr  9 2002, 19:34:33) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
 >>> import sys
 >>> for i in sys.modules: print i
...
stat
__future__
copy_reg
os
signal
site
__builtin__
UserDict
sys
sitecustomize
ntpath
__main__
exceptions
types
nt
os.path
 >>>
"""

As you can see, os is imported by the startup code, already.
(Which I didn't know!)
Furthermore, os didn't cause an import of _sre.

ciao - chris


-- 
Christian Tismer             :^)   
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/




From nas@python.ca  Sat Aug 10 19:47:42 2002
From: nas@python.ca (Neil Schemenauer)
Date: Sat, 10 Aug 2002 11:47:42 -0700
Subject: [Python-Dev] interning
In-Reply-To: <200208101617.g7AGHHY19927@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Sat, Aug 10, 2002 at 12:17:17PM -0400
References:  <200208091803.g79I3AN15763@pcp02138704pcs.reston01.va.comcast.net> <20020810101525.GA28642@hishome.net> <200208101357.g7ADvrm04098@pcp02138704pcs.reston01.va.comcast.net> <200208101617.g7AGHHY19927@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020810114742.A14447@glacier.arctrix.com>

Guido van Rossum wrote:
> $ ./python ../Lib/test/regrtest.py -l test_gc
> test_gc
> Warning: test created 20 uncollectable object(s).
> 1 test OK.

Something weird is going on.  This patch fixes test_finalizer_newclass: 

--- Lib/test/test_gc.py 9 Aug 2002 17:38:16 -0000       1.19
+++ Lib/test/test_gc.py 10 Aug 2002 18:33:47 -0000
@@ -147,6 +147,8 @@
     else:
         raise TestFailed, "didn't find obj in garbage (finalizer)"
     gc.garbage.remove(obj)
+    del A, B, obj
+    gc.collect() # finds 13 objects!


I guess there is a reference cycle there that wasn't there before.
Could it have something to do with tp_del?

  Neil


From tim.one@comcast.net  Sat Aug 10 22:22:36 2002
From: tim.one@comcast.net (Tim Peters)
Date: Sat, 10 Aug 2002 17:22:36 -0400
Subject: [Python-Dev] interning
In-Reply-To: <200208101617.g7AGHHY19927@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

[Guido]
> ...
> But none of the changes to C files made in the last month made any
> difference...  The difference is test_gc.py itself!  With a checkout
> from a month ago, if I change the classes in test_finalizer() and
> test_del() to be new-style classes, I get the same warnings.
>
> Maybe Tim understands the problem now?  (Summary: why do I get the
> Warning below.)
>
> $ ./python ../Lib/test/regrtest.py -l test_gc
> test_gc
> Warning: test created 20 uncollectable object(s).
> 1 test OK.
> $

I think this was shallow, and checked in a change (to test_saveall()) that I
believe fixes it.  Update and try again?



From martin@v.loewis.de  Sat Aug 10 22:26:56 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 10 Aug 2002 23:26:56 +0200
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: 
References: 
Message-ID: 

Tim Peters  writes:

> If you're building via right-clicking, you're making life much
> harder than necessary.  You can build from the command line, or do
> Build -> Batch Build -> Build in the GUI.

Thanks, that is very useful to know.

Regards,
Martin


From guido@python.org  Sun Aug 11 00:58:21 2002
From: guido@python.org (Guido van Rossum)
Date: Sat, 10 Aug 2002 19:58:21 -0400
Subject: [Python-Dev] interning
In-Reply-To: Your message of "Sat, 10 Aug 2002 11:47:42 PDT."
 <20020810114742.A14447@glacier.arctrix.com>
References:  <200208091803.g79I3AN15763@pcp02138704pcs.reston01.va.comcast.net> <20020810101525.GA28642@hishome.net> <200208101357.g7ADvrm04098@pcp02138704pcs.reston01.va.comcast.net> <200208101617.g7AGHHY19927@pcp02138704pcs.reston01.va.comcast.net>
 <20020810114742.A14447@glacier.arctrix.com>
Message-ID: <200208102358.g7ANwMu20902@pcp02138704pcs.reston01.va.comcast.net>

> > $ ./python ../Lib/test/regrtest.py -l test_gc
> > test_gc
> > Warning: test created 20 uncollectable object(s).
> > 1 test OK.

[Neil S]
> Something weird is going on.  This patch fixes test_finalizer_newclass: 
> 
> --- Lib/test/test_gc.py 9 Aug 2002 17:38:16 -0000       1.19
> +++ Lib/test/test_gc.py 10 Aug 2002 18:33:47 -0000
> @@ -147,6 +147,8 @@
>      else:
>          raise TestFailed, "didn't find obj in garbage (finalizer)"
>      gc.garbage.remove(obj)
> +    del A, B, obj
> +    gc.collect() # finds 13 objects!
> 
> 
> I guess there is a reference cycle there that wasn't there before.
> Could it have something to do with tp_del?

I don't think so -- a Python of a month old had the same warnings when
I added these tests that use new-style class.

It's much simpler than this: new-style classes have cyclical
references to themselves that must be collected.  It so happened that
the saveall test was fooled by these.  Tim checked in a fix that
prevents this.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From bsder@mail.allcaps.org  Sun Aug 11 03:49:26 2002
From: bsder@mail.allcaps.org (Andrew P. Lentvorski)
Date: Sat, 10 Aug 2002 19:49:26 -0700 (PDT)
Subject: [Python-Dev] Python rounding and/or rint
Message-ID: <20020810193401.B11140-100000@mail.allcaps.org>

Now that C9X is an official standard, can we either:

1) add rint() back into the math module (removed since 1.6.1?) or
2) update round() so that it complies with the default rounding mode

I bumped into a bug today because round() doesn't obey the same rounding
semantics as the FP operations do.

While there are lots of arguments about whether or not to add other C9X
functions, I'd like to try and avoid that tarpit.

The primary argument against rint was the lack of being able to write
portable code.  In this instance, the *lack* of rint (or its use in
round()) prevents writing portable code as I have no means to match the
rounding semantics of my FP ops from within Python.

-a



From tim.one@comcast.net  Sun Aug 11 04:23:51 2002
From: tim.one@comcast.net (Tim Peters)
Date: Sat, 10 Aug 2002 23:23:51 -0400
Subject: [Python-Dev] Python rounding and/or rint
In-Reply-To: <20020810193401.B11140-100000@mail.allcaps.org>
Message-ID: 

[Andrew P. Lentvorski]
> Now that C9X is an official standard,

I'm afraid that's irrelevant in practice before "almost all" platform C
packages conform to the new standard.

> can we either:
>
> 1) add rint() back into the math module (removed since 1.6.1?) or

This sounds counfused.  According to the CVS logs, rint() was briefly in the
codebase, first released in 1.6 beta 1 (rev 2.43 & 2.44 of mathmodule.c),
but retracted before 1.6 final was released (revs 2.45.2.1 and 2.53).

> 2) update round() so that it complies with the default rounding mode

round() has always forced "add a half and chop" (which can be done portably,
relying on C89's rounding-mode-independent floor() and ceil()); changing
that would be incompatible.

> I bumped into a bug today because round() doesn't obey the same
> rounding semantics as the FP operations do.

round() wasn't intended to.

> While there are lots of arguments about whether or not to add other C9X
> functions, I'd like to try and avoid that tarpit.

It's a non-starter before C9X triumphs, if ever.  If it does, there won't be
debate -- we'll gladly expose all the spiffy new numeric functions then.  It
would be great to have them!

> The primary argument against rint was the lack of being able to write
> portable code.  In this instance, the *lack* of rint (or its use in
> round()) prevents writing portable code as I have no means to match the
> rounding semantics of my FP ops from within Python.

Sorry, but neither does Python.  I suggest you write an extension module
with your favorite C9X gimmicks (this isn't hard), and offer it for use on
C9X platforms.  Eventually there may be enough of those that we could fold
the new functions into the core.  Before then, you may find more people in
the NumPy community willing and able to wrestle with reams of platform
#ifdefs for numeric gimmicks.



From mal@lemburg.com  Sun Aug 11 12:21:25 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sun, 11 Aug 2002 13:21:25 +0200
Subject: [Python-Dev] Python rounding and/or rint
References: 
Message-ID: <3D5648B5.3000905@lemburg.com>

Tim Peters wrote:
> [Andrew P. Lentvorski]
> 
>>The primary argument against rint was the lack of being able to write
>>portable code.  In this instance, the *lack* of rint (or its use in
>>round()) prevents writing portable code as I have no means to match the
>>rounding semantics of my FP ops from within Python.
> 
> Sorry, but neither does Python.  I suggest you write an extension module
> with your favorite C9X gimmicks (this isn't hard), and offer it for use on
> C9X platforms.  Eventually there may be enough of those that we could fold
> the new functions into the core.  Before then, you may find more people in
> the NumPy community willing and able to wrestle with reams of platform
> #ifdefs for numeric gimmicks.

Another approach would be to use GNU MP's cousin MPFR which has
a few well-defined rounding modes built in apart from various
other goodies which make dealing with floating point numbers
platform independent. Another interesting extension to GNU MP is
MPFI which implements interval arithmetics -- also nice to have
if you're deep into dealing with rounding errors.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From skip@manatee.mojam.com  Sun Aug 11 13:00:17 2002
From: skip@manatee.mojam.com (Skip Montanaro)
Date: Sun, 11 Aug 2002 07:00:17 -0500
Subject: [Python-Dev] Weekly Python Bug/Patch Summary
Message-ID: <200208111200.g7BC0HR2007119@manatee.mojam.com>

Bug/Patch Summary
-----------------

266 open / 2739 total bugs (-8)
118 open / 1644 total patches (-13)

New Bugs
--------

list(xrange(1e9))  -->  seg fault (2002-05-14)
	http://python.org/sf/556025
Sig11 in cPickle (stack overflow) (2002-07-01)
	http://python.org/sf/576084
os.tmpfile() can fail on win32 (2002-08-05)
	http://python.org/sf/591104
Mixin broken for new-style classes (2002-08-05)
	http://python.org/sf/591135
makesetup fails: long Setup.local lines (2002-08-05)
	http://python.org/sf/591287
httplib throws a TypeError when the target host disconnects (2002-08-05)
	http://python.org/sf/591349
Get rid of etype struct (2002-08-06)
	http://python.org/sf/591586
Hint for speeding up cPickle (2002-08-07)
	http://python.org/sf/592112
installation errors (2002-08-07)
	http://python.org/sf/592161
Webchecker error on http://www.naleo.org (2002-08-07)
	http://python.org/sf/592441
comments taken as values in ConfigParser (2002-08-08)
	http://python.org/sf/592527
Bug with deepcopy and new style objects (2002-08-08)
	http://python.org/sf/592567
HTTPS does not handle pipelined requests (2002-08-08)
	http://python.org/sf/592703
os.chmod is underdocumented :-) (2002-08-08)
	http://python.org/sf/592859
Can't assign to __name__ or __bases__ of new class (2002-08-09)
	http://python.org/sf/593154
u'%c' % large value: broken result (2002-08-10)
	http://python.org/sf/593581

New Patches
-----------

Fix "file:" URL to have right no. of /'s (2002-08-06)
	http://python.org/sf/591713
Split-out ntmodule.c (2002-08-08)
	http://python.org/sf/592529
socketmodule.[ch] downgrade (2002-08-09)
	http://python.org/sf/593069
bugfixes and cleanup for _strptime.py (2002-08-10)
	http://python.org/sf/593560
Static names (2002-08-11)
	http://python.org/sf/593627

Closed Bugs
-----------

Problems with Tcl/Tk and non-ASCII text entry (2000-10-31)
	http://python.org/sf/219960
Tutorial does not describe nested scope (2002-01-07)
	http://python.org/sf/500704
Get rid of make frameworkinstall (2002-01-18)
	http://python.org/sf/505423
Bgen should generate 7-bit-clean code (2002-06-08)
	http://python.org/sf/566302
tarball to untar into a single dir (2002-06-11)
	http://python.org/sf/567576
"python -u" not binary on cygwin (2002-06-17)
	http://python.org/sf/570044
Chained __slots__ dealloc segfault (2002-06-26)
	http://python.org/sf/574207
Tex Macro Error (2002-06-27)
	http://python.org/sf/574939
Parts of 2.2.1 core use old gc API (2002-06-30)
	http://python.org/sf/575715
os.path.walk behavior on symlinks (2002-07-03)
	http://python.org/sf/576975
LibRef 2.2.1, replace zero with False (2002-07-11)
	http://python.org/sf/579991
mimetools module privacy leak (2002-07-12)
	http://python.org/sf/580495
MacOSX python.app build problems (2002-07-12)
	http://python.org/sf/580550
''.split() docstring clarification (2002-07-15)
	http://python.org/sf/582071
no doc for os.fsync and os.fdatasync (2002-07-21)
	http://python.org/sf/584695
Two corrects for weakref docs (2002-07-25)
	http://python.org/sf/586583
references to email package (2002-07-26)
	http://python.org/sf/586937
ur'\u' not handled properly (2002-07-26)
	http://python.org/sf/587087
socket.py wrapper needs a class (2002-07-31)
	http://python.org/sf/589262
shared libpython & dependant libraries (2002-07-31)
	http://python.org/sf/589422
"".split() ignores maxsplit arg (2002-08-01)
	http://python.org/sf/589965
preconvert AppleSingle resource files (2002-08-02)
	http://python.org/sf/590456

Closed Patches
--------------

Removal of SET_LINENO (experimental) (2000-07-30)
	http://python.org/sf/401022
let mailbox.Maildir tag messages as read (2001-09-29)
	http://python.org/sf/466352
GETCONST/GETNAME/GETNAMEV speedup (2002-01-21)
	http://python.org/sf/506436
PEP 263 Implementation (2002-03-07)
	http://python.org/sf/526840
PEP 263 Implementation (2002-03-24)
	http://python.org/sf/534304
ae* modules: handle type inheritance (2002-04-02)
	http://python.org/sf/538395
Deprecate bsddb (2002-05-06)
	http://python.org/sf/553108
timeout socket implementation (2002-05-12)
	http://python.org/sf/555085
GetFInfo update (2002-06-11)
	http://python.org/sf/567296
Add param to email.Utils.decode() (2002-06-12)
	http://python.org/sf/568348
PyTRASHCAN slots deallocation (2002-06-28)
	http://python.org/sf/575073
Build MachoPython with 2level namespace (2002-07-10)
	http://python.org/sf/579841
xreadlines caching, file iterator (2002-07-11)
	http://python.org/sf/580331
Alternative PyTRASHCAN subtype_dealloc (2002-07-15)
	http://python.org/sf/581742
make file object an iterator (2002-07-17)
	http://python.org/sf/583235
yield allowed in try/finally (2002-07-21)
	http://python.org/sf/584626
Cygwin _hotshot patch (2002-07-30)
	http://python.org/sf/588561
os._execvpe security fix (2002-08-02)
	http://python.org/sf/590294


From magnus@hetland.org  Sun Aug 11 14:44:46 2002
From: magnus@hetland.org (Magnus Lie Hetland)
Date: Sun, 11 Aug 2002 15:44:46 +0200
Subject: [Python-Dev] Priority queue (binary heap) python code
In-Reply-To: <20020624213318.A5740@arizona.localdomain>; from kevin@koconnor.net on Mon, Jun 24, 2002 at 09:33:18PM -0400
References: <20020624213318.A5740@arizona.localdomain>
Message-ID: <20020811154446.A8103@idi.ntnu.no>

Kevin O'Connor :
>
> I often find myself needing priority queues in python, and I've finally
> broken down and written a simple implementation.
[snip]

I see that heapq is now in the libraries -- great!

Just one thought: If I want to use this library in an algorithm such
as, say, Dijkstra's single-source shortest path algorithms, I would
need an additional operation, the deacrease-key operation (as far as I
can see, the heapreplace only works with index 0 -- wy is that?)

E.g.:

def heapdecrease(heap, index, item):
    """
    Replace an item at a given index with a smaller one.

    May be used to implement the standard priority queue method
    heap-decrease-key, useful, for instance, in several graph
    algorithms.
    """
    assert item <= heap[index]
    heap[index] = item
    _siftdown(heap, 0, index)

Something might perhaps be useful to include in the library... Or,
perhaps, the _siftup and _siftdown methods don't have to be private?

In addition, I guess one would have to implement a sequence class that
maintained a map from values to heap indices to be able to use
heapdecrease in any useful way (otherwise, how would you know which
index to use?). That, however, I guess is not something that would be
'at home' in the heapq module. (Perhaps that is argument enough to
avoid including heapdecrease as well? Oh, well...)

-- 
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org


From tim.one@comcast.net  Sun Aug 11 20:07:43 2002
From: tim.one@comcast.net (Tim Peters)
Date: Sun, 11 Aug 2002 15:07:43 -0400
Subject: [Python-Dev] Priority queue (binary heap) python code
In-Reply-To: <20020811154446.A8103@idi.ntnu.no>
Message-ID: 

[Magnus Lie Hetland]
> I see that heapq is now in the libraries -- great!
>
> Just one thought: If I want to use this library in an algorithm such
> as, say, Dijkstra's single-source shortest path algorithms, I would
> need an additional operation, the deacrease-key operation

You'd need more than just that .

> (as far as I can see, the heapreplace only works with index 0 -- wy is
> that?)

heapreplace() is emphatically not a decrease-key operation.  It's equivalent
to a pop-min *followed by* a push, which combination is often called a
"hold" operation.  The value added may very well be larger than the value
popped, and, e.g., the example of an efficient N-Best queue in the test file
relies on this.  Hold is an extremely common operation in some kinds of
event simulations too, where it's also most common to push a value larger
than the one popped (e.g., when the queue is ordered by scheduled time, and
the item pushed is a follow-up event to the one getting popped).

> E.g.:
>
> def heapdecrease(heap, index, item):
>     """
>     Replace an item at a given index with a smaller one.
>
>     May be used to implement the standard priority queue method
>     heap-decrease-key, useful, for instance, in several graph
>     algorithms.
>     """
>     assert item <= heap[index]

That's the opposite of what heapreplace() usually sees.

>     heap[index] = item
>     _siftdown(heap, 0, index)
>
> Something might perhaps be useful to include in the library...

I really don't see how -- you generally have no way to know the correct
index short of O(N) search.  This representation of priority queue is
well-known to be sub-optimal for applications requiring frequent
decrease-key (Fibonacci heaps were desgined for it, though, and pairing
heaps are reported to run faster than Fibonacci heaps in practice despite
that one of the PH inventors eventually proved that decrease-key can destroy
PH's otherwise good worst-case behavior).

> Or, perhaps, the _siftup and _siftdown methods don't have to be
> private?

You really have to know what you're doing to use them correctly, and it's
dubious that _siftup calls _siftdown now (it's most convenient *given* all
the uses made of them right now, but, e.g., if a "delete at arbitrary index"
function were to be added, _siftdown and _siftup could stand to be
refactored -- exposing them would inhibit future improvements).

> In addition, I guess one would have to implement a sequence class that
> maintained a map from values to heap indices to be able to use
> heapdecrease in any useful way (otherwise, how would you know which
> index to use?).

Bingo.  All the internal heap manipulations would have to know about this
mapping too, in order to keep the indices up to date as it moved items up
and down in the queue.  If want you want is frequent decrease-key, you don't
want this implementation of priority queues at all.



From Jack.Jansen@oratrix.com  Sun Aug 11 22:33:59 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Sun, 11 Aug 2002 23:33:59 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
Message-ID: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com>

As of recently I'm getting deprecation warnings on lots of 
constructs of the form "0xff << 24", telling me that in Python 
2.4 this will return a long.

As these things are bitpatterns (they're all generated from .h 
files for system call interfaces and such) that the user will 
pass to methods that wrap underlying API calls I don't want them 
to be longs. How do I force them to remain ints?
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -



From mal@lemburg.com  Sun Aug 11 23:19:06 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 12 Aug 2002 00:19:06 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
References: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com>
Message-ID: <3D56E2DA.90400@lemburg.com>

Jack Jansen wrote:
> As of recently I'm getting deprecation warnings on lots of constructs of 
> the form "0xff << 24", telling me that in Python 2.4 this will return a 
> long.

Interesting. I wonder why the implementation warns about 0xff << 24...
0xff000000 fits nicely into a 32-bit integer. I don't see why the
"changing sign" is relevant here or even why it is mentioned in the
warning since the PEP doesn't say anything about it.

Changing these semantics would cause compatibility problems for
applications doing low-level bit manipulations or ones which use
the Python integer type to store unsigned integer values, e.g.
for use as bitmapped flags.

> As these things are bitpatterns (they're all generated from .h files for 
> system call interfaces and such) that the user will pass to methods that 
> wrap underlying API calls I don't want them to be longs. How do I force 
> them to remain ints?

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From greg@cosc.canterbury.ac.nz  Mon Aug 12 00:30:24 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 12 Aug 2002 11:30:24 +1200 (NZST)
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208090355.g793tVk05549@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200208112330.g7BNUOW08360@oma.cosc.canterbury.ac.nz>

Guido:

> (1) Make binary pickling the default (in cPickle as well as pickle).

That would break a lot of programs that use pickle
without opening the file in binary mode.

> (2) Replace the PyRun_String() call in cPickle with something faster.
>     Maybe the algorithm from parsestr() from compile.c can be
> exposed;

I like that idea -- it could be useful for other things,
too. I could use something like that in Pyrex, for example.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Mon Aug 12 01:18:27 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 12 Aug 2002 12:18:27 +1200 (NZST)
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208091414.g79EE6p06855@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200208120018.g7C0IRE09170@oma.cosc.canterbury.ac.nz>

Guido:

> Maybe we should just drop indirect interning then.  It can save 31
> bits per string object, right?  How to collect those savings?

Store the value of strings <= 3 chars long in there. :-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From guido@python.org  Mon Aug 12 01:25:27 2002
From: guido@python.org (Guido van Rossum)
Date: Sun, 11 Aug 2002 20:25:27 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Mon, 12 Aug 2002 00:19:06 +0200."
 <3D56E2DA.90400@lemburg.com>
References: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com>
 <3D56E2DA.90400@lemburg.com>
Message-ID: <200208120025.g7C0PRg02588@pcp02138704pcs.reston01.va.comcast.net>

> Jack Jansen wrote:
> > As of recently I'm getting deprecation warnings on lots of constructs of 
> > the form "0xff << 24", telling me that in Python 2.4 this will return a 
> > long.
> 
> Interesting. I wonder why the implementation warns about 0xff << 24...
> 0xff000000 fits nicely into a 32-bit integer. I don't see why the
> "changing sign" is relevant here or even why it is mentioned in the
> warning since the PEP doesn't say anything about it.

PEP 237 *tries* mention it:

    - Currently, x< Changing these semantics would cause compatibility problems for
> applications doing low-level bit manipulations or ones which use
> the Python integer type to store unsigned integer values, e.g.
> for use as bitmapped flags.

That's why I'm adding the warnings to 2.3.  Note that the bit pattern
in the lower 32 bits will remain the same; it's just the
interpretation of the sign that will change.

> > As these things are bitpatterns (they're all generated from .h
> > files for system call interfaces and such) that the user will pass
> > to methods that wrap underlying API calls I don't want them to be
> > longs. How do I force them to remain ints?

Why do you want them to remain ints?  Does a long whose lower 32 bits
have the right bit pattern not work?

If you really want the int value, you have to do a little arithmetic.
Here's something that's independent of the Python version and won't
issue a warning:

def toint32(x):
    x = x & 0xffffffffL # Force it to be a long in range(0, 2**32)
    if x & 0x80000000L: # If sign bit set
        x -= 0x100000000L # flip sign
    return int(x)

You can also write it as a single expression:

def toint32(x):
    return int((x & 0xffffffffL) - ((x & 0x80000000L) << 1))

In the long run you'll thank me for this.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg@cosc.canterbury.ac.nz  Mon Aug 12 01:58:59 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 12 Aug 2002 12:58:59 +1200 (NZST)
Subject: [Python-Dev] timsort for jython
In-Reply-To: 
Message-ID: <200208120058.g7C0wxu09838@oma.cosc.canterbury.ac.nz>

> Is there a reason to want to get rid of list.append()?

No, because...

> How does one decide between list.append() and augmented assignment
> (+=)

That's easy -- if I'm only appending one item, I use
append(), because it avoids creating a one-element list
and then throwing it away.

Be-kind-to-your-environment-and-minimise-waste-ly,

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From guido@python.org  Mon Aug 12 02:51:56 2002
From: guido@python.org (Guido van Rossum)
Date: Sun, 11 Aug 2002 21:51:56 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: Your message of "Mon, 12 Aug 2002 11:30:24 +1200."
 <200208112330.g7BNUOW08360@oma.cosc.canterbury.ac.nz>
References: <200208112330.g7BNUOW08360@oma.cosc.canterbury.ac.nz>
Message-ID: <200208120151.g7C1pv103230@pcp02138704pcs.reston01.va.comcast.net>

> > (1) Make binary pickling the default (in cPickle as well as pickle).
> 
> That would break a lot of programs that use pickle
> without opening the file in binary mode.

Really?  That's unfortunate.  The example thread on Google shows that
binary pickling isn't as widely known as it should be.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net  Mon Aug 12 03:21:51 2002
From: tim.one@comcast.net (Tim Peters)
Date: Sun, 11 Aug 2002 22:21:51 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <200208120025.g7C0PRg02588@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

[Guido]
> ...
> PEP 237 is about erasing all differences between int and long.  When
> seen as a long, 0xff000000 has the value 4278190080.  But currently it
> is an int and has the value -16777216.

Note that while it's an int under all current Python installations, the
*value* differs:  on 64-bit boxes other than Win64, 0xff000000 already has
value 4278190080.  This creates porting problems too.

> As a bit pattern that doesn't make much of a difference, but as a
> numeric value it makes a huge difference (2**32 to be exact :-).  So
> in Python 2.4, 0xff<<24, as well as the constant 0xff000000, will have
> the value 4278190080.

Some users won't even notice .  They may notice that
0xff00000000000000 "changes value", though.

> ...
> In the long run you'll thank me for this.

I'll start today:  thank you.



From greg@cosc.canterbury.ac.nz  Mon Aug 12 03:45:59 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 12 Aug 2002 14:45:59 +1200 (NZST)
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com>
Message-ID: <200208120245.g7C2jxR11922@oma.cosc.canterbury.ac.nz>

Jack Jansen :

> As these things are bitpatterns (they're all generated from .h files
> for system call interfaces and such) that the user will pass to
> methods that wrap underlying API calls I don't want them to be
> longs.

Presumably, by the time these actually become longs, the
relevant Python/C API calls for converting Python ints to
C ints will accept longs that are within range, so it
shouldn't be an issue.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Mon Aug 12 03:50:09 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 12 Aug 2002 14:50:09 +1200 (NZST)
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <3D56E2DA.90400@lemburg.com>
Message-ID: <200208120250.g7C2o9811996@oma.cosc.canterbury.ac.nz>

"M.-A. Lemburg" :

> I wonder why the implementation warns about 0xff << 24...  0xff000000
> fits nicely into a 32-bit integer. I don't see why the "changing sign"
> is relevant here

When the change happens, the result will be a positive number instead
of a negative one. While this isn't relevant for what you're doing, it
might be relevant in some other applications, so I suppose it was
thought prudent to warn about it.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From guido@python.org  Mon Aug 12 03:58:17 2002
From: guido@python.org (Guido van Rossum)
Date: Sun, 11 Aug 2002 22:58:17 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Mon, 12 Aug 2002 14:45:59 +1200."
 <200208120245.g7C2jxR11922@oma.cosc.canterbury.ac.nz>
References: <200208120245.g7C2jxR11922@oma.cosc.canterbury.ac.nz>
Message-ID: <200208120258.g7C2wHR10188@pcp02138704pcs.reston01.va.comcast.net>

> Presumably, by the time these actually become longs, the
> relevant Python/C API calls for converting Python ints to
> C ints will accept longs that are within range, so it
> shouldn't be an issue.

PyInt_AsLong() and the 'i' and 'l' format chars for PyArg_Parse*()
already do so -- and have done so for a long time.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg@cosc.canterbury.ac.nz  Mon Aug 12 04:05:42 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 12 Aug 2002 15:05:42 +1200 (NZST)
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208120151.g7C1pv103230@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200208120305.g7C35gW12257@oma.cosc.canterbury.ac.nz>

Guido:
> Me:
> > That would break a lot of programs that use pickle
> > without opening the file in binary mode.
> 
> Really?  That's unfortunate.

Unfortunate, yes, and true, as far as I can see. It bit me recently --
I decided to change something to use binary pickling, and forgot to
change the way I was opening the file.

If you must do this, I suppose you could start issuing warnings
if pickling is done without specifying a mode, and then change
the default later.

If there's a way of making non-binary unpickling dramatically
faster, though -- even if only with cPickle -- that would be a 
*big* win, and shouldn't cause any compatibity problems.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From guido@python.org  Mon Aug 12 04:12:16 2002
From: guido@python.org (Guido van Rossum)
Date: Sun, 11 Aug 2002 23:12:16 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: Your message of "Mon, 12 Aug 2002 15:05:42 +1200."
 <200208120305.g7C35gW12257@oma.cosc.canterbury.ac.nz>
References: <200208120305.g7C35gW12257@oma.cosc.canterbury.ac.nz>
Message-ID: <200208120312.g7C3CGN10284@pcp02138704pcs.reston01.va.comcast.net>

[Greg E]
> > > That would break a lot of programs that use pickle
> > > without opening the file in binary mode.

[Guido]
> > Really?  That's unfortunate.

[Greg E]
> Unfortunate, yes, and true, as far as I can see. It bit me recently --
> I decided to change something to use binary pickling, and forgot to
> change the way I was opening the file.
> 
> If you must do this, I suppose you could start issuing warnings
> if pickling is done without specifying a mode, and then change
> the default later.

I thought of that.  But probably not worth the upheaval.

> If there's a way of making non-binary unpickling dramatically
> faster, though -- even if only with cPickle -- that would be a 
> *big* win, and shouldn't cause any compatibity problems.

python/sf/505705 is close to acceptance, and reduced one particularly
slow unpickling example 6-fold in speed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net  Mon Aug 12 04:16:47 2002
From: tim.one@comcast.net (Tim Peters)
Date: Sun, 11 Aug 2002 23:16:47 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208120305.g7C35gW12257@oma.cosc.canterbury.ac.nz>
Message-ID: 

[Greg Ewing]
> That would break a lot of programs that use pickle
> without opening the file in binary mode.

[Guido]
> Really?  That's unfortunate.

[Greg]
> Unfortunate, yes, and true, as far as I can see. It bit me recently --
> I decided to change something to use binary pickling, and forgot to
> change the way I was opening the file.

Greg, do you use Windows?  If not, I suspect you're mis-remembering what you
did -- "binary mode" versus "text mode" doesn't make any difference on
Linux.



From greg@cosc.canterbury.ac.nz  Mon Aug 12 04:17:55 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 12 Aug 2002 15:17:55 +1200 (NZST)
Subject: [Python-Dev] The memo of pickle
In-Reply-To: 
Message-ID: <200208120317.g7C3Htu12448@oma.cosc.canterbury.ac.nz>

> Greg, do you use Windows?  If not, I suspect you're mis-remembering
> what you did -- "binary mode" versus "text mode" doesn't make any
> difference on Linux.

No, but I use a Mac (with Classic OS) where it certainly
does make a difference!

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From tim.one@comcast.net  Mon Aug 12 04:24:21 2002
From: tim.one@comcast.net (Tim Peters)
Date: Sun, 11 Aug 2002 23:24:21 -0400
Subject: [Python-Dev] The memo of pickle
In-Reply-To: <200208120317.g7C3Htu12448@oma.cosc.canterbury.ac.nz>
Message-ID: 

[Greg Ewing, on text mode vs binary mode]
> No, but I use a Mac (with Classic OS) where it certainly
> does make a difference!

Cool!  I was just wondering the other day whether there are any Mac users
left apart from Jack and Guido's brother.  It's a landslide .



From andymac@bullseye.apana.org.au  Sun Aug 11 00:28:37 2002
From: andymac@bullseye.apana.org.au (Andrew MacIntyre)
Date: Sun, 11 Aug 2002 10:28:37 +1100 (edt)
Subject: [Python-Dev] Patch 592529: Split-out ntmodule.c
In-Reply-To: 
Message-ID: 

On 9 Aug 2002, Martin v. [iso-8859-15] L=F6wis wrote:

> If you are familiar with the code, it would be good if you could
> comment on the following questions:
>
> - should os2module.c get its own source code file as well?
>
> - are the #ifdefs in the resulting ntmodule.c still needed?
>   I believe they are, as the various compilers appear to support
>   different sets of functions in their C libraries. Of course,
>   most of these could be eliminated if the C is avoided in favour
>   of the Win32 API. Alternatively, can anybody with access to any
>   of these compilers (BorlandC, Watcom, IBM) please comment on
>   which functions provided by MSVC are missing in those compilers?

I don't have a problem with the OS/2 stuff being split out as well.
However I think there is some merit to Tim's point (added as a comment to
the patch) about trying to contain the natural divergence of API support
if the code is completely split out.

I haven't looked at the your patch (just the SF patch manager
entry, sorry), but if a split is pursued, I would would find an approach
similar to the thread_* and dynload_* bits (in Python/) somewhat more in
keeping with the above reservation.

This approach would have a master module file (eg platformmodule.c) which
contains the init function and the PyMethodDef array (with methods
controlled by HAVE_method #ifdefs as appropriate), and includes
the platform specific implementation files.

I have had thoughts about doing this before, but the scale of the task and
the fact that I don't have a Windows dev box for testing put me off.

--
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac@bullseye.apana.org.au  | Snail: PO Box 370
        andymac@pcug.org.au            |        Belconnen  ACT  2616
Web:    http://www.andymac.org/        |        Australia



From martin@v.loewis.de  Mon Aug 12 06:41:04 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 12 Aug 2002 07:41:04 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <3D56E2DA.90400@lemburg.com>
References: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com>
 <3D56E2DA.90400@lemburg.com>
Message-ID: 

"M.-A. Lemburg"  writes:

> Changing these semantics would cause compatibility problems for
> applications doing low-level bit manipulations or ones which use
> the Python integer type to store unsigned integer values, e.g.
> for use as bitmapped flags.

In case this isn't clear yet: There likely will be no compatibility
problems. The bit manipulations will likely see the same results.

Of course, it would be good if users bring forward examples of how the
change affects their code. I.e. whenever you see such a warning,
please study the code and report whether the upcoming change will or
will not break your code. Such reports would allow to improve the
documentation.

Regards,
Martin


From oren-py-d@hishome.net  Mon Aug 12 07:37:32 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Mon, 12 Aug 2002 02:37:32 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <3D56E2DA.90400@lemburg.com>
References: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com> <3D56E2DA.90400@lemburg.com>
Message-ID: <20020812063732.GA95771@hishome.net>

On Mon, Aug 12, 2002 at 12:19:06AM +0200, M.-A. Lemburg wrote:
> Changing these semantics would cause compatibility problems for
> applications doing low-level bit manipulations or ones which use
> the Python integer type to store unsigned integer values, e.g.
> for use as bitmapped flags.

I'm very much in favor of this change but a deprecation warning is not 
enough - some suitable replacement should be provided to cryptographers 
and other bit fiddlers.

Proposal:

A standard module implementing the types [u]int[8|16|32|64]. These types
would behave just like C integers - wrap around on overflow, etc and have 
a guaranteed size regardless of platform. They can even have methods for
bit rotation.

	Oren


From mal@lemburg.com  Mon Aug 12 09:09:35 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 12 Aug 2002 10:09:35 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
References: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com>              <3D56E2DA.90400@lemburg.com> <200208120025.g7C0PRg02588@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3D576D3F.80906@lemburg.com>

Guido van Rossum wrote:
>>Jack Jansen wrote:
>>
>>>As of recently I'm getting deprecation warnings on lots of constructs of 
>>>the form "0xff << 24", telling me that in Python 2.4 this will return a 
>>>long.
>>
>>Interesting. I wonder why the implementation warns about 0xff << 24...
>>0xff000000 fits nicely into a 32-bit integer. I don't see why the
>>"changing sign" is relevant here or even why it is mentioned in the
>>warning since the PEP doesn't say anything about it.
> 
> 
> PEP 237 *tries* mention it:
> 
>     - Currently, x<       changed to return a long int containing all the shifted-out
>       bits, if returning a short int would lose bits.
> 
> Maybe you don't see changing sign as "losing bits"?  I do!  Maybe I
> have to clarify this in the PEP.

I was talking about the sign bit which lies within the 32 bits for
32-bit integers, so no bits are lost. I am not talking about things
like 0xff << 28 where bits are actually moved beyond the 32 bits to the
left and lost that way (or preserved if you move to a 64-bit platform,
but that's another story ;-).

> PEP 237 is about erasing all differences between int and long.  When
> seen as a long, 0xff000000 has the value 4278190080.  But currently it
> is an int and has the value -16777216.  As a bit pattern that doesn't
> make much of a difference, but as a numeric value it makes a huge
> difference (2**32 to be exact :-).  So in Python 2.4, 0xff<<24, as
> well as the constant 0xff000000, will have the value 4278190080.
> 
> Note that larger constants are already longs in 2.2: e.g. 0x100000000
> equals 4294967296 (which happens to be representable only as a long).
> It's the oct and hex constants in range(2**31, 2**32) that currently
> behave anomalously, returning negative numbers despite looking like
> positive numbers (to everyone except people whose minds have been
> exposed to 32-bit bit-fiddling too long :-).
> 
> 
>>Changing these semantics would cause compatibility problems for
>>applications doing low-level bit manipulations or ones which use
>>the Python integer type to store unsigned integer values, e.g.
>>for use as bitmapped flags.
> 
> 
> That's why I'm adding the warnings to 2.3.  Note that the bit pattern
> in the lower 32 bits will remain the same; it's just the
> interpretation of the sign that will change.

That's exactly what I'd like too :-) With the only difference
that you seem to see the sign bit as not included in the 32 bits.

>>>As these things are bitpatterns (they're all generated from .h
>>>files for system call interfaces and such) that the user will pass
>>>to methods that wrap underlying API calls I don't want them to be
>>>longs. How do I force them to remain ints?
>>
> 
> Why do you want them to remain ints?  Does a long whose lower 32 bits
> have the right bit pattern not work?

No, because you usually pass these objects directly to some
Python C function (directly as parameter or indirectly as item
in a list or tuple) which often enough insists on getting a true
integer object.

> If you really want the int value, you have to do a little arithmetic.
> Here's something that's independent of the Python version and won't
> issue a warning:
> 
> def toint32(x):
>     x = x & 0xffffffffL # Force it to be a long in range(0, 2**32)
>     if x & 0x80000000L: # If sign bit set
>         x -= 0x100000000L # flip sign
>     return int(x)
> 
> You can also write it as a single expression:
> 
> def toint32(x):
>     return int((x & 0xffffffffL) - ((x & 0x80000000L) << 1))
> 
> In the long run you'll thank me for this.

No argument about this. It's just that I see a lot of programs
breaking because of the 0x1 << 31 returning a long. That needen't
be the case. People using this will know what they are doing and
use a long when possible anyway. However, tweaking C extensions to
also accept longs instead of integers requires hacking those
extensions which I'd like to avoid if possible. I already had
one of these instances with file.tell() returning a long and
that caused a lot of trouble then.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From duncan@rcp.co.uk  Mon Aug 12 11:02:11 2002
From: duncan@rcp.co.uk (Duncan Booth)
Date: Mon, 12 Aug 2002 11:02:11 +0100
Subject: [Python-Dev] _sre as part of python.dll
In-Reply-To: <3D555DBB.5040204@tismer.com>
Message-ID: <3D5795B3.6178.22DC5F47@localhost>

On 10 Aug 2002 at 20:38, Christian Tismer wrote:

> Duncan Booth wrote:
> ...
> 
> > _sre is used by any application that imports 'os'. That (IMHO) is almost 
> > every non-trivial Python program.
> 
> Sure? Then try this in a Windows shell:

Sjoerd Mullender already pointed out I got this wrong. Unfortunately, for reasons 
that currently escape me, my response disappeared into a black hole and didn't 
appear on the mailing list.

I jumped to the wrong conclusion because running py2exe on a program that 
imports os always includes _sre.dll in the files for distribution. This is because the 
os module does indeed import _sre, but only when the function that uses it is 
actually called. So any program that imports os includes _sre in the automatically 
generated list of denpendencies, but it may or may not actually import it.
-- 
Duncan Booth                                             duncan@dales.rmplc.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
http://dales.rmplc.co.uk/Duncan



From walter@livinglogic.de  Mon Aug 12 11:47:33 2002
From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=)
Date: Mon, 12 Aug 2002 12:47:33 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com> <200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>              <3D4E97A7.7000904@lemburg.com> <200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3D579245.2080306@livinglogic.de>

I'm back from vacation. Comments on the thread and a list
of open issues are below.

Guido van Rossum wrote:
 > M.-A. Lemburg wrote:
 > > Walter has written a pretty good test suite for the patch
 > > and I have a good feeling about it. I'd like Walter to check
 > > it into CVS and then see whether the alpha tests bring up any
 > > quirks. The patch only touches the codecs and adds some new
 > > exceptions. There are no other changes involved.
 > >
 > > I think that together with PEP 263 (source code encoding) this
 > > is a great step forward in Python's i18n capabilities.
 > >
 > > BTW, the test script contains some examples of how to put the
 > > error callbacks to use:
 > >
 > > 
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=27815&aid=432401
 >
 > Sounds like a plan then.

Does this mean we can check in the patch?

Documentation is still missing and encoding specific
decoding tests should be added to the test script.

Has anybody except me and Marc-André tried the patch?
On anything other than Linux/Intel? With UCS2 and UCS4?

Martin v. Loewis wrote:
 > If you look at the large blocks of new code, you find that it is in
 >
 > - charmap_encoding_error, which insists on implementing known error
 >   handling algorithms inline,

This is done for performance reasons.

 > - the default error handlers, of which atleast
 >   PyCodec_XMLCharRefReplaceErrors should be pure-Python

The PyCodec_XMLCharRefReplaceErrors functionality is
independent of the rest, so moving this to Python
won't reduce complexity that much. And it will
slow down "xmlcharrefreplace" handling for those
codecs that don't implement it inline.

 > - PyCodec_BackslashReplaceErrors, likewise,
 >
 > - the UnicodeError exception methods (which could be omitted, IMO).

Those methods were implemented so that we can easily
move to new style exceptions. The exception attributes
can then be members of the C struct and the accessor functions
can be simple macros.

I guess some of the methods could be removed by moving
duplicate ones to the base class UnicodeError, but
this would break backwards compatibility.

Oren Tirosh wrote:
 > Some of my reservations about PEP 293:
 >
 > It overloads the meaning of the error handling argument in an unintuitive
 > way.  It gets to the point where it's much more than just error 
handling -
 > it's actually extending the functionality of the codec.
 >
 > Why implement yet another name-based registry?  There must be a 
simpler way
 > to do it.

The registry is name-based because this is required by the current C API.
Passing the error handler directly as a function object would be
simpler, but this can't be done, as it would require vast changes
to the C API (an old version of the patch did that.) And this way
we gain the benefit of implementing well-known error hanlding
names inline.

It is "yet another" registry exactly because encoding and error handling
are completely orthogonal (at least for encoding). If you add a
new error handler all codecs can use it (as long as they are aware
of the new error handling way) and if you define a new codec it will
work with all existing error handlers.

 > Generating an exception for each character that isn't handled by simple
 > lookup probably adds quite a lot of overhead.

1. All encoders try to collect runs of unencodable characters to
minimize the number of calls to the callback.

2. The PEP explicitely states that the codec is allowed to
reuse the exception object. All codecs do this, so the
exception object will only be created once (at most;
when no error occurs, no exception object will be created)
The exception object is just a quick way to pass information
between the codec and the error handler and it could become
even faster as soon as we get new style exceptions.

 > What are the use cases?  Maybe a simple extension to charmap would be 
enough
 > for all the practical cases?

Not all codecs are charmap based.



Open issues:

1. For each error handler two Python function objects are created:
One in the registry and a different one in the codecs module. This
means that e.g.
codecs.lookup_error("replace") != codecs.replace_errors

We can fix that by making the name ob the Python function object
globally visible or by changing the codecs init function to do a lookup 
and use the result or simply by removing codecs.replace_errors

2. Currently charmap encoding uses a safe way for reallocation
string storage, which tests available space on each output. This
slows charmap encoding down a bit. This should probably be changed
back to the old way: Test available space only for output strings
longer than one character.

3. Error reporting logic in the exception attribute setters/getters
may be non-standard. What is the standard way to report errors for
C functions that don't return object pointers?
==0 for error and !=0 for success
or
==0 for success and !=0 for error
PyArg_ParseTuple returns true an success, PyObject_SetAttr returns true
on failure, which one is the exception and which one the rule?

4. Assigning to an attribute of an exception object does not
change the appropriate entry in the args attribute. Is this
worth changing?

5. UTF-7 decoding does not yet take full advantage of the machinery:
When an unterminated shift sequence is encountered (e.g. "+xxx")
the faulty byte sequence has already been emitted.

Bye,
    Walter Dörwald



From loewis@informatik.hu-berlin.de  Mon Aug 12 12:48:06 2002
From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: Mon, 12 Aug 2002 13:48:06 +0200 (CEST)
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
Message-ID: <200208121148.g7CBm6KD027268@paros.informatik.hu-berlin.de>

http://www.python.org/peps/pep-0277.html

The PEP describes a Windows-only change to Unicode in file names: On
Windows NT/2k/XP, Python would allow arbitrary Unicode strings as file
names and pass them to the OS, instead of converting them to CP_ACP
first. This applies to open() and all os functions that accept
filenames.

In addition, os.list() would return Unicode filenames if the argument
is Unicode.

Please comment on the PEP. There is an updated patch on
http://python.org/sf/594001; please comment on the patch as well.

Regards,
Martin



From skip@pobox.com  Mon Aug 12 13:29:01 2002
From: skip@pobox.com (Skip Montanaro)
Date: Mon, 12 Aug 2002 07:29:01 -0500
Subject: [Python-Dev] timsort for jython
In-Reply-To: 
References: 
 
Message-ID: <15703.43533.581619.884543@localhost.localdomain>

    Patrick> Your flip comment has got me thinking about the "one best
    Patrick> idiom" for list appending. So I'll ask the question. Is there a
    Patrick> reason to want to get rid of list.append()? 

Certainly not for performance.  append is substantially faster that +=, at
least in part because of the list creation, especially if you cache the
method lookup. 

Skip

import time

def timefunc(s, args, *sargs, **kwds):
    t = time.time()
    apply(s, args+sargs, kwds)
    return time.time()-t

def appendit(l, o, n):
    append = l.append
    for i in xrange(n):
        append(o)

def extendit(l, o, n):
    extend = l.extend
    for i in xrange(n):
        extend([o])

def augassignit(l, o, n):
    for i in xrange(n):
        l += [o]

print "append small int:",
x = 0.0
for i in 1,2,3:
    x += timefunc(appendit, ([], 1, 100000))
print "%.3f" % (x/3)
print "aug assign small int:",
x = 0.0
for i in 1,2,3:
    x += timefunc(augassignit, ([], 1, 100000))
print "%.3f" % (x/3)
print "extend small int:",
x = 0.0
for i in 1,2,3:
    x += timefunc(extendit, ([], 1, 100000))
print "%.3f" % (x/3)


From mal@lemburg.com  Mon Aug 12 14:10:38 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 12 Aug 2002 15:10:38 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com> <200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>              <3D4E97A7.7000904@lemburg.com> <200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net> <3D579245.2080306@livinglogic.de>
Message-ID: <3D57B3CE.8050407@lemburg.com>

Walter D=F6rwald wrote:
> I'm back from vacation. Comments on the thread and a list
> of open issues are below.

I'm going on vacation for two weeks, so you'll have to take
it along from here.

Have fun,
--=20
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From guido@python.org  Mon Aug 12 14:18:52 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 09:18:52 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Mon, 12 Aug 2002 02:37:32 EDT."
 <20020812063732.GA95771@hishome.net>
References: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com> <3D56E2DA.90400@lemburg.com>
 <20020812063732.GA95771@hishome.net>
Message-ID: <200208121318.g7CDIrr11573@pcp02138704pcs.reston01.va.comcast.net>

> I'm very much in favor of this change but a deprecation warning is not 
> enough - some suitable replacement should be provided to cryptographers 
> and other bit fiddlers.

You can do all the bit fiddling you want using longs already.  If you
want the result truncated to n bits, simply apply a mask after each
operation, e.g. (for 32-bit results) x = (x << 14) & 0xffffffff.

> Proposal:
> 
> A standard module implementing the types [u]int[8|16|32|64]. These types
> would behave just like C integers - wrap around on overflow, etc and have 
> a guaranteed size regardless of platform. They can even have methods for
> bit rotation.

If you propose this as a Python module, I'm +/- 0; I don't have the
need, and I feel you can do all of this already, but I can see that
there may be one or two things that beginners at bit-fiddling might
find useful (like how to do sign extension or sign folding without an
if statement).

If you were proposing a C module, an emphatic YAGNI accompanies a -1.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Mon Aug 12 14:23:58 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 09:23:58 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Mon, 12 Aug 2002 10:09:35 +0200."
 <3D576D3F.80906@lemburg.com>
References: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com> <3D56E2DA.90400@lemburg.com> <200208120025.g7C0PRg02588@pcp02138704pcs.reston01.va.comcast.net>
 <3D576D3F.80906@lemburg.com>
Message-ID: <200208121323.g7CDNw711590@pcp02138704pcs.reston01.va.comcast.net>

> > That's why I'm adding the warnings to 2.3.  Note that the bit pattern
> > in the lower 32 bits will remain the same; it's just the
> > interpretation of the sign that will change.
> 
> That's exactly what I'd like too :-) With the only difference
> that you seem to see the sign bit as not included in the 32 bits.

I was using sloppy language by lumping "sign change" under "lost bits".
What I really meant was "returning a value that's different from what
the same operation on a long would return".  I've added something
about sign changes to the PEP.

> > Why do you want them to remain ints?  Does a long whose lower 32 bits
> > have the right bit pattern not work?
> 
> No, because you usually pass these objects directly to some
> Python C function (directly as parameter or indirectly as item
> in a list or tuple) which often enough insists on getting a true
> integer object.

There's no excuse for that any more.  The 'i' and 'l' format chars of
PyArg_Parse* and PyInt_AsLong() both work for longs as well as for
ints.

> No argument about this. It's just that I see a lot of programs
> breaking because of the 0x1 << 31 returning a long.

I think you're overly pessimistic.  But that's why I'm putting the
warning in for 2.3 -- the semantics are the same as for 2.2, they
won't change until 2.4 (or later if this turns out to be a bigger
issue).

> That needen't
> be the case. People using this will know what they are doing and
> use a long when possible anyway. However, tweaking C extensions to
> also accept longs instead of integers requires hacking those
> extensions which I'd like to avoid if possible. I already had
> one of these instances with file.tell() returning a long and
> that caused a lot of trouble then.

Sorry, no go.  There's no way I can defend returning a different value
for x<
References: <200208121148.g7CBm6KD027268@paros.informatik.hu-berlin.de>
Message-ID: <200208121338.g7CDc2l11722@pcp02138704pcs.reston01.va.comcast.net>

> http://www.python.org/peps/pep-0277.html
> 
> The PEP describes a Windows-only change to Unicode in file names: On
> Windows NT/2k/XP, Python would allow arbitrary Unicode strings as file
> names and pass them to the OS, instead of converting them to CP_ACP
> first. This applies to open() and all os functions that accept
> filenames.
> 
> In addition, os.list() would return Unicode filenames if the argument
> is Unicode.
> 
> Please comment on the PEP. There is an updated patch on
> http://python.org/sf/594001; please comment on the patch as well.

I've added some comments to the patch.

I'm +0 on the PEP; I'd like to defer to people who actually use
Windows like Mark Hammond and Tim Peters.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry@python.org  Mon Aug 12 14:53:12 2002
From: barry@python.org (Barry A. Warsaw)
Date: Mon, 12 Aug 2002 09:53:12 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
References: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com>
 <3D56E2DA.90400@lemburg.com>
 <20020812063732.GA95771@hishome.net>
 <200208121318.g7CDIrr11573@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <15703.48584.232054.347274@anthem.wooz.org>

>>>>> "GvR" == Guido van Rossum  writes:

    GvR> If you propose this as a Python module, I'm +/- 0; I don't
    GvR> have the need, and I feel you can do all of this already, but
    GvR> I can see that there may be one or two things that beginners
    GvR> at bit-fiddling might find useful (like how to do sign
    GvR> extension or sign folding without an if statement).

A HOWTO might also suffice.

-Barry


From Jack.Jansen@cwi.nl  Mon Aug 12 16:21:00 2002
From: Jack.Jansen@cwi.nl (Jack Jansen)
Date: Mon, 12 Aug 2002 17:21:00 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <20020812063732.GA95771@hishome.net>
Message-ID: <1BB36AF0-AE07-11D6-99CF-0030655234CE@cwi.nl>

On Monday, August 12, 2002, at 08:37 , Oren Tirosh wrote:
> Proposal:
>
> A standard module implementing the types [u]int[8|16|32|64]. These types
> would behave just like C integers - wrap around on overflow, etc and 
> have
> a guaranteed size regardless of platform. They can even have methods for
> bit rotation.

This, plus some syntactic sugar so I could easily specify values of 
these types in my source code, would do the trick.
Preferrably in such a way that I can use the C code verbatim: at the 
moment I don't have to understand what the C code does, whether it uses 
other constants, strings, expressions, etc.
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From martin@v.loewis.de  Mon Aug 12 16:29:14 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 12 Aug 2002 17:29:14 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <200208121323.g7CDNw711590@pcp02138704pcs.reston01.va.comcast.net>
References: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com>
 <3D56E2DA.90400@lemburg.com>
 <200208120025.g7C0PRg02588@pcp02138704pcs.reston01.va.comcast.net>
 <3D576D3F.80906@lemburg.com>
 <200208121323.g7CDNw711590@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

Guido van Rossum  writes:

> There's no excuse for that any more.  The 'i' and 'l' format chars of
> PyArg_Parse* and PyInt_AsLong() both work for longs as well as for
> ints.

There is a change, of course: Passing 0xff<<24 to a function that uses
the "i" converter will produce an OverflowError, whereas it previously
would pass in the negative numbers.

For cases of "I want 32 bits in an int", you'll have to accept both
signed and unsigned 32 bits - something that is currently not
supported in ParseTuple.

Regards,
Martin


From guido@python.org  Mon Aug 12 16:37:03 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 11:37:03 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Mon, 12 Aug 2002 17:29:14 +0200."
 
References: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com> <3D56E2DA.90400@lemburg.com> <200208120025.g7C0PRg02588@pcp02138704pcs.reston01.va.comcast.net> <3D576D3F.80906@lemburg.com> <200208121323.g7CDNw711590@pcp02138704pcs.reston01.va.comcast.net>
 
Message-ID: <200208121537.g7CFb3m16783@pcp02138704pcs.reston01.va.comcast.net>

> Guido van Rossum  writes:
> 
> > There's no excuse for that any more.  The 'i' and 'l' format chars of
> > PyArg_Parse* and PyInt_AsLong() both work for longs as well as for
> > ints.

Martin:
> There is a change, of course: Passing 0xff<<24 to a function that uses
> the "i" converter will produce an OverflowError, whereas it previously
> would pass in the negative numbers.

And unfortunately the same will happen for the "l" converter
(PyInt_AsLong() does a signed range check.

> For cases of "I want 32 bits in an int", you'll have to accept both
> signed and unsigned 32 bits - something that is currently not
> supported in ParseTuple.

Oops.  Darn.  You're right.  Sigh.  That's painful.  We have to add a
new format code (or more) to accept signed 32-bit ints but also longs
in range(32).  This should be added in 2.3 so extensions can start
using it now, and user code can start passing longs in range(2**32)
now.  I propose 'k' (for masK).  We should backport this to 2.2.2 as
well.  Plus a variant on PyInt_AsLong() with the same semantics, maybe
named PyInt_AsMask().

Any takers?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de  Mon Aug 12 16:41:46 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 12 Aug 2002 17:41:46 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: <3D579245.2080306@livinglogic.de>
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com>
 <200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>
 <3D4E97A7.7000904@lemburg.com>
 <200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net>
 <3D579245.2080306@livinglogic.de>
Message-ID: 

Walter D=F6rwald  writes:

>  > - charmap_encoding_error, which insists on implementing known error
>  >   handling algorithms inline,
>=20
> This is done for performance reasons.

Is that really worth it? Such errors are rare, and when they occur,
they usually cause an exception as the result of the "strict" error
handling.

I'd strongly encourage you to avoid duplication of code, and use
Python whereever possible.

> The PyCodec_XMLCharRefReplaceErrors functionality is
> independent of the rest, so moving this to Python
> won't reduce complexity that much. And it will
> slow down "xmlcharrefreplace" handling for those
> codecs that don't implement it inline.

Sure it will. But how much does that matter in the overall context of
generating HTML/XML?

>  > - the UnicodeError exception methods (which could be omitted, IMO).
>=20
> Those methods were implemented so that we can easily
> move to new style exceptions.=20

What are new-style exceptions?

> The exception attributes can then be members of the C struct and the
> accessor functions can be simple macros.

Again, I sense premature optimization.

> 1. For each error handler two Python function objects are created:
> One in the registry and a different one in the codecs module. This
> means that e.g.
> codecs.lookup_error("replace") !=3D codecs.replace_errors

Why would this be a problem?=20

> We can fix that by making the name ob the Python function object
> globally visible or by changing the codecs init function to do a
> lookup and use the result or simply by removing codecs.replace_errors

I recommend to fix this by implementing the registry in Python.

> 4. Assigning to an attribute of an exception object does not
> change the appropriate entry in the args attribute. Is this
> worth changing?

No. Exception objects should be treated as immutable (even if they
aren't). If somebody complains, we can fix it; until then, it suffices
if this is documented.

> 5. UTF-7 decoding does not yet take full advantage of the machinery:
> When an unterminated shift sequence is encountered (e.g. "+xxx")
> the faulty byte sequence has already been emitted.

It would be ok if it works as good as it did in 2.2. UTF-7 is rarely
used; if it is used, it is machine-generated, so there shouldn't be
any errors.

Regards,
Martin


From tim.one@comcast.net  Mon Aug 12 16:49:06 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 12 Aug 2002 11:49:06 -0400
Subject: [Python-Dev] 32-bit values (was RE: [Python-checkins] python/dist/src/Lib/test
 test_zlib.py,1.18,1.19)
In-Reply-To: 
Message-ID: 

[Guido]
> Modified Files:
> 	test_zlib.py
> Log Message:
> Portable way of producing unsigned 32-bit hex output to print the
> CRCs.
>
>
> Index: test_zlib.py
> ===================================================================
> RCS file: /cvsroot/python/python/dist/src/Lib/test/test_zlib.py,v
> retrieving revision 1.18
> retrieving revision 1.19
> diff -C2 -d -r1.18 -r1.19
> *** test_zlib.py	23 Jul 2002 19:04:09 -0000	1.18
> --- test_zlib.py	12 Aug 2002 15:26:05 -0000	1.19
> ***************
> *** 13,18 ****
>
>   # test the checksums (hex so the test doesn't break on 64-bit machines)
> ! print hex(zlib.crc32('penguin')), hex(zlib.crc32('penguin', 1))
> ! print hex(zlib.adler32('penguin')), hex(zlib.adler32('penguin', 1))
>
>   # make sure we generate some expected errors
> --- 13,20 ----
>
>   # test the checksums (hex so the test doesn't break on 64-bit machines)
> ! def fix(x):
> !     return "0x%x" % (x & 0xffffffffL)
> ! print fix(zlib.crc32('penguin')), fix(zlib.crc32('penguin', 1))
> ! print fix(zlib.adler32('penguin')), fix(zlib.adler32('penguin', 1))
>
>   # make sure we generate some expected errors

This raises a question:  what should crc32 and adler32 return?  They return
32-bit values, and that's part of external definitions so we can't change
that, but how we *view* "the sign bit" is up to us.  binascii.crc32()
always-- even on 64-bit boxes --returns a value in range(-2**31, 2**31).  I
know that because I forced it to not long ago.  I don't know what the other
guys return (zlib.crc32(), zlib.adler32(), ...?).

It would sure be nice if they returned values in range(0, 2**32) instead.  A
difficulty with changing this stuff is that checksums seem frequently to be
read and written via the struct module, with format code "l", and e.g.

>>> struct.pack("!l", 1L << 31)
Traceback (most recent call last):
  File "", line 1, in ?
OverflowError: long int too large to convert to int
>>>



From mal@lemburg.com  Mon Aug 12 16:50:57 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 12 Aug 2002 17:50:57 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
References: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com> <3D56E2DA.90400@lemburg.com> <200208120025.g7C0PRg02588@pcp02138704pcs.reston01.va.comcast.net> <3D576D3F.80906@lemburg.com> <200208121323.g7CDNw711590@pcp02138704pcs.reston01.va.comcast.net>               <200208121537.g7CFb3m16783@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3D57D961.2020600@lemburg.com>

Guido van Rossum wrote:
>>Guido van Rossum  writes:
>>
>>
>>>There's no excuse for that any more.  The 'i' and 'l' format chars of
>>>PyArg_Parse* and PyInt_AsLong() both work for longs as well as for
>>>ints.

Sigh.

> Martin:
> 
>>There is a change, of course: Passing 0xff<<24 to a function that uses
>>the "i" converter will produce an OverflowError, whereas it previously
>>would pass in the negative numbers.
> 
> 
> And unfortunately the same will happen for the "l" converter
> (PyInt_AsLong() does a signed range check.
> 
> 
>>For cases of "I want 32 bits in an int", you'll have to accept both
>>signed and unsigned 32 bits - something that is currently not
>>supported in ParseTuple.
> 
> 
> Oops.  Darn.  You're right.  Sigh.  That's painful.  We have to add a
> new format code (or more) to accept signed 32-bit ints but also longs
> in range(32). 

Rather than inventing something new to be compatible to the existing
old status quo, I'd rather like to see new format codes for unsigned
integers and/or longs and have the existing ones support the new
status quo.

 > This should be added in 2.3 so extensions can start
> using it now, and user code can start passing longs in range(2**32)
> now.  I propose 'k' (for masK).  We should backport this to 2.2.2 as
> well.  Plus a variant on PyInt_AsLong() with the same semantics, maybe
> named PyInt_AsMask().
> 
> Any takers?

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From guido@python.org  Mon Aug 12 16:54:09 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 11:54:09 -0400
Subject: [Python-Dev] 32-bit values (was RE: [Python-checkins] python/dist/src/Lib/test test_zlib.py,1.18,1.19)
In-Reply-To: Your message of "Mon, 12 Aug 2002 11:49:06 EDT."
 
References: 
Message-ID: <200208121554.g7CFs9l17354@pcp02138704pcs.reston01.va.comcast.net>

> This raises a question:  what should crc32 and adler32 return?  They return
> 32-bit values, and that's part of external definitions so we can't change
> that, but how we *view* "the sign bit" is up to us.  binascii.crc32()
> always-- even on 64-bit boxes --returns a value in range(-2**31, 2**31).  I
> know that because I forced it to not long ago.  I don't know what the other
> guys return (zlib.crc32(), zlib.adler32(), ...?).
> 
> It would sure be nice if they returned values in range(0, 2**32) instead.  A
> difficulty with changing this stuff is that checksums seem frequently to be
> read and written via the struct module, with format code "l", and e.g.
> 
> >>> struct.pack("!l", 1L << 31)
> Traceback (most recent call last):
>   File "", line 1, in ?
> OverflowError: long int too large to convert to int
> >>>

Such programs will have to be changed to use format code "L" instead.

Or perhaps "l" should be allowed to accept longs in
range(-2**31, 2**32) ?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Mon Aug 12 16:57:09 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 11:57:09 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Mon, 12 Aug 2002 17:50:57 +0200."
 <3D57D961.2020600@lemburg.com>
References: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com> <3D56E2DA.90400@lemburg.com> <200208120025.g7C0PRg02588@pcp02138704pcs.reston01.va.comcast.net> <3D576D3F.80906@lemburg.com> <200208121323.g7CDNw711590@pcp02138704pcs.reston01.va.comcast.net>  <200208121537.g7CFb3m16783@pcp02138704pcs.reston01.va.comcast.net>
 <3D57D961.2020600@lemburg.com>
Message-ID: <200208121557.g7CFv9617378@pcp02138704pcs.reston01.va.comcast.net>

> > Oops.  Darn.  You're right.  Sigh.  That's painful.  We have to add a
> > new format code (or more) to accept signed 32-bit ints but also longs
> > in range(32). 
> 
> Rather than inventing something new to be compatible to the existing
> old status quo, I'd rather like to see new format codes for unsigned
> integers and/or longs and have the existing ones support the new
> status quo.

That's okay too.  The function could be PyInt_AsUnsignedLong().  It
could convert negative 32-bit ints to unsigned as a backward
compatibility measure (with warning?) that will eventually disappear.

The format code could be 'I' for unsigned int, but I don't know what
to use for unsigned long.  Or perhaps still use 'k'/'K' for masK?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Mon Aug 12 17:03:14 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 12 Aug 2002 18:03:14 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com>	<200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>	<3D4E97A7.7000904@lemburg.com>	<200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net>	<3D579245.2080306@livinglogic.de> 
Message-ID: <3D57DC42.6070300@lemburg.com>

Martin v. Loewis wrote:
> Walter D=F6rwald  writes:
>=20
>=20
>> > - charmap_encoding_error, which insists on implementing known error
>> >   handling algorithms inline,
>>
>>This is done for performance reasons.
>=20
> Is that really worth it? Such errors are rare, and when they occur,
> they usually cause an exception as the result of the "strict" error
> handling.
>=20
> I'd strongly encourage you to avoid duplication of code, and use
> Python whereever possible.

See below: this is not always possible; much for the same reason
that exceptions are implemented in C as well.

>>The PyCodec_XMLCharRefReplaceErrors functionality is
>>independent of the rest, so moving this to Python
>>won't reduce complexity that much. And it will
>>slow down "xmlcharrefreplace" handling for those
>>codecs that don't implement it inline.
>=20
> Sure it will. But how much does that matter in the overall context of
> generating HTML/XML?
>=20
>=20
>> > - the UnicodeError exception methods (which could be omitted, IMO).
>>
>>Those methods were implemented so that we can easily
>>move to new style exceptions.=20
>=20
>=20
> What are new-style exceptions?=20

Exceptions that are built as subclassable types.

>>The exception attributes can then be members of the C struct and the
>>accessor functions can be simple macros.
>=20
>=20
> Again, I sense premature optimization.

There's nothing premature here. By moving exception handling to
C level, you get *much* better performance than at Python level.
Remember that applications like e.g. escaping chars in an XML
document can cause lots of these exceptions to be generated.

>>1. For each error handler two Python function objects are created:
>>One in the registry and a different one in the codecs module. This
>>means that e.g.
>>codecs.lookup_error("replace") !=3D codecs.replace_errors
>=20
>=20
> Why would this be a problem?=20
>=20
>=20
>>We can fix that by making the name ob the Python function object
>>globally visible or by changing the codecs init function to do a
>>lookup and use the result or simply by removing codecs.replace_errors
>=20
>=20
> I recommend to fix this by implementing the registry in Python.

This doesn't work as I've already explained before. The predefined
error handling modes of builtin codecs must work with relying on
the Python import mechanism.

>>4. Assigning to an attribute of an exception object does not
>>change the appropriate entry in the args attribute. Is this
>>worth changing?
>=20
>=20
> No. Exception objects should be treated as immutable (even if they
> aren't). If somebody complains, we can fix it; until then, it suffices
> if this is documented.

What ? That exceptions are immutable ? I think it's a big win that
exceptions are in fact mutable -- they are great for transporting
extra information up the chain...

try:
     ...
except Exception, obj:
     obj.been_there =3D 1
     raise

>>5. UTF-7 decoding does not yet take full advantage of the machinery:
>>When an unterminated shift sequence is encountered (e.g. "+xxx")
>>the faulty byte sequence has already been emitted.
>=20
>=20
> It would be ok if it works as good as it did in 2.2. UTF-7 is rarely
> used; if it is used, it is machine-generated, so there shouldn't be
> any errors.

Right.

--=20
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From tim.one@comcast.net  Mon Aug 12 17:10:19 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 12 Aug 2002 12:10:19 -0400
Subject: [Python-Dev] 32-bit values (was RE: [Python-checkins]
 python/dist/src/Lib/test test_zlib.py,1.18,1.19)
In-Reply-To: <200208121554.g7CFs9l17354@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

[Tim]
> This raises a question:  what should crc32 and adler32 return?
> ...
> binascii.crc32() always-- even on 64-bit boxes --returns a value in
> range(-2**31, 2**31).
> ...
> I don't know what the other guys return (zlib.crc32(),
> zlib.adler32(), ...?).
>
> It would sure be nice if they returned values in range(0,
> 2**32) instead.  A difficulty with changing this stuff is that
> checksums seem frequently to be read and written via the struct
> module, with format code "l", and e.g.
>
> >>> struct.pack("!l", 1L << 31)
> Traceback (most recent call last):
>   File "", line 1, in ?
> OverflowError: long int too large to convert to int

[Guido]
> Such programs will have to be changed to use format code "L" instead.

I'm not following this.  At least binascii.crc32() always produces a 32-bit
signed int now, so there's no *need* to use "L" now.  Are you saying that
binascii.crc32() should be changed to return a non-negative value always?
Also the other xyz.abc32() functions?

> Or perhaps "l" should be allowed to accept longs in
> range(-2**31, 2**32) ?

Well, unpacking a packed value wouldn't always return the value you started
with then (pack 2**31 via "l", then unpack it via "l" and you get
back -2**31), so it's not very attractive.  If you dump a checksum via pack,
then unpack it later, you really want to get back the same value, not just
"the same bits after some fiddling".



From martin@v.loewis.de  Mon Aug 12 17:24:47 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 12 Aug 2002 18:24:47 +0200
Subject: [Python-Dev] 32-bit values (was RE: [Python-checkins] python/dist/src/Lib/test test_zlib.py,1.18,1.19)
In-Reply-To: <200208121554.g7CFs9l17354@pcp02138704pcs.reston01.va.comcast.net>
References: 
 <200208121554.g7CFs9l17354@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

Guido van Rossum  writes:

> Or perhaps "l" should be allowed to accept longs in range(-2**31,
> 2**32) ?

For the struct and array modules, that sounds reasonable.

Regards,
Martin


From guido@python.org  Mon Aug 12 17:26:37 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 12:26:37 -0400
Subject: [Python-Dev] 32-bit values (was RE: [Python-checkins] python/dist/src/Lib/test test_zlib.py,1.18,1.19)
In-Reply-To: Your message of "Mon, 12 Aug 2002 12:10:19 EDT."
 
References: 
Message-ID: <200208121626.g7CGQb117525@pcp02138704pcs.reston01.va.comcast.net>

> [Tim]
> > This raises a question:  what should crc32 and adler32 return?
> > ...
> > binascii.crc32() always-- even on 64-bit boxes --returns a value in
> > range(-2**31, 2**31).
> > ...
> > I don't know what the other guys return (zlib.crc32(),
> > zlib.adler32(), ...?).
> >
> > It would sure be nice if they returned values in range(0,
> > 2**32) instead.  A difficulty with changing this stuff is that
> > checksums seem frequently to be read and written via the struct
> > module, with format code "l", and e.g.
> >
> > >>> struct.pack("!l", 1L << 31)
> > Traceback (most recent call last):
> >   File "", line 1, in ?
> > OverflowError: long int too large to convert to int

> [Guido]
> > Such programs will have to be changed to use format code "L" instead.

[Tim]
> I'm not following this.  At least binascii.crc32() always produces a 32-bit
> signed int now, so there's no *need* to use "L" now.  Are you saying that
> binascii.crc32() should be changed to return a non-negative value always?
> Also the other xyz.abc32() functions?

Um, I thought *you* were proposing that!  What else did you mean by
"It would sure be nice if they returned values in range(0, 2**32)
instead" ?

> > Or perhaps "l" should be allowed to accept longs in
> > range(-2**31, 2**32) ?
> 
> Well, unpacking a packed value wouldn't always return the value you
> started with then (pack 2**31 via "l", then unpack it via "l" and
> you get back -2**31), so it's not very attractive.  If you dump a
> checksum via pack, then unpack it later, you really want to get back
> the same value, not just "the same bits after some fiddling".

Yeah, you can't win. :-(

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Mon Aug 12 17:27:32 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 12:27:32 -0400
Subject: [Python-Dev] 32-bit values (was RE: [Python-checkins] python/dist/src/Lib/test test_zlib.py,1.18,1.19)
In-Reply-To: Your message of "Mon, 12 Aug 2002 18:24:47 +0200."
 
References:  <200208121554.g7CFs9l17354@pcp02138704pcs.reston01.va.comcast.net>
 
Message-ID: <200208121627.g7CGRWQ17542@pcp02138704pcs.reston01.va.comcast.net>

> Guido van Rossum  writes:
> 
> > Or perhaps "l" should be allowed to accept longs in range(-2**31,
> > 2**32) ?
> 
> For the struct and array modules, that sounds reasonable.

Though Tim brought up that then you won't always get back what you put
in (if you put in a value > sys.maxint, it comes back negative).

Is that a problem or not?  I tend to think that's not how this is most
often used.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net  Mon Aug 12 17:45:24 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 12 Aug 2002 12:45:24 -0400
Subject: [Python-Dev] 32-bit values (was RE: [Python-checkins]
 python/dist/src/Lib/test test_zlib.py,1.18,1.19)
In-Reply-To: <200208121626.g7CGQb117525@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

[Guido]
> Such programs will have to be changed to use format code "L" instead.

[Tim]
> I'm not following this.  At least binascii.crc32() always
> produces a 32-bit signed int now, so there's no *need* to use "L" now.
> Are you saying that binascii.crc32() should be changed to return a
> non-negative value always?  Also the other xyz.abc32() functions?

[Guido]
> Um, I thought *you* were proposing that!  What else did you mean by
> "It would sure be nice if they returned values in range(0, 2**32)
> instead" ?

I did suggest it, yes.  Had you said "Such programs *would* have to be
changed ...", my response would have been different.  But you said "will",
which reads like you already decided such a change will be made.  Now it
sounds like it's undecided (OK by me either way, I'm just trying to locate
our current position on the map ).



From guido@python.org  Mon Aug 12 17:51:20 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 12:51:20 -0400
Subject: [Python-Dev] 32-bit values (was RE: [Python-checkins] python/dist/src/Lib/test test_zlib.py,1.18,1.19)
In-Reply-To: Your message of "Mon, 12 Aug 2002 12:45:24 EDT."
 
References: 
Message-ID: <200208121651.g7CGpKM17796@pcp02138704pcs.reston01.va.comcast.net>

> Subject: RE: [Python-Dev] 32-bit values (was RE: [Python-checkins]
>     python/dist/src/Lib/test test_zlib.py,1.18,1.19)
> From: Tim Peters 
> To: Guido van Rossum 
> Cc: PythonDev 
> Date: Mon, 12 Aug 2002 12:45:24 -0400
> X-Spam-Level: 
> 
> [Guido]
> > Such programs will have to be changed to use format code "L" instead.
> 
> [Tim]
> > I'm not following this.  At least binascii.crc32() always
> > produces a 32-bit signed int now, so there's no *need* to use "L" now.
> > Are you saying that binascii.crc32() should be changed to return a
> > non-negative value always?  Also the other xyz.abc32() functions?
> 
> [Guido]
> > Um, I thought *you* were proposing that!  What else did you mean by
> > "It would sure be nice if they returned values in range(0, 2**32)
> > instead" ?
> 
> I did suggest it, yes.  Had you said "Such programs *would* have to be
> changed ...", my response would have been different.  But you said "will",
> which reads like you already decided such a change will be made.  Now it
> sounds like it's undecided (OK by me either way, I'm just trying to locate
> our current position on the map ).

I responded to your specific example, which probably wasn't how you
intended to use it.

I really don't know what's the best return range for 32-bit checksums
given all the constraints.  I'd leave this alone until we have decided
what to do with the other issues (like what to do with extensions that
use signed 32-bit values to represent masks now).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de  Mon Aug 12 17:23:00 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 12 Aug 2002 18:23:00 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: <3D57DC42.6070300@lemburg.com>
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com>
 <200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>
 <3D4E97A7.7000904@lemburg.com>
 <200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net>
 <3D579245.2080306@livinglogic.de>
 
 <3D57DC42.6070300@lemburg.com>
Message-ID: 

"M.-A. Lemburg"  writes:

> > What are new-style exceptions?
> 
> Exceptions that are built as subclassable types.

Exceptions first of all inherit from Exception. When/if Exception
stops being a class, we'll have to deal with more issues than the PEP
293 exceptions.

> There's nothing premature here. By moving exception handling to
> C level, you get *much* better performance than at Python level.

Can you give a specific example: What Python code, how much better
performance?

> This doesn't work as I've already explained before. The predefined
> error handling modes of builtin codecs must work with relying on
> the Python import mechanism.

You mean "without"? Where did you explain this before? And why is
that? Guido argues that more of the central interpreter machinery must
be moved to Python - I can't see why codecs should be an exception
here.

> What ? That exceptions are immutable ? I think it's a big win that
> exceptions are in fact mutable -- they are great for transporting
> extra information up the chain...

I see. So this is an open issue.

Regards,
Martin


From trentm@ActiveState.com  Mon Aug 12 18:37:29 2002
From: trentm@ActiveState.com (Trent Mick)
Date: Mon, 12 Aug 2002 10:37:29 -0700
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <200208121338.g7CDc2l11722@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Mon, Aug 12, 2002 at 09:38:02AM -0400
References: <200208121148.g7CBm6KD027268@paros.informatik.hu-berlin.de> <200208121338.g7CDc2l11722@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020812103729.I386@ActiveState.com>

[Guido van Rossum wrote]
> 
> I've added some comments to the patch.
> 
> I'm +0 on the PEP; I'd like to defer to people who actually use
> Windows like Mark Hammond and Tim Peters.

We (ActiveState) are committed to getting the functionality in -- which
means that David and myself can help with coding and testing.  Depending
on the schedule, I can help with e.g. filling out the test suite.  We
can also setup some test machines to test things that are probably too
odd to fit in the test suite.

Martin, if there is anything we can do to help with for this patch,
please let us know.

Trent



From mal@lemburg.com  Mon Aug 12 18:43:20 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 12 Aug 2002 19:43:20 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com>	<200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>	<3D4E97A7.7000904@lemburg.com>	<200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net>	<3D579245.2080306@livinglogic.de>		<3D57DC42.6070300@lemburg.com> 
Message-ID: <3D57F3B8.7090507@lemburg.com>

Martin v. Loewis wrote:
> "M.-A. Lemburg"  writes:
> 
> 
>>>What are new-style exceptions?
>>
>>Exceptions that are built as subclassable types.
> 
> 
> Exceptions first of all inherit from Exception. When/if Exception
> stops being a class, we'll have to deal with more issues than the PEP
> 293 exceptions.

Right. It would be nice to have classes or at least exceptions
turn into new-style types as well. Then you'd have access to
slots and all the other goodies which make a great difference
in accessing performance at C level.

>>There's nothing premature here. By moving exception handling to
>>C level, you get *much* better performance than at Python level.
> 
> 
> Can you give a specific example: What Python code, how much better
> performance?

Walter has the details here.

>>This doesn't work as I've already explained before. The predefined
>>error handling modes of builtin codecs must work with relying on
>>the Python import mechanism.
> 
> 
> You mean "without"? 

Right. s/with/without/.

> Where did you explain this before? 

Hmm, I remember having posted the reasoning I gave here
in another response on this thread, but I can't find it
at the moment.

 > And why is
> that? Guido argues that more of the central interpreter machinery must
> be moved to Python - I can't see why codecs should be an exception
> here.

The problem is the same as what we had with the exceptions.py
module early on in the 1.6 alphas: if this module isn't found
all kinds of things start failing. The same would happen when
you start to use builtin codecs which have external error handler
implementation as .py files, e.g. unicode('utf-8', 'replace')
could then fail because of an ImportError.

For the charmap codec it's mostly about performance. I don't
have objections for other codecs which rely on external
resources.

>>What ? That exceptions are immutable ? I think it's a big win that
>>exceptions are in fact mutable -- they are great for transporting
>>extra information up the chain...
> 
> I see. So this is an open issue.

I wouldn't call it an issue. It's a feature :-) (and one that makes
Python's exception mechanism very powerful)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From faassen@vet.uu.nl  Mon Aug 12 18:57:18 2002
From: faassen@vet.uu.nl (Martijn Faassen)
Date: Mon, 12 Aug 2002 19:57:18 +0200
Subject: [Python-Dev] Re: [PythonLabs] PEP 2
In-Reply-To: <200208121540.g7CFeOJ16812@pcp02138704pcs.reston01.va.comcast.net>
References:  <200208121540.g7CFeOJ16812@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020812175718.GB20846@vet.uu.nl>

[Barry]

 [snip] 
 I was going to point David at PEP 2 as the guidelines for getting modules 
 added to the standard library, but I don't think PEP 2 really describes 
 current practice.

 PEP 2 says:

    When developers wish to have a contribution accepted into the
    standard library, they will first form a group of maintainers
    (normally initially consisting of themselves).

    Then, this group shall produce a PEP called a library PEP. A
    library PEP is a special form of standards track PEP.  The library
    PEP gives an overview of the proposed contribution, along with the
    proposed contribution as the reference implementation.  This PEP
    should also contain a motivation on why this contribution should
    be part of the standard library.

 I think only in rare situations do we need a PEP for a library
 module.  If you agree then I think we should rewrite PEP 2 to describe
 current practice.

 [Barry's description of current practice later in this post]

[Guido]

 Sounds like a good idea.

 It really depends a lot on circumstances.  PEP 282 was written to
 propose a logging module, and then a logging module was proposed that
 will soon go into the std lib (I hope).  But Optik will end up being
 adapted without a PEP.

[back to me]

PEP 2 indeed does not describe current practice; it tries to introduce new
procedures for the development of the standard library.

PEP 2 was written as informed by this post by Tim Peters:

http://groups.google.com/groups?q=g:thl2977775070d&dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=mailman.993958922.4025.python-list%40python.org

which in turn is referring to this post by Guido:

http://groups.google.com/groups?q=g:thl717774049d&dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=cpvglejfl8.fsf%40cj20424-a.reston1.va.home.com

Tim said the following:

> All the core
> developers have done major work on the libraries, so that's not a hangup.
> What is a hangup is that people also want stuff the current group of core
> developers has no competence (and/or sometimes interest) to write.  Like SSL
> support on Windows, or IPv6 support, etc.  Expert-level work in a field
> requires experts in that field to contribute.  We also need a plan to keep
> their stuff running after they go away again, the lack of which is one
> strong reason Guido resists adding stuff to the library.

And he suggested I look at the then empty PEP 2.

I took the point of these postings to be that many core library modules
cannot be developed and maintained by the core developers, as they lack
the knowledge and expertise to do so. Therefore the community needs
to develop and importantly also maintain those. In particular,
having explicit and active maintainers was held to be a particularly 
important precondition for library inclusion by the core developers.

Since I thought development of the standard library was important I
tried to make sure that these requirements were to be fulfilled by
people wanting to add a new module.

Barry describes these steps as the way new modules get added now:

- develop the library as an independent project, outside the Python project

- make the library available to the Python community, usually in the
  form of a distutils package

- get feedback and experience from the Python community at large

- if the module becomes popular, is widely backed, and/or fills a   
  niche in the standard library, propose it for inclusion in the core
  distro via discussion and consensus building on python-dev
   
And quoting from PEP 2:
  
    The library PEP differs from a normal standard track PEP in that
    the reference
    implementation should in this case always already have been
    written before the PEP is to be reviewed for inclusion by the
    integrators and to be commented upon by the community; the
    reference implementation _is_ the proposed contribution.

and Barry quotes PEP 2 himself:

    When developers wish to have a contribution accepted into the
    standard library, they will first form a group of maintainers
    (normally initially consisting of themselves).

This is not incompatible at all with the above procedure, if you 
read it carefully. :) The idea of PEP 2 is that there is *already* 
a (potential) contribution, and people want it accepted into the
standard library. The PEP does not describe where this contribution
is coming from; it may indeed be a popular module or it may fill
a niche or whatnot, or it may simply have a stunning design and
implementation extremely useful to everybody. I can make this more
explicit in the PEP.

What PEP 2 tries to supply is a procedure to follow if people
have already decided they would like to try to get a module or set of
modules accepted into the standard library. They can decide this before
or after they write the module; the PEP doesn't care -- as long as the
module is there when they submit the library PEP. At least they know
there'll be Integrators that will review things, and they know they had
better come up with some maintainers before submitting the PEP.

I can add a phrase to make initial community participation more important,
though the PEP process itself already indicates to ask for community input
when submitting a PEP.

Right now the Integrators are described to be PythonLabs, but this could be a
separate group dedicated to maintaining the library later on if desired.

Anyway, what I was aiming for with the PEP was not to codify existing
procedure but to improve on it, informed by what I thought were
the requirements. I thought it was a problem that the standard library is 
apparently not the central focus of the core developers, while the standard
library *is* a very important part of what makes Python appealing. If
the community is to develop it further, I thought the procedures to do
so needed a bit more framework than there is now.

Perhaps I misunderstood something and perhaps PEP 2 won't help anyway,
but that's the background behind it.

I cc-ed my reply to Python-dev as this may need a bit more input. There was
only little (Aazh I recall, though shamefully I think I forgot to integrate
one of his suggestions on backporting bugfixes) when I first posted it both 
on comp.lang.python and on python-dev, but perhaps the reasoning behind it
was unclear.

PEP 2 is here:

http://www.python.org/peps/pep-0002.html

Regards,

Martijn



From walter@livinglogic.de  Mon Aug 12 19:39:25 2002
From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=)
Date: Mon, 12 Aug 2002 20:39:25 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com>	<200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>	<3D4E97A7.7000904@lemburg.com>	<200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net>	<3D579245.2080306@livinglogic.de> 
Message-ID: <3D5800DD.1050108@livinglogic.de>

This is a multi-part message in MIME format.
--------------090104050809030407000604
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 8bit

Martin v. Loewis wrote:

> Walter Dörwald  writes:
> 
> 
>> > - charmap_encoding_error, which insists on implementing known error
>> >   handling algorithms inline,
>>
>>This is done for performance reasons.
> 
> 
> Is that really worth it? Such errors are rare, and when they occur,
> they usually cause an exception as the result of the "strict" error
> handling.

Of course it's irrelevant how fast the exception is raised, but it could
be important for the handlers that really do a replacement.

> I'd strongly encourage you to avoid duplication of code, and use
> Python whereever possible.
> 
> 
>>The PyCodec_XMLCharRefReplaceErrors functionality is
>>independent of the rest, so moving this to Python
>>won't reduce complexity that much. And it will
>>slow down "xmlcharrefreplace" handling for those
>>codecs that don't implement it inline.
> 
> 
> Sure it will. But how much does that matter in the overall context of
> generating HTML/XML?

See the attached test script. It encodes 100 versions of the german
text on http://www.gutenberg2000.de/grimm/maerchen/tapfere.htm

Output is as follows:
1790000 chars, 2.330% unenc
ignore: 0.022 (factor=1.000)
xmlcharrefreplace: 0.044 (factor=1.962)
xml2: 0.267 (factor=12.003)
xml3: 0.723 (factor=32.506)
workaround: 5.151 (factor=231.702)
i.e. a 1.7MB string with 2.3% unencodable characters was
encoded.

Using the the inline xmlcharrefplace instead of ignore is
half as fast. Using a callback instead of the inline
implementation is a factor of 12 slower than ignore.
Using the Python implementation of the callback is a
factor of 32 slower and using the pre-PEP workaround
is a factor of 231 slower.

Replacing every unencodable character with u"\u4242" and
using "iso-8859-15" gives:
ignore: 0.351 (factor=1.000)
xmlcharrefreplace: 0.390 (factor=1.113)
xml2: 0.653 (factor=1.862)
xml3: 1.137 (factor=3.244)
workaround: 12.310 (factor=35.117)

 > [...]
>>The exception attributes can then be members of the C struct and the
>>accessor functions can be simple macros.
> 
> 
> Again, I sense premature optimization.

No it's more like anticipating change.

>>1. For each error handler two Python function objects are created:
>>One in the registry and a different one in the codecs module. This
>>means that e.g.
>>codecs.lookup_error("replace") != codecs.replace_errors
> 
> 
> Why would this be a problem?

It's just unintuitive.

>>We can fix that by making the name ob the Python function object
>>globally visible or by changing the codecs init function to do a
>>lookup and use the result or simply by removing codecs.replace_errors
> 
> 
> I recommend to fix this by implementing the registry in Python.

Even simpler would be to move the initialization of the module
variables from Modules/_codecsmodule.c to Lib/codecs.py. There is
no need for them to be available in _codecs. All that is required
for this change is to add

    strict_errors = lookup_error("strict")
    ignore_errors = lookup_error("ignore")
    replace_errors = lookup_error("replace")
    xmlcharrefreplace_errors = lookup_error("xmlcharrefreplace")
    backslashreplace_errors = lookup_error("backslashreplace")

to codecs.py

The registry should be available via two simple C APIs, just
like the encoding registry.

>>4. Assigning to an attribute of an exception object does not
>>change the appropriate entry in the args attribute. Is this
>>worth changing?
> 
> 
> No. Exception objects should be treated as immutable (even if they
> aren't).

The codecs in the PEP *do* modify attributes of the exception
object.

> If somebody complains, we can fix it; until then, it suffices
> if this is documented.

It can't really be fixed for codecs implemented in Python. For codecs
that use the C functions we could add the functionality that e.g.
PyUnicodeEncodeError_SetReason(exc) sets exc.reason and exc.args[3],
but AFAICT it can't be done easily for Python where attribute assignment
directly goes to the instance dict.

If those exception classes were new style classes it would be simple, 
because the attributes would be properties and args would probably
be generated lazily.

>>5. UTF-7 decoding does not yet take full advantage of the machinery:
>>When an unterminated shift sequence is encountered (e.g. "+xxx")
>>the faulty byte sequence has already been emitted.
> 
> 
> It would be ok if it works as good as it did in 2.2. UTF-7 is rarely
> used; if it is used, it is machine-generated, so there shouldn't be
> any errors.

It does:
 >>> "+xxx".decode("utf-7", "replace")
u'\uc71c\ufffd'

althought the result should probably have been u'\ufffd'.

Bye,
    Walter Dörwald

--------------090104050809030407000604
Content-Type: text/plain;
 name="test2.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="test2.py"

import codecs, time

def xml3(exc):
	if isinstance(exc, UnicodeEncodeError):
		return (u"".join([ u"&#%d;" % ord(c) for c in exc.object[exc.start:exc.end]]), exc.end)
	else:
		raise TypeError("don't know how to handle %r" % exc)

count = 0

def check(exc):
	global count
	count += exc.end-exc.start
	return (u"", exc.end)

codecs.register_error("xmlcheck", check)
codecs.register_error("xml2", codecs.xmlcharrefreplace_errors)
codecs.register_error("xml3", xml3)

l = 100
s = unicode(open("tapferschneider.txt").read(), "latin-1")
s *= l

s.encode("ascii", "xmlcheck")

print "%d chars, %.03f%% unenc" % (len(s), 100.*(float(count)/len(s)))

handlers = ["ignore", "xmlcharrefreplace", "xml2", "xml3"]
times = [0]*(len(handlers)+1)
res = [0]*(len(handlers)+1)
for (i, h) in enumerate(handlers):
	t1 = time.time()
	res[i] = s.encode("ascii", h)
	t2 = time.time()
	times[i] = t2-t1
	print "%s: %.03f (factor=%.03f)" % (handlers[i], times[i], times[i]/times[0])

i = len(handlers)
t1 = time.time()
v = []
for c in s:
	try:
		v.append(c.encode("ascii"))
	except UnicodeError:
		v.append("&#%d;" % ord(c))
res[i] = "".join(v)
t2 = time.time()
times[i] = t2-t1
print "workaround: %.03f (factor=%.03f)" % (times[i], times[i]/times[0])


--------------090104050809030407000604--



From walter@livinglogic.de  Mon Aug 12 19:47:54 2002
From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=)
Date: Mon, 12 Aug 2002 20:47:54 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com>	<200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>	<3D4E97A7.7000904@lemburg.com>	<200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net>	<3D579245.2080306@livinglogic.de>  <3D57DC42.6070300@lemburg.com>
Message-ID: <3D5802DA.8030604@livinglogic.de>

M.-A. Lemburg wrote:

> Martin v. Loewis wrote:
 > [...]
>>> codecs.lookup_error("replace") != codecs.replace_errors
>>
>> [...]
>> I recommend to fix this by implementing the registry in Python.
> 
> This doesn't work as I've already explained before. The predefined
> error handling modes of builtin codecs must work with relying on
> the Python import mechanism.

s/with/without/ ?

At least "strict" should be implemented inline, because reading
broken .pyc files which contain (utf-8 encoded) unicode constants
would probably lead to all kinds of interesting problems.

[...]

Bye,
    Walter Dörwald



From sholden@holdenweb.com  Mon Aug 12 20:05:01 2002
From: sholden@holdenweb.com (Steve Holden)
Date: Mon, 12 Aug 2002 15:05:01 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
References: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com> <3D56E2DA.90400@lemburg.com> <200208120025.g7C0PRg02588@pcp02138704pcs.reston01.va.comcast.net> <3D576D3F.80906@lemburg.com> <200208121323.g7CDNw711590@pcp02138704pcs.reston01.va.comcast.net>  <200208121537.g7CFb3m16783@pcp02138704pcs.reston01.va.comcast.net>              <3D57D961.2020600@lemburg.com>  <200208121557.g7CFv9617378@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <086f01c24233$290dc9d0$6300000a@holdenweb.com>

----- Original Message -----
From: "Guido van Rossum" 
To: "M.-A. Lemburg" 
Cc: "Martin v. Loewis" ; "Jack Jansen"
; 
Sent: Monday, August 12, 2002 11:57 AM
Subject: Re: [Python-Dev] Deprecation warning on integer shifts and such


> > > Oops.  Darn.  You're right.  Sigh.  That's painful.  We have to add a
> > > new format code (or more) to accept signed 32-bit ints but also longs
> > > in range(32).
> >
> > Rather than inventing something new to be compatible to the existing
> > old status quo, I'd rather like to see new format codes for unsigned
> > integers and/or longs and have the existing ones support the new
> > status quo.
>
> That's okay too.  The function could be PyInt_AsUnsignedLong().  It
> could convert negative 32-bit ints to unsigned as a backward
> compatibility measure (with warning?) that will eventually disappear.
>
> The format code could be 'I' for unsigned int, but I don't know what
> to use for unsigned long.  Or perhaps still use 'k'/'K' for masK?
>

Does 32 here actually mean 32, or does it mean length of int -- I'm
presuming there are or will be platforms with 64-bit PyInts?

regards
-----------------------------------------------------------------------
Steve Holden                                 http://www.holdenweb.com/
Python Web Programming                http://pydish.holdenweb.com/pwp/
-----------------------------------------------------------------------






From guido@python.org  Mon Aug 12 20:13:38 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 15:13:38 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Mon, 12 Aug 2002 15:05:01 EDT."
 <086f01c24233$290dc9d0$6300000a@holdenweb.com>
References: <0C474D03-AD72-11D6-9656-003065517236@oratrix.com> <3D56E2DA.90400@lemburg.com> <200208120025.g7C0PRg02588@pcp02138704pcs.reston01.va.comcast.net> <3D576D3F.80906@lemburg.com> <200208121323.g7CDNw711590@pcp02138704pcs.reston01.va.comcast.net>  <200208121537.g7CFb3m16783@pcp02138704pcs.reston01.va.comcast.net> <3D57D961.2020600@lemburg.com> <200208121557.g7CFv9617378@pcp02138704pcs.reston01.va.comcast.net>
 <086f01c24233$290dc9d0$6300000a@holdenweb.com>
Message-ID: <200208121913.g7CJDdb20707@pcp02138704pcs.reston01.va.comcast.net>

> > > > Oops.  Darn.  You're right.  Sigh.  That's painful.  We have to add a
> > > > new format code (or more) to accept signed 32-bit ints but also longs
> > > > in range(32).
> > >
> > > Rather than inventing something new to be compatible to the existing
> > > old status quo, I'd rather like to see new format codes for unsigned
> > > integers and/or longs and have the existing ones support the new
> > > status quo.
> >
> > That's okay too.  The function could be PyInt_AsUnsignedLong().  It
> > could convert negative 32-bit ints to unsigned as a backward
> > compatibility measure (with warning?) that will eventually disappear.
> >
> > The format code could be 'I' for unsigned int, but I don't know what
> > to use for unsigned long.  Or perhaps still use 'k'/'K' for masK?
> 
> Does 32 here actually mean 32, or does it mean length of int -- I'm
> presuming there are or will be platforms with 64-bit PyInts?

Good question.  Much code that uses these features assumes 32 bits.
OTOH the same problem occurs for real on 64-bit systems at the 64-bit
boundary.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Jack.Jansen@oratrix.com  Mon Aug 12 20:56:23 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Mon, 12 Aug 2002 21:56:23 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <200208121148.g7CBm6KD027268@paros.informatik.hu-berlin.de>
Message-ID: <93F350A8-AE2D-11D6-B7D6-003065517236@oratrix.com>

On maandag, augustus 12, 2002, at 01:48 , Martin v. L=F6wis wrote:

> http://www.python.org/peps/pep-0277.html
>
> The PEP describes a Windows-only change to Unicode in file names: On
> Windows NT/2k/XP, Python would allow arbitrary Unicode strings as file
> names and pass them to the OS, instead of converting them to CP_ACP
> first. This applies to open() and all os functions that accept
> filenames.
>
> In addition, os.list() would return Unicode filenames if the argument
> is Unicode.

This is the bit I still don't like (at least, if I'm not=20
mistaken I commented on it a while ago too). A routine could be=20
doing an os.list() expecting strings, but suddenly someone=20
passes it a unicode directoryname and the return value would=20
change.

I would much prefer an optional encoding argument whereby you=20
give the encoding in which you want the return value. Default=20
would be the local filesystem encoding. If you pass unicode you=20
will get direct unicode on XP/2K, and a converted string on=20
other platforms (but always unicode).

Oh yes, the same reasoning would hold for readlink(), getcwd()=20
and any other call that returns filenames.
--
- Jack Jansen               =20
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution --=20
Emma Goldman -



From guido@python.org  Mon Aug 12 21:07:46 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 16:07:46 -0400
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: Your message of "Mon, 12 Aug 2002 21:56:23 +0200."
 <93F350A8-AE2D-11D6-B7D6-003065517236@oratrix.com>
References: <93F350A8-AE2D-11D6-B7D6-003065517236@oratrix.com>
Message-ID: <200208122007.g7CK7li21777@pcp02138704pcs.reston01.va.comcast.net>

> > http://www.python.org/peps/pep-0277.html
> >
> > The PEP describes a Windows-only change to Unicode in file names: On
> > Windows NT/2k/XP, Python would allow arbitrary Unicode strings as file
> > names and pass them to the OS, instead of converting them to CP_ACP
> > first. This applies to open() and all os functions that accept
> > filenames.
> >
> > In addition, os.list() would return Unicode filenames if the argument
> > is Unicode.
> 
> This is the bit I still don't like (at least, if I'm not 
> mistaken I commented on it a while ago too). A routine could be 
> doing an os.list() expecting strings, but suddenly someone 
> passes it a unicode directoryname and the return value would 
> change.

Hm, that would be the responsibility of whoever passes it Unicode.
Most code works just fine when presented with Unicode where 8-bit
strings are expected.  It's only code that assumes the 8-bit strings
are Latin-1 (or something else besides ASCII) that gets in trouble.

But shouldn't it return Unicode whenever there are filenames in the
directory that can't represented as ASCII?

That's what Tkinter does: Tk gives back UTF-8, which degenerates to
ASCII if there are only ASCII chars; if any high bits are detected,
Tkinter decodes the UTF-8, turning the return string into Unicode.

> I would much prefer an optional encoding argument whereby you 
> give the encoding in which you want the return value. Default 
> would be the local filesystem encoding. If you pass unicode you 
> will get direct unicode on XP/2K, and a converted string on 
> other platforms (but always unicode).

Hm, I don't know if I'd like os.listdir() to have an encoding
argument.  Sounds like the wrong solution somehow.

> Oh yes, the same reasoning would hold for readlink(), getcwd() 
> and any other call that returns filenames.

Ditto.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Jack.Jansen@oratrix.com  Mon Aug 12 21:18:26 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Mon, 12 Aug 2002 22:18:26 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <200208121537.g7CFb3m16783@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

On maandag, augustus 12, 2002, at 05:37 , Guido van Rossum wrote:
> Oops.  Darn.  You're right.  Sigh.  That's painful.  We have to add a
> new format code (or more) to accept signed 32-bit ints but also longs
> in range(32).  This should be added in 2.3 so extensions can start
> using it now, and user code can start passing longs in range(2**32)
> now.  I propose 'k' (for masK).  We should backport this to 2.2.2 as
> well.  Plus a variant on PyInt_AsLong() with the same semantics, maybe
> named PyInt_AsMask().

Ow, pleeeeeeeeeeeeeeeeeeeeeeeeaaaaaase........

Just before 2.1 was released (or was it 2.0?) on a whim someone 
"fixed" the short integer handling to bother about signs, in a 
backward-incompatible way, despite that fact that about 95% of 
the short PyArg_Parse formats in the core were mine, and I asked 
for some form of backward compatibility. I spent about 2 weeks 
going over a few thousand API calls to fix this mess at a time I 
had more than enough other work on my hands.

Can we please make this change in a backwards-compatible way, 
i.e. leave the i and l formats alone and use something new for 
"range-checked-int" and "range-checked-long"?

I already fear that I have to come up with some sort of a fix 
for the range-check warning (more than 6000 lines worth of 
constant definitions that can currently be copied verbatim from 
C header files to Python will have to be parsed, and computed, 
and all these things can contain references to other constants, 
strings and who knows what more, see Mac/Lib/Carbon/*.py), I 
really could do without more work on my plate...
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -



From guido@python.org  Mon Aug 12 21:49:54 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 16:49:54 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Mon, 12 Aug 2002 22:18:26 +0200."
 
References: 
Message-ID: <200208122049.g7CKnsD23148@pcp02138704pcs.reston01.va.comcast.net>

> On maandag, augustus 12, 2002, at 05:37 , Guido van Rossum wrote:
> > Oops.  Darn.  You're right.  Sigh.  That's painful.  We have to add a
> > new format code (or more) to accept signed 32-bit ints but also longs
> > in range(32).  This should be added in 2.3 so extensions can start
> > using it now, and user code can start passing longs in range(2**32)
> > now.  I propose 'k' (for masK).  We should backport this to 2.2.2 as
> > well.  Plus a variant on PyInt_AsLong() with the same semantics, maybe
> > named PyInt_AsMask().
> 
> Ow, pleeeeeeeeeeeeeeeeeeeeeeeeaaaaaase........
> 
> Just before 2.1 was released (or was it 2.0?) on a whim someone 
> "fixed" the short integer handling to bother about signs, in a 
> backward-incompatible way, despite that fact that about 95% of 
> the short PyArg_Parse formats in the core were mine, and I asked 
> for some form of backward compatibility. I spent about 2 weeks 
> going over a few thousand API calls to fix this mess at a time I 
> had more than enough other work on my hands.

Oops.  That wasn't intended of course.

> Can we please make this change in a backwards-compatible way, 
> i.e. leave the i and l formats alone and use something new for 
> "range-checked-int" and "range-checked-long"?

Um, they *already* do range checking.  'i' requires that the value
(whether it comes from a Python int or a Python long) is in the range
[INT_MIN, INT_MAX].  'l' doesn't do range checking on Python ints
(because they are defined to fit in a C long), but for a Python long
it requires that the value fits in the range of a signed C long,
i.e. [-sys.maxint-1, sys.maxint].

The problem is that hex constants and shifted values may be
represented by signed Python ints, abusing the sign bit as a mask bit,
*or* by Python longs that represent corresponding values by
nonnegative values in the range [0, 2*sys.maxint+1].  Since
PyArg_Parse* doesn't know whether the value will be used as a mask or
as a real signed int, we must allow negative Python longs (in the
range [-sys.maxint-1, -1] as well.

That means that the range checking (for long values) has to be
defective from the point of view of a function that doesn't want a
mask value but really expects a signed C int or long: Python code can
pass in a Python long value that's too large, but because it's still
in the 32-bit range the Python code won't be told about the overflow
error, and the C code will happily be given a large negative value.

If we really believe that there's more code (in the world, not just in
the core CVS tree) that uses 'i' or 'l' for masks than that uses it
for signed values, we cold fix 'i' and 'l' this way, and add new codes
for code that really wants signed values.  Still, all that code would
have to be fixed somehow and we would have to track it down.

And then we'd still be stuck with PyInt_AsLong() -- should it use the
same rule?  I hope not.

> I already fear that I have to come up with some sort of a fix 
> for the range-check warning (more than 6000 lines worth of 
> constant definitions that can currently be copied verbatim from 
> C header files to Python will have to be parsed, and computed, 
> and all these things can contain references to other constants, 
> strings and who knows what more, see Mac/Lib/Carbon/*.py), I 
> really could do without more work on my plate...

Before you start to panic, can you please try to import all those
modules and see how many cause warnings?  I only found one,
Controls.py line 11, but there are many files that I can't run because
they require extension modules I don't have.  I do note that none of
them generate warnings in the parser.  I also found a SyntaxError that
contradicts your assertion about "can currently be copied verbatim
from C header files".

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jepler@unpythonic.net  Mon Aug 12 21:51:08 2002
From: jepler@unpythonic.net (Jeff Epler)
Date: Mon, 12 Aug 2002 15:51:08 -0500
Subject: [Python-Dev] Performance (non)optimization: 31-bit ints in pointers
Message-ID: <20020812205107.GA15411@unpythonic.net>

Many Lisp interpreters use 'tagged types' to, among other things, let
small ints reside directly in the machine registers.

Python might wish to take advantage of this by designating pointers to odd
addresses stand for integers according to the following relationship:
    p = (i<<1) | 1
    i = (p>>1)
(due to alignment requirements on all common machines, all valid
pointers-to-struct have 0 in their low bit)  This means that all integers
which fit in 31 bits can be stored without actually allocating or deallocating
anything.

I modified a Python interpreter to the point where it could run simple
programs.  The changes are unfortunately very invasive, because they
make any C code which simply executes
    o->ob_type
or otherwise dereferences a PyObject* invalid when presented with a
small int.  This would obviously affect a huge amount of existing code in
extensions, and is probably enough to stop this from being implemented
before Python 3000.

This also introduces another conditional branch in many pieces of code, such
as any call to PyObject_TypeCheck().

Performance results are mixed.  A small program designed to test the
speed of all-integer arithmetic comes out faster by 14% (3.38 vs 2.90
"user" time on my machine) but pystone comes out 5% slower (14124 vs 13358
"pystones/second").

I don't know if anybody's barked up this tree before, but I think
these results show that it's almost certainly not worth the effort to
incorporate this "performance" hack in Python.  I'll keep my tree around
for awhile, in case anybody else wants to see it, but beware that it
still has serious issues even in the core:
    >>> 0+0j
    Traceback (most recent call last):
      File "", line 1, in ?
    TypeError: unsupported operand types for +: 'int' and 'complex'
    >>> (0).__class__
    Segmentation fault

Jeff
jepler@unpythonic.net
PS The program that shows the advantage of this optimization is as follows:

j = 0
for k in range(10):
    for i in range(100000) + range(1<<30, 1<<30 + 100000):
	j = j ^ i
print j


From guido@python.org  Mon Aug 12 22:07:31 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 17:07:31 -0400
Subject: [Python-Dev] Performance (non)optimization: 31-bit ints in pointers
In-Reply-To: Your message of "Mon, 12 Aug 2002 15:51:08 CDT."
 <20020812205107.GA15411@unpythonic.net>
References: <20020812205107.GA15411@unpythonic.net>
Message-ID: <200208122107.g7CL7aE23242@pcp02138704pcs.reston01.va.comcast.net>

> Many Lisp interpreters use 'tagged types' to, among other things, let
> small ints reside directly in the machine registers.
> 
> Python might wish to take advantage of this by designating pointers to odd
> addresses stand for integers according to the following relationship:
>     p = (i<<1) | 1
>     i = (p>>1)
> (due to alignment requirements on all common machines, all valid
> pointers-to-struct have 0 in their low bit)  This means that all integers
> which fit in 31 bits can be stored without actually allocating or deallocating
> anything.
> 
> I modified a Python interpreter to the point where it could run simple
> programs.  The changes are unfortunately very invasive, because they
> make any C code which simply executes
>     o->ob_type
> or otherwise dereferences a PyObject* invalid when presented with a
> small int.  This would obviously affect a huge amount of existing code in
> extensions, and is probably enough to stop this from being implemented
> before Python 3000.
> 
> This also introduces another conditional branch in many pieces of code, such
> as any call to PyObject_TypeCheck().
> 
> Performance results are mixed.  A small program designed to test the
> speed of all-integer arithmetic comes out faster by 14% (3.38 vs 2.90
> "user" time on my machine) but pystone comes out 5% slower (14124 vs 13358
> "pystones/second").
> 
> I don't know if anybody's barked up this tree before, but I think
> these results show that it's almost certainly not worth the effort to
> incorporate this "performance" hack in Python.  I'll keep my tree around
> for awhile, in case anybody else wants to see it, but beware that it
> still has serious issues even in the core:
>     >>> 0+0j
>     Traceback (most recent call last):
>       File "", line 1, in ?
>     TypeError: unsupported operand types for +: 'int' and 'complex'
>     >>> (0).__class__
>     Segmentation fault

We used *exactly* this approach in ABC.  I decided not to go with it
in Python, for two reasons that are essentially what you write up
here: (1) the changes are very pervasive (in ABC, we kept finding
places where we had pointer-manipulating code that had to be fixed to
deal with the small ints), and (2) it wasn't at all clear if it was a
performance win in the end (all the extra tests and special cases
may cost as much as you gain).

In general, ABC tried to use many tricks from the books (e.g. it used
asymptotically optimal B-tree algorithms to represent dicts, lists and
strings, to guarantee performance of slicing and dicing operations for
strings of absurd lengths).  In Python I decided to stay away from
cleverness except when extensive performance analysis showed there was
a real need to speed something up.  That got us super-fast dicts, for
example, and .pyc files to cache the work of the (slow, but
trick-free) parser.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Jack.Jansen@oratrix.com  Mon Aug 12 22:24:40 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Mon, 12 Aug 2002 23:24:40 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <200208122049.g7CKnsD23148@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

On maandag, augustus 12, 2002, at 10:49 , Guido van Rossum wrote:
>> Can we please make this change in a backwards-compatible way,
>> i.e. leave the i and l formats alone and use something new for
>> "range-checked-int" and "range-checked-long"?
>
> Um, they *already* do range checking.  'i' requires that the value
> (whether it comes from a Python int or a Python long) is in the range
> [INT_MIN, INT_MAX].  'l' doesn't do range checking on Python ints
> (because they are defined to fit in a C long), but for a Python long
> it requires that the value fits in the range of a signed C long,
> i.e. [-sys.maxint-1, sys.maxint].

Yes, but due to the way the parser works everything works fine 
for me. In
the constant definition file it says "foo = 0xff000000". The 
parser turns this
into a negative integer. Fine with me, the bits are the same, 
and PyArg_Parse
doesn't complain.

> If we really believe that there's more code (in the world, not just in
> the core CVS tree) that uses 'i' or 'l' for masks than that uses it
> for signed values, we cold fix 'i' and 'l' this way, and add new codes
> for code that really wants signed values.  Still, all that code would
> have to be fixed somehow and we would have to track it down.

I think what it boils down to is what Python's model of the 
world is: C or mathematics. It used to be C, which is probably 
the one reason Python caught on initially (whereas ABC with it's 
mathematical model didn't, really). I can see the reason behind 
moving towards a more consistent world view, where integers are 
integers, be they 32 bits or more, where strings are strings, be 
they unicode or ascii, and I even agree with it, up to a point.

The drawback is that it will make it more difficult to interface 
Python to the real world, where integers have a size, characters 
are 8 bits, binary data is "char *" too, unicode has funny APIs, 
etc. And I happen to feel responsible for a lot of this real 
world interfacing code:-)

> Before you start to panic, can you please try to import all those
> modules

I just did so, see my other mail. You're right, the problem is 
theoretically big, but pretty manageable in practice.
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -



From Jack.Jansen@oratrix.com  Mon Aug 12 22:06:29 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Mon, 12 Aug 2002 23:06:29 +0200
Subject: [Python-Dev] Correction: Deprecation warning on integer shifts and such
In-Reply-To: 
Message-ID: <5F21B008-AE37-11D6-B7D6-003065517236@oratrix.com>

On maandag, augustus 12, 2002, at 10:18 , Jack Jansen wrote:
>
> I already fear that I have to come up with some sort of a fix 
> for the range-check warning (more than 6000 lines worth of 
> constant definitions that can currently be copied verbatim from 
> C header files to Python will have to be parsed, and computed, 
> and all these things can contain references to other constants, 
> strings and who knows what more, see Mac/Lib/Carbon/*.py), I 
> really could do without more work on my plate...

I'll retract this statement after a bit of research: it turns 
out there are only very few of those 6000 constants that 
actually run afoul of the warning, so I can fix those by hand.

That is, if there's actually a warning for every bad constant, 
not just once per module...
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -



From martin@v.loewis.de  Mon Aug 12 22:29:04 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 12 Aug 2002 23:29:04 +0200
Subject: [Python-Dev] 32-bit values (was RE: [Python-checkins] python/dist/src/Lib/test test_zlib.py,1.18,1.19)
In-Reply-To: <200208121627.g7CGRWQ17542@pcp02138704pcs.reston01.va.comcast.net>
References: 
 <200208121554.g7CFs9l17354@pcp02138704pcs.reston01.va.comcast.net>
 
 <200208121627.g7CGRWQ17542@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

Guido van Rossum  writes:

> Though Tim brought up that then you won't always get back what you put
> in (if you put in a value > sys.maxint, it comes back negative).
> 
> Is that a problem or not?  I tend to think that's not how this is most
> often used.

I withdraw my earlier recommendation: It *is* desirable that
struct.pack/unpack gives back the same value.

Regards,
Martin



From guido@python.org  Mon Aug 12 22:29:12 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 17:29:12 -0400
Subject: [Python-Dev] Correction: Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Mon, 12 Aug 2002 23:06:29 +0200."
 <5F21B008-AE37-11D6-B7D6-003065517236@oratrix.com>
References: <5F21B008-AE37-11D6-B7D6-003065517236@oratrix.com>
Message-ID: <200208122129.g7CLTCx23366@pcp02138704pcs.reston01.va.comcast.net>

> I'll retract this statement after a bit of research: it turns 
> out there are only very few of those 6000 constants that 
> actually run afoul of the warning, so I can fix those by hand.
> 
> That is, if there's actually a warning for every bad constant, 
> not just once per module...

If you want to get the warnings for each line, use "python -Wall".

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Mon Aug 12 22:31:26 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 17:31:26 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Mon, 12 Aug 2002 23:24:40 +0200."
 
References: 
Message-ID: <200208122131.g7CLVQT23404@pcp02138704pcs.reston01.va.comcast.net>

> I think what it boils down to is what Python's model of the 
> world is: C or mathematics. It used to be C, which is probably 
> the one reason Python caught on initially (whereas ABC with it's 
> mathematical model didn't, really). I can see the reason behind 
> moving towards a more consistent world view, where integers are 
> integers, be they 32 bits or more, where strings are strings, be 
> they unicode or ascii, and I even agree with it, up to a point.
> 
> The drawback is that it will make it more difficult to interface 
> Python to the real world, where integers have a size, characters 
> are 8 bits, binary data is "char *" too, unicode has funny APIs, 
> etc. And I happen to feel responsible for a lot of this real 
> world interfacing code:-)

The issue is not that the new approach makes it more difficult to
interface to the real world.  The issue is that you have to change how
you interface to the real world.  Writing something from scratch that
uses the new approach won't take any more work.  It's the backwards
compatibility that bites you.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de  Mon Aug 12 22:41:38 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 12 Aug 2002 23:41:38 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: <3D57F3B8.7090507@lemburg.com>
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com>
 <200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>
 <3D4E97A7.7000904@lemburg.com>
 <200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net>
 <3D579245.2080306@livinglogic.de>
 
 <3D57DC42.6070300@lemburg.com>
 
 <3D57F3B8.7090507@lemburg.com>
Message-ID: 

"M.-A. Lemburg"  writes:

> The problem is the same as what we had with the exceptions.py
> module early on in the 1.6 alphas: if this module isn't found
> all kinds of things start failing. The same would happen when
> you start to use builtin codecs which have external error handler
> implementation as .py files, e.g. unicode('utf-8', 'replace')
> could then fail because of an ImportError.

What kinds of things would start failing? If you get an interactive
prompt (i.e. Python still manages to start up), or you get a traceback
indicating the problem in non-interactive mode, I don't see this as a
problem - *of course* Python will stop working if you remove essential
files.

This is like saying you expect the interpreter to continue to work
after you remove python23.dll.

So, if your worry is that things would not work if you remove a Python
file - don't worry. Python already relies on Python files being
present in various places.

> For the charmap codec it's mostly about performance. I don't
> have objections for other codecs which rely on external
> resources.

Please remember that we are still about error handling here, and that
the normal case will be "strict", which usually results in aborting
the computation.

So I don't see the performance issue even for the charmap codec.

Regards,
Martin


From martin@v.loewis.de  Mon Aug 12 23:12:31 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 00:12:31 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: <3D5800DD.1050108@livinglogic.de>
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com>
 <200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>
 <3D4E97A7.7000904@lemburg.com>
 <200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net>
 <3D579245.2080306@livinglogic.de>
 
 <3D5800DD.1050108@livinglogic.de>
Message-ID: 

Walter D=F6rwald  writes:

> Output is as follows:
> 1790000 chars, 2.330% unenc
> ignore: 0.022 (factor=3D1.000)
> xmlcharrefreplace: 0.044 (factor=3D1.962)
> xml2: 0.267 (factor=3D12.003)
> xml3: 0.723 (factor=3D32.506)
> workaround: 5.151 (factor=3D231.702)
> i.e. a 1.7MB string with 2.3% unencodable characters was
> encoded.

Those numbers are impressive. Can you please add

def xml4(exc):
  if isinstance(exc, UnicodeEncodeError):
    if exc.end-exc.start =3D=3D 1:
      return u"&#"+str(ord(exc.object[exc.start]))+u";"
    else:
      r =3D []
      for c in exc.object[exc.start:exc.end]:
        r.extend([u"&#", str(ord(c)), u";"])
      return u"".join(r)
  else:
    raise TypeError("don't know how to handle %r" % exc)

and report how that performs (assuming I made no error)?

> Using a callback instead of the inline implementation is a factor of
> 12 slower than ignore.

For the purpose of comparing C and Python, this isn't relevant, is it?
Only the C version of xmlcharrefreplace and a Python version should be
compared.

> It can't really be fixed for codecs implemented in Python. For codecs
> that use the C functions we could add the functionality that e.g.
> PyUnicodeEncodeError_SetReason(exc) sets exc.reason and exc.args[3],
> but AFAICT it can't be done easily for Python where attribute assignment
> directly goes to the instance dict.

You could add methods into the class set_reason etc, which error
handler authors would have to use.

Again, these methods could be added through Python code, so no C code
would be necessary to implemenet them.

You could even implement a setattr method in Python - although you'ld
have to search this from C while initializing the class.

Regards,
Martin


From martin@v.loewis.de  Mon Aug 12 23:27:51 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 00:27:51 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <20020812103729.I386@ActiveState.com>
References: <200208121148.g7CBm6KD027268@paros.informatik.hu-berlin.de>
 <200208121338.g7CDc2l11722@pcp02138704pcs.reston01.va.comcast.net>
 <20020812103729.I386@ActiveState.com>
Message-ID: 

Trent Mick  writes:

> We (ActiveState) are committed to getting the functionality in -- which
> means that David and myself can help with coding and testing.  Depending
> on the schedule, I can help with e.g. filling out the test suite.  We
> can also setup some test machines to test things that are probably too
> odd to fit in the test suite.
> 
> Martin, if there is anything we can do to help with for this patch,
> please let us know.

At the moment, a review would be most appreciated. Please consider
Guido's comments as being taken care of, except for the test suite.

Neil's original patch (winunichanges.zip) has a number of test cases,
for various file names, and for open, os.{stat, rename, mkdir, chdir,
_getfullpathname}. You could start from that, or make your own test
cases - "full coverage" (in terms of tested functions) appears to be
desirable, and it should not print to stdout, but assert things
instead.

In addition, I have no W9x system. I assume Neil is right in his
analysis that the *W functions are not useful/not available on W9x?
If so, somebody should test that the resulting Python binary still
runs on W9x, falling back to the FileSystemDefaultEncoding.

Regards,
Martin


From martin@v.loewis.de  Mon Aug 12 23:34:54 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 00:34:54 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <93F350A8-AE2D-11D6-B7D6-003065517236@oratrix.com>
References: <93F350A8-AE2D-11D6-B7D6-003065517236@oratrix.com>
Message-ID: 

Jack Jansen  writes:

> This is the bit I still don't like (at least, if I'm not mistaken I
> commented on it a while ago too). A routine could be doing an
> os.list() expecting strings, but suddenly someone passes it a
> unicode directoryname and the return value would change.

Sure, but within reasonable limitations, "nothing bad" would happen:
those file names most likely use only ASCII, so the default encoding
treats them nicely whereever they appear.

> I would much prefer an optional encoding argument whereby you give the
> encoding in which you want the return value. Default would be the
> local filesystem encoding. If you pass unicode you will get direct
> unicode on XP/2K, and a converted string on other platforms (but
> always unicode).

I would not like that. First of all, it isn't any more portable than
PEP 277: on Unix, to implement that feature, you'll have to know the
encoding of filenames on disk first - which alone is tricky.

Furthermore, it is easy to implement that on top of PEP 277: just
write a wrapper than encodes the result.

> Oh yes, the same reasoning would hold for readlink(), getcwd() and
> any other call that returns filenames.

These are more tricky, indeed. Fortunately, they are not in the domain
of PEP 277: readlink is not supported on Windows, and getcwd not
considered in the PEP. If that is an issue, I'd add a "return_unicode"
flag to getcwd.

Allowing the application to specify an encoding at the file system API
is not really helpful, as the encoding at the file system API is
usually mandated by the application.

Regards,
Martin


From martin@v.loewis.de  Mon Aug 12 23:13:16 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 00:13:16 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
In-Reply-To: <3D5802DA.8030604@livinglogic.de>
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com>
 <200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>
 <3D4E97A7.7000904@lemburg.com>
 <200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net>
 <3D579245.2080306@livinglogic.de>
 
 <3D57DC42.6070300@lemburg.com> <3D5802DA.8030604@livinglogic.de>
Message-ID: 

Walter D=F6rwald  writes:

> At least "strict" should be implemented inline, because reading
> broken .pyc files which contain (utf-8 encoded) unicode constants
> would probably lead to all kinds of interesting problems.

If you have a broken .pyc file, you have more problems than that,
though...

Regards,
Martin



From martin@v.loewis.de  Mon Aug 12 23:50:51 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 00:50:51 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <200208122007.g7CK7li21777@pcp02138704pcs.reston01.va.comcast.net>
References: <93F350A8-AE2D-11D6-B7D6-003065517236@oratrix.com>
 <200208122007.g7CK7li21777@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

Guido van Rossum  writes:

> But shouldn't it return Unicode whenever there are filenames in the
> directory that can't represented as ASCII?

Unfortunately, on Windows, there is no way to find out: If you use the
ANSI function (which not only covers ASCII, but the full user's code
page), and you have a file name not representable in this code page,
the system returns a file name that contains question marks.

Of course, you could always use the Win32 Wide API (unicode) function,
and convert the pure-ASCII strings into byte strings. That gives a
number of options:
- always return Unicode for Unicode directory argument,
- return Unicode only for non-ASCII, and only for Unicode argument,
- return Unicode only for non-ASCII, regardless of Unicode argument,
- return Unicode only for non-MBCS (again depending or not depending
  on whether the argument is Unicode).

In the third case, if you have a non-representable file name, you
currently get a string like "??????.txt", whereas you then get
u"\uabcd\uefgh...txt". What might be worse: If the file name is
representable in "mbcs", yet outside ASCII, you currently get a "good"
byte string, and you get a Unicode string under option three.

So the MBCS options sound better. Unfortunately, testing whether a
string encodes as MBCS might be expensive.

> Hm, I don't know if I'd like os.listdir() to have an encoding
> argument.  Sounds like the wrong solution somehow.

I don't like that, either.

> > Oh yes, the same reasoning would hold for readlink(), getcwd() 
> > and any other call that returns filenames.
> 
> Ditto.

For readlink, if you trust FileSystemDefaultEncoding, you could return
a Unicode object if you find non-ASCII in the link contents.

For getcwd, you again have the issue of reliably detecting non-ASCII
if you use the ANSI function; if you use the Wide function, you again
have the choice of returning Unicode only if non-ASCII, or only if
non-MBCS.

Regards,
Martin


From martin@v.loewis.de  Mon Aug 12 23:59:41 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 00:59:41 +0200
Subject: [Python-Dev] Re: [PythonLabs] PEP 2
In-Reply-To: <20020812175718.GB20846@vet.uu.nl>
References: 
 <200208121540.g7CFeOJ16812@pcp02138704pcs.reston01.va.comcast.net>
 <20020812175718.GB20846@vet.uu.nl>
Message-ID: 

Martijn Faassen  writes:

> [Barry]
> 
>  I was going to point David at PEP 2 as the guidelines for getting modules 
>  added to the standard library, but I don't think PEP 2 really describes 
>  current practice.

[Martijn]
> What PEP 2 tries to supply is a procedure to follow if people
> have already decided they would like to try to get a module or set of
> modules accepted into the standard library. They can decide this before
> or after they write the module; the PEP doesn't care -- as long as the
> module is there when they submit the library PEP. At least they know
> there'll be Integrators that will review things, and they know they had
> better come up with some maintainers before submitting the PEP.

I always read the PEP in precisely that way, and I think it is just
fine as it stands.

*Of course*, the BDFL can decide to incorporate any new modules any
time he wants. The PEP is to give people a guideline if they want to
get a module "in" that the BDFL doesn't outright want: they need to
offer supporting it, and they need to document it, provide test cases,
etc - then there is a good chance that the BDFL won't object.

This also gives the BDFL the explicit power to remove the module when
problems surface with it and the original authors ran away - it
essentially ties contributors to their contribution, which I see as a
good thing.

Regards,
Martin



From martin@v.loewis.de  Tue Aug 13 00:04:48 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 01:04:48 +0200
Subject: [Python-Dev] Performance (non)optimization: 31-bit ints in pointers
In-Reply-To: <200208122107.g7CL7aE23242@pcp02138704pcs.reston01.va.comcast.net>
References: <20020812205107.GA15411@unpythonic.net>
 <200208122107.g7CL7aE23242@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

Guido van Rossum  writes:

> In Python I decided to stay away from cleverness except when
> extensive performance analysis showed there was a real need to speed
> something up.  That got us super-fast dicts, for example, and .pyc
> files to cache the work of the (slow, but trick-free) parser.

For small ints, it also got you the small int cache, which has nearly
the same storage requirements as a pointer-as-int, and is probably as
expensive (you can drop the tests for odd addresses, but need to add
increfs and decrefs for ints).

Regards,
Martin



From gmcm@hypernet.com  Mon Aug 12 23:21:20 2002
From: gmcm@hypernet.com (Gordon McMillan)
Date: Mon, 12 Aug 2002 18:21:20 -0400
Subject: [Python-Dev] Performance (non)optimization: 31-bit ints in pointers
In-Reply-To: <200208122107.g7CL7aE23242@pcp02138704pcs.reston01.va.comcast.net>
References: Your message of "Mon, 12 Aug 2002 15:51:08 CDT." <20020812205107.GA15411@unpythonic.net>
Message-ID: <3D57FCA0.30691.64578AF5@localhost>

On 12 Aug 2002 at 17:07, Guido van Rossum wrote:

> ...  In Python I decided to
> stay away from cleverness except when extensive
> performance analysis showed there was a real need to
> speed something up. 

For which some of us are very grateful, even
without Tim nagging us.

-- Gordon
http://www.mcmillan-inc.com/



From martin@v.loewis.de  Tue Aug 13 00:57:17 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 01:57:17 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: 
References: 
Message-ID: 

Jack Jansen  writes:

> Yes, but due to the way the parser works everything works fine for
> me. In the constant definition file it says "foo = 0xff000000". The
> parser turns this into a negative integer. Fine with me, the bits
> are the same, and PyArg_Parse doesn't complain.

Please notice that this will stop working some day: 0xff000000 will be
a positive number, and the "i" parser will raise an OverflowError.

By that time, you might be using the "k" parser, which will accept
0xff000000 both as a negative and a positive number, and fill the int
with 0xff000000.

Before that happens, you might want to anticipate that problem, and
propose an implementation that means minimum changes for you - it then
will likely mean minimum changes for everybody else, as well. Perhaps
"k" isn't such a good solution, perhaps "I" is better, or perhaps "i"
should weaken its range checking, and emit a deprecationwarning when
an unsigned number is passed.

Regards,
Martin


From guido@python.org  Tue Aug 13 01:15:40 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 20:15:40 -0400
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: Your message of "Tue, 13 Aug 2002 00:50:51 +0200."
 
References: <93F350A8-AE2D-11D6-B7D6-003065517236@oratrix.com> <200208122007.g7CK7li21777@pcp02138704pcs.reston01.va.comcast.net>
 
Message-ID: <200208130015.g7D0Fel26486@pcp02138704pcs.reston01.va.comcast.net>

> Unfortunately, on Windows, there is no way to find out: If you use the
> ANSI function (which not only covers ASCII, but the full user's code
> page), and you have a file name not representable in this code page,
> the system returns a file name that contains question marks.
> 
> Of course, you could always use the Win32 Wide API (unicode) function,
> and convert the pure-ASCII strings into byte strings. That gives a
> number of options:
> - always return Unicode for Unicode directory argument,
> - return Unicode only for non-ASCII, and only for Unicode argument,
> - return Unicode only for non-ASCII, regardless of Unicode argument,
> - return Unicode only for non-MBCS (again depending or not depending
>   on whether the argument is Unicode).
> 
> In the third case, if you have a non-representable file name, you
> currently get a string like "??????.txt", whereas you then get
> u"\uabcd\uefgh...txt". What might be worse: If the file name is
> representable in "mbcs", yet outside ASCII, you currently get a "good"
> byte string, and you get a Unicode string under option three.

Why is getting Unicode worse than getting MBCS?  #3 looks right to me...

> So the MBCS options sound better. Unfortunately, testing whether a
> string encodes as MBCS might be expensive.

I still don't fully understand MBCS.  I know there's a variable
assignment of codes to the upper half of the 8-bit space, based on a
user setting.  But is that always a simply mapping to 128 non-ASCII
characters, or are there multi-byte codes that expand the total
character set to more than 256?

> For readlink, if you trust FileSystemDefaultEncoding, you could return
> a Unicode object if you find non-ASCII in the link contents.

What is FileSystemDefaultEncoding and when can you trust it?

> For getcwd, you again have the issue of reliably detecting non-ASCII
> if you use the ANSI function; if you use the Wide function, you again
> have the choice of returning Unicode only if non-ASCII, or only if
> non-MBCS.

Wide + Unicode (if non-ASCII) sounds good to me.  The fewer places an
app has to deal with MBCS the better, it seems to me.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Aug 13 01:27:16 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 20:27:16 -0400
Subject: [Python-Dev] Performance (non)optimization: 31-bit ints in pointers
In-Reply-To: Your message of "Tue, 13 Aug 2002 01:04:48 +0200."
 
References: <20020812205107.GA15411@unpythonic.net> <200208122107.g7CL7aE23242@pcp02138704pcs.reston01.va.comcast.net>
 
Message-ID: <200208130027.g7D0RGg26585@pcp02138704pcs.reston01.va.comcast.net>

> > In Python I decided to stay away from cleverness except when
> > extensive performance analysis showed there was a real need to speed
> > something up.  That got us super-fast dicts, for example, and .pyc
> > files to cache the work of the (slow, but trick-free) parser.
> 
> For small ints, it also got you the small int cache, which has nearly
> the same storage requirements as a pointer-as-int, and is probably as
> expensive (you can drop the tests for odd addresses, but need to add
> increfs and decrefs for ints).

But the increfs and decrefs for ints are goodness, because they
simplify the code.  You can incref/decref any object without having to
know its type.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Aug 13 01:34:00 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 12 Aug 2002 20:34:00 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Tue, 13 Aug 2002 01:57:17 +0200."
 
References: 
 
Message-ID: <200208130034.g7D0Y0L26649@pcp02138704pcs.reston01.va.comcast.net>

> Jack Jansen  writes:
> 
> > Yes, but due to the way the parser works everything works fine for
> > me. In the constant definition file it says "foo = 0xff000000". The
> > parser turns this into a negative integer. Fine with me, the bits
> > are the same, and PyArg_Parse doesn't complain.

[Martin]
> Please notice that this will stop working some day: 0xff000000 will be
> a positive number, and the "i" parser will raise an OverflowError.
> 
> By that time, you might be using the "k" parser, which will accept
> 0xff000000 both as a negative and a positive number, and fill the int
> with 0xff000000.
> 
> Before that happens, you might want to anticipate that problem, and
> propose an implementation that means minimum changes for you - it then
> will likely mean minimum changes for everybody else, as well. Perhaps
> "k" isn't such a good solution, perhaps "I" is better, or perhaps "i"
> should weaken its range checking, and emit a deprecationwarning when
> an unsigned number is passed.

Why is it so hard to get people to think about what they need?  (I
mean beyond "I don't want anything to change" or vague things like
that.  I am looking for an API that will make developers like Jack as
well as other extension developers happy, but it feels like pulling
teeth.  Have I not explained the issues and boundary conditions
clearly enough?  (About the only non-negotiable thing is that at some
point there shall be no difference in how ints and longs with the same
mathematical value are treated; and the fact that 0xffffffff shall be
a positive number whose value is 4294967295.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@comcast.net  Tue Aug 13 01:44:46 2002
From: tim.one@comcast.net (Tim Peters)
Date: Mon, 12 Aug 2002 20:44:46 -0400
Subject: [Python-Dev] Performance (non)optimization: 31-bit ints in pointers
In-Reply-To: <20020812205107.GA15411@unpythonic.net>
Message-ID: 

[Jeff Epler]
> Many Lisp interpreters use 'tagged types' to, among other things, let
> small ints reside directly in the machine registers.

And many Lisp interpreters derive from ones written for once-trendy Lisp
hardware, which had special support for tag bits.  Simulating this in
software is a PITA.

> (due to alignment requirements on all common machines, all valid
> pointers-to-struct have 0 in their low bit)

Not so on word-addressed machines, though, or on machines using low-order
pointer bits for their own notion of tag bits.

patiently-awaiting-seymour-cray's-resurrection-ly y'rs  - tim



From dan@sidhe.org  Tue Aug 13 04:59:11 2002
From: dan@sidhe.org (Dan Sugalski)
Date: Mon, 12 Aug 2002 23:59:11 -0400
Subject: [Python-Dev] Bugs in the python grammar?
Message-ID: 

We've been digging through the python grammar, looking to build up a 
parser for it, and have come across what look to be bugs:

In http://www.python.org/doc/current/ref/grammar.txt :

a_expr ::=
              m_expr | aexpr "+" m_expr
               aexpr "-" m_expr

		aexpr "=" m_expr

should be:

		| aexpr "=" m_expr

lambda_form ::=
	"lambda" [parameter_list]: expression

'[]:' doesn't make much sense. Do you mean:

	"lambda" [parameter_list]":" expression

parameter_list ::=
              (defparameter ",")*
                 ("*" identifier [, "**" identifier]
                 | "**" identifier
                   | defparameter [","])


		("*" identifier [, "**" identifier]

should be:
		("*" identifier ["," "**" identifier]



These known issues? Or have we mis-analyzed things somewhere?
-- 
                                         Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
dan@sidhe.org                         have teddy bears and even
                                       teddy bears get drunk


From oren-py-d@hishome.net  Tue Aug 13 07:15:06 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Tue, 13 Aug 2002 02:15:06 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: 
References:  
Message-ID: <20020813061506.GA49563@hishome.net>

On Tue, Aug 13, 2002 at 01:57:17AM +0200, Martin v. Loewis wrote:
> Jack Jansen  writes:
> 
> > Yes, but due to the way the parser works everything works fine for
> > me. In the constant definition file it says "foo = 0xff000000". The
> > parser turns this into a negative integer. Fine with me, the bits
> > are the same, and PyArg_Parse doesn't complain.
> 
> Please notice that this will stop working some day: 0xff000000 will be
> a positive number, and the "i" parser will raise an OverflowError.

The problem is that many programmers have 0xFFFFFFFF pretty much hard-wired
into their brains as -1. How about treating plain hexadecimal literals as
signed 32 bits regardless of platform integer size?  I think that this will
produce the smallest number of incompatibilities for existing code and
maintain compatibility with C header files on 32 bit platforms. In this case 
0xff000000 will always be interpreted as -16777216 and the 'i' parser will 
happily convert it to wither 0xFF000000 or 0xFFFFFFFFFF000000, depending on
the native platform word size - which is probably what the programmer 
meant.

I don't think that interpreting 0xFF000000 as 4278190080 will really help
anyone. This includes users of 64 bit platforms - I don't think many of them
actually think of 0xFF000000 as 4278190080. To them it probably means "Danger,
Will Robinson! Unportable code!". So what's the point of having Python 
interpret it as 4278190080?  If what I really meant was 4278190080 I can
represent it portably as 0xFF000000L and in this case the 'i' parser will 
complain on 32 bit platforms - with a good reason.

To support header files that can be included from Python and C and produce
unambigous results on both 32 and 64 bit platforms it is possible to add
support for the C suffixes UL/LU and ULL/LLU.  

	Oren



From martin@v.loewis.de  Tue Aug 13 07:51:28 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 08:51:28 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <200208130015.g7D0Fel26486@pcp02138704pcs.reston01.va.comcast.net>
References: <93F350A8-AE2D-11D6-B7D6-003065517236@oratrix.com>
 <200208122007.g7CK7li21777@pcp02138704pcs.reston01.va.comcast.net>
 
 <200208130015.g7D0Fel26486@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

Guido van Rossum  writes:

> Why is getting Unicode worse than getting MBCS?  #3 looks right to me...

If people do

out =3D open("names.txt","w")
for f in os.listdir("."):
  print >>out, f

then this will print all filenames in mbcs. Under your proposed
changed, it will raise a UnicodeError.

> I still don't fully understand MBCS.  I know there's a variable
> assignment of codes to the upper half of the 8-bit space, based on a
> user setting.  But is that always a simply mapping to 128 non-ASCII
> characters, or are there multi-byte codes that expand the total
> character set to more than 256?

Yes, the "mbcs" might be truly multibyte. Microsoft calls it the "ANSI
code page", CP_ACP, which varies with the localization. They currently
use:

code page region                 encoding style
1250      Central Europe         8-bit
1251      Cyrillic               8-bit
1252      Western Europe         8-bit
1253      Greek                  8-bit
1254      Turkish                8-bit
1255      Hebrew                 8-bit
1256      Arabic                 8-bit
1257      Baltic                 8-bit
1258      Vietnamese             8-bit

874       Thai                   multi-byte
932       Japan                  Shift-JIS, multi-byte
936       Simplified Chinese     GB2312, multi-byte
949       Korea                  multi-byte
950       Traditional Chinese    BIG5, multi-byte

The multi-byte codes fall in two categories: those that use bytes <128
for multi-byte codes (e.g. 950) and those that don't (e.g. 932); the
latter ones restrict themselves to bytes >=3D128 for multi-byte
characters (I believe this is what the Shift in Shift-JIS tries to
indicate).

> > For readlink, if you trust FileSystemDefaultEncoding, you could return
> > a Unicode object if you find non-ASCII in the link contents.
>=20
> What is FileSystemDefaultEncoding and when can you trust it?

It's a global variable (really called Py_FileSystemDefaultEncoding),
introduced by Mark Hammond, and should be set to the encoding that the
operating system uses to encode file names, on the file system API.

On Windows, this is reliably CP_ACP/"mbcs". On Unix, it is the
locale's encoding by convention, which is set only if
setlocale(LC_CTYPE,"") was called. Some Unix users may not follow the
convention, or may have file names which cannot be represented in
their locale's encoding.

> Wide + Unicode (if non-ASCII) sounds good to me.  The fewer places an
> app has to deal with MBCS the better, it seems to me.

Ok, I'll update the PEP.

You may have been under the impression that MBCS is only relevant in
Far East, so let me stress this point: It applies to all windows
versions, e.g. a user of a French installation who has a file named
C:\Docs\Boulot\S=E9minaireLORIA-jan2002\DemoCORBA (bug #509117)
currently gets a byte string when listing C:\Docs\Boulot, but will
get a Unicode string under the modified PEP 277.

Regards,
Martin



From martin@v.loewis.de  Tue Aug 13 08:07:49 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 09:07:49 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <20020813061506.GA49563@hishome.net>
References: 
 
 <20020813061506.GA49563@hishome.net>
Message-ID: 

Oren Tirosh  writes:

> The problem is that many programmers have 0xFFFFFFFF pretty much hard-wired
> into their brains as -1. How about treating plain hexadecimal literals as
> signed 32 bits regardless of platform integer size?  

The idea is that, for any sequence S of digits, S, and SL, should mean
the same thing. So 0xFFFFFFFF should mean the same thing as
0xFFFFFFFFL.

Programmers may have that hard-wired, but this is not a problem; a
problem only arises if their code breaks. In many cases, it won't.

> I think that this will produce the smallest number of
> incompatibilities for existing code and maintain compatibility with
> C header files on 32 bit platforms. In this case 0xff000000 will
> always be interpreted as -16777216 and the 'i' parser will happily
> convert it to wither 0xFF000000 or 0xFFFFFFFFFF000000, depending on
> the native platform word size - which is probably what the
> programmer meant.

This means you suggest that PEP 237 is not implemented, or atleast
frozen at the current stage.

> So what's the point of having Python interpret it as 4278190080?

It allows to unify ints and longs, see PEP 237.

> If what I really meant was 4278190080 I can represent it portably as
> 0xFF000000L and in this case the 'i' parser will complain on 32 bit
> platforms - with a good reason.

Yes, but the L suffix will go away one day.

Regards,
Martin


From guido@python.org  Tue Aug 13 10:11:01 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 05:11:01 -0400
Subject: [Python-Dev] Bugs in the python grammar?
In-Reply-To: Your message of "Mon, 12 Aug 2002 23:59:11 EDT."
 
References: 
Message-ID: <200208130911.g7D9B1b27636@pcp02138704pcs.reston01.va.comcast.net>

> We've been digging through the python grammar, looking to build up a 
> parser for it, and have come across what look to be bugs:
> 
> In http://www.python.org/doc/current/ref/grammar.txt :

I don't know where that file comes from; it's not the official
grammar.  Fred will fix the typos you found.

This one is correct (we use it to generate our parser):

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Grammar/Grammar?rev=1.48&content-type=text/vnd.viewcvs-markup

Or download Python and look at Grammar/Grammar .

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Aug 13 10:41:54 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 05:41:54 -0400
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: Your message of "Tue, 13 Aug 2002 08:51:28 +0200."
 
References: <93F350A8-AE2D-11D6-B7D6-003065517236@oratrix.com> <200208122007.g7CK7li21777@pcp02138704pcs.reston01.va.comcast.net>  <200208130015.g7D0Fel26486@pcp02138704pcs.reston01.va.comcast.net>
 
Message-ID: <200208130941.g7D9ftP27703@pcp02138704pcs.reston01.va.comcast.net>

> > Why is getting Unicode worse than getting MBCS?  #3 looks right to me...
> 
> If people do
> 
> out = open("names.txt","w")
> for f in os.listdir("."):
>   print >>out, f
> 
> then this will print all filenames in mbcs. Under your proposed
> changed, it will raise a UnicodeError.

OK, you've convinced me.  I guess the best compromise then is 8-bit
in, MBCS out, and Unicode in, Unicode out.

> > I still don't fully understand MBCS.  I know there's a variable
> > assignment of codes to the upper half of the 8-bit space, based on a
> > user setting.  But is that always a simply mapping to 128 non-ASCII
> > characters, or are there multi-byte codes that expand the total
> > character set to more than 256?
> 
> Yes, the "mbcs" might be truly multibyte. Microsoft calls it the "ANSI
> code page", CP_ACP, which varies with the localization. They currently
> use:
> 
> code page region                 encoding style
> 1250      Central Europe         8-bit
> 1251      Cyrillic               8-bit
> 1252      Western Europe         8-bit
> 1253      Greek                  8-bit
> 1254      Turkish                8-bit
> 1255      Hebrew                 8-bit
> 1256      Arabic                 8-bit
> 1257      Baltic                 8-bit
> 1258      Vietnamese             8-bit
> 
> 874       Thai                   multi-byte
> 932       Japan                  Shift-JIS, multi-byte
> 936       Simplified Chinese     GB2312, multi-byte
> 949       Korea                  multi-byte
> 950       Traditional Chinese    BIG5, multi-byte
> 
> The multi-byte codes fall in two categories: those that use bytes <128
> for multi-byte codes (e.g. 950) and those that don't (e.g. 932); the
> latter ones restrict themselves to bytes >=128 for multi-byte
> characters (I believe this is what the Shift in Shift-JIS tries to
> indicate).

Aha!  So MBCS is not an encoding: it's an indirection for a variety of
encodings.  (Is there a way to find out what the encoding is?)

> > > For readlink, if you trust FileSystemDefaultEncoding, you could return
> > > a Unicode object if you find non-ASCII in the link contents.
> > 
> > What is FileSystemDefaultEncoding and when can you trust it?
> 
> It's a global variable (really called Py_FileSystemDefaultEncoding),
> introduced by Mark Hammond, and should be set to the encoding that the
> operating system uses to encode file names, on the file system API.
> 
> On Windows, this is reliably CP_ACP/"mbcs".

Do you mean that the condition on

#if defined(HAVE_LANGINFO_H) && defined(CODESET)

is reliably false on Windows?  Otherwise _locale.setlocale() could set
it.

> On Unix, it is the locale's encoding by convention, which is set
> only if setlocale(LC_CTYPE,"") was called. Some Unix users may not
> follow the convention, or may have file names which cannot be
> represented in their locale's encoding.

So as long as they use 8-bit it's not our problem, right.  Another
reason to avoid prodicing Unicode without a clue that the app expects
Unicode (alas).  (BTW I find a Unicode argument to os.listdir() a
sufficient clue.  IOW os.listdir(u".") should return Unicode.)

> > Wide + Unicode (if non-ASCII) sounds good to me.  The fewer places an
> > app has to deal with MBCS the better, it seems to me.
> 
> Ok, I'll update the PEP.

To what?  (It would be bad if I convinced you at the same time you
convinced me of the opposite. :-)

> You may have been under the impression that MBCS is only relevant in
> Far East, so let me stress this point: It applies to all windows
> versions, e.g. a user of a French installation who has a file named
> C:\Docs\Boulot\SéminaireLORIA-jan2002\DemoCORBA (bug #509117)
> currently gets a byte string when listing C:\Docs\Boulot, but will
> get a Unicode string under the modified PEP 277.

No, I was aware of that part.  I guess they should get MBCS on
os.listdir('C:\\Docs\\Boulot') but Unicode on
os.listdir(u'C:\\Docs\\Boulot').

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Jack.Jansen@oratrix.com  Tue Aug 13 11:02:59 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Tue, 13 Aug 2002 12:02:59 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <200208130034.g7D0Y0L26649@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

Martin:

> Before that happens, you might want to anticipate that problem, and
> propose an implementation that means minimum changes for you - it then
> will likely mean minimum changes for everybody else, as well. Perhaps
> "k" isn't such a good solution, perhaps "I" is better, or perhaps "i"
> should weaken its range checking, and emit a deprecationwarning when
> an unsigned number is passed.

The least amount of work for me would be caused by keeping "i" semantics 
as they are, of course.

If we switch to "k" for integers in the range -2**-31..2**31-1 that 
would not be too much work, as a lot of the code is generated (I would 
take the quick and dirty approach of using k for all my integers). Only 
the hand-written code would have to be massaged by hand.

If we have only pure signed and pure unsigned converters it would mean 
an extraordinary amount of work, but luckily it seems that that is not 
going to happen.

On Tuesday, August 13, 2002, at 02:34 , Guido van Rossum wrote:
> Why is it so hard to get people to think about what they need?  (I
> mean beyond "I don't want anything to change" or vague things like
> that.  I am looking for an API that will make developers like Jack as
> well as other extension developers happy, but it feels like pulling
> teeth.

It feels that way because pulling teeth is probably exactly the right 
analogy: what you're doing is probably a good idea in the long run, but 
right now it hurts...


--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From guido@python.org  Tue Aug 13 11:20:31 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 06:20:31 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Tue, 13 Aug 2002 12:02:59 +0200."
 
References: 
Message-ID: <200208131020.g7DAKVk29479@pcp02138704pcs.reston01.va.comcast.net>

> The least amount of work for me would be caused by keeping "i" semantics 
> as they are, of course.

So you're using 'i', not 'l'?  Any particular reason?

> If we switch to "k" for integers in the range -2**-31..2**31-1 that 
> would not be too much work, as a lot of the code is generated (I would 
> take the quick and dirty approach of using k for all my integers). Only 
> the hand-written code would have to be massaged by hand.

Glad, that's my preferred choice too.  But note that in Python 2.4 and
beyond, 'k' will only accept positive inputs, so you'll really have to
find a way to mark your signed integer arguments up differently.

In 2.3 (and 2.2.2), I propose the following semantics for 'k': if the
argument is a Python int, a signed value within the range
[INT_MIN,INT_MAX] is required; if it is a Python long, a nonnegative
value in the range [0, 2*INT_MAX+1] is required.  These are the same
semantics that are currently used by struct.pack() for 'L', I found
out; I like these.

We'll have to niggle about the C type corresponding to 'k'.  Should it
be 'int' or 'long'?  It may not matter for you, since you expect to be
running on 32-bit hardware forever; but it matters for other potential
users of 'k'.  We could also have both 'k' and 'K', where 'k' stores
into a C int and 'K' into a C long.

I also propose to have a C API PyInt_AsUnsignedLong, which will
implement the semantics of 'K'.  Like 'i', 'k' will have to do an
explicit range test.

> If we have only pure signed and pure unsigned converters it would mean 
> an extraordinary amount of work, but luckily it seems that that is not 
> going to happen.

Not until 2.4, that is -- then 'k' (and 'K') will change to pure
unsigned.  But your hex constants and results of left shifts will
*also* be pure unsigned then; the only problem would be with Python
code that uses ~0 or -1 as a shorthand for 0xffffffff (which it ain't
on 64-bit machines today).

> It feels that way because pulling teeth is probably exactly the right 
> analogy: what you're doing is probably a good idea in the long run, but 
> right now it hurts...

OK, that's a good extension of the analogy.

Glad we're moving forward on this.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Tue Aug 13 11:55:52 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 13 Aug 2002 12:55:52 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
References: 
Message-ID: <3D58E5B8.4040905@lemburg.com>

Jack Jansen wrote:
> On Tuesday, August 13, 2002, at 02:34 , Guido van Rossum wrote:
> 
>> Why is it so hard to get people to think about what they need?  (I
>> mean beyond "I don't want anything to change" or vague things like
>> that.  I am looking for an API that will make developers like Jack as
>> well as other extension developers happy, but it feels like pulling
>> teeth.
> 
> 
> It feels that way because pulling teeth is probably exactly the right 
> analogy: what you're doing is probably a good idea in the long run, but 
> right now it hurts...

Here's an slightly different idea:

   Bit shifting is hardly ever done on signed data and since
   Python does not provide an unsigned integer object, most
   developers stick to the integer object and interpret its value
   as unsigned object (much like many people interpret strings
   as having a Latin-1 value). If people use integers that way,
   they know what they are doing and they are usually not interested
   in the sign of the value at all, as long as the bits stay the
   same when they pass the value to various Python APIs (including
   C APIs).

   Conclusion: Offer developers a better way to deal with unsigned
   data, e.g. an unsigned 32-bit integer type as subtype of int and
   let the bit manipulation operators return this unsigned type.

   For backward compat. make sure that common parser markers continue
   to work as they do now and add new ones for unsigned values for
   future use. PyInt_AS_LONG(unsignedInteger) would return the
   value of unsignedInteger casted to a signed one and extensions
   would be happy.

   Only if a value gets shifted beyond the first 32 bits,
   convert it to a long.

That should solve most backward compat problems for bit
shifters while still unifying ints and longs.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From duncan@rcp.co.uk  Tue Aug 13 12:24:34 2002
From: duncan@rcp.co.uk (Duncan Booth)
Date: Tue, 13 Aug 2002 12:24:34 +0100
Subject: [Python-Dev] Bugs in the python grammar?
References:  <200208130911.g7D9B1b27636@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

Guido van Rossum  wrote in
news:200208130911.g7D9B1b27636@pcp02138704pcs.reston01.va.comcast.net: 
> This one is correct (we use it to generate our parser):
> 
> http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/G
> rammar/Grammar?rev=1.48&content-type=text/vnd.viewcvs-markup 
> 
> Or download Python and look at Grammar/Grammar .
> 
Does the program for converting the grammar to a railroad diagram still 
exist anywhere? I've searched Google, but I can't find any trace of it 
anywhere.

-- 
Duncan Booth                                             duncan@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?


From walter@livinglogic.de  Tue Aug 13 12:31:15 2002
From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=)
Date: Tue, 13 Aug 2002 13:31:15 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com>	<200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>	<3D4E97A7.7000904@lemburg.com>	<200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net>	<3D579245.2080306@livinglogic.de>		<3D5800DD.1050108@livinglogic.de> 
Message-ID: <3D58EE03.9070808@livinglogic.de>

Martin v. Loewis wrote:

> Walter Dörwald  writes:
> 
> 
>>Output is as follows:
>>1790000 chars, 2.330% unenc
>>ignore: 0.022 (factor=1.000)
>>xmlcharrefreplace: 0.044 (factor=1.962)
>>xml2: 0.267 (factor=12.003)
>>xml3: 0.723 (factor=32.506)
>>workaround: 5.151 (factor=231.702)
>>i.e. a 1.7MB string with 2.3% unencodable characters was
>>encoded.
> 
> 
> Those numbers are impressive. Can you please add
> 
> def xml4(exc):
>   if isinstance(exc, UnicodeEncodeError):
>     if exc.end-exc.start == 1:
>       return u"&#"+str(ord(exc.object[exc.start]))+u";"
>     else:
>       r = []
>       for c in exc.object[exc.start:exc.end]:
>         r.extend([u"&#", str(ord(c)), u";"])
>       return u"".join(r)
>   else:
>     raise TypeError("don't know how to handle %r" % exc)
> 
> and report how that performs (assuming I made no error)?

You must return a tuple (replacement, new input position)
otherwise the code is correct. It tried it and two new
versions:

def xml5(exc):
     if isinstance(exc, UnicodeEncodeError):
         return (u"&#%d;" % ord(exc.object[exc.start]), exc.start+1)
     else:
         raise TypeError("don't know how to handle %r" % exc)

def xml6(exc):
     if isinstance(exc, UnicodeEncodeError):
         return (u"&#" + str(ord(exc.object[exc.start]) + u";"), 
exc.start+1)
     else:
         raise TypeError("don't know how to handle %r" % exc)

Here are the results:

1790000 chars, 2.330% unenc
ignore: 0.022 (factor=1.000)
xmlcharrefreplace: 0.042 (factor=1.935)
xml2: 0.264 (factor=12.084)
xml3: 0.733 (factor=33.529)
xml4: 0.504 (factor=23.057)
xml5: 0.474 (factor=21.649)
xml6: 0.481 (factor=22.010)
workaround: 5.138 (factor=234.862)

>>Using a callback instead of the inline implementation is a factor of
>>12 slower than ignore.
> 
> 
> For the purpose of comparing C and Python, this isn't relevant, is it?
> Only the C version of xmlcharrefreplace and a Python version should be
> compared.

I was just to lazy to code this. ;)

Python is a factor of 2.7 slower than the C callback
(or 1.9 for your version).

>>It can't really be fixed for codecs implemented in Python. For codecs
>>that use the C functions we could add the functionality that e.g.
>>PyUnicodeEncodeError_SetReason(exc) sets exc.reason and exc.args[3],
>>but AFAICT it can't be done easily for Python where attribute assignment
>>directly goes to the instance dict.
> 
> 
> You could add methods into the class set_reason etc, which error
> handler authors would have to use.
> 
> Again, these methods could be added through Python code, so no C code
> would be necessary to implemenet them.
> 
> You could even implement a setattr method in Python - although you'ld
> have to search this from C while initializing the class.

For me this sounds much more complicated than the current C functions, 
especially for using them from C, which most codecs probably will.

Bye,
    Walter Dörwald



From walter@livinglogic.de  Tue Aug 13 12:38:53 2002
From: walter@livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Tue, 13 Aug 2002 13:38:53 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com>	<200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>	<3D4E97A7.7000904@lemburg.com>	<200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net>	<3D579245.2080306@livinglogic.de>		<3D57DC42.6070300@lemburg.com>		<3D57F3B8.7090507@lemburg.com> 
Message-ID: <3D58EFCD.2090201@livinglogic.de>

Martin v. Loewis wrote:

> [...]
>>For the charmap codec it's mostly about performance. I don't
>>have objections for other codecs which rely on external
>>resources.
> 
> 
> Please remember that we are still about error handling here, and that
> the normal case will be "strict", which usually results in aborting
> the computation.
> 
> So I don't see the performance issue even for the charmap codec.

I guess this code might be used inside a webserver that outputs XML
results and that honors the Accept-Charset header from the client,
so it must do encoding on the fly.

So I want the code to be as fast as possible.

Bye,
    Walter Dörwald



From Jack.Jansen@cwi.nl  Tue Aug 13 12:32:33 2002
From: Jack.Jansen@cwi.nl (Jack Jansen)
Date: Tue, 13 Aug 2002 13:32:33 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: 
Message-ID: <5BF4D214-AEB0-11D6-9BA3-0030655234CE@cwi.nl>

I was going to suggest that if we return mixed sets of unicode/string=20
values from listdir() we could also do the same thing for platforms=20
where FileSystemDefaultEncoding is utf-8, such as MacOSX.

But as usual with unicode, when I actually try this it doesn't work, and=20=

I don't understand why not. Why is unicode always something that seems=20=

so simple and logical until you actually try it??!?!?

Here's a transcript of my Python session. The terminal has been set to=20=

render in latin-1. The directory contains one file, "fr=F6r"=20
(fr-o-umlaut-r).
sap!jack- python
Python 2.3a0 (#32, Aug 12 2002, 15:31:25)
[GCC 2.95.2 19991024 (release)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> import os
 >>> os.listdir('.')
['fro\xcc\x88r']
 >>> utf8name =3D os.listdir('.')[0]
 >>> unicodename =3D utf8name.decode('utf-8')
 >>> unicodename
u'fro\u0308r'
 >>> print unicodename.encode('latin-1')
Traceback (most recent call last):
   File "", line 1, in ?
UnicodeError: Latin-1 encoding error: ordinal not in range(256)
 >>>

Sigh. \u0308 is not in the range(256), but the whole point of=20
encode('latin-1') is to make it so, isn't it? And o-umlaut definitely=20
has a latin-1 encoding. I tried the same with macroman in stead of=20
latin-1 (just to make sure this wasn't a bug in the latin-1 encoder),=20
but still no go.

What am I doing wrong?
--
- Jack Jansen               =20
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma=20
Goldman -



From jepler@unpythonic.net  Tue Aug 13 12:50:59 2002
From: jepler@unpythonic.net (jepler@unpythonic.net)
Date: Tue, 13 Aug 2002 06:50:59 -0500
Subject: [Python-Dev] Performance (non)optimization: 31-bit ints in pointers
In-Reply-To: 
References: <20020812205107.GA15411@unpythonic.net> 
Message-ID: <20020813065059.A1048@unpythonic.net>

On Mon, Aug 12, 2002 at 08:44:46PM -0400, Tim Peters wrote:
> [Jeff Epler]
> > (due to alignment requirements on all common machines, all valid
> > pointers-to-struct have 0 in their low bit)
> 
> Not so on word-addressed machines, though, or on machines using low-order
> pointer bits for their own notion of tag bits.

Of course, "all common machines" simply means "x86 machines".

Jeff


From Jack.Jansen@oratrix.com  Tue Aug 13 12:46:52 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Tue, 13 Aug 2002 13:46:52 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <200208131020.g7DAKVk29479@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <5BD15FEC-AEB2-11D6-9BA3-0030655234CE@oratrix.com>

On Tuesday, August 13, 2002, at 12:20 , Guido van Rossum wrote:

>> The least amount of work for me would be caused by keeping "i" 
>> semantics
>> as they are, of course.
>
> So you're using 'i', not 'l'?  Any particular reason?

No, sorry. It's l everywhere.

>> If we switch to "k" for integers in the range -2**-31..2**31-1 that
>> would not be too much work, as a lot of the code is generated (I would
>> take the quick and dirty approach of using k for all my integers). Only
>> the hand-written code would have to be massaged by hand.
>
> Glad, that's my preferred choice too.  But note that in Python 2.4 and
> beyond, 'k' will only accept positive inputs, so you'll really have to
> find a way to mark your signed integer arguments up differently.

Huh??! Now you've confused me. If "k" means "32 bit mask", why would it 
be changed in 2.4 not to accept negative values? "-1" is a perfectly 
normal way to specify "0xffffffff" in C usage...

> In 2.3 (and 2.2.2), I propose the following semantics for 'k': if the
> argument is a Python int, a signed value within the range
> [INT_MIN,INT_MAX] is required; if it is a Python long, a nonnegative
> value in the range [0, 2*INT_MAX+1] is required.  These are the same
> semantics that are currently used by struct.pack() for 'L', I found
> out; I like these.

I don't see the point, really. Why not allow [INT_MIN, 2*INT_MAX+1]? If 
the "k" specifier is especially meant for bit patterns why not have 
semantics of "anything goes, unless we are absolutely sure it isn't 
going to fit"?

> We'll have to niggle about the C type corresponding to 'k'.  Should it
> be 'int' or 'long'?  It may not matter for you, since you expect to be
> running on 32-bit hardware forever; but it matters for other potential
> users of 'k'.  We could also have both 'k' and 'K', where 'k' stores
> into a C int and 'K' into a C long.

How about k1 for a byte, k2 for a short, k4 for a long and k8 for a long 
long?
>
> I also propose to have a C API PyInt_AsUnsignedLong, which will
> implement the semantics of 'K'.  Like 'i', 'k' will have to do an
> explicit range test.

In my proposal these would then probably become PyInt_As1Byte, 
PyInt_As2Bytes, PyInt_As4Bytes and PyInt_As8Bytes.
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From jason@tishler.net  Tue Aug 13 13:08:51 2002
From: jason@tishler.net (Jason Tishler)
Date: Tue, 13 Aug 2002 08:08:51 -0400
Subject: [Python-Dev] Bugs #544740: test_commands test fails under Cygwin
Message-ID: <20020813120851.GC2548@tishler.net>

Neil Norwitz suggested that I discuss the following on python-dev:

    http://sourceforge.net/tracker/?func=detail&aid=544740&group_id=5470&atid=105470

The problem is that test_commands does not handle spaces in either user
or group names.  Although this is probably only an issue under Cygwin,
this could affect other Unixes too (yup, I'm clutching at straws).
Anyway, suggestions on how to fix this will be greatly appreciated.

Thanks,
Jason


From barry@python.org  Tue Aug 13 13:08:38 2002
From: barry@python.org (Barry A. Warsaw)
Date: Tue, 13 Aug 2002 08:08:38 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
References: 
 <200208131020.g7DAKVk29479@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <15704.63174.621804.194154@anthem.wooz.org>

>>>>> "GvR" == Guido van Rossum  writes:

    GvR> In 2.3 (and 2.2.2), I propose the following semantics for
    GvR> 'k': if the argument is a Python int, a signed value within
    GvR> the range [INT_MIN,INT_MAX] is required; if it is a Python
    GvR> long, a nonnegative value in the range [0, 2*INT_MAX+1] is
    GvR> required.  These are the same semantics that are currently
    GvR> used by struct.pack() for 'L', I found out; I like these.

It's too bad struct.pack() and PyArg_ParseTuple() can't share the same
format character for the same semantics.  Py3k.

-Barry


From walter@livinglogic.de  Tue Aug 13 13:13:27 2002
From: walter@livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Tue, 13 Aug 2002 14:13:27 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
References: <5BF4D214-AEB0-11D6-9BA3-0030655234CE@cwi.nl>
Message-ID: <3D58F7E7.3040103@livinglogic.de>

Jack Jansen wrote:

> [...]
> Here's a transcript of my Python session. The terminal has been set to 
> render in latin-1. The directory contains one file, "frör" (fr-o-umlaut-r).
> sap!jack- python
> Python 2.3a0 (#32, Aug 12 2002, 15:31:25)
> [GCC 2.95.2 19991024 (release)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> import os
>  >>> os.listdir('.')
> ['fro\xcc\x88r']
>  >>> utf8name = os.listdir('.')[0]
>  >>> unicodename = utf8name.decode('utf-8')
>  >>> unicodename
> u'fro\u0308r'

U+0308 is not 'LATIN SMALL LETTER O WITH DIAERESIS' but
'COMBINING DIAERESIS', i.e. the ö got decomposed into
o + 'COMBINING DIAERESIS'.

> [...]


Bye,
    Walter Dörwald



From fredrik@pythonware.com  Tue Aug 13 13:17:54 2002
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Tue, 13 Aug 2002 14:17:54 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
References: <5BF4D214-AEB0-11D6-9BA3-0030655234CE@cwi.nl>
Message-ID: <025901c242c3$796b5b50$0900a8c0@spiff>

jack wrote:

> Sigh. \u0308 is not in the range(256), but the whole point of=20
> encode('latin-1') is to make it so, isn't it?

Define "make it so"?

The encoders convert unicode code points to corresponding code
points in the given 8-bit encoding.  One character in, one character
out (unless the target encoding is a multibyte encoding, like utf-8).

This works perfectly well if producers follow the "early uniform
normalization" rule (everything else is madness).  For some reason,
your listdir implementation doesn't.

Instead of returning LATIN SMALL LETTER O WITH DIARESIS (\u00f6),
it returns multiple unicode characters.  I'd say it's broken.

As far as I know, there's no standard unicode normalizer in Python.





From guido@python.org  Tue Aug 13 13:57:59 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 08:57:59 -0400
Subject: [Python-Dev] Bugs in the python grammar?
In-Reply-To: Your message of "Tue, 13 Aug 2002 12:24:34 BST."
 
References:  <200208130911.g7D9B1b27636@pcp02138704pcs.reston01.va.comcast.net>
 
Message-ID: <200208131257.g7DCvxS29767@pcp02138704pcs.reston01.va.comcast.net>

> > Or download Python and look at Grammar/Grammar .
> > 
> Does the program for converting the grammar to a railroad diagram still 
> exist anywhere? I've searched Google, but I can't find any trace of it 
> anywhere.

No, I don't have it and I don't think the original author has it
either.  That was 11-12 years ago...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From yozh@mx1.ru  Tue Aug 13 14:03:21 2002
From: yozh@mx1.ru (Stepan Koltsov)
Date: Tue, 13 Aug 2002 17:03:21 +0400
Subject: [Python-Dev] q about __dict__
Message-ID: <20020813130321.GA12404@banana.mx1.ru>

Hi, Guido, other python developers and other subscribers :-)

Anybody, please explain me, is this a bug? :

Code
=== begin ===
class Dict(dict):
	def __setitem__(*x):
		raise Exception, "Please, do not touch me!"

class A:
	def __init__(*x):
		self.__dict__ = Dict()

A().x = 12
===  end  ===
doesn't raise Exception, i. e. my __setitem__ function is not called.

Patching that is simple: in Objects/classobject.c need to replace
	PyDict_SetItem -> PyObject_SetItem
	PyDict_GetItem -> PyObject_GetItem
	PyDict_DelItem -> PyObject_DelItem
and maybe something else (not much).

Overhead is minimal, and as bonus python gets ability of assigning
object of any type (not inherited from dict) to __dict__.

Motivation:

Somethimes I want to write strange classes, for example, class with
ordered attributes. I know, that it is possibe to implement this
redefining class attributes __setattr__, etc., but setting __dict__
looks clearer.

Thanks for reading this letter till the end ;-)

-- 
mailto: Stepan Koltsov 


From guido@python.org  Tue Aug 13 14:04:30 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 09:04:30 -0400
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: Your message of "Tue, 13 Aug 2002 14:17:54 +0200."
 <025901c242c3$796b5b50$0900a8c0@spiff>
References: <5BF4D214-AEB0-11D6-9BA3-0030655234CE@cwi.nl>
 <025901c242c3$796b5b50$0900a8c0@spiff>
Message-ID: <200208131304.g7DD4Up29869@pcp02138704pcs.reston01.va.comcast.net>

> This works perfectly well if producers follow the "early uniform
> normalization" rule (everything else is madness).  For some reason,
> your listdir implementation doesn't.

My guess it's not his listdir() or filesystem, but the keyboard
driver.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Aug 13 14:01:25 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 09:01:25 -0400
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: Your message of "Tue, 13 Aug 2002 13:32:33 +0200."
 <5BF4D214-AEB0-11D6-9BA3-0030655234CE@cwi.nl>
References: <5BF4D214-AEB0-11D6-9BA3-0030655234CE@cwi.nl>
Message-ID: <200208131301.g7DD1PC29805@pcp02138704pcs.reston01.va.comcast.net>

> Here's a transcript of my Python session. The terminal has been set to 
> render in latin-1. The directory contains one file, "frör" 
> (fr-o-umlaut-r).
> sap!jack- python
> Python 2.3a0 (#32, Aug 12 2002, 15:31:25)
> [GCC 2.95.2 19991024 (release)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> import os
>  >>> os.listdir('.')
> ['fro\xcc\x88r']
>  >>> utf8name = os.listdir('.')[0]
>  >>> unicodename = utf8name.decode('utf-8')
>  >>> unicodename
> u'fro\u0308r'
>  >>> print unicodename.encode('latin-1')
> Traceback (most recent call last):
>    File "", line 1, in ?
> UnicodeError: Latin-1 encoding error: ordinal not in range(256)
>  >>>
> 
> Sigh. \u0308 is not in the range(256), but the whole point of 
> encode('latin-1') is to make it so, isn't it? And o-umlaut definitely 
> has a latin-1 encoding. I tried the same with macroman in stead of 
> latin-1 (just to make sure this wasn't a bug in the latin-1 encoder), 
> but still no go.
> 
> What am I doing wrong?

Looks like it isn't you: the filename somehow contains a character
that's not in the Latin-1 subset of Unicode, and no encoding can fix
that for you.  I don't know why -- you'll have to figure out why your
keyboard generates that character when you type o-umlaut.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Aug 13 14:08:53 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 09:08:53 -0400
Subject: [Python-Dev] Bugs #544740: test_commands test fails under Cygwin
In-Reply-To: Your message of "Tue, 13 Aug 2002 08:08:51 EDT."
 <20020813120851.GC2548@tishler.net>
References: <20020813120851.GC2548@tishler.net>
Message-ID: <200208131308.g7DD8rS29896@pcp02138704pcs.reston01.va.comcast.net>

> Neil Norwitz suggested that I discuss the following on python-dev:
> 
>     http://sourceforge.net/tracker/?func=detail&aid=544740&group_id=5470&atid=105470
> 
> The problem is that test_commands does not handle spaces in either user
> or group names.  Although this is probably only an issue under Cygwin,
> this could affect other Unixes too (yup, I'm clutching at straws).
> Anyway, suggestions on how to fix this will be greatly appreciated.

The obvious fix would be a better regular expression.  Please submit a
patch.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nhodgson@bigpond.net.au  Tue Aug 13 14:12:16 2002
From: nhodgson@bigpond.net.au (Neil Hodgson)
Date: Tue, 13 Aug 2002 23:12:16 +1000
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
References: <200208121148.g7CBm6KD027268@paros.informatik.hu-berlin.de>
Message-ID: <04ff01c242cb$0c0bf620$3da48490@neil>

> Please comment on the PEP. There is an updated patch on
> http://python.org/sf/594001; please comment on the patch as well.

   I received off-list replies to the PEP from about 5 people. All were in
favour but it doesn't show a great deal of interest. It is hard to place a
good limit on how far this PEP should extend. My initial proposal was just
to allow opening files with Unicode names. The extension to other functions
suggested on the list were worthwhile, especially listdir, but since NT
supports Unicode in all system calls, it could end up being applied to less
useful calls such as popen and getenv.

   There was a suggestion from David Ascher that supporting a Unicode
version of getcwd would be useful and I agree as this will often feed into
the other file handling calls. This one can't be finessed by checking an
input argument for Unicode, so needs an extra name such as getcwdu. It'd be
a good idea here to work out a naming convention for this distinction now so
it can be used for more functions in the future.

Guido:
> Aren't there some #ifdefs missing? posix_[12]str have code
> that's only relevant for Windows but isn't #ifdef'ed out
> like it is elsewhere.

   I didn't have more #ifdefs to shorten the code. The #ifdefs that exist
are to hide symbols (like _wmkdir) that may only be available on Windows.
The Unicode paths are guarded by unicode_file_names() so will be avoided on
other platforms. It doesn't matter greatly to me if there are additional
compile time guards although taking it further to have the extra (wide)
arguments to posix_[12]str only on Windows would obfuscate the code.

   Neil




From guido@python.org  Tue Aug 13 14:11:18 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 09:11:18 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Tue, 13 Aug 2002 12:55:52 +0200."
 <3D58E5B8.4040905@lemburg.com>
References: 
 <3D58E5B8.4040905@lemburg.com>
Message-ID: <200208131311.g7DDBI929911@pcp02138704pcs.reston01.va.comcast.net>

> Here's an slightly different idea:
> 
>    Bit shifting is hardly ever done on signed data and since
>    Python does not provide an unsigned integer object, most
>    developers stick to the integer object and interpret its value
>    as unsigned object (much like many people interpret strings
>    as having a Latin-1 value). If people use integers that way,
>    they know what they are doing and they are usually not interested
>    in the sign of the value at all, as long as the bits stay the
>    same when they pass the value to various Python APIs (including
>    C APIs).
> 
>    Conclusion: Offer developers a better way to deal with unsigned
>    data, e.g. an unsigned 32-bit integer type as subtype of int and
>    let the bit manipulation operators return this unsigned type.
> 
>    For backward compat. make sure that common parser markers continue
>    to work as they do now and add new ones for unsigned values for
>    future use. PyInt_AS_LONG(unsignedInteger) would return the
>    value of unsignedInteger casted to a signed one and extensions
>    would be happy.
> 
>    Only if a value gets shifted beyond the first 32 bits,
>    convert it to a long.
> 
> That should solve most backward compat problems for bit
> shifters while still unifying ints and longs.

-100.

We are *already* offering developers a way to deal with unsigned data:
use longs.  Bit shifting works just fine on longs, and the results are
positive unless you "or" in a negative number.  Getting a 32-bit
result in Python is trivial (mask with 0xffffffffL).  The Python C API
already supports getting an unsigned C long out of a Python long
(PyLong_AsUnsignedLong()).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Aug 13 15:23:28 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 10:23:28 -0400
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: Your message of "Tue, 13 Aug 2002 23:12:16 +1000."
 <04ff01c242cb$0c0bf620$3da48490@neil>
References: <200208121148.g7CBm6KD027268@paros.informatik.hu-berlin.de>
 <04ff01c242cb$0c0bf620$3da48490@neil>
Message-ID: <200208131423.g7DENSg02154@odiug.zope.com>

> There was a suggestion from David Ascher that supporting a Unicode
> version of getcwd would be useful and I agree as this will often feed into
> the other file handling calls. This one can't be finessed by checking an
> input argument for Unicode, so needs an extra name such as getcwdu. It'd be
> a good idea here to work out a naming convention for this distinction now so
> it can be used for more functions in the future.

It's gonna be ugly anyhow, so appending a 'u' is fine with me.

> Guido:
> > Aren't there some #ifdefs missing? posix_[12]str have code
> > that's only relevant for Windows but isn't #ifdef'ed out
> > like it is elsewhere.
> 
>    I didn't have more #ifdefs to shorten the code. The #ifdefs that exist
> are to hide symbols (like _wmkdir) that may only be available on Windows.
> The Unicode paths are guarded by unicode_file_names() so will be avoided on
> other platforms. It doesn't matter greatly to me if there are additional
> compile time guards although taking it further to have the extra (wide)
> arguments to posix_[12]str only on Windows would obfuscate the code.

Those are all details.  We can finesse that when we get closer to
agreeing on the semantics.  I think code that we know will never be
executed on Unix should be inside #ifdefs.  Maybe we should reconsider
moving the Windows code to a separate file...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From dan@sidhe.org  Tue Aug 13 15:33:19 2002
From: dan@sidhe.org (Dan Sugalski)
Date: Tue, 13 Aug 2002 10:33:19 -0400
Subject: [Python-Dev] Bugs in the python grammar?
In-Reply-To: <200208130911.g7D9B1b27636@pcp02138704pcs.reston01.va.comcast.net>
References: 
 <200208130911.g7D9B1b27636@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

At 5:11 AM -0400 8/13/02, Guido van Rossum wrote:
>  > We've been digging through the python grammar, looking to build up a
>>  parser for it, and have come across what look to be bugs:
>>
>>  In http://www.python.org/doc/current/ref/grammar.txt :
>
>I don't know where that file comes from; it's not the official
>grammar.  Fred will fix the typos you found.
>
>This one is correct (we use it to generate our parser):
>
>http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Grammar/Grammar?rev=1.48&content-type=text/vnd.viewcvs-markup

Okay, cool, thanks.
-- 
                                         Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
dan@sidhe.org                         have teddy bears and even
                                       teddy bears get drunk


From martin@v.loewis.de  Tue Aug 13 15:45:36 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 16:45:36 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <200208130941.g7D9ftP27703@pcp02138704pcs.reston01.va.comcast.net>
References: <93F350A8-AE2D-11D6-B7D6-003065517236@oratrix.com>
 <200208122007.g7CK7li21777@pcp02138704pcs.reston01.va.comcast.net>
 
 <200208130015.g7D0Fel26486@pcp02138704pcs.reston01.va.comcast.net>
 
 <200208130941.g7D9ftP27703@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

Guido van Rossum  writes:

> Aha!  So MBCS is not an encoding: it's an indirection for a variety of
> encodings.  (Is there a way to find out what the encoding is?)

Correct. In Python, locale.getdefaultlocale()[1] returns the encoding;
the underlying API function is GetACP, and Python uses it as

    PyOS_snprintf(encoding, sizeof(encoding), "cp%d", GetACP());

There is a second indirection, the "OEM code page", which they use:
- for on-disk FAT short file names,
- for the cmd.exe window

Python currently offers no access to GetOEMCP().

> Do you mean that the condition on
> 
> #if defined(HAVE_LANGINFO_H) && defined(CODESET)
> 
> is reliably false on Windows?  Otherwise _locale.setlocale() could set
> it.

Correct. nl_langinfo is a Sun invention (I believe) which made it into
Posix; Microsoft ignores it.

> So as long as they use 8-bit it's not our problem, right.  Another
> reason to avoid prodicing Unicode without a clue that the app expects
> Unicode (alas).  (BTW I find a Unicode argument to os.listdir() a
> sufficient clue.  IOW os.listdir(u".") should return Unicode.)

Indeed, that would be consistent. I deliberately want to leave this
out of PEP 277. On Unix, things are not that clear - as Jack points
out, readlink() and getcwd() also need consideration.

> > Ok, I'll update the PEP.
> 
> To what?  (It would be bad if I convinced you at the same time you
> convinced me of the opposite. :-)

I haven't changed anything yet, and I won't. 

In this terrain, Windows has the cleaner API (they consider file names
as character strings, not as byte strings), so doing the right thing
is easier.

Regards,
Martin


From martin@v.loewis.de  Tue Aug 13 15:48:09 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 16:48:09 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <5BF4D214-AEB0-11D6-9BA3-0030655234CE@cwi.nl>
References: <5BF4D214-AEB0-11D6-9BA3-0030655234CE@cwi.nl>
Message-ID: 

Jack Jansen  writes:

> I was going to suggest that if we return mixed sets of unicode/string
> values from listdir() we could also do the same thing for platforms
> where FileSystemDefaultEncoding is utf-8, such as MacOSX.

Indeed, on MacOS, I think returning Unicode objects is a safe thing to
do as well.

Regards,
Martin


From guido@python.org  Tue Aug 13 15:51:40 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 10:51:40 -0400
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: Your message of "Tue, 13 Aug 2002 16:45:36 +0200."
 
References: <93F350A8-AE2D-11D6-B7D6-003065517236@oratrix.com> <200208122007.g7CK7li21777@pcp02138704pcs.reston01.va.comcast.net>  <200208130015.g7D0Fel26486@pcp02138704pcs.reston01.va.comcast.net>  <200208130941.g7D9ftP27703@pcp02138704pcs.reston01.va.comcast.net>
 
Message-ID: <200208131451.g7DEpep03988@odiug.zope.com>

> > Aha!  So MBCS is not an encoding: it's an indirection for a variety of
> > encodings.  (Is there a way to find out what the encoding is?)
> 
> Correct. In Python, locale.getdefaultlocale()[1] returns the encoding;
> the underlying API function is GetACP, and Python uses it as
> 
>     PyOS_snprintf(encoding, sizeof(encoding), "cp%d", GetACP());
> 
> There is a second indirection, the "OEM code page", which they use:
> - for on-disk FAT short file names,
> - for the cmd.exe window
> 
> Python currently offers no access to GetOEMCP().
> 
> > Do you mean that the condition on
> > 
> > #if defined(HAVE_LANGINFO_H) && defined(CODESET)
> > 
> > is reliably false on Windows?  Otherwise _locale.setlocale() could set
> > it.
> 
> Correct. nl_langinfo is a Sun invention (I believe) which made it into
> Posix; Microsoft ignores it.
> 
> > So as long as they use 8-bit it's not our problem, right.  Another
> > reason to avoid prodicing Unicode without a clue that the app expects
> > Unicode (alas).  (BTW I find a Unicode argument to os.listdir() a
> > sufficient clue.  IOW os.listdir(u".") should return Unicode.)
> 
> Indeed, that would be consistent. I deliberately want to leave this
> out of PEP 277. On Unix, things are not that clear - as Jack points
> out, readlink() and getcwd() also need consideration.
> 
> > > Ok, I'll update the PEP.
> > 
> > To what?  (It would be bad if I convinced you at the same time you
> > convinced me of the opposite. :-)
> 
> I haven't changed anything yet, and I won't. 
> 
> In this terrain, Windows has the cleaner API (they consider file names
> as character strings, not as byte strings), so doing the right thing
> is easier.

OK.  I leave this further in your capable hands, Martin!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@v.loewis.de  Tue Aug 13 15:50:59 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 16:50:59 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <025901c242c3$796b5b50$0900a8c0@spiff>
References: <5BF4D214-AEB0-11D6-9BA3-0030655234CE@cwi.nl>
 <025901c242c3$796b5b50$0900a8c0@spiff>
Message-ID: 

"Fredrik Lundh"  writes:

> As far as I know, there's no standard unicode normalizer in Python.

Maybe that example shows that there should be: the codecs which use
combining characters should then normalize the string on error,
probably to NFC, and retry.

Regards,
Martin


From martin@v.loewis.de  Tue Aug 13 15:54:08 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 16:54:08 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <200208131301.g7DD1PC29805@pcp02138704pcs.reston01.va.comcast.net>
References: <5BF4D214-AEB0-11D6-9BA3-0030655234CE@cwi.nl>
 <200208131301.g7DD1PC29805@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

Guido van Rossum  writes:

> Looks like it isn't you: the filename somehow contains a character
> that's not in the Latin-1 subset of Unicode, and no encoding can fix
> that for you.  I don't know why -- you'll have to figure out why your
> keyboard generates that character when you type o-umlaut.

As Walter explains, he has \u006f\u0308, which is

\N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}

This could be normalized to

\N{LATIN SMALL LETTER O WITH DIAERESIS}

which then can be encoded as Latin-1. This, of course, requires the
databases for normalization (canonical composition and decomposition).

Regards,
Martin


From martin@v.loewis.de  Tue Aug 13 15:59:05 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 16:59:05 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: 
References: 
Message-ID: 

Jack Jansen  writes:

> If we have only pure signed and pure unsigned converters it would mean
> an extraordinary amount of work, but luckily it seems that that is not
> going to happen.

Now I'm confused: "l" *is* a "pure signed converter", no? I.e. it
won't accept a value above 2**31-1, right?

Regards,
Martin


From martin@v.loewis.de  Tue Aug 13 15:56:33 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 16:56:33 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <04ff01c242cb$0c0bf620$3da48490@neil>
References: <200208121148.g7CBm6KD027268@paros.informatik.hu-berlin.de>
 <04ff01c242cb$0c0bf620$3da48490@neil>
Message-ID: 

"Neil Hodgson"  writes:

>    There was a suggestion from David Ascher that supporting a Unicode
> version of getcwd would be useful and I agree as this will often feed into
> the other file handling calls. This one can't be finessed by checking an
> input argument for Unicode, so needs an extra name such as getcwdu. It'd be
> a good idea here to work out a naming convention for this distinction now so
> it can be used for more functions in the future.

Alternatively, a flag could do. Alas, it currently isn't in the PEP,
and unless there is easy agreement on how it should work, I think this
must be left for further study.

Regards,
Martin



From guido@python.org  Tue Aug 13 16:04:32 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 11:04:32 -0400
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: Your message of "Tue, 13 Aug 2002 16:54:08 +0200."
 
References: <5BF4D214-AEB0-11D6-9BA3-0030655234CE@cwi.nl> <200208131301.g7DD1PC29805@pcp02138704pcs.reston01.va.comcast.net>
 
Message-ID: <200208131504.g7DF4We04131@odiug.zope.com>

> As Walter explains, he has \u006f\u0308, which is
> 
> \N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}
> 
> This could be normalized to
> 
> \N{LATIN SMALL LETTER O WITH DIAERESIS}
> 
> which then can be encoded as Latin-1. This, of course, requires the
> databases for normalization (canonical composition and decomposition).

But if you pass the normalized string (or the Latin-1 string) to
open(), will it find the file?  I.e. if the filesystem has the
unnormalized name stored in its directory, will filesystem requests
normalize filenames before comparing them?

Jack, can you try to do that?  Can you try open('fr\xf6r') in that
directory?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Tue Aug 13 14:56:55 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 13 Aug 2002 15:56:55 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
References:               <3D58E5B8.4040905@lemburg.com> <200208131311.g7DDBI929911@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3D591027.8070200@lemburg.com>

Guido van Rossum wrote:
>>Here's an slightly different idea:
>>
>>   Bit shifting is hardly ever done on signed data and since
>>   Python does not provide an unsigned integer object, most
>>   developers stick to the integer object and interpret its value
>>   as unsigned object (much like many people interpret strings
>>   as having a Latin-1 value). If people use integers that way,
>>   they know what they are doing and they are usually not interested
>>   in the sign of the value at all, as long as the bits stay the
>>   same when they pass the value to various Python APIs (including
>>   C APIs).
>>
>>   Conclusion: Offer developers a better way to deal with unsigned
>>   data, e.g. an unsigned 32-bit integer type as subtype of int and
>>   let the bit manipulation operators return this unsigned type.
>>
>>   For backward compat. make sure that common parser markers continue
>>   to work as they do now and add new ones for unsigned values for
>>   future use. PyInt_AS_LONG(unsignedInteger) would return the
>>   value of unsignedInteger casted to a signed one and extensions
>>   would be happy.
>>
>>   Only if a value gets shifted beyond the first 32 bits,
>>   convert it to a long.
>>
>>That should solve most backward compat problems for bit
>>shifters while still unifying ints and longs.
> 
> 
> -100.
> 
> We are *already* offering developers a way to deal with unsigned data:
> use longs.  Bit shifting works just fine on longs, and the results are
> positive unless you "or" in a negative number.  Getting a 32-bit
> result in Python is trivial (mask with 0xffffffffL).  The Python C API
> already supports getting an unsigned C long out of a Python long
> (PyLong_AsUnsignedLong()).

You are turning in circles here. Longs are not compatible
to integers at C level. That's what I was trying to
address.

Longs don't offer the performance you'd expect from bit operations,
so they are not a real-life alternative to native 32-bit or 64-bit
integers or bit fields. They are from a language designer's POV,
but then I'd suggest to drop the difference between ints and longs
completely in Py3k instead and make them a single hybrid type for
multi-precision numbers which uses native C number types or arrays
of bytes as necessary.

BTW, what do you mean by:

	"hex()/oct() of negative int will return "
	"a signed string in Python 2.4 and up"

Are you suggesting that hex(0xff000000) returns
"-0x1000000" ?

That looks like another potentially harmful change.

Is it really worth breaking these things just for the sake
of trying to avoid OverflowErrors where a simple explicit
cast by the programmer is all that's needed to avoid them ?

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From guido@python.org  Tue Aug 13 16:15:30 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 11:15:30 -0400
Subject: [Python-Dev] q about __dict__
In-Reply-To: Your message of "Tue, 13 Aug 2002 17:03:21 +0400."
 <20020813130321.GA12404@banana.mx1.ru>
References: <20020813130321.GA12404@banana.mx1.ru>
Message-ID: <200208131515.g7DFFUU05315@odiug.zope.com>

> Anybody, please explain me, is this a bug? :

No and yes.  It's curently defined as a feature -- you can subclass
dict, but it's not always safe to override operations like __getitem__
because Python internally takes shortcuts for dicts used to implement
namespaces.  For dicts used for namespaces, it's only safe to add new
methods; for dicts not used for namespaces, it's safe to override
special methods.

> Code
> === begin ===
> class Dict(dict):
> 	def __setitem__(*x):
> 		raise Exception, "Please, do not touch me!"
> 
> class A:
> 	def __init__(*x):
> 		self.__dict__ = Dict()
> 
> A().x = 12
> ===  end  ===
> doesn't raise Exception, i. e. my __setitem__ function is not called.
> 
> Patching that is simple: in Objects/classobject.c need to replace
> 	PyDict_SetItem -> PyObject_SetItem
> 	PyDict_GetItem -> PyObject_GetItem
> 	PyDict_DelItem -> PyObject_DelItem
> and maybe something else (not much).
> 
> Overhead is minimal, and as bonus python gets ability of assigning
> object of any type (not inherited from dict) to __dict__.

Have you tried this?  Because PyDict_GetItem() doesn't set an
exception condition when the key is not found, a lot of code would
have to be changed.

> Motivation:
> 
> Somethimes I want to write strange classes, for example, class with
> ordered attributes. I know, that it is possibe to implement this
> redefining class attributes __setattr__, etc., but setting __dict__
> looks clearer.

For this particular situation (instance variables) I'm not totally
against fixing this, but I don't find it has a high priority.  You can
help by providing a patch that implements your idea above, and showing
some benchmark results (e.g. based on PyBench) that indicate the
minimal performance impact you're claiming.  If you don't feel like
doing this yourself (e.g. because you're not confident about your C
coding skills), ask around on comp.lang.python for help.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Aug 13 16:17:25 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 11:17:25 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Tue, 13 Aug 2002 16:59:05 +0200."
 
References: 
 
Message-ID: <200208131517.g7DFHP205353@odiug.zope.com>

> Jack Jansen  writes:
> 
> > If we have only pure signed and pure unsigned converters it would mean
> > an extraordinary amount of work, but luckily it seems that that is not
> > going to happen.

[MvL]
> Now I'm confused: "l" *is* a "pure signed converter", no? I.e. it
> won't accept a value above 2**31-1, right?

Correct.  I think Jack's worry is that in 2.3, mask expressions can be
negative, and if "k" were a pure unsigned converter, negative masks
would not be accepted.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Aug 13 16:26:26 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 11:26:26 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Tue, 13 Aug 2002 15:56:55 +0200."
 <3D591027.8070200@lemburg.com>
References:  <3D58E5B8.4040905@lemburg.com> <200208131311.g7DDBI929911@pcp02138704pcs.reston01.va.comcast.net>
 <3D591027.8070200@lemburg.com>
Message-ID: <200208131526.g7DFQQe05375@odiug.zope.com>

> > We are *already* offering developers a way to deal with unsigned data:
> > use longs.  Bit shifting works just fine on longs, and the results are
> > positive unless you "or" in a negative number.  Getting a 32-bit
> > result in Python is trivial (mask with 0xffffffffL).  The Python C API
> > already supports getting an unsigned C long out of a Python long
> > (PyLong_AsUnsignedLong()).
> 
> You are turning in circles here. Longs are not compatible
> to integers at C level. That's what I was trying to
> address.

In what sense are longs not compatible to C integers?  Because they
can hold larger values?  PyLong_AsUnsignedLong() does a range check;
Python code can force the value to be in range by using (e.g.)
x&0xffffffffL.

> Longs don't offer the performance you'd expect from bit operations,

Oh puleeeeeeze.  The overhead of the VM, object creation and
deallocation, overflow checks, etc., completely drown the time it
takes to do the meazly a< so they are not a real-life alternative to native 32-bit or 64-bit
> integers or bit fields.

Of course they aren't, and that's not what we need.  We're talking
here about being able to pass various bits and masks to system and
library calls that.

> They are from a language designer's POV,
> but then I'd suggest to drop the difference between ints and longs
> completely in Py3k instead and make them a single hybrid type for
> multi-precision numbers which uses native C number types or arrays
> of bytes as necessary.

Yeah, in Py3k, there will be only one type.  Pep 237 tries to
approximate this without having to change *every* line of code dealing
with ints.  Unless you use isinstance() or type(), eventually you
won't be able to tell the difference (and we'll provide a way to
abstract away from those too, e.g. a baseint class).

> BTW, what do you mean by:
> 
> 	"hex()/oct() of negative int will return "
> 	"a signed string in Python 2.4 and up"
> 
> Are you suggesting that hex(0xff000000) returns
> "-0x1000000" ?

No, because 0xff000000 will be a positive number. :-)

However, hex(-1) will return '-0x1' rather than '0xffffffff'.

> That looks like another potentially harmful change.

That's why I'm adding warnings now.

I'm frustrated that you apparently didn't read PEP 237 when it was
discussed in the first place.

> Is it really worth breaking these things just for the sake
> of trying to avoid OverflowErrors where a simple explicit
> cast by the programmer is all that's needed to avoid them ?

Yes.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jriehl@spaceship.com  Tue Aug 13 17:05:51 2002
From: jriehl@spaceship.com (Jonathan Riehl)
Date: Tue, 13 Aug 2002 11:05:51 -0500 (CDT)
Subject: [Python-Dev] PEP 269 will live again.
Message-ID: 

I move to move PEP 269 (pgen module for Python) to zombie monster status
and refer all interested parties to my post today in the parser-sig for
more details.  Apparently I am only interested in parser generators in the
month of August (PEP 269 was drafted in Aug.2001).
-Jon



From mal@lemburg.com  Tue Aug 13 17:12:40 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 13 Aug 2002 18:12:40 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
References:  <3D58E5B8.4040905@lemburg.com> <200208131311.g7DDBI929911@pcp02138704pcs.reston01.va.comcast.net>              <3D591027.8070200@lemburg.com> <200208131526.g7DFQQe05375@odiug.zope.com>
Message-ID: <3D592FF8.2060705@lemburg.com>

Guido van Rossum wrote:
>>>We are *already* offering developers a way to deal with unsigned data:
>>>use longs.  Bit shifting works just fine on longs, and the results are
>>>positive unless you "or" in a negative number.  Getting a 32-bit
>>>result in Python is trivial (mask with 0xffffffffL).  The Python C API
>>>already supports getting an unsigned C long out of a Python long
>>>(PyLong_AsUnsignedLong()).
>>
>>You are turning in circles here. Longs are not compatible
>>to integers at C level. That's what I was trying to
>>address.
> 
> 
> In what sense are longs not compatible to C integers? 

PyInt_Check() doesn't accept longs. PyInt_AS_LONG() returns
garbage.

> I'm frustrated that you apparently didn't read PEP 237 when it was
> discussed in the first place.

I was on vacation at the time you discussed this and I
had never expected that you are actually trying to force
long usage instead of integer usage. My impression was that
you were aiming at providing ways to be able to pass longs
to integer aware APIs which is goodness.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From brian@sweetapp.com  Tue Aug 13 17:20:06 2002
From: brian@sweetapp.com (Brian Quinlan)
Date: Tue, 13 Aug 2002 09:20:06 -0700
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <200208131504.g7DF4We04131@odiug.zope.com>
Message-ID: <001e01c242e5$49697ff0$bd5d4540@Dell2>

Guido van Rossum wrote:
> But if you pass the normalized string (or the Latin-1 string) to
> open(), will it find the file?  

I tried opening a file using both "o\xcc\x88" and "\xc3\xb6". Both
result in the same file being opened.

> I.e. if the filesystem has the
> unnormalized name stored in its directory, will filesystem requests
> normalize filenames before comparing them?

It could be that Apple is decomposing the filenames before comparing
them. Either way works.

Cheers,
Brian



From guido@python.org  Tue Aug 13 17:26:26 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 12:26:26 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Tue, 13 Aug 2002 18:12:40 +0200."
 <3D592FF8.2060705@lemburg.com>
References:  <3D58E5B8.4040905@lemburg.com> <200208131311.g7DDBI929911@pcp02138704pcs.reston01.va.comcast.net> <3D591027.8070200@lemburg.com> <200208131526.g7DFQQe05375@odiug.zope.com>
 <3D592FF8.2060705@lemburg.com>
Message-ID: <200208131626.g7DGQQr08391@odiug.zope.com>

> > In what sense are longs not compatible to C integers? 
> 
> PyInt_Check() doesn't accept longs. PyInt_AS_LONG() returns
> garbage.

Since you were proposing a new type, I don't see how that matters.
(Making unsigned a subtype of int won't work.)

> > I'm frustrated that you apparently didn't read PEP 237 when it was
> > discussed in the first place.
> 
> I was on vacation at the time you discussed this and I
> had never expected that you are actually trying to force
> long usage instead of integer usage. My impression was that
> you were aiming at providing ways to be able to pass longs
> to integer aware APIs which is goodness.

PyInt_AsLong() and the 'i' and 'l' formats to PyArg_Parse* have
accepted longs for a long time.  The proper idiom is either to use
PyArg_Parse* with an 'i' or 'l' format, or to call PyInt_AsLong()
*without* first using PyInt_Check().

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Aug 13 17:36:22 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 12:36:22 -0400
Subject: [Python-Dev] PEP 269 will live again.
In-Reply-To: Your message of "Tue, 13 Aug 2002 11:05:51 CDT."
 
References: 
Message-ID: <200208131636.g7DGaMK08413@odiug.zope.com>

> I move to move PEP 269 (pgen module for Python) to zombie monster status
> and refer all interested parties to my post today in the parser-sig for
> more details.  Apparently I am only interested in parser generators in the
> month of August (PEP 269 was drafted in Aug.2001).

I suppose you're referring to this message:

http://mail.python.org/pipermail/parser-sig/2002-August/000010.html

I have not retired your PEP and am glad you're interested in this
subject again.  Let's try to reach a conclusion before August is over.

I don't think you should try to tell the Jython folks what to do.  A
pgen module that only works in CPython is still valuable.  If you want
to port pgen to Jython and support it as a module, that's fine, but I
don't think you should try to get the Jython developers to use pgen as
their parser.  After all, Jython's *implementation* is *supposed* to
be Javaesque.

Are you interested in implementing PEP 269 as it currently stands?
Then fine, let's do it and get it into Python 2.3.

If you want to expand the scope, I predict that it'll never happen, so
then let's retire the PEP.  It's up to you.

Note that Jeremy has a new Python compiler package (Lib/python/ in the
Python 2.3 CVS tree), which currently uses parse trees as produced by
the old 'parser' module as input, and then restructures them into more
abstract syntax trees.  This compiler is easily retargetable to other
input and output structures though -- I believe Finn Bock already has
a Jython version of it.  I don't know what it generates, I doubt it
generates CPython bytecode, maybe it generates Java source or JVM
assembler; I believe it takes the same parse tree that Jython uses as
input.

I think it would be useful if you use the same form of abstract syntax
trees as Jeremy's parser uses (not the parser module output, but the
restructured abstract syntax trees); I think they are quite flexible
and useful.

If you don't want to do this, you'll have to motivate why your
alternative is better, and also show how Jeremy's compiler package can
be easily adapted to use your form of parse trees.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Tue Aug 13 17:37:20 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 12:37:20 -0400
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: Your message of "Tue, 13 Aug 2002 09:20:06 PDT."
 <001e01c242e5$49697ff0$bd5d4540@Dell2>
References: <001e01c242e5$49697ff0$bd5d4540@Dell2>
Message-ID: <200208131637.g7DGbLA08429@odiug.zope.com>

> Guido van Rossum wrote:
> > But if you pass the normalized string (or the Latin-1 string) to
> > open(), will it find the file?  
> 
> I tried opening a file using both "o\xcc\x88" and "\xc3\xb6". Both
> result in the same file being opened.
> 
> > I.e. if the filesystem has the
> > unnormalized name stored in its directory, will filesystem requests
> > normalize filenames before comparing them?
> 
> It could be that Apple is decomposing the filenames before comparing
> them. Either way works.

Hm, that sucks (either way) -- because you get unnormalized Unicode
out of directory listings, which is harder to turn into local
encodings.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From brian@sweetapp.com  Tue Aug 13 17:50:14 2002
From: brian@sweetapp.com (Brian Quinlan)
Date: Tue, 13 Aug 2002 09:50:14 -0700
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <200208131637.g7DGbLA08429@odiug.zope.com>
Message-ID: <001f01c242e9$7f45cfd0$bd5d4540@Dell2>

Guido van Rossum wrote:
> > It could be that Apple is decomposing the filenames before comparing
> > them. Either way works.
>=20
> Hm, that sucks (either way) -- because you get unnormalized Unicode
> out of directory listings, which is harder to turn into local
> encodings.

Here is a relevant URI:
http://developer.apple.com/techpubs/macosx/Essentials/SystemOverview/Fil
eSystem/File_Encodings_and_Fonts.html

"""In addition, all code that calls BSD system routines should ensure
that the const *char parameters of these routines are in UTF-8 encoding.
All BSD system functions expect their string parameters to be in UTF-8
encoding and nothing else. An additional caveat is that string
parameters for files, paths, and other file-system entities must be in
canonical UTF-8. In a canonical UTF-8 Unicode string, all decomposable
characters are decomposed; for example, ? (0x00E9) is represented as e
(0x0065) + =B4(0x0301). To put things in canonical UTF-8 encoding, use =
the
"file-system representation" APIs defined in Cocoa and Carbon (including
Core Foundation). For example, to get a canonical UTF-8 character string
in Cocoa, use NSString's fileSystemRepresentation method; for
noncanonical UTF-8 strings, use NSString's UTF8String method"""

Cheers,
Brian



From mal@lemburg.com  Tue Aug 13 17:54:49 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 13 Aug 2002 18:54:49 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
References: <001e01c242e5$49697ff0$bd5d4540@Dell2> <200208131637.g7DGbLA08429@odiug.zope.com>
Message-ID: <3D5939D9.4060304@lemburg.com>


Guido van Rossum wrote:
>>Guido van Rossum wrote:
>>
>>>But if you pass the normalized string (or the Latin-1 string) to
>>>open(), will it find the file?  
>>
>>I tried opening a file using both "o\xcc\x88" and "\xc3\xb6". Both
>>result in the same file being opened.
>>
>>
>>>I.e. if the filesystem has the
>>>unnormalized name stored in its directory, will filesystem requests
>>>normalize filenames before comparing them?
>>
>>It could be that Apple is decomposing the filenames before comparing
>>them. Either way works.

The recommended way of doing normalization is to go by
Normalization Form C: Canonical Decomposition,
followed by Canonical Composition.

See http://www.unicode.org/unicode/reports/tr15/#Specification

Note that for proper collation suppotr, Unicode strings mus first be
normalized. See http://www.unicode.org/unicode/reports/tr10/#Main_Algorithm

> Hm, that sucks (either way) -- because you get unnormalized Unicode
> out of directory listings, which is harder to turn into local
> encodings.

You can easily normalize it again (provided you have a normalization
lib at hand).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From pedroni@inf.ethz.ch  Tue Aug 13 18:41:43 2002
From: pedroni@inf.ethz.ch (Samuele Pedroni)
Date: Tue, 13 Aug 2002 19:41:43 +0200
Subject: [Python-Dev] Re: PEP 269 will live again.
Message-ID: <015401c242f0$b08cd8c0$6d94fea9@newmexico>

[GvR]
>Note that Jeremy has a new Python compiler package (Lib/python/ in the
>Python 2.3 CVS tree), which currently uses parse trees as produced by
>the old 'parser' module as input, and then restructures them into more
>abstract syntax trees.  This compiler is easily retargetable to other
>input and output structures though -- I believe Finn Bock already has
>a Jython version of it.  I don't know what it generates, I doubt it
>generates CPython bytecode, maybe it generates Java source or JVM
>assembler; I believe it takes the same parse tree that Jython uses as
>input.

yup, it is more than a prototype,
the compilers in the current Jython CVS are based on that.

>I think it would be useful if you use the same form of abstract syntax
>trees as Jeremy's parser uses (not the parser module output, but the
>restructured abstract syntax trees); I think they are quite flexible
>and useful.

Yup and the point of the exercise is to make possible for
the future versions of PyChecker etc to work with Jython too.

>If you don't want to do this, you'll have to motivate why your
>alternative is better, and also show how Jeremy's compiler package can
>be easily adapted to use your form of parse trees.

yes ideally it should output a superset of that, or something with
small changes that can be easely backported to the above effort in
Jython, otherwise is a kind of step backward:

when first proposed the PEP would have been a furher blessing
for the awful parser module output format,

now the efforts of Jeremy and Finn have moved a bit both
Python and Jython away from that and on a parallel track.

regards.





From ark@research.att.com  Tue Aug 13 19:02:27 2002
From: ark@research.att.com (Andrew Koenig)
Date: Tue, 13 Aug 2002 14:02:27 -0400 (EDT)
Subject: [Python-Dev] type categories
Message-ID: <200208131802.g7DI2Ro27807@europa.research.att.com>

While I was driving to work today, I had a thought about the
iterator/iterable discussion of a few weeks ago.  My impression is
that that discussion was inconclusive, but a few general principles
emerged from it:

	1) Some types are iterators -- that is, they support calls
	   to next() and raise StopIteration when they have no more
	   information to give.

	2) Some types are iterables -- that is, they support calls
	   to __iter__() that yield an iterator as the result.

	3) Every iterator is also an iterable, because iterators are
	   required to implement __iter__() as well as next().

	4) The way to determine whether an object is an iterator
	   is to call its next() method and see what happens.

	5) The way to determine whether an object is an iterable
	   is to call its __iter__() method and see what happens.

I'm uneasy about (4) because if an object is an iterator, calling its
next() method is destructive.  The implication is that you had better
not use this method to test if an object is an iterator until you are
ready to take irrevocable action based on that test.  On the other
hand, calling __iter__() is safe, which means that you can test
nondestructively whether an object is an iterable, which includes
all iterators.

Here is what I realized this morning.  It may be obvious to you,
but it wasn't to me (until after I realized it, of course):

     ``iterator'' and ``iterable'' are just two of many type
     categories that exist in Python.

Some other categories:

     callable
     sequence
     generator
     class
     instance
     type
     number
     integer
     floating-point number
     complex number
     mutable
     tuple
     mapping
     method
     built-in

As far as I know, there is no uniform method of determining into which
category or categories a particular object falls.  Of course, there
are non-uniform ways of doing so, but in general, those ways are, um,
nonuniform.  Therefore, if you want to check whether an object is in
one of these categories, you haven't necessarily learned much about
how to check if it is in a different one of these categories.

So what I wonder is this:  Has there been much thought about making
these type categories more explicitly part of the type system?


From martin@v.loewis.de  Tue Aug 13 20:43:20 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 21:43:20 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <200208131423.g7DENSg02154@odiug.zope.com>
References: <200208121148.g7CBm6KD027268@paros.informatik.hu-berlin.de>
 <04ff01c242cb$0c0bf620$3da48490@neil>
 <200208131423.g7DENSg02154@odiug.zope.com>
Message-ID: 

Guido van Rossum  writes:

> Those are all details.  We can finesse that when we get closer to
> agreeing on the semantics.  I think code that we know will never be
> executed on Unix should be inside #ifdefs.  Maybe we should reconsider
> moving the Windows code to a separate file...

Indeed, I had PEP 277 in mind when I created the ntmodule patch :-)

Regards,
Martin


From jason@tishler.net  Tue Aug 13 20:46:29 2002
From: jason@tishler.net (Jason Tishler)
Date: Tue, 13 Aug 2002 15:46:29 -0400
Subject: [Python-Dev] Bugs #544740: test_commands test fails under Cygwin
In-Reply-To: <200208131308.g7DD8rS29896@pcp02138704pcs.reston01.va.comcast.net>
References: <20020813120851.GC2548@tishler.net>
 <200208131308.g7DD8rS29896@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020813194628.GA2720@tishler.net>

On Tue, Aug 13, 2002 at 09:08:53AM -0400, Guido van Rossum wrote:
> > Anyway, suggestions on how to fix this will be greatly appreciated.
> 
> The obvious fix would be a better regular expression.

I thought of that before I submitted my bug report.  Unfortunately,
deriving the better regular expression is not obvious.

The following is what I have come up with:

pat = r'''d.........             # directory
          \s+\d+                 # number of links
          (\s+\w+)+              # user and group which can contain spaces
          \s+\d+                 # size
***>      \s+\w+\s+\d+\s+[\d:]+  # date <***
          \s+/\.                 # file name
        '''

Unfortunately, I had to make an assumption on the date format in order
to match the user and group names regardless of the number of embedded
spaces.  Is my date regular expression acceptable?  Will it work in non
US locales?

> Please submit a patch.

I will do so, once I get some feedback.

Thanks,
Jason


From mclay@nist.gov  Tue Aug 13 20:45:29 2002
From: mclay@nist.gov (Michael McLay)
Date: Tue, 13 Aug 2002 15:45:29 -0400
Subject: [Python-Dev] type categories
In-Reply-To: <200208131802.g7DI2Ro27807@europa.research.att.com>
References: <200208131802.g7DI2Ro27807@europa.research.att.com>
Message-ID: <200208131545.29856.mclay@nist.gov>

On Tuesday 13 August 2002 02:02 pm, Andrew Koenig wrote:
[...]
> I'm uneasy about (4) because if an object is an iterator, calling its
> next() method is destructive.  The implication is that you had better
> not use this method to test if an object is an iterator until you are
> ready to take irrevocable action based on that test.  

The test would be non-destructive if the test only checks for the existence of 
the next() method. 
	
	 hasattr(f,"next")

> Here is what I realized this morning.  It may be obvious to you,
> but it wasn't to me (until after I realized it, of course):
>
>      ``iterator'' and ``iterable'' are just two of many type
>      categories that exist in Python.
>
> Some other categories:
>
>      callable
>      sequence
>      generator
>      class
>      instance
>      type
>      number
>      integer
>      floating-point number
>      complex number
>      mutable
>      tuple
>      mapping
>      method
>      built-in
>
> As far as I know, there is no uniform method of determining into which
> category or categories a particular object falls.  Of course, there
> are non-uniform ways of doing so, but in general, those ways are, um,
> nonuniform.  Therefore, if you want to check whether an object is in
> one of these categories, you haven't necessarily learned much about
> how to check if it is in a different one of these categories.
>
> So what I wonder is this:  Has there been much thought about making
> these type categories more explicitly part of the type system?

The category names look like general purpose interface names. The addition of 
interfaces has been discussed quite a bit. While many people are interested 
in having interfaces added to Python, there are many design issues that will 
have to be resolved before it happens. Hopefully the removal of the 
class/type wart and the use of interfaces in Zope will hasten the addition of 
interfaces. 

I like your list of the basic Python interfaces. Perhaps a weak version of 
interface definitions could be added to Python prior to a full featured 
capability. The weak version would simply add a __category__ attribute to the 
each type definition. This attribute would reference an object that defines 
the distinguishing features of the category interface.  Enforcement would be 
optional, but at least the definition would be published.  Adding just the 
definition of the type interface would create a direct benefit, but it would 
provide a hook for developers to use in work on optimization and testing.



From martin@v.loewis.de  Tue Aug 13 20:49:33 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 21:49:33 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <200208131504.g7DF4We04131@odiug.zope.com>
References: <5BF4D214-AEB0-11D6-9BA3-0030655234CE@cwi.nl>
 <200208131301.g7DD1PC29805@pcp02138704pcs.reston01.va.comcast.net>
 
 <200208131504.g7DF4We04131@odiug.zope.com>
Message-ID: 

Guido van Rossum  writes:

> But if you pass the normalized string (or the Latin-1 string) to
> open(), will it find the file?  I.e. if the filesystem has the
> unnormalized name stored in its directory, will filesystem requests
> normalize filenames before comparing them?
> 
> Jack, can you try to do that?  Can you try open('fr\xf6r') in that
> directory?

If my understanding of OS X is correct, then this won't work: OS X
demands UTF-8 for all file names.

The interesting question is whether u"fr\xf6r".encode("utf-8") allows
one to open the file. If that won't work, it could be considered a bug
in OS X, and I trust Apple that they can get such things right (if
they had considered them).

BTW, the same question holds on Windows: If you create a file on NTFS
with \xf6 in it, can you open it by passing \x6f\u0308? I can't try at
the moment...

Regards,
Martin


From brian@sweetapp.com  Tue Aug 13 21:02:21 2002
From: brian@sweetapp.com (Brian Quinlan)
Date: Tue, 13 Aug 2002 13:02:21 -0700
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: 
Message-ID: <002b01c24304$56108a90$bd5d4540@Dell2>

> If my understanding of OS X is correct, then this won't work: OS X
> demands UTF-8 for all file names.

That is correct, at least at the BSD API level.
 
> The interesting question is whether u"fr\xf6r".encode("utf-8") allows
> one to open the file. If that won't work, it could be considered a bug
> in OS X, and I trust Apple that they can get such things right (if
> they had considered them).
 
It will work.

Cheers,
Brian



From martin@v.loewis.de  Tue Aug 13 21:02:07 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 22:02:07 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <200208131637.g7DGbLA08429@odiug.zope.com>
References: <001e01c242e5$49697ff0$bd5d4540@Dell2>
 <200208131637.g7DGbLA08429@odiug.zope.com>
Message-ID: 

Guido van Rossum  writes:

> > It could be that Apple is decomposing the filenames before comparing
> > them. Either way works.
> 
> Hm, that sucks (either way) -- because you get unnormalized Unicode
> out of directory listings, which is harder to turn into local
> encodings.

Notice that, most likely, Apple *does* normalize them - they just use
Normal Form D (which favours decomposition, instead of using
precomposed characters) - this is what Apple apparently calls
"canonical".

That choice is not surprising - NFD is "more logical", as precomposed
characters are available only arbitrarily (e.g. the WITH TILDE
combinations exist for a, i, e, n, o, u, v, y, but not for, say, x).

The Unicode FAQ
(http://www.unicode.org/unicode/faq/normalization.html) says

Q: Which forms of normalization should I support?

A: The choice of which to use depends on the particular program or
system.  The most commonly supported form is NFC, since it is more
compatible with strings converted from legacy encodings. This is also
the choice for the web, as per the recommendations in "Character Model
for the World Wide Web" from the W3C. The other normalization forms
are useful for other domains.

So I guess Python should atleast provide NFC - precisely because of
the legacy encodings.

Regards,
Martin


From martin@v.loewis.de  Tue Aug 13 21:27:29 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 13 Aug 2002 22:27:29 +0200
Subject: [Python-Dev] type categories
In-Reply-To: <200208131802.g7DI2Ro27807@europa.research.att.com>
References: <200208131802.g7DI2Ro27807@europa.research.att.com>
Message-ID: 

Andrew Koenig  writes:

> So what I wonder is this:  Has there been much thought about making
> these type categories more explicitly part of the type system?

Certainly. Such a feature has been called "interface" or "protocol"; I
usually associate with "interface" a static property (a type
implements an interface, by means of a declaration) and with
"protocol" a dynamic property (an object conforms to a protocol, by
acting according to the rules that the protocol set).

Your question exist in many variations. One of it lead to the creation
of the types-sig, another one triggered papers titled "Optional Static
Typing", see

http://www.python.org/~guido/static-typing/

The most recent version of an attempt to making interfaces part of
Python is PEP 245,

http://python.org/peps/pep-0245.html

I believe there is agreement by now that there will be difference
between declared interfaces and implemented protocols: an object may
follow the protocol even if it did not declare the interface, and an
object may violate a protocol even if its type did declare the
interface.

Beyond that, there is little agreement.

Regards,
Martin


From guido@python.org  Tue Aug 13 21:29:09 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 16:29:09 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Tue, 13 Aug 2002 13:46:52 +0200."
 <5BD15FEC-AEB2-11D6-9BA3-0030655234CE@oratrix.com>
References: <5BD15FEC-AEB2-11D6-9BA3-0030655234CE@oratrix.com>
Message-ID: <200208132029.g7DKT9M23788@odiug.zope.com>

> >> If we switch to "k" for integers in the range -2**-31..2**31-1 that
> >> would not be too much work, as a lot of the code is generated (I would
> >> take the quick and dirty approach of using k for all my integers). Only
> >> the hand-written code would have to be massaged by hand.
> >
> > Glad, that's my preferred choice too.  But note that in Python 2.4 and
> > beyond, 'k' will only accept positive inputs, so you'll really have to
> > find a way to mark your signed integer arguments up differently.
> 
> Huh??! Now you've confused me. If "k" means "32 bit mask", why would it 
> be changed in 2.4 not to accept negative values? "-1" is a perfectly 
> normal way to specify "0xffffffff" in C usage...

Hm, in Python I'd hope that people would write 0xffffffff if they want
32 one bits -- -1L has an infinite number of one bits, and on 64-bit
systems, -1 has 64 one-bits instead of 32.  Most masks are formed by
taking a small positive constant (e.g. 1 or 0xff) and shifting it
left.  In Python 2.4 that will always return a positive value.

But if you really don't like this, we could do something different --
'k' could simply give you the lower 32 bits of the value.  (Or the
lower sizeof(long)*8 bits???).

> > In 2.3 (and 2.2.2), I propose the following semantics for 'k': if the
> > argument is a Python int, a signed value within the range
> > [INT_MIN,INT_MAX] is required; if it is a Python long, a nonnegative
> > value in the range [0, 2*INT_MAX+1] is required.  These are the same
> > semantics that are currently used by struct.pack() for 'L', I found
> > out; I like these.
> 
> I don't see the point, really. Why not allow [INT_MIN, 2*INT_MAX+1]? If 
> the "k" specifier is especially meant for bit patterns why not have 
> semantics of "anything goes, unless we are absolutely sure it isn't 
> going to fit"?

In the end, I see two possibilities: lenient, taking the lower N bits,
or strict, requiring [0 .. 2**32-1].  The proposal I made above was an
intermediate move on the way to the strict approach (given the reality
that in 2.3, 1<<31 is negative).

> > We'll have to niggle about the C type corresponding to 'k'.  Should it
> > be 'int' or 'long'?  It may not matter for you, since you expect to be
> > running on 32-bit hardware forever; but it matters for other potential
> > users of 'k'.  We could also have both 'k' and 'K', where 'k' stores
> > into a C int and 'K' into a C long.
> 
> How about k1 for a byte, k2 for a short, k4 for a long and k8 for a long 
> long?

Hm, the format characters typically correspond to a specific C type.
We already have 'b' for unsigned char and 'B' for signed/unsigned
char, 'h' for unsigned short and 'H' for signed/unsigned short.  These
are unfortunately inconsistent with 'i' for signed int and 'l' for
signed long.

So I'd rather you pick a C type for 'k' (and a policy about range
checks).

> > I also propose to have a C API PyInt_AsUnsignedLong, which will
> > implement the semantics of 'K'.  Like 'i', 'k' will have to do an
> > explicit range test.
> 
> In my proposal these would then probably become PyInt_As1Byte, 
> PyInt_As2Bytes, PyInt_As4Bytes and PyInt_As8Bytes.

And what would their return types be?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Tue Aug 13 22:02:01 2002
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 13 Aug 2002 23:02:01 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
References: <001e01c242e5$49697ff0$bd5d4540@Dell2>	<200208131637.g7DGbLA08429@odiug.zope.com> 
Message-ID: <3D5973C9.5070309@lemburg.com>

Martin v. Loewis wrote:
> Guido van Rossum  writes:
> 
> 
>>>It could be that Apple is decomposing the filenames before comparing
>>>them. Either way works.
>>
>>Hm, that sucks (either way) -- because you get unnormalized Unicode
>>out of directory listings, which is harder to turn into local
>>encodings.
> 
> 
> Notice that, most likely, Apple *does* normalize them - they just use
> Normal Form D (which favours decomposition, instead of using
> precomposed characters) - this is what Apple apparently calls
> "canonical".

Both the decomposition and the composition are called "canonical" --
simply because both operations lead to predefined results (those
defined by the Unicode database).

http://www.unicode.org/unicode/reports/tr15/

has all the details.

As always with Unicode, things are slightly more complicated than
what people are normally used to (but for good reasons). The introduction
of that tech report describes these things in details. Canonical
equivalence basically means that the graphemes for the Unicode
code points when rendered look the same to the user -- even though
the code point combinations may be different.

Normalization takes care of mapping this visual equivalence to
an algorithm.

Now, if the OS uses canonical equivalence to find file names,
then all possible combinations of code points resulting in the
same sequence of graphemes will give you a match; for a good
reason: because the user of a GUI file manager wouldn't be
able to distinguish between two canonically equivalent file
names.

> That choice is not surprising - NFD is "more logical", as precomposed
> characters are available only arbitrarily (e.g. the WITH TILDE
> combinations exist for a, i, e, n, o, u, v, y, but not for, say, x).

... but in a well-defined manner and that's what's important.

> The Unicode FAQ
> (http://www.unicode.org/unicode/faq/normalization.html) says
> 
> Q: Which forms of normalization should I support?
> 
> A: The choice of which to use depends on the particular program or
> system.  The most commonly supported form is NFC, since it is more
> compatible with strings converted from legacy encodings. This is also
> the choice for the web, as per the recommendations in "Character Model
> for the World Wide Web" from the W3C. The other normalization forms
> are useful for other domains.
> 
> So I guess Python should atleast provide NFC - precisely because of
> the legacy encodings.

At least is good :-) NFC is NFD + canonical composition. Decomposition
isn't all that hard (using unicodedata.decomposition()). For
composition the situation is different: not all information is
available in the unicodedata database (the exclusion list) and
the database also doesn't provide the reverse mapping from
decomposed code points to composed one. See the Annexes to the
tech report to get an impression of just how hard combining is...

Still, would be nice to have (written in C for speed, since
this would be a very common operation). Zope Corp. will certainly
be interested in this for Zope3 ;-)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/



From guido@python.org  Tue Aug 13 22:15:58 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 17:15:58 -0400
Subject: [Python-Dev] type categories
In-Reply-To: Your message of "Tue, 13 Aug 2002 14:02:27 EDT."
 <200208131802.g7DI2Ro27807@europa.research.att.com>
References: <200208131802.g7DI2Ro27807@europa.research.att.com>
Message-ID: <200208132115.g7DLFwL25088@odiug.zope.com>

> While I was driving to work today, I had a thought about the
> iterator/iterable discussion of a few weeks ago.  My impression is
> that that discussion was inconclusive, but a few general principles
> emerged from it:
> 
> 	1) Some types are iterators -- that is, they support calls
> 	   to next() and raise StopIteration when they have no more
> 	   information to give.
> 
> 	2) Some types are iterables -- that is, they support calls
> 	   to __iter__() that yield an iterator as the result.
> 
> 	3) Every iterator is also an iterable, because iterators are
> 	   required to implement __iter__() as well as next().
> 
> 	4) The way to determine whether an object is an iterator
> 	   is to call its next() method and see what happens.
> 
> 	5) The way to determine whether an object is an iterable
> 	   is to call its __iter__() method and see what happens.
> 
> I'm uneasy about (4) because if an object is an iterator, calling its
> next() method is destructive.  The implication is that you had better
> not use this method to test if an object is an iterator until you are
> ready to take irrevocable action based on that test.  On the other
> hand, calling __iter__() is safe, which means that you can test
> nondestructively whether an object is an iterable, which includes
> all iterators.

Alex Martelli introduced the "Look Before You Leap" (LBYL) syndrome
for your uneasiness with (4) (and (5), I might add -- I don't know
that __iter__ is always safe).  He contrasts it with a different
attitude, which might be summarized as "It's easier to ask forgiveness
than permission."  In many cases, there is no reason for LBYL
syndrome, and it can actually cause subtle bugs.  For example, a LBYL
programmer could write

  if not os.path.exists(fn):
    print "File doesn't exist:", fn
    return
  fp = open(fn)
  ...use fp...

A "forgiveness" programmer would write this as follows instead:

  try:
    fp = open(fn)
  except IOError, msg:
    print "Can't open", fn, ":", msg
    return
  ...use fp...

The latter is safer; there are many reasons why the open() call in the
first version could fail despite the exists() test succeeding,
including insufficient permissions, lack of operating resources, a
hard file lock, or another process that removed the file in the mean
time.

While it's not an absolute rule, I tend to dislike interface/protocol
checking as an example of LBYL syndrome.  I prefer to write this:

  def f(x):
    print x[0]

rather than this:

  def f(x):
    if not hasattr(x, "__getitem__"):
      raise TypeError, "%r doesn't support __getitem__" % x
    print x[0]

Admittedly this is an extreme example that looks rather silly, but
similar type checks are common in Python code written by people coming
from languages with stronger typing (and a bit of paranoia).

The exception is when you need to do something different based on the
type of an object and you can't add a method for what you want to do.
But that is relatively rare.

> Here is what I realized this morning.  It may be obvious to you,
> but it wasn't to me (until after I realized it, of course):
> 
>      ``iterator'' and ``iterable'' are just two of many type
>      categories that exist in Python.
> 
> Some other categories:
> 
>      callable
>      sequence
>      generator
>      class
>      instance
>      type
>      number
>      integer
>      floating-point number
>      complex number
>      mutable
>      tuple
>      mapping
>      method
>      built-in

You missed the two that are most commonly needed in practice: string
and file. :-)  I believe that the notion of an informal or "lore" (as
Jim Fulton likes to call it) protocol first became apparent when we
started to use the idea of a "file-like object" as a valid value for
sys.stdout.

> As far as I know, there is no uniform method of determining into which
> category or categories a particular object falls.  Of course, there
> are non-uniform ways of doing so, but in general, those ways are, um,
> nonuniform.  Therefore, if you want to check whether an object is in
> one of these categories, you haven't necessarily learned much about
> how to check if it is in a different one of these categories.
> 
> So what I wonder is this:  Has there been much thought about making
> these type categories more explicitly part of the type system?

I think this has been answered by other respondents.

Interestingly enough, Jim Fulton asked me to critique the Interface
package as it exists in Zope 3, from the perspective of adding
(something like) it to Python 2.3.

This is a descendant of the "scarecrow" proposal,
http://www.foretec.com/python/workshops/1998-11/dd-fulton.html (see
also http://www.zope.org/Members/jim/PythonInterfaces/Summary).

The Zope3 implementation can be viewed here:
http://cvs.zope.org/Zope3/lib/python/Interface/

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Jack.Jansen@oratrix.com  Tue Aug 13 22:14:30 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Tue, 13 Aug 2002 23:14:30 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <200208131301.g7DD1PC29805@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

On dinsdag, augustus 13, 2002, at 03:01 , Guido van Rossum wrote:
>
> Looks like it isn't you: the filename somehow contains a character
> that's not in the Latin-1 subset of Unicode, and no encoding can fix
> that for you.  I don't know why -- you'll have to figure out why your
> keyboard generates that character when you type o-umlaut.

No, it's the way the filesystem stores filenames, apparently.=20
Or, at least, it's the way the filesystem API's expose those=20
filenames. Here's a session again (this time I'm using the=20
terminal in utf-8 mode):

 >>> x =3D "fr\xc3\xb6r"
 >>> os.listdir(".")
['.DS_Store']
 >>> open(x, "w")

 >>> os.listdir(".")
['.DS_Store', 'fro\xcc\x88r']
 >>> os.path.exists('fro\xcc\x88r')
True
 >>> os.path.exists("fr\xc3\xb6r")
True

If I create a file with an o-umlaut it gets decomposed into an o=20
and a combining umlaut.

[Jack goes off and wrestles his way through a gazillion websites=20
with Unicode information]

If I understand the unicode standard (according to unicode.org)=20
correctly this means that MacOS stores filenames in NFD=20
normalized form, with all combining characters split out, and=20
this is the preferred normalized form. Am I correct here?

But, even if NFC is the preferred normalized form (the documents=20
I saw hinted that this may have been the case in previous=20
Unicode standards:-): both NFC and NFD renditions of this string=20
are legal unicode, aren't they? And if they are then both should=20
be converted to the same latin-1 string, shouldn't they?

Do I misunderstand something, or this this a bug (limitation?)=20
in the unicode->latin-1 decoder?
--
- Jack Jansen               =20
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution --=20
Emma Goldman -



From ark@research.att.com  Tue Aug 13 22:27:19 2002
From: ark@research.att.com (Andrew Koenig)
Date: Tue, 13 Aug 2002 17:27:19 -0400 (EDT)
Subject: [Python-Dev] type categories
In-Reply-To: <200208132115.g7DLFwL25088@odiug.zope.com> (message from Guido
 van Rossum on Tue, 13 Aug 2002 17:15:58 -0400)
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208132115.g7DLFwL25088@odiug.zope.com>
Message-ID: <200208132127.g7DLRJO29696@europa.research.att.com>

Guido> Alex Martelli introduced the "Look Before You Leap" (LBYL) syndrome
Guido> for your uneasiness with (4) (and (5), I might add -- I don't know
Guido> that __iter__ is always safe).  He contrasts it with a different
Guido> attitude, which might be summarized as "It's easier to ask forgiveness
Guido> than permission."  In many cases, there is no reason for LBYL
Guido> syndrome, and it can actually cause subtle bugs.  For example, a LBYL
Guido> programmer could write

Guido>   if not os.path.exists(fn):
Guido>     print "File doesn't exist:", fn
Guido>     return
Guido>   fp = open(fn)
Guido>   ...use fp...

Guido> A "forgiveness" programmer would write this as follows instead:

Guido>   try:
Guido>     fp = open(fn)
Guido>   except IOError, msg:
Guido>     print "Can't open", fn, ":", msg
Guido>     return
Guido>   ...use fp...

Guido> The latter is safer; there are many reasons why the open() call in the
Guido> first version could fail despite the exists() test succeeding,
Guido> including insufficient permissions, lack of operating resources, a
Guido> hard file lock, or another process that removed the file in the mean
Guido> time.

Guido> While it's not an absolute rule, I tend to dislike interface/protocol
Guido> checking as an example of LBYL syndrome.  I prefer to write this:

Guido>   def f(x):
Guido>     print x[0]

Guido> rather than this:

Guido>   def f(x):
Guido>     if not hasattr(x, "__getitem__"):
Guido>       raise TypeError, "%r doesn't support __getitem__" % x
Guido>     print x[0]


I completely agree with you so far.  If you have an object that you
know that you intend to use in only a single way, it is usually right
to just go ahead and use it that way rather than asking first.

Guido> Admittedly this is an extreme example that looks rather silly,
Guido> but similar type checks are common in Python code written by
Guido> people coming from languages with stronger typing (and a bit of
Guido> paranoia).

Guido> The exception is when you need to do something different based
Guido> on the type of an object and you can't add a method for what
Guido> you want to do.  But that is relatively rare.

Perhaps the reason it's rare is that it's difficult to do.

One of the cases I was thinking of was the built-in * operator,
which does something completely diferent if one of its operands
is an integer.  Another one was the buffering iterator we were
discussing earlier, which ideally would omit buffering entirely
if asked to buffer a type that already supports multiple iteration.

>> Some other categories:

>> callable
>> sequence
>> generator
>> class
>> instance
>> type
>> number
>> integer
>> floating-point number
>> complex number
>> mutable
>> tuple
>> mapping
>> method
>> built-in

Guido> You missed the two that are most commonly needed in practice:
Guido> string and file. :-)

Actually, I thought of them but omitted them to avoid confusion between
a type and a category with a single element.

Guido> I believe that the notion of an informal or "lore" (as Jim
Guido> Fulton likes to call it) protocol first became apparent when we
Guido> started to use the idea of a "file-like object" as a valid
Guido> value for sys.stdout.

OK.  So what I'm asking about is a way of making notions such as
"file-like object" more formal and/or automatic.

Of course, one reason for my interest is my experience with a
language that supports compile-time overloading -- what I'm really
seeing on the horizon is some kind of notion of overloading in
Python, perhaps along the lines of ML's clausal function definitions
(which I think are truly elegant).

Guido> Interestingly enough, Jim Fulton asked me to critique the Interface
Guido> package as it exists in Zope 3, from the perspective of adding
Guido> (something like) it to Python 2.3.

Guido> This is a descendant of the "scarecrow" proposal,
Guido> http://www.foretec.com/python/workshops/1998-11/dd-fulton.html (see
Guido> also http://www.zope.org/Members/jim/PythonInterfaces/Summary).

Guido> The Zope3 implementation can be viewed here:
Guido> http://cvs.zope.org/Zope3/lib/python/Interface/

I'll have a look; thanks!




From Jack.Jansen@oratrix.com  Tue Aug 13 22:34:43 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Tue, 13 Aug 2002 23:34:43 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: 
Message-ID: <7B04D38E-AF04-11D6-9AFE-003065517236@oratrix.com>

On dinsdag, augustus 13, 2002, at 11:14 , Jack Jansen wrote:
> If I create a file with an o-umlaut it gets decomposed into an 
> o and a combining umlaut.

After a few more experiments I did manage to confuse the 
filesystem APIs: it turns out ligatures are not correctly 
decomposed. I.e. if you create a file "\uFB03" you cannot open 
it as "ffi".

--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -



From Jack.Jansen@oratrix.com  Tue Aug 13 22:51:29 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Tue, 13 Aug 2002 23:51:29 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <200208132029.g7DKT9M23788@odiug.zope.com>
Message-ID: 

On dinsdag, augustus 13, 2002, at 10:29 , Guido van Rossum wrote:
>> Huh??! Now you've confused me. If "k" means "32 bit mask", why 
>> would it
>> be changed in 2.4 not to accept negative values? "-1" is a perfectly
>> normal way to specify "0xffffffff" in C usage...
>
> Hm, in Python I'd hope that people would write 0xffffffff if they want
> 32 one bits -- -1L has an infinite number of one bits, and on 64-bit
> systems, -1 has 64 one-bits instead of 32.  Most masks are formed by
> taking a small positive constant (e.g. 1 or 0xff) and shifting it
> left.  In Python 2.4 that will always return a positive value.

That is all fine if you're on an island. But if you transcribe 
existing C code to Python, or use examples or manuals written 
for C, then I would think there's no reason not to be lenient.

But (you hear Jack's reasoning collapsing in the distance) I 
haven't checked that Apple still uses -1 to mean "all ones" in 
their sample code. They used to do that a lot, but they may have 
stopped that. I don't know.

> In the end, I see two possibilities: lenient, taking the lower N bits,
> or strict, requiring [0 .. 2**32-1].  The proposal I made above was an
> intermediate move on the way to the strict approach (given the reality
> that in 2.3, 1<<31 is negative).

I would say strict on positive values, lenient on negative ones. 
Not too lenient, of course: -0x100000000L should not be a passed 
as 0 but give an exception.

This would correspond to everyday use in languages such as C.

>>> We'll have to niggle about the C type corresponding to 'k'.  
>>> Should it
>>> be 'int' or 'long'?  It may not matter for you, since you 
>>> expect to be
>>> running on 32-bit hardware forever; but it matters for other 
>>> potential
>>> users of 'k'.  We could also have both 'k' and 'K', where 'k' stores
>>> into a C int and 'K' into a C long.
>>
>> How about k1 for a byte, k2 for a short, k4 for a long and k8 
>> for a long
>> long?
>
> Hm, the format characters typically correspond to a specific C type.
> We already have 'b' for unsigned char and 'B' for signed/unsigned
> char, 'h' for unsigned short and 'H' for signed/unsigned short.  These
> are unfortunately inconsistent with 'i' for signed int and 'l' for
> signed long.
>
> So I'd rather you pick a C type for 'k' (and a policy about range
> checks).

ok. How about uint32_t? And, while we're at it, add Q for uint64_t?

>> In my proposal these would then probably become PyInt_As1Byte,
>> PyInt_As2Bytes, PyInt_As4Bytes and PyInt_As8Bytes.
>
> And what would their return types be?

uint8_t, uint16_t, uint32_t and uint64_t.
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -



From paul-python@svensson.org  Tue Aug 13 22:49:16 2002
From: paul-python@svensson.org (Paul Svensson)
Date: Tue, 13 Aug 2002 17:49:16 -0400 (EDT)
Subject: [Python-Dev] Deprecation warning on integer shifts and suchA
In-Reply-To: <200208132029.g7DKT9M23788@odiug.zope.com>
Message-ID: 

On Tue, 13 Aug 2002, Guido van Rossum wrote:

>> >> If we switch to "k" for integers in the range -2**-31..2**31-1 that
>> >> would not be too much work, as a lot of the code is generated (I would
>> >> take the quick and dirty approach of using k for all my integers). Only
>> >> the hand-written code would have to be massaged by hand.
>> >
>> > Glad, that's my preferred choice too.  But note that in Python 2.4 and
>> > beyond, 'k' will only accept positive inputs, so you'll really have to
>> > find a way to mark your signed integer arguments up differently.
>>
>> Huh??! Now you've confused me. If "k" means "32 bit mask", why would it
>> be changed in 2.4 not to accept negative values? "-1" is a perfectly
>> normal way to specify "0xffffffff" in C usage...
>
>Hm, in Python I'd hope that people would write 0xffffffff if they want
>32 one bits -- -1L has an infinite number of one bits, and on 64-bit
>systems, -1 has 64 one-bits instead of 32.  Most masks are formed by
>taking a small positive constant (e.g. 1 or 0xff) and shifting it
>left.  In Python 2.4 that will always return a positive value.
>
>But if you really don't like this, we could do something different --
>'k' could simply give you the lower 32 bits of the value.  (Or the
>lower sizeof(long)*8 bits???).

For a mask, it makes some kind of sense to require all the high bits,
those not ports of the mask, to be all the same; it makes it less likely
that something important gets lost when they're trimmed off.
But, don't all ones make just as much sense as all zeros ?

Even with unified numbers, -1 (or ~0) seems to be a reasonable
way to spell a bitmask with all bits set, without having to know
how many "all" are.

	/Paul



From David Abrahams"   <200208132115.g7DLFwL25088@odiug.zope.com>
Message-ID: <0c0501c24311$8cebbdc0$6501a8c0@boostconsulting.com>

From: "Guido van Rossum" 

> Alex Martelli introduced the "Look Before You Leap" (LBYL) syndrome
> for your uneasiness with (4) (and (5), I might add -- I don't know
> that __iter__ is always safe).  He contrasts it with a different
> attitude, which might be summarized as "It's easier to ask forgiveness
> than permission."  In many cases, there is no reason for LBYL
> syndrome, and it can actually cause subtle bugs.

> While it's not an absolute rule, I tend to dislike interface/protocol
> checking as an example of LBYL syndrome.



> The exception is when you need to do something different based on the
> type of an object and you can't add a method for what you want to do.
> But that is relatively rare.

The main reason I want to be able to LBYL (and, AFAICT, it's the same as
Alex's reason) is to support multiple dispatch. In other words, it wouldn't
be user code doing the looking. The best reason to support protocol
introspection is so that we can provide users with a way to write
more-elegant code, instead of messing around with manual type inspection.
What's your position on multiple dispatch?

-Dave





From skip@pobox.com  Tue Aug 13 23:30:00 2002
From: skip@pobox.com (Skip Montanaro)
Date: Tue, 13 Aug 2002 17:30:00 -0500
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <3D5973C9.5070309@lemburg.com>
References: <001e01c242e5$49697ff0$bd5d4540@Dell2>
 <200208131637.g7DGbLA08429@odiug.zope.com>
 
 <3D5973C9.5070309@lemburg.com>
Message-ID: <15705.34920.804857.914875@localhost.localdomain>

    mal> As always with Unicode, things are slightly more complicated than
    mal> what people are normally used to ...

What's the current behavior?  If my program receives an input in utf-8
(let's say it comes from a form on a website), what form will it be in, or
can't I tell?  Is it possible I will get spurious inequalities today if I
compare two different unicode objects which were created from different
sources and in different normal forms?  What about a string and a unicode
object?  Where can I read all about it (Python and unicode normalization)?

Skip



From guido@python.org  Wed Aug 14 03:39:09 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 22:39:09 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Tue, 13 Aug 2002 23:51:29 +0200."
 
References: 
Message-ID: <200208140239.g7E2d9n30799@pcp02138704pcs.reston01.va.comcast.net>

> > In the end, I see two possibilities: lenient, taking the lower N bits,
> > or strict, requiring [0 .. 2**32-1].  The proposal I made above was an
> > intermediate move on the way to the strict approach (given the reality
> > that in 2.3, 1<<31 is negative).
> 
> I would say strict on positive values, lenient on negative ones. 
> Not too lenient, of course: -0x100000000L should not be a passed 
> as 0 but give an exception.

I;m not sure I like this.  On the one hand this is what the struct
modules does currently for 'L'.  On the other hand it seems not to
provide any more safety than simply taking the low N bits (using 2's
complement for negative values) and throwing the rest away.

> This would correspond to everyday use in languages such as C.

Actually, C is fairly careful: AFAIK on a 32-bit machine the type of
0xffffffff is unsigned long, so it's not strictly -1, and you'll have
to use a cast somewhere to be able to compare it to an int.

> > Hm, the format characters typically correspond to a specific C type.
> > We already have 'b' for unsigned char and 'B' for signed/unsigned
> > char, 'h' for unsigned short and 'H' for signed/unsigned short.  These
> > are unfortunately inconsistent with 'i' for signed int and 'l' for
> > signed long.
> >
> > So I'd rather you pick a C type for 'k' (and a policy about range
> > checks).
> 
> ok. How about uint32_t? And, while we're at it, add Q for uint64_t?
> 
> >> In my proposal these would then probably become PyInt_As1Byte,
> >> PyInt_As2Bytes, PyInt_As4Bytes and PyInt_As8Bytes.
> >
> > And what would their return types be?
> 
> uint8_t, uint16_t, uint32_t and uint64_t.

Hm.  This is a big deviation from tradition.  Those types aren't
currently used or defined.

How about the following counterproposal.  This also changes some of
the other format codes to be a little more regular.

Code    C type          	Range check

b	unsigned char		0..UCHAR_MAX
B	unsigned char		none **
h	unsigned short		0..USHRT_MAX
H	unsigned short		none **
i	int			INT_MIN..INT_MAX
I *	unsigned int		0..UINT_MAX
l	long			LONG_MIN..LONG_MAX
k *	unsigned long		none
L	long long		LLONG_MIN..LLONG_MAX
K *	unsigned long long	none

Notes:

* New format codes.

** Changed from previous "range-and-a-half" to "none"; the
   range-and-a-half checking wasn't particularly useful.

If you need a uint32 mask, you can use the 'k' format and cast the
unsigned long you got to uint32; this should do the right thing.

If you really prefer your proposal with specific sized types, perhaps
you can show some coding example that would be easier using specific
sizes rather than char/short/int/long/long long?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Aug 14 03:42:12 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 22:42:12 -0400
Subject: [Python-Dev] type categories
In-Reply-To: Your message of "Tue, 13 Aug 2002 17:36:55 EDT."
 <0c0501c24311$8cebbdc0$6501a8c0@boostconsulting.com>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208132115.g7DLFwL25088@odiug.zope.com>
 <0c0501c24311$8cebbdc0$6501a8c0@boostconsulting.com>
Message-ID: <200208140242.g7E2gCs30811@pcp02138704pcs.reston01.va.comcast.net>

> The main reason I want to be able to LBYL (and, AFAICT, it's the same as
> Alex's reason) is to support multiple dispatch.

But isn't your application one where the types are mapped from C++?
Then you should be able to dispatch on type() of the arguments.  Or am
I misunderstanding, and do you want to make multi-dispatch a standard
paradigm in Python?

> In other words, it wouldn't
> be user code doing the looking. The best reason to support protocol
> introspection is so that we can provide users with a way to write
> more-elegant code, instead of messing around with manual type inspection.
> What's your position on multiple dispatch?

That it's too inefficient in a language with run-time dispatch to even
think about it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Aug 14 04:16:55 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 23:16:55 -0400
Subject: [Python-Dev] type categories
In-Reply-To: Your message of "Tue, 13 Aug 2002 17:27:19 EDT."
 <200208132127.g7DLRJO29696@europa.research.att.com>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208132115.g7DLFwL25088@odiug.zope.com>
 <200208132127.g7DLRJO29696@europa.research.att.com>
Message-ID: <200208140316.g7E3GtT30902@pcp02138704pcs.reston01.va.comcast.net>

> Guido> The exception is when you need to do something different based
> Guido> on the type of an object and you can't add a method for what
> Guido> you want to do.  But that is relatively rare.
> 
> Perhaps the reason it's rare is that it's difficult to do.

Perhaps...  Is it the chicken or the egg?

> One of the cases I was thinking of was the built-in * operator,
> which does something completely diferent if one of its operands
> is an integer.

Really?  I suppose you're thinking of sequence repetition.  I consider
that one of my early mistakes (it didn't make it to my "regrets" list
but probably should have).  It would have been much simpler if
sequences simply supported multiplcation, and in fact repeated changes
to the implementation (and subtle edge cases of the semantics) are
slowly nudging into this direction.

> Another one was the buffering iterator we were
> discussing earlier, which ideally would omit buffering entirely
> if asked to buffer a type that already supports multiple iteration.

How do you do that in C++?  I guess you overload the function that
asks for the iterator, and call that function in a template.  I think
in Python we can ask the caller to provide a buffering iterator when a
function needs one.  Since we really have very little power at compile
time, we sometimes need to do a little more work at run time.  But the
resulting language appears to be easier to understand (for most people
anyway) despite the theoretical deficiency.

I'm not quite sure why that is, but I am slowly developing a theory,
based on a remark by Samuele Pedroni; at least I believe it was he who
remarked at some point "Python has only run time", ehich got me
thinking.  My theory, partially developed though it is, is that it is
much harder (again, for most people :-) to understand in your head
what happens at compile time than it is to understand what goes at run
time.  Or perhaps that understanding *both* is harder than
understanding only one.

But I believe that for most people acquiring a sufficient mental model
for what goes on at run time is simpler than the mental model for what
goes on at compile time.  Possibly this is because compilers really
*do* rely on very sophisticated algorithms (such as deciding which
overloading function is called based upon type information and
available conversions).  Run time on the other hand is dead simple
most of the time -- it has to be, since it has to be executed by a
machine that has a very limited time to make its decisions.

All this reminds me of a remark that I believe is due to John
Ousterhout at the VHLL conference in '94 in Santa Fe, where you & I
first met.  (Strangely it was Perl's Tom Christiansen who was in a
large part responsible for the eclectic program.)  You gave a talk
about ML, and I believe it was in response to your talk that John
remarked that ML was best suited for people with an IQ of over 150.
That rang true to me, since the only other person besides you that I
know who is a serious ML user definitely falls into that category.
And ML is definitely a language that does more than the average
language at compile time.

> >> Some other categories:
> 
> >> callable
> >> sequence
> >> generator
> >> class
> >> instance
> >> type
> >> number
> >> integer
> >> floating-point number
> >> complex number
> >> mutable
> >> tuple
> >> mapping
> >> method
> >> built-in
> 
> Guido> You missed the two that are most commonly needed in practice:
> Guido> string and file. :-)
> 
> Actually, I thought of them but omitted them to avoid confusion between
> a type and a category with a single element.

Can you explain?  Neither string (which has Unicode and 8-bit, plus a
few other objects that are sufficiently string-like to be
regex-searchable, like arrays) nor file (at least in the "lore
protocol" interpretation of file-like object) are categories with a
single element.

> Guido> I believe that the notion of an informal or "lore" (as Jim
> Guido> Fulton likes to call it) protocol first became apparent when we
> Guido> started to use the idea of a "file-like object" as a valid
> Guido> value for sys.stdout.
> 
> OK.  So what I'm asking about is a way of making notions such as
> "file-like object" more formal and/or automatic.

Yeah, that's the holy Grail of interfaces in Python.

> Of course, one reason for my interest is my experience with a
> language that supports compile-time overloading -- what I'm really
> seeing on the horizon is some kind of notion of overloading in
> Python, perhaps along the lines of ML's clausal function definitions
> (which I think are truly elegant).

Honestly, I hadn't read this far ahead when I brought up ML above. :-)

I really hope that the holy grail can be found at run time rather than
compile time.  Python's compile time doesn't have enough information
easily available, and to gather the necessary information is very
expensive (requiring whole-program analysis) and not 100% reliable
(due to Python's extreme dynamic side).

> Guido> Interestingly enough, Jim Fulton asked me to critique the Interface
> Guido> package as it exists in Zope 3, from the perspective of adding
> Guido> (something like) it to Python 2.3.
> 
> Guido> This is a descendant of the "scarecrow" proposal,
> Guido> http://www.foretec.com/python/workshops/1998-11/dd-fulton.html (see
> Guido> also http://www.zope.org/Members/jim/PythonInterfaces/Summary).
> 
> Guido> The Zope3 implementation can be viewed here:
> Guido> http://cvs.zope.org/Zope3/lib/python/Interface/
> 
> I'll have a look; thanks!

BTW A the original scarecrow proposal is at 
http://www.foretec.com/python/workshops/1998-11/dd-fulton-sum.html

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Aug 14 04:31:40 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 23:31:40 -0400
Subject: [Python-Dev] strange warnings from tempfile.mkstemped.__del__ on HP
Message-ID: <200208140331.g7E3Veh30980@pcp02138704pcs.reston01.va.comcast.net>

Lysator's snake-farm, which does regular builds of CVS Python
checkouts on a variety of uncommon platforms, has started reporting
two warnings that I don't understand.  (Never mind the gettext.py
warnings; they're shallow; someone should fix them.)

The problem is the two exceptions ignored in __del__ methods.  If I
look at the code of the new tempfile.py module and its
test_tempfile.py unittests, I see that there's a class mkstemped
defined in test_tempfile.py, which has a __del__ method that closes
the file descriptor.  The only way I can see this failing with an
AttributeError exception is if the instance never makes it through its
__init__ call.  But in that case I would have expect a failure
reported; the only instantiation of mkstemped() is inside a try/except
where the exceptclause calls self.failOnException() which causes the
unit tests to fail.  But the unittest doesn't report any failures?!

I don't see this happening on Linux, so it's hard to go beyond
speculation.

--Guido van Rossum (home page: http://www.python.org/~guido/)

------- Forwarded Message

Date:    Tue, 13 Aug 2002 23:04:56 -0400
From:    sfarmer@lysator.liu.se
To:      snake-farm-report@lists.lysator.liu.se
Subject: [farm-report] Build python-HP_UX-B.11.00-9000_829-taylor was successfu
	  l.

Build test succeeded. Any warnings are appended below.
- --
/mp/slaskdisk/tmp/sfarmer/python/dist/src/Lib/gettext.py:142: DeprecationWarnin
g: hex/oct constants > sys.maxint will return positive values in Python 2.4 and
 up
  LE_MAGIC = 0x950412de
/mp/slaskdisk/tmp/sfarmer/python/dist/src/Lib/gettext.py:143: DeprecationWarnin
g: hex/oct constants > sys.maxint will return positive values in Python 2.4 and
 up
  BE_MAGIC = 0xde120495
/mp/slaskdisk/tmp/sfarmer/python/dist/src/Lib/gettext.py:149: DeprecationWarnin
g: hex/oct constants > sys.maxint will return positive values in Python 2.4 and
 up
  MASK = 0xffffffff
Exception exceptions.AttributeError: "mkstemped instance has no attribute 'fd'"
 in > ignored
Exception exceptions.AttributeError: "mkstemped instance has no attribute 'fd'"
 in > ignored

Stop.

_______________________________________________
Snake-farm-report mailing list
Snake-farm-report@lists.lysator.liu.se
http://lists.lysator.liu.se/mailman/listinfo/snake-farm-report

------- End of Forwarded Message



From guido@python.org  Wed Aug 14 04:39:52 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 13 Aug 2002 23:39:52 -0400
Subject: [Python-Dev] strange warnings from tempfile.mkstemped.__del__ on HP
In-Reply-To: Your message of "Tue, 13 Aug 2002 23:31:40 EDT."
 <200208140331.g7E3Veh30980@pcp02138704pcs.reston01.va.comcast.net>
References: <200208140331.g7E3Veh30980@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200208140339.g7E3dqS31194@pcp02138704pcs.reston01.va.comcast.net>

> The problem is the two exceptions ignored in __del__ methods.  If I
> look at the code of the new tempfile.py module and its
> test_tempfile.py unittests, I see that there's a class mkstemped
> defined in test_tempfile.py, which has a __del__ method that closes
> the file descriptor.  The only way I can see this failing with an
> AttributeError exception is if the instance never makes it through its
> __init__ call.  But in that case I would have expect a failure
> reported; the only instantiation of mkstemped() is inside a try/except
> where the exceptclause calls self.failOnException() which causes the
> unit tests to fail.  But the unittest doesn't report any failures?!

Mmm, it seems the test script doesn't show the test output.  Maybe one
of the tests is failing, but "make test" doesn't fail as a result?  Or
only the first test run is failing?  "make test" ignores the result of
the first test run (the tests are run twice, once without .pyc files
in place, once with).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From dave@boost-consulting.com  Wed Aug 14 04:21:00 2002
From: dave@boost-consulting.com (David Abrahams)
Date: Tue, 13 Aug 2002 23:21:00 -0400
Subject: [Python-Dev] type categories
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208132115.g7DLFwL25088@odiug.zope.com>              <0c0501c24311$8cebbdc0$6501a8c0@boostconsulting.com>  <200208140242.g7E2gCs30811@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <0ccc01c24341$d839b130$6501a8c0@boostconsulting.com>

From: "Guido van Rossum" 


> > The main reason I want to be able to LBYL (and, AFAICT, it's the same
as
> > Alex's reason) is to support multiple dispatch.
>
> But isn't your application one where the types are mapped from C++?

Not all of them, not hardly! Boost.Python is about interoperability, not
just about wrapping C++. My users are writing functions that want to accept
any Python sequence as one argument (for some definition of "sequence").
They'd like to dispatch to different implementations of that function based
on whether that argument is a sequence or a scalar numeric type.

> Then you should be able to dispatch on type() of the arguments.  Or am
> I misunderstanding, and do you want to make multi-dispatch a standard
> paradigm in Python?

Absolutely.

> > In other words, it wouldn't
> > be user code doing the looking. The best reason to support protocol
> > introspection is so that we can provide users with a way to write
> > more-elegant code, instead of messing around with manual type
inspection.
> > What's your position on multiple dispatch?
>
> That it's too inefficient in a language with run-time dispatch to even
> think about it.

That's funny, my users are very happy with how fast it works in
Boost.Python. I don't see any reason it should have to be much less
efficient in pure Python for most cases... the important "type categories"
could be builtins. And as others have pointed out, it could even be used to
get certain optimzations.

-Dave

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com




From tim.one@comcast.net  Wed Aug 14 05:00:04 2002
From: tim.one@comcast.net (Tim Peters)
Date: Wed, 14 Aug 2002 00:00:04 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <200208140239.g7E2d9n30799@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

[Guido]
> ...
> Actually, C is fairly careful: AFAIK on a 32-bit machine the type of
> 0xffffffff is unsigned long,

Close:  it's unsigned int.

> so it's not strictly -1,

Not close:  it's nothing at all like -1!  Try this:

#include 
void main() {printf("%d\n", 5 / 0xffffffff);}

If it "acted like" -1 this would print -5 instead of 0 (and if it doesn't
print 0, your compiler is broken).  Maybe more obvious is to do

    printf("%g\n", (double)0xffffffff);

That should pring something close to 

    4.29497e+09

not

   -1

> and you'll have to use a cast somewhere to be able to compare it
> to an int.

If you want it treated like -1, definitely, because it's not -1.  If you
want it treated like 4294967295, then in the absence of an explict cast the
int you're comparing it to will get silently promoted to unsigned int too
(or with a warning msg, if your compiler is helpful).

>> uint8_t, uint16_t, uint32_t and uint64_t.

> Hm.  This is a big deviation from tradition.  Those types aren't
> currently used or defined.

Nor are they required to exist, not even in C99, where all the new "exact
size" typedefs are optional -- some boxes simply don't have these types.
Most Cray boxes don't have a two-byte type, for example, and some don't have
a 32-bit type.

> ...
> If you really prefer your proposal with specific sized types, perhaps
> you can show some coding example that would be easier using specific
> sizes rather than char/short/int/long/long long?

Since we can't promise to supply specific-sized types, let's cut that short.
You never need specific-sized types, and Python-Dev has had this argument
before.  Whenever it's come up, the code that relied on specific-sized types
got simpler after making it portable.  What you do need is a type *at least*
as big as the size you need in the end (and C99 has required typedefs for
that concept; Python could grow some too).



From guido@python.org  Wed Aug 14 05:05:54 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 00:05:54 -0400
Subject: [Python-Dev] type categories
In-Reply-To: Your message of "Tue, 13 Aug 2002 23:21:00 EDT."
 <0ccc01c24341$d839b130$6501a8c0@boostconsulting.com>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208132115.g7DLFwL25088@odiug.zope.com> <0c0501c24311$8cebbdc0$6501a8c0@boostconsulting.com> <200208140242.g7E2gCs30811@pcp02138704pcs.reston01.va.comcast.net>
 <0ccc01c24341$d839b130$6501a8c0@boostconsulting.com>
Message-ID: <200208140405.g7E45s731824@pcp02138704pcs.reston01.va.comcast.net>

> That's funny, my users are very happy with how fast it works in
> Boost.Python. I don't see any reason it should have to be much less
> efficient in pure Python for most cases... the important "type categories"
> could be builtins. And as others have pointed out, it could even be used to
> get certain optimzations.

Time to write a PEP.  Maybe there's an implementation trick you
haven't told us about?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From David Abrahams"  <200208132115.g7DLFwL25088@odiug.zope.com> <0c0501c24311$8cebbdc0$6501a8c0@boostconsulting.com> <200208140242.g7E2gCs30811@pcp02138704pcs.reston01.va.comcast.net>              <0ccc01c24341$d839b130$6501a8c0@boostconsulting.com>  <200208140405.g7E45s731824@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <0ce601c24346$ff967f60$6501a8c0@boostconsulting.com>

From: "Guido van Rossum" 


> > That's funny, my users are very happy with how fast it works in
> > Boost.Python. I don't see any reason it should have to be much less
> > efficient in pure Python for most cases... the important "type
categories"
> > could be builtins. And as others have pointed out, it could even be
used to
> > get certain optimzations.
>
> Time to write a PEP.

I don't know how these things usually work, but isn't it a bit early for
that? I would like to have some discussion about multiple dispatch (and
especially matching criteria) before investing in a formal proposal. That's
what my earlier posting which got banished to the types-sig was trying to
do. Getting a feel for what people are thinking about this, and getting
feedback from those with lots more experience than I in matters Pythonic is
important to me.

> Maybe there's an implementation trick you haven't told us about?

There's not all that much to what I'm doing. I have a really simple-minded
dispatching scheme which checks each overload in sequence, and takes the
first one which can get a match for all arguments. That causes some
problems for people who want to overload on Python float vs. int types, for
example, because each one matches the other. When I get some time I plan to
move to a more-sophisticated scheme which rates each match and picks the
best one. It doesn't seem like it should cause a significant slowdown, but
that's just intuition (AKA bullshit) talking. My users generally think

    C++ = fast (but hard)
    Python = slow (but easy)

[no rude remarks from the peanut gallery, please!]
So they don't tend to worry too much about the speed at the Python/C++
boundary, where this mechanism lies. It could be that they don't notice the
cost because they're putting all time-critical functionality completely
inside the C++ part.

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com




From tim.one@comcast.net  Wed Aug 14 05:53:14 2002
From: tim.one@comcast.net (Tim Peters)
Date: Wed, 14 Aug 2002 00:53:14 -0400
Subject: [Python-Dev] strange warnings from tempfile.mkstemped.__del__ on HP
In-Reply-To: <200208140331.g7E3Veh30980@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

[Guido]
> Lysator's snake-farm, which does regular builds of CVS Python
> checkouts on a variety of uncommon platforms, has started reporting
> two warnings that I don't understand.  (Never mind the gettext.py
> warnings; they're shallow; someone should fix them.)

I submitted a patch for that to SF and assigned it to Barry (I have no idea
how to test gettext.py).

> The problem is the two exceptions ignored in __del__ methods.  If I
> look at the code of the new tempfile.py module and its
> test_tempfile.py unittests, I see that there's a class mkstemped
> defined in test_tempfile.py, which has a __del__ method that closes
> the file descriptor.  The only way I can see this failing with an
> AttributeError exception is if the instance never makes it through its
> __init__ call.

I agree, and, indeed, that's what would happen if it did fail during the
call to mkstemped.__init__().  So the call to tempfile._mkstemp_inner()
fails in two test cases (there were two distinct instances of the "no
attribute 'fd'" message), but we don't know which ones.

> ...
> But in that case I would have expect a failure reported; the only
> instantiation of mkstemped() is inside a try/except where the
> exceptclause calls self.failOnException() which causes the
> unit tests to fail.  But the unittest doesn't report any failures?!

Well, I didn't see *any* test output in the report, neither successes nor
failures, just Python-produced exceptions and warnings.  Maybe the script
only captures stderr?  A failing unittest run *under* regrtest.py doesn't
normally print anything to stderr.  It would have printed this to stdout,
though:

"""
...
test_tempfile
test test_tempfile failed -- errors occurred; run in verbose mode for
details
...
1 test failed:
test_tempfile
"""

So even if we had that, it wouldn't have helped.  stdout from a regrtest -v
run is what we need, or from running test_tempfile.py directly (w/o
regrtest).



From barry@zope.com  Wed Aug 14 06:06:35 2002
From: barry@zope.com (Barry A. Warsaw)
Date: Wed, 14 Aug 2002 01:06:35 -0400
Subject: [Python-Dev] hex constants, bit patterns, PEP 237 warnings and gettext
Message-ID: <15705.58715.533054.676186@anthem.wooz.org>

Ok, I admit that I've only tangentially followed the thread on PEP 237
deprecation warnings, and I've just skimmed PEP 237 but I'm pretty
tired, so I must be missing something.

The deprecation warnings on compiling Lib/gettext.py are complaining
about these three hex constants:

    ...
    # Magic number of .mo files
    LE_MAGIC = 0x950412de
    BE_MAGIC = 0xde120495

    def _parse(self, fp):
        """Override this method to support alternative .mo formats."""
        # We need to & all 32 bit unsigned integers with 0xffffffff for
        # portability to 64 bit machines.
        MASK = 0xffffffff

These really are intended as 32 bit patterns, not signed integers.
Hex constants seem like the most straightforward way to spell such bit
patterns.  If I wanted MASK to be -1 I would have spelled it that way!

What gettext wants to do is to read the first 4 bytes from a file, &
it with the MASK to get a 32 bit pattern and then compare that against
two known patterns to see if we're looking at big-ending or
little-ending.  This is as recommended in the GNU gettext docs.

Now, if I add a trailing `L' to each of those constants the warnings
go away, which seems odd to me given that PEP 237 is trying to do away
with the int/long distinction and will eventually make the trailing
`L' illegal!

So I clearly don't understand why I need to add the trailing-L to
quiet the warnings, and PEP 237 doesn't quiet help me understand why
hex and oct constants > sys.maxint have to have warnings (I understand
why shifts and such need warnings).  Maybe it's just me, but if I want
a bit pattern, I write a hex constant; I'm never going to write -1 as
0xffffffff.

So if "0x950412de" isn't the right way to write a 32 bit pattern, what
is? "0x950412deL"?  If so, what happens when the trailing-L becomes
illegal?  Seems like I'll be caught in a trap -- help me out! :)

-Barry


From tim.one@comcast.net  Wed Aug 14 06:16:58 2002
From: tim.one@comcast.net (Tim Peters)
Date: Wed, 14 Aug 2002 01:16:58 -0400
Subject: [Python-Dev] hex constants, bit patterns,
 PEP 237 warnings and gettext
In-Reply-To: <15705.58715.533054.676186@anthem.wooz.org>
Message-ID: 

[Barry A. Warsaw]
> ...
> So if "0x950412de" isn't the right way to write a 32 bit pattern,

It isn't today, but will be in 2.4.

> what is? "0x950412deL"?

That's what my gettext.py patch did (along with using 'I' codes in unpack,
and getting rid of all the "& MASK" fiddling) -- check it out, it's already
assigned to you for your convenience .)

> If so, what happens when the trailing-L becomes illegal?

I think that's more of a Python 3 thing.  But if not:

> Seems like I'll be caught in a trap -- help me out! :)

Easy:  we take the trailing-L away again someday .  My bet is that
trailing-L will never go away, though (why bother?).



From barry@python.org  Wed Aug 14 06:18:27 2002
From: barry@python.org (Barry A. Warsaw)
Date: Wed, 14 Aug 2002 01:18:27 -0400
Subject: [Python-Dev] hex constants, bit patterns, PEP 237 warnings and gettext
References: <15705.58715.533054.676186@anthem.wooz.org>
Message-ID: <15705.59427.792614.217066@anthem.wooz.org>

>>>>> "BAW" == Barry A Warsaw  writes:

    BAW> These really are intended as 32 bit patterns, not signed
    BAW> integers.  Hex constants seem like the most straightforward
    BAW> way to spell such bit patterns.  If I wanted MASK to be -1 I
    BAW> would have spelled it that way!

>>>>> "TP" == Tim Peters  writes:

    Guido> (Never mind the gettext.py > warnings; they're shallow;
    Guido> someone should fix them.)

    TP> I submitted a patch for that to SF and assigned it to Barry (I
    TP> have no idea how to test gettext.py).

BTW, I grok the other changes in Tim's patch.  struct's got an `I'
code for unpacking unsigned ints now, so it makes perfect sense to use
that instead of `i' w/ masking.

-Barry


From martin@v.loewis.de  Wed Aug 14 07:23:42 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 14 Aug 2002 08:23:42 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <15705.34920.804857.914875@localhost.localdomain>
References: <001e01c242e5$49697ff0$bd5d4540@Dell2>
 <200208131637.g7DGbLA08429@odiug.zope.com>
 
 <3D5973C9.5070309@lemburg.com>
 <15705.34920.804857.914875@localhost.localdomain>
Message-ID: 

Skip Montanaro  writes:

> What's the current behavior?  If my program receives an input in utf-8
> (let's say it comes from a form on a website), what form will it be in, or
> can't I tell?  

In general, you cannot tell in advance - it will depend on the data
source.

W3C advocates "early normalization" towards "NFC", meaning that in the
Internet, you should always see NFC data - unless you are primary data
source, e.g. by reading from a terminal, or after decoding some legacy
encoding. It turns out that most Python codecs will produce NFC
already, so normalization to NFC would be required only for user input,
and - as it turns out - when reading file names on OS X.

> Is it possible I will get spurious inequalities today if I compare
> two different unicode objects which were created from different
> sources and in different normal forms?

If they are in different normal forms, you *will* get inequalities
reliably. In the real world, inequalities will be spurious.

> What about a string and a unicode object?  Where can I read all
> about it (Python and unicode normalization)?

Python does no normalization, so there is nothing to read. For
Unicode, you may want to start with the Normalization FAQ

http://www.unicode.org/unicode/faq/normalization.html

Regards,
Martin


From martin@v.loewis.de  Wed Aug 14 07:28:36 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 14 Aug 2002 08:28:36 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <7B04D38E-AF04-11D6-9AFE-003065517236@oratrix.com>
References: <7B04D38E-AF04-11D6-9AFE-003065517236@oratrix.com>
Message-ID: 

Jack Jansen  writes:

> After a few more experiments I did manage to confuse the filesystem
> APIs: it turns out ligatures are not correctly decomposed. I.e. if you
> create a file "\uFB03" you cannot open it as "ffi".

LATIN SMALL LIGATURE FFI is a compatibility character. Those are not
normalized under NFD, only under NFKD (in which case it would decay to
ffi). Since NFKD loses information (of typographical nature in this
case), NFKD is only recommended for restricted domains (identifiers
being an explicit example).

Regards,
Martin


From martin@v.loewis.de  Wed Aug 14 07:33:13 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 14 Aug 2002 08:33:13 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: 
References: 
Message-ID: 

Jack Jansen  writes:

> If I understand the unicode standard (according to unicode.org)
> correctly this means that MacOS stores filenames in NFD normalized
> form, with all combining characters split out, and this is the
> preferred normalized form. Am I correct here?

You are correct that this is likely the form that OS X uses on-disk,
and at the APIs. This is not really the preferred form - W3C favours
and advocates NFC - precisely because it is easier to transform into
legacy encodings (as you just observed).

> But, even if NFC is the preferred normalized form (the documents I saw
> hinted that this may have been the case in previous Unicode
> standards:-): both NFC and NFD renditions of this string are legal
> unicode, aren't they? And if they are then both should be converted to
> the same latin-1 string, shouldn't they?

Yes, and yes.

> Do I misunderstand something, or this this a bug (limitation?) in the
> unicode->latin-1 decoder?

It's a limitation, in all codecs. Contributions of normalization code
are welcome. Since this is hard work, this is unlikely to be fixed in
Python 2.3 - unless somebody has a really good incentive for fixing
it.

Regards,
Martin


From oren-py-d@hishome.net  Wed Aug 14 08:25:23 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Wed, 14 Aug 2002 10:25:23 +0300
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: ; from martin@v.loewis.de on Tue, Aug 13, 2002 at 09:07:49AM +0200
References:   <20020813061506.GA49563@hishome.net> 
Message-ID: <20020814102523.A20855@hishome.net>

On Tue, Aug 13, 2002 at 09:07:49AM +0200, Martin v. Loewis wrote:
> Oren Tirosh  writes:
> > I think that this will produce the smallest number of
> > incompatibilities for existing code and maintain compatibility with
> > C header files on 32 bit platforms. In this case 0xff000000 will
> > always be interpreted as -16777216 and the 'i' parser will happily
> > convert it to wither 0xFF000000 or 0xFFFFFFFFFF000000, depending on
> > the native platform word size - which is probably what the
> > programmer meant.
> 
> This means you suggest that PEP 237 is not implemented, or atleast
> frozen at the current stage.

Not at all! Removing the differences between ints and longs is good. 
My reservations are about thehexadecimal representation.

    - Currently, the '%u', '%x', '%X' and '%o' string formatting
      operators and the hex() and oct() built-in functions behave
      differently for negative numbers: negative short ints are
      formatted as unsigned C long, while negative long ints are
      formatted with a minus sign.  This will be changed to use the
      long int semantics in all cases (but without the trailing 'L'
      that currently distinguishes the output of hex() and oct() for
      long ints).  Note that this means that '%u' becomes an alias for
      '%d'.  It will eventually be removed.

In Python up to 2.2 it's inconsistent between ints and longs:
>>> hex(-16711681)
'0xff00ffff'
>>> hex(-16711681L)
'-0xff0001L'		# ??!?!?

The hex representation of ints gives me useful information about their 
bit structure. After all, it is not immediately apparent to most mortals 
that the number above is a mask for bits 16-23.

The hex representation of longs is something I find quite misleading and 
I think it's also unprecedented.  This wart has bothered me for a long 
time now but I didn't have any use for it so I didn't mind too much. Now 
it is proposed to extend this useless representation to ints so I do.

So we have two elements of the language that are inconsistent. One of 
them is in widespread use and the other is... ahem... 

Which one of them should be changed to conform to the other? 

My proposal: 

On 32 bit platforms:
>>> hex(-16711681)
'0xff00ffff'
>>> hex(-16711681L)
'0xff00ffff'

On 64 bit platforms:
>>> hex(-16711681)
'0xffffffffff00ffffLL'
>>> hex(-16711681L)
'0xffffffffff00ffffLL'

The 'LL' suffix means that this number is to be treated as a 64 bit
*signed* number. This is consistent with the way it is interpreted by 
GCC and other unix compilers on both 32 and 64 bit platforms.  

What to do about numbers from 2**31 to 2**32-1?

>>> hex(4278255615)
0xff00ffffU

The U suffix, also borrowed from C, makes it unambigous on 32 and 64 bit 
platforms for both Python and C. 

Representation of positive numbers:

 0x00000000   -         0x7fffffff   : unambigous on all platforms
 0x80000000U  -         0xffffffffU  : representation adds U suffix
0x100000000LL - 0x7fffffffffffffffLL : representation adds LL suffix

Representation of negative numbers:
 0x80000000  - 0xffffffff (-2147483648 to -1):
	8 digits on 32 bit platforms
 0xffffffff80000000LL  - 0xffffffffffffffffLL  (same range):
	16 digits and LL suffix on 64 bit platforms

 others negative numbers: 16 digits and LL suffix on all platforms.

This makes the hex representation of a number informative and consistent 
between int and long on all platforms. It is also consistent with the
C compiler on the same platform. Yes, it will produce a different text
representation of some numbers on different platforms but this conveys
important information about the bit structure of the number which really
is different between platforms. eval()ing it back to a number is still 
consistent.

When converting in the other direction (hex representation to number) 
there is an ambigous range from 0x80000000 to 0xffffffff.  Should it be 
treated as signed or unsigned?  The current interpretation is signed. PEP
237 proposes to change it to unsigned. I propose to do neither - this range
should be deprecated and some explicit notation should be used instead.

There's no need to be in a hurry about deprecating it, though. The
overwhelming majority of Python code will run on 32 bit platforms for some
time yet.

I propose that on 32 bit platforms this will produce a silent warning. No 
code will break. Running the program with -Wall will inform the programmer 
that the code may not work for some future version of Python.

On 64 bit platforms this will be interpreted the same way as on a 32 bit 
platform (signed 32 bits) but produce a noisy warning.  If the code was 
written on a 64 bit platform and the programmer meant the number to be 
treated as unsigned an explicit U suffix can be added to make it 
unambigously unsigned. If the code was written on a 32 bit platform and 
the programmer meant the number to be treated as signed it's possible to 
just live with the warning (the code should still run correctly) or add 8 
leading 'F's and an 'LL' suffix to make it unambigously signed. The 
modified code will run without warning on both 32 and 64 bit platforms.

Notes: 

The number 4000000000 would be represented in hex as 0xEE6B2800U whether 
it's as an int on a 64 bit platform or a long on either 32 or 64 bit 
platforms.  The representation depends only on the numeric value, not the 
type. This proposal therefore does not contradict the purpose of PEP 237
because ints and longs are treated identically.

What's the hex representation of numbers outside the range of 64 bit 
integers? Frankly, I don't care.  I'll go with any proposed solution as
long as eval(hex(x)) == x.

On Microsoft platforms 64 bit literals use the suffix 'i64', not 'LL'.
Python may either use 'LL' exclusively or produce 'i64' on Microsoft
platforms and 'LL' on other platforms. In the latter case it should 
accept either suffix on all platforms.

Yes, this proposal is more complicated and has special treatment for
different ranges but that is because the issue is not trivial and cannot
be brushed aside using a one-size-doesn't-fit-anyone approach. This
reminds me a lot of unicode issues.

What about the L suffix? This proposal adopts the LL and U suffixes from
C and ensures that they are interpreted consistently on both languages.
But the L suffix is not consistent with C for the range 0x80000000L to 
0xFFFFFFFFL. Should the L suffix be deprecated? Should it produce a 
warning for the possibly ambigous range?

	Oren



From oren-py-d@hishome.net  Wed Aug 14 09:43:34 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Wed, 14 Aug 2002 04:43:34 -0400
Subject: [Python-Dev] type categories
In-Reply-To: <200208132115.g7DLFwL25088@odiug.zope.com>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208132115.g7DLFwL25088@odiug.zope.com>
Message-ID: <20020814084333.GA86955@hishome.net>

On Tue, Aug 13, 2002 at 05:15:58PM -0400, Guido van Rossum wrote:
> Alex Martelli introduced the "Look Before You Leap" (LBYL) syndrome
> for your uneasiness with (4) (and (5), I might add -- I don't know
> that __iter__ is always safe).  He contrasts it with a different
> attitude, which might be summarized as "It's easier to ask forgiveness
> than permission."  In many cases, there is no reason for LBYL
> syndrome, and it can actually cause subtle bugs.  For example, a LBYL
> programmer could write
> 
>   if not os.path.exists(fn):
>     print "File doesn't exist:", fn
>     return
>   fp = open(fn)
>   ...use fp...
> 
> A "forgiveness" programmer would write this as follows instead:
> 
>   try:
>     fp = open(fn)
>   except IOError, msg:
>     print "Can't open", fn, ":", msg
>     return
>   ...use fp...

So far I have proposed two "forgiveness" solutions to the re-iterability 
issue:

One was to raise an error if .next() is called after StopIteration so an 
attempt to iterate twice over an iterator would fail noisily.  You have 
rejected this idea, probably because too much code depends on the current 
documented behavior.

My other proposed solution is at
http://mail.python.org/pipermail/python-dev/2002-July/026960.html
I suspect it got lost in the noise, though.

	Oren



From Jack.Jansen@oratrix.com  Wed Aug 14 10:29:42 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Wed, 14 Aug 2002 11:29:42 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <200208140239.g7E2d9n30799@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <5CE274DA-AF68-11D6-AAC0-0030655234CE@oratrix.com>

On Wednesday, August 14, 2002, at 04:39 , Guido van Rossum wrote:
> How about the following counterproposal.  This also changes some of
> the other format codes to be a little more regular.
>
> Code    C type          	Range check
>
> b	unsigned char		0..UCHAR_MAX
> B	unsigned char		none **
> h	unsigned short		0..USHRT_MAX
> H	unsigned short		none **
> i	int			INT_MIN..INT_MAX
> I *	unsigned int		0..UINT_MAX
> l	long			LONG_MIN..LONG_MAX
> k *	unsigned long		none
> L	long long		LLONG_MIN..LLONG_MAX
> K *	unsigned long long	none
>
> Notes:
>
> * New format codes.
>
> ** Changed from previous "range-and-a-half" to "none"; the
>    range-and-a-half checking wasn't particularly useful.

Fine with me.

My only reason for suggesting the uint32_t and friends was because I was 
under the impression that you were unhappy with "unsigned long" having  
a different size on different platforms. I'm perfectly happy with 
char/short/long/long long.
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From Jack.Jansen@oratrix.com  Wed Aug 14 10:46:52 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Wed, 14 Aug 2002 11:46:52 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: 
Message-ID: 

On Wednesday, August 14, 2002, at 08:33 , Martin v. Loewis wrote:
>> Do I misunderstand something, or this this a bug (limitation?) in the
>> unicode->latin-1 decoder?
>
> It's a limitation, in all codecs. Contributions of normalization code
> are welcome. Since this is hard work, this is unlikely to be fixed in
> Python 2.3 - unless somebody has a really good incentive for fixing
> it.

Why is this hard work? I would guess that a simple table lookup would 
suffice, after all there are only a finite number of unicode characters 
that can be split up, and each one can be split up in only a small 
number of ways.

Wouldn't something like
for c in input:
	if not canbestartofcombiningsequence.has_key(c):
		output.append(c)
      nlookahead = MAXCHARSTOCOMBINE
      while nlookahead > 1:
		attempt = lookahead next nlookahead bytes from input
		if combine.has_key(attempt):
			output.append(combine[attempt])
			skip the lookahead in input
			break
	else:
		output.append(c)
do the trick, if the two dictionaries are initialized intelligently?
		
--
- Jack Jansen                
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -



From oren-py-d@hishome.net  Wed Aug 14 11:18:19 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Wed, 14 Aug 2002 06:18:19 -0400
Subject: [Python-Dev] type categories
In-Reply-To: <200208131545.29856.mclay@nist.gov>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov>
Message-ID: <20020814101819.GA93585@hishome.net>

On Tue, Aug 13, 2002 at 03:45:29PM -0400, Michael McLay wrote:
> > So what I wonder is this:  Has there been much thought about making
> > these type categories more explicitly part of the type system?
> 
> The category names look like general purpose interface names. The addition of 
> interfaces has been discussed quite a bit. While many people are interested 
> in having interfaces added to Python, there are many design issues that will 
> have to be resolved before it happens. 

Nope. Type categories are fundamentally different from interfaces.  An 
interface must be declared by the type while a category can be an 
observation about an existing type. 

Two types that are defined independently in different libraries may in 
fact fit under the same category because they implement the same protocol.
With named interfaces they may in fact be compatible but they will not 
expose the same explicit interface. Requiring them to import the interface 
from a common source starts to sound more like Java than Python and would
introduce dependencies and interface version issues in a language that is 
wonderfully free from such arbitrary complexities.

Python is a dymanic language. It deserves a dynamic type category system,
not static interfaces that must be declared. It's fine to write a class and
somehow say "I intend this class to be in category X, please warn me if I 
write a method that will make it incompatible". But I don't want declarations 
to be a *requirement* for being considered compatible with a protocol. I 
have noticed that a lots of protocols are defined retroactively by 
observation of the behavior of existing code. There shoudln't be any need 
to go tag someone else's code as conforming to a protocol or put a wrapper
around it just to be able to use it.

A category is defined mathematically by a membership predicate. So what we
need for type categories is a system for writing predicates about types.

Standard Python expressions should not be used for defining a category
membership predicate. A Python expression is not a pure function. This
makes it impossible to cache the results of which type belongs to what
category for efficiency. Another problem is that many different expressions 
may be equivalent but if two independently defined categories use equivalent 
predicates they should *be* the same category.  They should be merged at 
runtime just like interned strings. 

About a year ago I worked on a system for predicates having a canonical 
representation for security applications. . While I was working on it I 
realized that it would be perfect for implementing a type category system
for Python. It would be useful at runtime for error detection and runtime
queries of protocols. It would also be useful at compile time for early
detection of some errors and possibly for optimization. By implementing
an optional strict mode the early error detection could be improved to the
point where it's effectively a static type system.

Just a quick example of the usefulness of canonical predicates: if I
calculate the intersection of two predicates and reduce it to canonical
form it will reduce to the FALSE predicate if no input will satisfy both
predicates. It will be equal to one of the predicate if it is contained
by the other.

I spent countless hours thinking about these issues, probably more than 
most people on this list... I think I have the foundation for a powerful 
yet unobtrusive type category system. Unfortunately it will take me some 
time to put it in writing and I don't have enough free time (who does?)

	Oren



From walter@livinglogic.de  Wed Aug 14 11:47:47 2002
From: walter@livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Wed, 14 Aug 2002 12:47:47 +0200
Subject: [Python-Dev] PEP 293, Codec Error Handling Callbacks
References: <3D1057E8.9090200@livinglogic.de> <3D4E336E.8070700@lemburg.com>	<200208051348.g75DmOv13530@pcp02138704pcs.reston01.va.comcast.net>	<3D4E97A7.7000904@lemburg.com>	<200208051527.g75FR1814634@pcp02138704pcs.reston01.va.comcast.net>	<3D579245.2080306@livinglogic.de>		<3D57DC42.6070300@lemburg.com> 
Message-ID: <3D5A3553.4020705@livinglogic.de>

Martin v. Loewis wrote:

> "M.-A. Lemburg"  writes:
 >
>>What ? That exceptions are immutable ? I think it's a big win that
>>exceptions are in fact mutable -- they are great for transporting
>>extra information up the chain...
> 
> 
> I see. So this is an open issue.

Yes, but I think this is not that much of a problem, because when
the code that catches the exception wants to do something with
exc.args it has to know what the entries mean, which depends on
the type. And if this code knows that it is dealing with a
UnicodeEncodeError it can simply use exc.start instead of
exc.args[2].

Bye,
    Walter Dörwald



From kalle@lysator.liu.se  Wed Aug 14 12:53:03 2002
From: kalle@lysator.liu.se (Kalle Svensson)
Date: Wed, 14 Aug 2002 13:53:03 +0200
Subject: [snake-farm] RE: [Python-Dev] strange warnings from tempfile.mkstemped.__del__ on HP
In-Reply-To: 
References: <200208140331.g7E3Veh30980@pcp02138704pcs.reston01.va.comcast.net> 
Message-ID: <20020814115303.GB2054@i92.ryd.student.liu.se>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[Tim Peters]
> So even if we had that, it wouldn't have helped.  stdout from a
> regrtest -v run is what we need, or from running test_tempfile.py
> directly (w/o regrtest).

Here you go.

: kalle@taylor [python-HP_UX-B.11.00-9000_829-taylor]$ ; ./python ../python/dist/src/Lib/test/test_tempfile.py 
There are no surprising symbols in the tempfile module ... ok
_once initializes its argument ... ok
_once calls the callback just once ... ok
_once does not modify anything but its argument ... ok
_RandomNameSequence returns a six-character string ... ok
_RandomNameSequence returns no duplicate strings (stochastic) ... ok
_RandomNameSequence supports the iterator protocol ... ok
_candidate_tempdir_list returns a nonempty list of strings ... ok
_candidate_tempdir_list contains the expected directories ... ok
_get_candidate_names returns a _RandomNameSequence object ... ok
_get_candidate_names always returns the same object ... ok
_mkstemp_inner can create files ... ok
_mkstemp_inner can create many files (stochastic) ... FAIL
Exception exceptions.AttributeError: "mkstemped instance has no attribute 'fd'" in > ignored
_mkstemp_inner can create files in a user-selected directory ... ok
_mkstemp_inner creates files with the proper mode ... ok
_mkstemp_inner file handles are not inherited by child processes ... ok
_mkstemp_inner can create files in text mode ... ok
gettempprefix returns a nonempty prefix string ... ok
gettempprefix returns a usable prefix string ... ok
gettempdir returns a directory which exists ... ok
gettempdir returns a directory writable by the user ... ok
gettempdir always returns the same object ... ok
mkstemp can create files ... ok
mkstemp can create directories in a user-selected directory ... ok
mkdtemp can create directories ... ok
mkdtemp can create many directories (stochastic) ... ok
mkdtemp can create directories in a user-selected directory ... ok
mkdtemp creates directories with the proper mode ... ok
mktemp can choose usable file names ... ok
mktemp can choose many usable file names (stochastic) ... ok
mktemp issues a warning when used ... ok
NamedTemporaryFile can create files ... ok
NamedTemporaryFile creates files with names ... ok
A NamedTemporaryFile is deleted when closed ... ok
A NamedTemporaryFile can be closed many times without error ... ok
TemporaryFile can create files ... ok
TemporaryFile creates files with no names (on this system) ... ok
A TemporaryFile can be closed many times without error ... ok

======================================================================
FAIL: _mkstemp_inner can create many files (stochastic)
- ----------------------------------------------------------------------
Traceback (most recent call last):
  File "../python/dist/src/Lib/test/test_tempfile.py", line 295, in test_basic_many
  File "../python/dist/src/Lib/test/test_tempfile.py", line 278, in do_create
  File "../python/dist/src/Lib/test/test_tempfile.py", line 33, in failOnException
  File "/mp/slaskdisk/tmp/sfarmer/python/dist/src/Lib/unittest.py", line 260, in fail
AssertionError: _mkstemp_inner raised exceptions.OSError: [Errno 24] Too many open files: '/tmp/aaU3irrA'

- ----------------------------------------------------------------------
Ran 38 tests in 43.182s

FAILED (failures=1)
Traceback (most recent call last):
  File "../python/dist/src/Lib/test/test_tempfile.py", line 719, in ?
    test_main()
  File "../python/dist/src/Lib/test/test_tempfile.py", line 716, in test_main
    test_support.run_suite(suite)
  File "/mp/slaskdisk/tmp/sfarmer/python/dist/src/Lib/test/test_support.py", line 188, in run_suite
    raise TestFailed(err)
test.test_support.TestFailed: Traceback (most recent call last):
  File "../python/dist/src/Lib/test/test_tempfile.py", line 295, in test_basic_many
  File "../python/dist/src/Lib/test/test_tempfile.py", line 278, in do_create
  File "../python/dist/src/Lib/test/test_tempfile.py", line 33, in failOnException
  File "/mp/slaskdisk/tmp/sfarmer/python/dist/src/Lib/unittest.py", line 260, in fail
AssertionError: _mkstemp_inner raised exceptions.OSError: [Errno 24] Too many open files: '/tmp/aaU3irrA'

Hmm, I wonder how many that is, and how to change it.  I'll look
around.

Peace,
  Kalle
- -- 
Kalle Svensson, http://www.juckapan.org/~kalle/
Student, root and saint in the Church of Emacs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.6 

iD8DBQE9WkRmdNeA1787sd0RAjJhAJ9uMgUlPxBP93brZSxqhoQ4YkvIbQCg0MXv
HyLJEYRv3/t9X15xuJa5veA=
=cmKE
-----END PGP SIGNATURE-----


From pedroni@inf.ethz.ch  Wed Aug 14 12:54:44 2002
From: pedroni@inf.ethz.ch (Samuele Pedroni)
Date: Wed, 14 Aug 2002 13:54:44 +0200
Subject: [Python-Dev] Multiple dispatch
Message-ID: <002001c24389$61d49ee0$6d94fea9@newmexico>

[David Abrahams]
>I don't know how these things usually work, but isn't it a bit early for
>that? I would like to have some discussion about multiple dispatch (and
>especially matching criteria) before investing in a formal proposal. That's
>what my earlier posting which got banished to the types-sig was trying to
>do. Getting a feel for what people are thinking about this, and getting
>feedback from those with lots more experience than I in matters Pythonic is
>important to me.

I'm interested in multiple dispatch [but have limited
band-width], once I have even written
a pure Python implementation of it (only on classes).  It's quite
expressive for some designs. But:

[- Jython too internally uses a kind of multiple-dispatch
 in order to dispatch to overloaded Java methods.
But such a mechanims is really quite a limited beasts
wrt to adding multiple-dispatch to Python in general. ]

- I'm not sure is that much Pythonic or easy to grasp,
 one remark that I have read sometimes is that
 with multimethods the program logic is easely scattered
 in many places with so-to-say non-local effects.

- It is yet another paradigm that should be integrated
 with the rest of Python. For example how does it interact
  with the current single-dispatched methods, does it?
[It is not just a theoretical question, it influences whether
this can be used to model e.g. the dispatch of Jython for
overloaded Java methods or not, simplifying the picture
or adding confusion]
- Syntax and semantics: in Python definitions are assignments.
  Now one needs at least a (maybe implicit) define generic function and
 an add method to generic function. (Should def be abused?)
- Should all function  (methods) definitions define generic function methods
  under the hood.

- Do we dispatch on only foo.__class__ or
  do we want to dispatch on protocols/interfaces/categories,
  now at the moment these are not first-class in Python.
-  How do we solve dispatch ambiguities, the more
  predictable and uncomplicated the more Pythonic.

- Sometimes it is useful to substitute functions and methods
  with wrapped re-editions, the equivalent for multi-method
 are at least before,after,around combinators, I think
 they are useful, but make the picture more complex.

So the question is more what is the most pythonic way
we can find to add multiple dispatch, then maybe it
is Pythonic enough or not. [It seems a SIG task
but I have not really written that word <.5 wink>]

Related: Smallscript, CLOS, Dylan, various overloading flavors

Smallscript is interesting because it adds multiple dispatch
to the single-dispatch semantics of Smalltalk, so it's
very overlapping with our case, OTOH I have not played
with it and I don't know the details of the actual semantics,
[and in general it gives a PL/I-esque impression, at least from far away].

regards.



From barry@zope.com  Wed Aug 14 13:02:30 2002
From: barry@zope.com (Barry A. Warsaw)
Date: Wed, 14 Aug 2002 08:02:30 -0400
Subject: [Python-Dev] hex constants, bit patterns,
 PEP 237 warnings and gettext
References: <15705.58715.533054.676186@anthem.wooz.org>
 
Message-ID: <15706.18134.62932.879188@anthem.wooz.org>

>>>>> "TP" == Tim Peters  writes:

    TP> [Barry A. Warsaw]
    >> ...  So if "0x950412de" isn't the right way to write a 32 bit
    >> pattern,

    TP> It isn't today, but will be in 2.4.

But isn't that wasteful?  Today I have to add the L to my hex
constants, but in a year from now, I can just turn around and remove
them again.  What's the point?

The deeper question is: what's wrong with "0x950412de"?  What bits
have I lost by writing my hex constant this way?  I'm trying to
understand why hex constants > sys.maxint have to deprecated.

-Barry


From guido@python.org  Wed Aug 14 13:13:05 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 08:13:05 -0400
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: Your message of "Wed, 14 Aug 2002 08:33:13 +0200."
 
References: 
 
Message-ID: <200208141213.g7ECD5V00311@pcp02138704pcs.reston01.va.comcast.net>

> > Do I misunderstand something, or this this a bug (limitation?) in the
> > unicode->latin-1 decoder?
> 
> It's a limitation, in all codecs. Contributions of normalization code
> are welcome. Since this is hard work, this is unlikely to be fixed in
> Python 2.3 - unless somebody has a really good incentive for fixing
> it.

Note that normalization doesn't belong in the codecs (except perhaps
as a separate Unicode->Unicode codec, since codecs seem to be useful
for all string->string transformations).  It's a separate step that
the application has to request; only the app knows whether a
particular Unicode string is already normalized or not, and whether
the expense is useful for the app, or not.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From kalle@lysator.liu.se  Wed Aug 14 13:27:49 2002
From: kalle@lysator.liu.se (Kalle Svensson)
Date: Wed, 14 Aug 2002 14:27:49 +0200
Subject: [snake-farm] RE: [Python-Dev] strange warnings from tempfile.mkstemped.__del__ on HP
In-Reply-To: <20020814115303.GB2054@i92.ryd.student.liu.se>
References: <200208140331.g7E3Veh30980@pcp02138704pcs.reston01.va.comcast.net>  <20020814115303.GB2054@i92.ryd.student.liu.se>
Message-ID: <20020814122749.GD2054@i92.ryd.student.liu.se>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[me, on the HP-UX snake farm build]
> AssertionError: _mkstemp_inner raised exceptions.OSError: [Errno 24]
> Too many open files: '/tmp/aaU3irrA'
> 
> Hmm, I wonder how many that is, and how to change it.  I'll look
> around.

I've raised maxfiles from 200 to 2048, and the test now runs without
error.

Peace,
  Kalle
- -- 
Kalle Svensson, http://www.juckapan.org/~kalle/
Student, root and saint in the Church of Emacs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.6 

iD8DBQE9WkypdNeA1787sd0RAhM5AKCGfOftdmz2hGtBtFFCLzTvGQwVfQCbBv6+
wnwrIif0wg3Qf80cI6Lw9cE=
=ErkC
-----END PGP SIGNATURE-----


From oren-py-d@hishome.net  Wed Aug 14 13:24:58 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Wed, 14 Aug 2002 08:24:58 -0400
Subject: [Python-Dev] hex constants, bit patterns, PEP 237 warnings and gettext
In-Reply-To: <15706.18134.62932.879188@anthem.wooz.org>
References: <15705.58715.533054.676186@anthem.wooz.org>  <15706.18134.62932.879188@anthem.wooz.org>
Message-ID: <20020814122458.GA14912@hishome.net>

On Wed, Aug 14, 2002 at 08:02:30AM -0400, Barry A. Warsaw wrote:
> 
> >>>>> "TP" == Tim Peters  writes:
> 
>     TP> [Barry A. Warsaw]
>     >> ...  So if "0x950412de" isn't the right way to write a 32 bit
>     >> pattern,
> 
>     TP> It isn't today, but will be in 2.4.
> 
> But isn't that wasteful?  Today I have to add the L to my hex
> constants, but in a year from now, I can just turn around and remove
> them again.  What's the point?
> The deeper question is: what's wrong with "0x950412de"?  What bits
> have I lost by writing my hex constant this way?  I'm trying to
> understand why hex constants > sys.maxint have to deprecated.

Unifying ints and longs means that there is no predefined bit width for
numbers. Conceptually they are all infinite. Positive numbers have an
infinite number of leading '0's and negative numbers have an infinite number
of leading 'F's. Numbers that have less than 8/16 digits to the right of
this infinite sequence '0'f or 'F's of happen to get a more efficient 
internal representation and a different ob_type, but other than that it 
should be impossible to tell the difference between an int and a long.

What's wrong with 0x950412de is that with a word width of 32 bits it is 
negative and therefore the invisible bits to the left are all set. With a 
word width of 64 bits or with an infinite width they are cleared.

That's why I propose borrowing the 'U' suffix from C. 0x950412deU would
mean that the bits to the left are cleared. This way you could change your
code only once, document your intentions clearly and get a number that is
guaranteed to be equivalent on Python and C compilers with different native
word sizes.

	Oren



From guido@python.org  Wed Aug 14 13:26:58 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 08:26:58 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Wed, 14 Aug 2002 11:29:42 +0200."
 <5CE274DA-AF68-11D6-AAC0-0030655234CE@oratrix.com>
References: <5CE274DA-AF68-11D6-AAC0-0030655234CE@oratrix.com>
Message-ID: <200208141226.g7ECQxt00824@pcp02138704pcs.reston01.va.comcast.net>

> > How about the following counterproposal.  This also changes some of
> > the other format codes to be a little more regular.
> >
> > Code    C type          	Range check
> >
> > b	unsigned char		0..UCHAR_MAX
> > B	unsigned char		none **
> > h	unsigned short		0..USHRT_MAX
> > H	unsigned short		none **
> > i	int			INT_MIN..INT_MAX
> > I *	unsigned int		0..UINT_MAX
> > l	long			LONG_MIN..LONG_MAX
> > k *	unsigned long		none
> > L	long long		LLONG_MIN..LLONG_MAX
> > K *	unsigned long long	none
> >
> > Notes:
> >
> > * New format codes.
> >
> > ** Changed from previous "range-and-a-half" to "none"; the
> >    range-and-a-half checking wasn't particularly useful.
> 
> Fine with me.

OK, I've added this to my TODO list (python.org/sf/595026, assigned to
me -- but if someone else wants to do it, please assign to yourself or
submit a patch and leave a note in the bug item!).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Aug 14 13:40:47 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 08:40:47 -0400
Subject: [Python-Dev] type categories
In-Reply-To: Your message of "Tue, 13 Aug 2002 23:59:29 EDT."
 <0ce601c24346$ff967f60$6501a8c0@boostconsulting.com>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208132115.g7DLFwL25088@odiug.zope.com> <0c0501c24311$8cebbdc0$6501a8c0@boostconsulting.com> <200208140242.g7E2gCs30811@pcp02138704pcs.reston01.va.comcast.net> <0ccc01c24341$d839b130$6501a8c0@boostconsulting.com> <200208140405.g7E45s731824@pcp02138704pcs.reston01.va.comcast.net>
 <0ce601c24346$ff967f60$6501a8c0@boostconsulting.com>
Message-ID: <200208141240.g7ECelW00912@pcp02138704pcs.reston01.va.comcast.net>

> > Time to write a PEP.
> 
> I don't know how these things usually work, but isn't it a bit early
> for that?

Not at all.  If you want multiple dispatch to go into the language,
you'll have to educate the rest of us here, both about the advantages,
and how it can be implemented with reasonable, Pythonic semantics.
A PEP is the perfect vehicle for that.  A PEP doesn't have to *start*
as a full formal proposal.  It can go through stages and eventually
end up being rejected before there ever was a full formal proposal,
*or* it will eventually evolve into a full formal proposal.  (For
example, PEPs 245 and 246 are examples of PEPs in the very early
stages.  I expect PEP 245 was too early, but PEP 246 strikes me as
just the right thing to get a meaningful discussion started.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Aug 14 13:53:39 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 08:53:39 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Wed, 14 Aug 2002 10:25:23 +0300."
 <20020814102523.A20855@hishome.net>
References:   <20020813061506.GA49563@hishome.net> 
 <20020814102523.A20855@hishome.net>
Message-ID: <200208141253.g7ECrdY00941@pcp02138704pcs.reston01.va.comcast.net>

> Not at all! Removing the differences between ints and longs is good. 
> My reservations are about thehexadecimal representation.
> 
>     - Currently, the '%u', '%x', '%X' and '%o' string formatting
>       operators and the hex() and oct() built-in functions behave
>       differently for negative numbers: negative short ints are
>       formatted as unsigned C long, while negative long ints are
>       formatted with a minus sign.  This will be changed to use the
>       long int semantics in all cases (but without the trailing 'L'
>       that currently distinguishes the output of hex() and oct() for
>       long ints).  Note that this means that '%u' becomes an alias for
>       '%d'.  It will eventually be removed.
> 
> In Python up to 2.2 it's inconsistent between ints and longs:
> >>> hex(-16711681)
> '0xff00ffff'
> >>> hex(-16711681L)
> '-0xff0001L'		# ??!?!?
> 
> The hex representation of ints gives me useful information about their 
> bit structure. After all, it is not immediately apparent to most mortals 
> that the number above is a mask for bits 16-23.

If you want to see the bit mask, all you have to do is and it with a
positive mask, e.g. 0xffff to see it as a 16-bit mask, 0xffffffffL for
a 32-bit mask, or 0xffffffffffffffff for a 64-bit mask.  And you can
go higher.

> The hex representation of longs is something I find quite misleading and 
> I think it's also unprecedented.  This wart has bothered me for a long 
> time now but I didn't have any use for it so I didn't mind too much. Now 
> it is proposed to extend this useless representation to ints so I do.

Just yesterday I got a proposal for a hex calculator that proposed
-0x1 to represent the mathematical value -1 in hex.  I don't think
it's unprecedented at all, although it may be unconventional.

> So we have two elements of the language that are inconsistent. One of 
> them is in widespread use and the other is... ahem... 
> 
> Which one of them should be changed to conform to the other? 
> 
> My proposal: 
> 
> On 32 bit platforms:
> >>> hex(-16711681)
> '0xff00ffff'
> >>> hex(-16711681L)
> '0xff00ffff'
> 
> On 64 bit platforms:
> >>> hex(-16711681)
> '0xffffffffff00ffffLL'
> >>> hex(-16711681L)
> '0xffffffffff00ffffLL'
> 
> The 'LL' suffix means that this number is to be treated as a 64 bit
> *signed* number. This is consistent with the way it is interpreted by 
> GCC and other unix compilers on both 32 and 64 bit platforms.  

-1.

Python doesn't have the concept of 64-bit signed numbers.  It also
doesn't have the 'LL' syntax on input -- or do you propose to add that
too?  Why should the hex representation have to contain the conceptual
size of the number?  Do you propose to add LL to the hex
representations of positive numbers too?

> What to do about numbers from 2**31 to 2**32-1?
> 
> >>> hex(4278255615)
> 0xff00ffffU
> 
> The U suffix, also borrowed from C, makes it unambigous on 32 and 64 bit 
> platforms for both Python and C. 

Another -1.  Python doesn't have this on input.

> Representation of positive numbers:
> 
>  0x00000000   -         0x7fffffff   : unambigous on all platforms
>  0x80000000U  -         0xffffffffU  : representation adds U suffix
> 0x100000000LL - 0x7fffffffffffffffLL : representation adds LL suffix

What does the addition of the U or LL suffix give you?  If I really
want to know how many bits there are I can count the digits, right?
And usually the app that does the printing knows in how many bits it
is interested.

> Representation of negative numbers:
>  0x80000000  - 0xffffffff (-2147483648 to -1):
> 	8 digits on 32 bit platforms
>  0xffffffff80000000LL  - 0xffffffffffffffffLL  (same range):
> 	16 digits and LL suffix on 64 bit platforms
> 
>  others negative numbers: 16 digits and LL suffix on all platforms.

And what do you suppose we do with hex(-100**100)?

> This makes the hex representation of a number informative and consistent 
> between int and long on all platforms. It is also consistent with the
> C compiler on the same platform. Yes, it will produce a different text
> representation of some numbers on different platforms but this conveys
> important information about the bit structure of the number which really
> is different between platforms. eval()ing it back to a number is still 
> consistent.

Why is the bit structure so important to you?

> When converting in the other direction (hex representation to number) 
> there is an ambigous range from 0x80000000 to 0xffffffff.  Should it be 
> treated as signed or unsigned?  The current interpretation is signed. PEP
> 237 proposes to change it to unsigned. I propose to do neither - this range
> should be deprecated and some explicit notation should be used instead.

Now that's really helpful. :-(  What is someone to do who wants to
enter a hex constant they got from some documentation?  E.g. the AIFC
magic number is 0xA2805140.  Why shouldn't I be able to write that?
What's the use of having a discontinuity in our notation?

(I wanted to write much stronger words but I'm trying to respond to
the proposal only.  I guess I'm -1000000 on this.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Aug 14 13:59:44 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 08:59:44 -0400
Subject: [Python-Dev] tempfile.py
Message-ID: <200208141259.g7ECxiL00996@pcp02138704pcs.reston01.va.comcast.net>

The mkstemp() function in the rewritten tempfile has an argument with
a curious name and default: binary=True.  This caused confusion (even
the docstring in the original patch was confused :-).  It would be
much easier to explain if this was changed to text=False.  That is, to
deviate from the default mode, i.e. use text mode, you'll have to
write mkstemp(text=True) rather than mkstemp(binary=False).

This might require a few changes to the standard library and to
anybody's code who has aggressively started using this, but given the
freshness of the patch I think that's OK.  If anybody sees a good
reason *not* to do this, please let me know (here or on the SF patch,
python.org/sf/589982).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Aug 14 14:09:19 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 09:09:19 -0400
Subject: [Python-Dev] type categories
In-Reply-To: Your message of "Wed, 14 Aug 2002 06:18:19 EDT."
 <20020814101819.GA93585@hishome.net>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov>
 <20020814101819.GA93585@hishome.net>
Message-ID: <200208141309.g7ED9Jb01045@pcp02138704pcs.reston01.va.comcast.net>

[Oren]
> Type categories are fundamentally different from interfaces.  An 
> interface must be declared by the type while a category can be an 
> observation about an existing type. 

Yup.  (In Python these have often been called "protocols".  Jim Fulton
calls them "lore protocols".)

> Two types that are defined independently in different libraries may
> in fact fit under the same category because they implement the same
> protocol.  With named interfaces they may in fact be compatible but
> they will not expose the same explicit interface. Requiring them to
> import the interface from a common source starts to sound more like
> Java than Python and would introduce dependencies and interface
> version issues in a language that is wonderfully free from such
> arbitrary complexities.

Hm, I'm not sure if you can solve the version incompatibility problem
by ignoring it. :-)

> Python is a dymanic language. It deserves a dynamic type category
> system, not static interfaces that must be declared. It's fine to
> write a class and somehow say "I intend this class to be in category
> X, please warn me if I write a method that will make it
> incompatible". But I don't want declarations to be a *requirement*
> for being considered compatible with a protocol. I have noticed that
> a lots of protocols are defined retroactively by observation of the
> behavior of existing code. There shoudln't be any need to go tag
> someone else's code as conforming to a protocol or put a wrapper
> around it just to be able to use it.

Are you familiar with Zope's Interface package?  It solves this
problem (nicely, IMO) by allowing you to place an interface
declaration inside a class but also allowing you to make calls to an
interface registry that declare interfaces for pre-existing classes.

> A category is defined mathematically by a membership predicate. So
> what we need for type categories is a system for writing predicates
> about types.

Now I think you've lost me.  How can a category on the one hand be
observed after the fact and on the other hand defined by a rigorous
mathematical definition?  How could a program tell by looking at a
class whether it really is an implementation of a given protocol?

> Standard Python expressions should not be used for defining a
> category membership predicate. A Python expression is not a pure
> function. This makes it impossible to cache the results of which
> type belongs to what category for efficiency. Another problem is
> that many different expressions may be equivalent but if two
> independently defined categories use equivalent predicates they
> should *be* the same category.  They should be merged at runtime
> just like interned strings.

Again you've lost me.  I expect there's something here that you assume
well-known.  Can you please clarify this?  What on earth do you mean
by "A Python expression is not a pure function" ?

> About a year ago I worked on a system for predicates having a
> canonical representation for security applications. . While I was
> working on it I realized that it would be perfect for implementing a
> type category system for Python. It would be useful at runtime for
> error detection and runtime queries of protocols. It would also be
> useful at compile time for early detection of some errors and
> possibly for optimization. By implementing an optional strict mode
> the early error detection could be improved to the point where it's
> effectively a static type system.

So let's see a proposal already.  I can't guess what you are proposing
from this description except that you think highly of your own
invention.  I wouldn't expect you to mention it otherwise, so that's 
bits of information. :-)

> Just a quick example of the usefulness of canonical predicates: if I
> calculate the intersection of two predicates and reduce it to
> canonical form it will reduce to the FALSE predicate if no input
> will satisfy both predicates. It will be equal to one of the
> predicate if it is contained by the other.
> 
> I spent countless hours thinking about these issues, probably more than 
> most people on this list...

How presumptuous.

> I think I have the foundation for a powerful yet unobtrusive type
> category system. Unfortunately it will take me some time to put it
> in writing and I don't have enough free time (who does?)

I say vaporware. :-)

Tell us about it when you have time.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fredrik@pythonware.com  Wed Aug 14 14:12:21 2002
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Wed, 14 Aug 2002 15:12:21 +0200
Subject: [Python-Dev] tempfile.py
References: <200208141259.g7ECxiL00996@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <016a01c24394$3a620b80$0900a8c0@spiff>

guido wrote:
> The mkstemp() function in the rewritten tempfile has an argument with
> a curious name and default: binary=3DTrue.  This caused confusion =
(even
> the docstring in the original patch was confused :-).  It would be
> much easier to explain if this was changed to text=3DFalse.

fwiw, it would probably be even easier to use/explain if it
used a mode string.





From guido@python.org  Wed Aug 14 14:11:44 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 09:11:44 -0400
Subject: [Python-Dev] hex constants, bit patterns, PEP 237 warnings and gettext
In-Reply-To: Your message of "Wed, 14 Aug 2002 08:02:30 EDT."
 <15706.18134.62932.879188@anthem.wooz.org>
References: <15705.58715.533054.676186@anthem.wooz.org> 
 <15706.18134.62932.879188@anthem.wooz.org>
Message-ID: <200208141311.g7EDBiZ01068@pcp02138704pcs.reston01.va.comcast.net>

>     TP> [Barry A. Warsaw]
>     >> ...  So if "0x950412de" isn't the right way to write a 32 bit
>     >> pattern,
> 
>     TP> It isn't today, but will be in 2.4.
> 
> But isn't that wasteful?  Today I have to add the L to my hex
> constants, but in a year from now, I can just turn around and remove
> them again.  What's the point?

Think 5 years rather than 1 year.

> The deeper question is: what's wrong with "0x950412de"?  What bits
> have I lost by writing my hex constant this way?  I'm trying to
> understand why hex constants > sys.maxint have to deprecated.

We're not deprecating them.  Instead, the type of hex constants
in range(sys.maxint, 2*sys.maxint+2) will change from int to long, to
be consistent with other hex constants.  Currently:

  >>> 0xf > 0
  True
  >>> 0xffff > 0
  True
  >>> 0xfffffff > 0
  True
  >>> 0xffffffff > 0
  False                      <----------- This anomaly will disappear
  >>> 0xfffffffff > 0
  True
  >>> 0xffffffffffffffff > 0
  True
  >>> 

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Aug 14 14:12:36 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 09:12:36 -0400
Subject: [Python-Dev] hex constants, bit patterns, PEP 237 warnings and gettext
In-Reply-To: Your message of "Wed, 14 Aug 2002 08:02:30 EDT."
 <15706.18134.62932.879188@anthem.wooz.org>
References: <15705.58715.533054.676186@anthem.wooz.org> 
 <15706.18134.62932.879188@anthem.wooz.org>
Message-ID: <200208141312.g7EDCaf01081@pcp02138704pcs.reston01.va.comcast.net>

> I'm trying to understand why hex constants > sys.maxint have to
> deprecated.

Hm, maybe I should use a different warning category rather than
DeprecationWarning?  Any suggestions?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Aug 14 14:14:21 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 09:14:21 -0400
Subject: [snake-farm] RE: [Python-Dev] strange warnings from tempfile.mkstemped.__del__ on HP
In-Reply-To: Your message of "Wed, 14 Aug 2002 14:27:49 +0200."
 <20020814122749.GD2054@i92.ryd.student.liu.se>
References: <200208140331.g7E3Veh30980@pcp02138704pcs.reston01.va.comcast.net>  <20020814115303.GB2054@i92.ryd.student.liu.se>
 <20020814122749.GD2054@i92.ryd.student.liu.se>
Message-ID: <200208141314.g7EDELl01095@pcp02138704pcs.reston01.va.comcast.net>

> [me, on the HP-UX snake farm build]
> > AssertionError: _mkstemp_inner raised exceptions.OSError: [Errno 24]
> > Too many open files: '/tmp/aaU3irrA'
> > 
> > Hmm, I wonder how many that is, and how to change it.  I'll look
> > around.
> 
> I've raised maxfiles from 200 to 2048, and the test now runs without
> error.

Thanks!  Maybe the test was a little too eager though -- perhaps it
could be happy with creating 100 instead of 1000 files.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From ark@research.att.com  Wed Aug 14 15:08:59 2002
From: ark@research.att.com (Andrew Koenig)
Date: 14 Aug 2002 10:08:59 -0400
Subject: [Python-Dev] type categories
In-Reply-To: <20020814101819.GA93585@hishome.net>
References: <200208131802.g7DI2Ro27807@europa.research.att.com>
 <200208131545.29856.mclay@nist.gov>
 <20020814101819.GA93585@hishome.net>
Message-ID: 

>> The category names look like general purpose interface names. The
>> addition of interfaces has been discussed quite a bit. While many
>> people are interested in having interfaces added to Python, there
>> are many design issues that will have to be resolved before it
>> happens.

Oren> Nope. Type categories are fundamentally different from
Oren> interfaces.  An interface must be declared by the type while a
Oren> category can be an observation about an existing type.

Why?  That is, why can't you imagine making a claim that type
X meets interface Y, even though the author of neither X nor Y
made that claim?

However, now that you bring it up... One difference I see between
interfaces and categories is that I can imagine categories carrying
semantic information to the human reader of the code that is not
actually expressed in the category itself.  As a simple example,
I can imagine a PartialOrdering category that I might like as part
of the specification for an argument to a sort function.

Oren> Two types that are defined independently in different libraries
Oren> may in fact fit under the same category because they implement
Oren> the same protocol.  With named interfaces they may in fact be
Oren> compatible but they will not expose the same explicit
Oren> interface. Requiring them to import the interface from a common
Oren> source starts to sound more like Java than Python and would
Oren> introduce dependencies and interface version issues in a
Oren> language that is wonderfully free from such arbitrary
Oren> complexities.

Why is importing an interface any worse than importing a library?

I see both interfaces and categories as claims about types.  Those
claims might be made by the types' authors, or they might be made by
the types' users.  I see no reason why they should have to be any
more static than the definitions of the types themselves.

Oren> Python is a dymanic language. It deserves a dynamic type
Oren> category system, not static interfaces that must be
Oren> declared. It's fine to write a class and somehow say "I intend
Oren> this class to be in category X, please warn me if I write a
Oren> method that will make it incompatible". But I don't want
Oren> declarations to be a *requirement* for being considered
Oren> compatible with a protocol. I have noticed that a lots of
Oren> protocols are defined retroactively by observation of the
Oren> behavior of existing code. There shoudln't be any need to go tag
Oren> someone else's code as conforming to a protocol or put a wrapper
Oren> around it just to be able to use it.

Oren> A category is defined mathematically by a membership
Oren> predicate. So what we need for type categories is a system for
Oren> writing predicates about types.

Indeed, that's what I was thinking about initially.  Guido pointed out
that the notion could be expanded to making concrete assertions about
the interface to a class.  I had originally considered that those
assertions could be just that--assertions, but then when Guido started
talking about interfaces, I realized that my original thought of
expressing satisfaction of a predicate by inheriting it could be
extended by simply adding methods to those predicates.  Of course,
this technique has the disadvantage that it's not easy to add base
classes to a class after it has been defined.

Oren> Standard Python expressions should not be used for defining a
Oren> category membership predicate. A Python expression is not a pure
Oren> function. This makes it impossible to cache the results of which
Oren> type belongs to what category for efficiency. Another problem is
Oren> that many different expressions may be equivalent but if two
Oren> independently defined categories use equivalent predicates they
Oren> should *be* the same category.  They should be merged at runtime
Oren> just like interned strings.

Yes.

Oren> About a year ago I worked on a system for predicates having a
Oren> canonical representation for security applications. . While I
Oren> was working on it I realized that it would be perfect for
Oren> implementing a type category system for Python. It would be
Oren> useful at runtime for error detection and runtime queries of
Oren> protocols. It would also be useful at compile time for early
Oren> detection of some errors and possibly for optimization. By
Oren> implementing an optional strict mode the early error detection
Oren> could be improved to the point where it's effectively a static
Oren> type system.

Oren> Just a quick example of the usefulness of canonical predicates:
Oren> if I calculate the intersection of two predicates and reduce it
Oren> to canonical form it will reduce to the FALSE predicate if no
Oren> input will satisfy both predicates. It will be equal to one of
Oren> the predicate if it is contained by the other.

Oren> I spent countless hours thinking about these issues, probably
Oren> more than most people on this list... I think I have the
Oren> foundation for a powerful yet unobtrusive type category
Oren> system. Unfortunately it will take me some time to put it in
Oren> writing and I don't have enough free time (who does?)

Is there room to scribble it in the margin somewhere?  


-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From sholden@holdenweb.com  Wed Aug 14 15:06:07 2002
From: sholden@holdenweb.com (Steve Holden)
Date: Wed, 14 Aug 2002 10:06:07 -0400
Subject: [Python-Dev] hex constants, bit patterns, PEP 237 warnings and gettext
References: <15705.58715.533054.676186@anthem.wooz.org>               <15706.18134.62932.879188@anthem.wooz.org>  <200208141312.g7EDCaf01081@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <009d01c2439b$c2e562c0$6300000a@holdenweb.com>

[Oren]
> > I'm trying to understand why hex constants > sys.maxint have to
> > deprecated.
> 
[Guido]
> Hm, maybe I should use a different warning category rather than
> DeprecationWarning?  Any suggestions?
> 

SignExtensionWarning? IntegerPrecisionWarning?

regards
-----------------------------------------------------------------------
Steve Holden                                 http://www.holdenweb.com/
Python Web Programming                http://pydish.holdenweb.com/pwp/
-----------------------------------------------------------------------






From barry@zope.com  Wed Aug 14 15:53:10 2002
From: barry@zope.com (Barry A. Warsaw)
Date: Wed, 14 Aug 2002 10:53:10 -0400
Subject: [Python-Dev] hex constants, bit patterns, PEP 237 warnings and gettext
References: <15705.58715.533054.676186@anthem.wooz.org>
 
 <15706.18134.62932.879188@anthem.wooz.org>
 <20020814122458.GA14912@hishome.net>
Message-ID: <15706.28374.168828.437841@anthem.wooz.org>

>>>>> "OT" == Oren Tirosh  writes:

    OT> What's wrong with 0x950412de is that with a word width of 32
    OT> bits it is negative and therefore the invisible bits to the
    OT> left are all set. With a word width of 64 bits or with an
    OT> infinite width they are cleared.

My point is that if I write a hex constant I never think about it as a
negative number; it's always an unsigned bit pattern.  I know Python
currently disagrees when the bit pattern is 32-bits in width and the
top bit is set, and that PEP 237 is the roadmap to get there from
here.

>>>>> "GvR" == Guido van Rossum  writes:

    >> I'm trying to understand why hex constants > sys.maxint have to
    >> deprecated.

    GvR> Hm, maybe I should use a different warning category rather
    GvR> than DeprecationWarning?  Any suggestions?

I think that would help a lot, yes.  We had a lively internal
discussion this morning about it and we came up with FutureWarning.
Maybe Guido will come up with a better name, but I don't think it
should be DeprecationWarning.  The code that causes the warning isn't
being deprecated, its semantics are destined to be changed, and that
seems like an important distinction.

-Barry


From guido@python.org  Wed Aug 14 15:58:32 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 10:58:32 -0400
Subject: [Python-Dev] tempfile.py
In-Reply-To: Your message of "Wed, 14 Aug 2002 15:12:21 +0200."
 <016a01c24394$3a620b80$0900a8c0@spiff>
References: <200208141259.g7ECxiL00996@pcp02138704pcs.reston01.va.comcast.net>
 <016a01c24394$3a620b80$0900a8c0@spiff>
Message-ID: <200208141458.g7EEwWB27193@odiug.zope.com>

> guido wrote:
> > The mkstemp() function in the rewritten tempfile has an argument with
> > a curious name and default: binary=True.  This caused confusion (even
> > the docstring in the original patch was confused :-).  It would be
> > much easier to explain if this was changed to text=False.
> 
> fwiw, it would probably be even easier to use/explain if it
> used a mode string.

The [Named]TemporaryFile() functions do that.  mkstemp() returns a
OS-level file desccriptor.  I guess the 'binary' flag is coming from
Windows thinking, where you have to add os.O_BINARY to the open()
flags for binary mode.

I'll change mkstemp() to having a text=False argument instead.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry@python.org  Wed Aug 14 15:59:51 2002
From: barry@python.org (Barry A. Warsaw)
Date: Wed, 14 Aug 2002 10:59:51 -0400
Subject: [Python-Dev] hex constants, bit patterns, PEP 237 warnings and gettext
References: <15705.58715.533054.676186@anthem.wooz.org>
 
 <15706.18134.62932.879188@anthem.wooz.org>
 <200208141312.g7EDCaf01081@pcp02138704pcs.reston01.va.comcast.net>
 <009d01c2439b$c2e562c0$6300000a@holdenweb.com>
Message-ID: <15706.28775.905259.545599@anthem.wooz.org>

>>>>> "SH" == Steve Holden  writes:

    SH> SignExtensionWarning? IntegerPrecisionWarning?

ItsGonnaBeABitDifferentWarning

:)

-Barry


From guido@python.org  Wed Aug 14 16:16:05 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 11:16:05 -0400
Subject: [Python-Dev] hex constants, bit patterns, PEP 237 warnings and gettext
In-Reply-To: Your message of "Wed, 14 Aug 2002 10:59:51 EDT."
 <15706.28775.905259.545599@anthem.wooz.org>
References: <15705.58715.533054.676186@anthem.wooz.org>  <15706.18134.62932.879188@anthem.wooz.org> <200208141312.g7EDCaf01081@pcp02138704pcs.reston01.va.comcast.net> <009d01c2439b$c2e562c0$6300000a@holdenweb.com>
 <15706.28775.905259.545599@anthem.wooz.org>
Message-ID: <200208141516.g7EFG5327345@odiug.zope.com>

>     SH> SignExtensionWarning? IntegerPrecisionWarning?
> 
> ItsGonnaBeABitDifferentWarning

Let it be FutureWarning.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas@python.ca  Wed Aug 14 16:54:40 2002
From: nas@python.ca (Neil Schemenauer)
Date: Wed, 14 Aug 2002 08:54:40 -0700
Subject: [Python-Dev] type categories
In-Reply-To: <0ce601c24346$ff967f60$6501a8c0@boostconsulting.com>; from dave@boost-consulting.com on Tue, Aug 13, 2002 at 11:59:29PM -0400
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208132115.g7DLFwL25088@odiug.zope.com> <0c0501c24311$8cebbdc0$6501a8c0@boostconsulting.com> <200208140242.g7E2gCs30811@pcp02138704pcs.reston01.va.comcast.net> <0ccc01c24341$d839b130$6501a8c0@boostconsulting.com> <200208140405.g7E45s731824@pcp02138704pcs.reston01.va.comcast.net> <0ce601c24346$ff967f60$6501a8c0@boostconsulting.com>
Message-ID: <20020814085440.A31966@glacier.arctrix.com>

David Abrahams wrote:
> There's not all that much to what I'm doing. I have a really simple-minded
> dispatching scheme which checks each overload in sequence, and takes the
> first one which can get a match for all arguments.

Can you explain in more detail how the matching is done?  Wouldn't
having some kind of type declarations be a precondition to implementing
multiple dispatch.

  Neil


From tim@zope.com  Wed Aug 14 17:10:43 2002
From: tim@zope.com (Tim Peters)
Date: Wed, 14 Aug 2002 12:10:43 -0400
Subject: [snake-farm] RE: [Python-Dev] strange warnings from tempfile.mkstemped.__del__ on HP
In-Reply-To: <200208141314.g7EDELl01095@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

[Guido]
> Thanks!  Maybe the test was a little too eager though -- perhaps it
> could be happy with creating 100 instead of 1000 files.

Just noting that this change has been made in current CVS, so there
shouldn't be a need to boost the HP default anymore.



From pedroni@inf.ethz.ch  Wed Aug 14 17:17:14 2002
From: pedroni@inf.ethz.ch (Samuele Pedroni)
Date: Wed, 14 Aug 2002 18:17:14 +0200
Subject: [Python-Dev] multiple dispatch (ii)
Message-ID: <007901c243ae$0db918c0$6d94fea9@newmexico>

Here is my old code,
is kind of a alpha quality prototype code,
no syntax sugar, no integration, pure python.

The "_redispatch" mechanism is the moral
equivalent of

class A:
  def meth(self): ...

class B(A):
  def meth(self):
     A.meth(self)

it is used both for call-next-method functionality
(that means super for multiple dispatch)
and to solve ambiguities.

(this is pre 2.2 stuff, nowadays
the mro of the actual argument type can be used
to solve ambiguities (like CLOS and Dylan do), if you add
interfaces/protocols to the picture you should
decide how to merge them in the mro, if the case)

[it uses memoization and so you can't fiddle
with __bases__]

#test_mdisp.py:

print "** mdisp test"
import mdisp

class Panel: pass

class PadPanel(Panel): pass

class Specific: pass

present = mdisp.Generic()

panel = PadPanel()
spec = Specific()

def pan(p,o):
    print "generic panel present"

def pad(p,o):
    print "pad panel present"

def speci(p,o):
    print "generic panel  present"

def padspeci(p,o):
    print "pad panel  present"

present.add_method((Panel,mdisp.Any),pan)

present(panel,spec)

present.add_method((Panel,Specific),speci)

present(panel,spec)

present.add_method((PadPanel,mdisp.Any),pad)

try:
    present(panel,spec)
except mdisp.AmbiguousMethodError:
    print "ambiguity"

print "_redispatch = (None,Any)",

present(panel,spec,_redispatch=(None,mdisp.Any))

present.add_method((PadPanel,Specific),padspeci)

present(panel,spec)

print "* again... panel:obj tierule"

present=mdisp.Generic("panel:obj")

present.add_method((Panel,mdisp.Any),pan)

present(panel,spec)

present.add_method((Panel,Specific),speci)

present(panel,spec)

present.add_method((PadPanel,mdisp.Any),pad)

try:
    present(panel,spec)
except mdisp.AmbiguousMethodError:
    print "ambiguity"

present.add_method((PadPanel,Specific),padspeci)

present(panel,spec)

OUTPUT
** mdisp test
generic panel present
generic panel  present
ambiguity
_redispatch = (None,Any) pad panel present
pad panel  present
* again... panel:obj tierule
generic panel present
generic panel  present
pad panel present
pad panel  present

#actual mdisp.py:

import types
import re

def class_of(obj):
    if type(obj) is types.InstanceType:
        return obj.__class__
    else:
        return type(obj)

NonComparable = None
class Any: pass

def class_le(cl1,cl2):
    if cl1 == cl2: return 1
    if cl2 == Any: return 1
    try:
        cl_lt = issubclass(cl1,cl2)
        cl_gt = issubclass(cl2,cl1)
        if not (cl_lt or cl_gt): return NonComparable
        return cl_lt
    except:
        return NonComparable

def classes_tuple_le(tup1,tup2):
    if len(tup1) != len(tup2): return NonComparable
    tup_le = 0
    tup_gt = 0
    for cl1,cl2 in zip(tup1,tup2):
        cl_le = class_le(cl1,cl2)
        if cl_le == NonComparable:
            return NonComparable
        if cl_le:
            tup_le |= 1
        else:
            tup_gt |= 1
        if tup_le and tup_gt: return NonComparable
    return tup_le

def classes_tuple_le_ex(tup1,tup2, tierule = None):
    if len(tup1) != len(tup2): return NonComparable
    if not tierule: tierule = (len(tup1),)
    last = 0
    for upto in tierule:
        sl1 = tup1[last:upto]
        sl2 = tup2[last:upto]
        last = upto
        if sl1 == sl2: continue
        if len(sl1) == 1:
            return class_le(sl1[0],sl2[0])
        sl_le = 0
        sl_gt = 0
        for cl1,cl2 in zip(sl1,sl2):
            cl_le = class_le(cl1,cl2)
            if cl_le == NonComparable:
                return NonComparable
            if cl_le:
                sl_le |= 1
            else:
                sl_gt |= 1
            if sl_le and sl_gt: return NonComparable
        return sl_le
    return 1

_id_regexp = re.compile("\w+")

def build_tierule(patt):
    tierule = []
    last = 0
    for uni in patt.split(':'):
        c = 0
        for arg in uni.split(','):
            if not _id_regexp.match(arg): raise "ValueError","invalid Generic
(tierule) pattern"
            c += 1
        last += c
        tierule.append(last)
    return tierule

def forge_classes_tuple(model,tup):
    return tuple ( map ( lambda (m,cl): m or cl,
                 zip(model,tup)))

class GenericDispatchError(TypeError): pass

class NoApplicableMethodError(GenericDispatchError): pass

class AmbiguousMethodError(GenericDispatchError): pass

class Generic:
    def __init__(self,args=None):
        self.cache = {}
        self.methods = {}
        if args:
            self.args = args
            self.tierule = build_tierule(args)
        else:
            self.args = "???"
            self.tierule = None

    def add_method(self,cltup,func):
        self.methods[cltup] = func
        new_meth = (cltup,func)
        self.cache[cltup] = new_meth
        for d_cltup,(meth_cltup,meth_func) in self.cache.items():
            if classes_tuple_le(d_cltup,cltup):
                le = classes_tuple_le_ex(cltup,meth_cltup,self.tierule)
                if le == NonComparable:
                    del self.cache[d_cltup]
                elif le:
                    self.cache[d_cltup] = new_meth

    def __call__(self,*args,**kw):
        redispatch = kw.get('_redispatch',None)
        d_cltup = map(class_of,args)
        if redispatch:
            d_cltup = forge_classes_tuple(redispatch,d_cltup)
        else:
            d_cltup = tuple(d_cltup)

        if self.cache.has_key(d_cltup):
            return self.cache[d_cltup][1](*args) # 1 retrieves func

        cands = []
        for cltup in self.methods.keys():
            if d_cltup == cltup:
                return self.methods[cltup](*args)
            if classes_tuple_le(d_cltup,cltup): # applicable?
                i = len(cands)
                app = not i
                i -= 1
                while i>=0:
                    cand = cands[i]
                    le = classes_tuple_le_ex(cltup,cand,self.tierule)
                    #print cltup,"<=",cand,"?",le
                    if le == NonComparable:
                        app = 1
                    elif le:
                        if cand != cltup:
                            app = 1
                            #print "remove",cand
                            del cands[i]
                    i -= 1
                if app:
                    cands.append(cltup)
                #print cands
        if len(cands) == 0:
            raise NoApplicableMethodError
        if len(cands)>1:
            raise AmbiguousMethodError
        cltup = cands[0]
        func = self.methods[cltup]
        self.cache[d_cltup] = (cltup,func)
        return func(*args)





From martin@v.loewis.de  Wed Aug 14 19:35:41 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 14 Aug 2002 20:35:41 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: 
References: 
Message-ID: 

Jack Jansen  writes:

> Why is this hard work? I would guess that a simple table lookup would
> suffice, after all there are only a finite number of unicode
> characters that can be split up, and each one can be split up in only
> a small number of ways.

Canonical decomposition requires more than that: you not only need to
apply the canonical decomposition mapping, but also need to put the
resulting characters into canonical order (if more than one combining
character applies to a base character).

In addition, a na=EFve implementation will consume large amounts of
memory. Hangul decomposition is better done algorithmitically, as we
are talking about 11172 precombined characters for Hangul alone.

> Wouldn't something like
> for c in input:
> 	if not canbestartofcombiningsequence.has_key(c):
> 		output.append(c)
>       nlookahead =3D MAXCHARSTOCOMBINE
>       while nlookahead > 1:
> 		attempt =3D lookahead next nlookahead bytes from input
> 		if combine.has_key(attempt):
> 			output.append(combine[attempt])
> 			skip the lookahead in input
> 			break
> 	else:
> 		output.append(c)
> do the trick, if the two dictionaries are initialized intelligently?

No, that doesn't do canonical ordering. There is a lot more to
normalization; the hard work is really in understanding what has to be
done.

Regards,
Martin


From martin@v.loewis.de  Wed Aug 14 19:46:04 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 14 Aug 2002 20:46:04 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <20020814102523.A20855@hishome.net>
References: 
 
 <20020813061506.GA49563@hishome.net>
 
 <20020814102523.A20855@hishome.net>
Message-ID: 

Oren Tirosh  writes:

> >>> hex(-16711681)
> '0xff00ffff'
> >>> hex(-16711681L)
> '-0xff0001L'		# ??!?!?
[...] 
> The hex representation of longs is something I find quite misleading and 
> I think it's also unprecedented.  This wart has bothered me for a long 
> time now but I didn't have any use for it so I didn't mind too much. Now 
> it is proposed to extend this useless representation to ints so I do.

I don't find it misleading - in fact, the C representation is
misleading: 0xff00ffff looks like a positive number (it does not have
a sign) - this is misleading, as the number is, in fact, negative.

The representation is not misleading: it does not make you believe it
is something that it actually isn't. It might be surprising, but after
thinking about it, it should be clear that it is correct: -N is the
number that, when added to N, gives zero. Indeed:

>>> -16711681L+0xff0001L
0L

If you want the bitmask for the lowest 32 bits, you can write

>>> hex(-16711681L & (2**32-1))
'0xFF00FFFFL'

Notice that -16711681 is a number with an infinite amont of leading
ones - just as 16711681 is a number with an infinite amount of leading
zeroes.

Regards,
Martin


From guido@python.org  Wed Aug 14 19:49:25 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 14:49:25 -0400
Subject: [Python-Dev] Alternative implementation of interning
Message-ID: <200208141849.g7EInPj21457@odiug.zope.com>

python/sf/576101

I think Oren did a good job on this.  Could somebody please do an
independent review of the code before I check it in?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From paul-python@svensson.org  Wed Aug 14 20:21:57 2002
From: paul-python@svensson.org (Paul Svensson)
Date: Wed, 14 Aug 2002 15:21:57 -0400 (EDT)
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: 
Message-ID: 

On 14 Aug 2002, Martin v. Loewis wrote:

>Oren Tirosh  writes:
>
>> >>> hex(-16711681)
>> '0xff00ffff'
>> >>> hex(-16711681L)
>> '-0xff0001L'		# ??!?!?
>[...]
>> The hex representation of longs is something I find quite misleading and
>> I think it's also unprecedented.  This wart has bothered me for a long
>> time now but I didn't have any use for it so I didn't mind too much. Now
>> it is proposed to extend this useless representation to ints so I do.
>
>I don't find it misleading - in fact, the C representation is
>misleading: 0xff00ffff looks like a positive number (it does not have
>a sign) - this is misleading, as the number is, in fact, negative.
>
>The representation is not misleading: it does not make you believe it
>is something that it actually isn't. It might be surprising, but after
>thinking about it, it should be clear that it is correct: -N is the
>number that, when added to N, gives zero. Indeed:
>
>>>> -16711681L+0xff0001L
>0L
>
>If you want the bitmask for the lowest 32 bits, you can write
>
>>>> hex(-16711681L & (2**32-1))
>'0xFF00FFFFL'
>
>Notice that -16711681 is a number with an infinite amont of leading
>ones - just as 16711681 is a number with an infinite amount of leading
>zeroes.

Just a thougth: if it's true that those using hex() and %x are more
interested in the bit values than the numerical value of the whole number,
would a format like ~0xff000 be easier to interpret (and stop this debate) ?

	/Paul



From martin@v.loewis.de  Wed Aug 14 20:35:01 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 14 Aug 2002 21:35:01 +0200
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: 
References: 
Message-ID: 

Paul Svensson  writes:

> Just a thougth: if it's true that those using hex() and %x are more
> interested in the bit values than the numerical value of the whole
> number, would a format like ~0xff000 be easier to interpret (and
> stop this debate) ?

I like this.

Regards,
Martin


From guido@python.org  Wed Aug 14 20:39:00 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 15:39:00 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Wed, 14 Aug 2002 15:21:57 EDT."
 
References: 
Message-ID: <200208141939.g7EJd0F23995@odiug.zope.com>

> Just a thougth: if it's true that those using hex() and %x are more
> interested in the bit values than the numerical value of the whole number,
> would a format like ~0xff000 be easier to interpret (and stop this debate) ?

Hmm...  It has a perverse Pythonic smell...  But I fear it would
introduce more backwards incompatibilities, because it would have to
apply to longs as well, and hence change the output whenever a
negative long is converted to hex or octal.  (And what about %u?
Should "%u" % -1 return "~0" too?)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From ark@research.att.com  Wed Aug 14 20:43:29 2002
From: ark@research.att.com (Andrew Koenig)
Date: 14 Aug 2002 15:43:29 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208070227.g772R4i07678@oma.cosc.canterbury.ac.nz>
References: <200208070227.g772R4i07678@oma.cosc.canterbury.ac.nz>
Message-ID: 

>> PS: is pure substring testing such a common idiom?
>> I have not found so many
>> matches for   find\(.*\)\s*>  in the std lib

Greg> For more generality, maybe

Greg>   re in string

Greg> should be made to work too, where re is a regular
Greg> expression object?

Then the core language would have to know about regular
expressions, right?

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From Jack.Jansen@oratrix.com  Wed Aug 14 20:52:00 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Wed, 14 Aug 2002 21:52:00 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <200208141213.g7ECD5V00311@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <4C515BCF-AFBF-11D6-8B4E-003065517236@oratrix.com>

On woensdag, augustus 14, 2002, at 02:13 , Guido van Rossum wrote:
> Note that normalization doesn't belong in the codecs (except perhaps
> as a separate Unicode->Unicode codec, since codecs seem to be useful
> for all string->string transformations).  It's a separate step that
> the application has to request; only the app knows whether a
> particular Unicode string is already normalized or not, and whether
> the expense is useful for the app, or not.

I don't like this, I don't like it at all.

Python jumps through hoops to make 'jack' and u'jack' compare=20
identical and be interchangeable in dict keys and what have you,=20
and now suddenly I find out that there's two ways to say u'j=E4ck'=20
and they won't compare equal. Not good.

I sympathise with the fact that this is difficult (although I=20
still don't understand why: whereas when you want to create the=20
decomposed version I can imagine there's N! ways to notate a=20
character with N combining chars, I would think there's one and=20
only one way to write a combined character), but that shouldn't=20
stop us at least planning to fix this.

And I don't think the burden should fall on the application.=20
That same reasoning could have been followed for making ascii=20
and unicode-ascii-subset compare equal: the application will=20
know it has to convert ascii to unicode before comparing.
--
- Jack Jansen               =20
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution --=20
Emma Goldman -



From guido@python.org  Wed Aug 14 21:13:09 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 16:13:09 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Wed, 14 Aug 2002 15:43:29 EDT."
 
References: <200208070227.g772R4i07678@oma.cosc.canterbury.ac.nz>
 
Message-ID: <200208142013.g7EKD9v29275@odiug.zope.com>

> Greg>   re in string
> 
> Greg> should be made to work too, where re is a regular
> Greg> expression object?
> 
> Then the core language would have to know about regular
> expressions, right?

Um, yes.  That kills the idea (unless you want to write this as
"string in re", which almost makes sense :-).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Aug 14 21:18:22 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 16:18:22 -0400
Subject: [Python-Dev] SET_LINENO killer
Message-ID: <200208142018.g7EKIMw29325@odiug.zope.com>

python/sf/587993

Looks like Michael Hudson did an *outstanding* and very thorough job
on this.  Does anybody see a reason why I shouldn't let him check this
in?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From paul@svensson.org  Wed Aug 14 21:30:23 2002
From: paul@svensson.org (Paul Svensson)
Date: Wed, 14 Aug 2002 16:30:23 -0400 (EDT)
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <200208141939.g7EJd0F23995@odiug.zope.com>
Message-ID: 

On Wed, 14 Aug 2002, Guido van Rossum wrote:

>> Just a thougth: if it's true that those using hex() and %x are more
>> interested in the bit values than the numerical value of the whole number,
>> would a format like ~0xff000 be easier to interpret (and stop this debate) ?
>
>Hmm...  It has a perverse Pythonic smell...  But I fear it would
>introduce more backwards incompatibilities, because it would have to
>apply to longs as well, and hence change the output whenever a
>negative long is converted to hex or octal.  (And what about %u?
>Should "%u" % -1 return "~0" too?)

Didn't you say "%u" would be going away ?
You're right about octal, that would be nice to change, too.
Maybe the right time to do the change would be when the L goes away,
since that would be similarly invasive ?

	/Paul



From skip@pobox.com  Wed Aug 14 21:39:01 2002
From: skip@pobox.com (Skip Montanaro)
Date: Wed, 14 Aug 2002 15:39:01 -0500
Subject: [Python-Dev] Alternative implementation of interning
In-Reply-To: <200208141849.g7EInPj21457@odiug.zope.com>
References: <200208141849.g7EInPj21457@odiug.zope.com>
Message-ID: <15706.49125.814428.988008@localhost.localdomain>

    Guido> I think Oren did a good job on this.  Could somebody please do an
    Guido> independent review of the code before I check it in?

Since I haven't actually looked at the patch yet, this doesn't qualify as a
review, but how about renaming PyString_InternInPlace to
PyString_InternImmortal?  My guess is "InPlace" refers to some structural
difference between mortal and immortal interned strings which doesn't give
the programmer any hints about intended usage of either function.

Skip


From ark@research.att.com  Wed Aug 14 21:34:55 2002
From: ark@research.att.com (Andrew Koenig)
Date: Wed, 14 Aug 2002 16:34:55 -0400 (EDT)
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208142013.g7EKD9v29275@odiug.zope.com> (message from Guido
 van Rossum on Wed, 14 Aug 2002 16:13:09 -0400)
References: <200208070227.g772R4i07678@oma.cosc.canterbury.ac.nz>
  <200208142013.g7EKD9v29275@odiug.zope.com>
Message-ID: <200208142034.g7EKYt508950@europa.research.att.com>

Greg> re in string
>> 
Greg> should be made to work too, where re is a regular
Greg> expression object?
>> 
>> Then the core language would have to know about regular
>> expressions, right?

Guido> Um, yes.  That kills the idea (unless you want to write this as
Guido> "string in re", which almost makes sense :-).

Or unless the notion of ``x in y'' could be were reinterpreted
in terms of a new attribute that strings, chars, and regexps
would share.

That is, I can imagine defining ``x in y'' anologously to ``x+y''
as follows:

   If x has an attribute __in__, then ``x in y'' means ``x.__in__(y)''

   Otherwise, if y has an attribute __rin__, then ``x in y'' means
   ``y.__rin__(x)''

and so on.

This is an example of the kind of situation where I imagine type
categories would be useful.




From guido@python.org  Wed Aug 14 22:01:14 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 17:01:14 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Wed, 14 Aug 2002 16:30:23 EDT."
 
References: 
Message-ID: <200208142101.g7EL1Ej32188@odiug.zope.com>

> >> Just a thougth: if it's true that those using hex() and %x are more
> >> interested in the bit values than the numerical value of the whole number,
> >> would a format like ~0xff000 be easier to interpret (and stop this debate) ?
> >
> >Hmm...  It has a perverse Pythonic smell...  But I fear it would
> >introduce more backwards incompatibilities, because it would have to
> >apply to longs as well, and hence change the output whenever a
> >negative long is converted to hex or octal.  (And what about %u?
> >Should "%u" % -1 return "~0" too?)
> 
> Didn't you say "%u" would be going away ?

Yes, but not any time soon.

> You're right about octal, that would be nice to change, too.
> Maybe the right time to do the change would be when the L goes away,
> since that would be similarly invasive ?

I see, you meant this idea for Python 3000, not for 2.3 or even 2.4.
That's fine, but doesn't help for the immediate pain.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com  Wed Aug 14 22:04:49 2002
From: skip@pobox.com (Skip Montanaro)
Date: Wed, 14 Aug 2002 16:04:49 -0500
Subject: [Python-Dev] Alternative implementation of interning
Message-ID: <15706.50673.81267.900261@localhost.localdomain>

A couple minor nits from scanning the patch:

* Probably makes no difference, but it seems oddly asymmetric to fiddle with
  the interned string's refcount in string_dealloc, call PyObject_DelItem,
  then not restore the refcount to zero.

* Should be Py_DECREF(keys) (not Py_XDECREF(keys)) in
  _Py_ReleaseInternedStrings.  If you've gotten that far keys can't be
  NULL.  If you're worried about keys being NULL, you should check it before
  the for loop (PyMapping_Size() will barf on a NULL arg).

Also, regarding the name of PyString_InternInPlace, I see now that's the
original name.  I suggest that name be deprecated in favor of
PyString_InternImmortal with a macro defined in stringobject.h for
compatibility.

Skip



From guido@python.org  Wed Aug 14 22:12:28 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 17:12:28 -0400
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: Your message of "Wed, 14 Aug 2002 16:34:55 EDT."
 <200208142034.g7EKYt508950@europa.research.att.com>
References: <200208070227.g772R4i07678@oma.cosc.canterbury.ac.nz>  <200208142013.g7EKD9v29275@odiug.zope.com>
 <200208142034.g7EKYt508950@europa.research.att.com>
Message-ID: <200208142112.g7ELCSQ32576@odiug.zope.com>

> Or unless the notion of ``x in y'' could be were reinterpreted
> in terms of a new attribute that strings, chars, and regexps
> would share.
> 
> That is, I can imagine defining ``x in y'' anologously to ``x+y''
> as follows:
> 
>    If x has an attribute __in__, then ``x in y'' means ``x.__in__(y)''
> 
>    Otherwise, if y has an attribute __rin__, then ``x in y'' means
>    ``y.__rin__(x)''
> 
> and so on.
> 
> This is an example of the kind of situation where I imagine type
> categories would be useful.

It is already done this way, except the attribute is called
__contains__ and we only ask the right argument for it: "x in y" calls
"y.__contains__(x)" [if it exists; otherwise there's a fallback that
loops over y's items comparing them to x].

I suppose we could add __rcontains__ that was tried next, analogously
to __add__ and __radd__; or maybe it could be called __in__
instead. :-)

Unfortunately that would be a significant change in internal shit.
I'm not convinced that this particular example is worth that
(especially since chars are already taken care of -- they're just
1-char strings).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Aug 14 22:14:04 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 17:14:04 -0400
Subject: [Python-Dev] Alternative implementation of interning
In-Reply-To: Your message of "Wed, 14 Aug 2002 15:39:01 CDT."
 <15706.49125.814428.988008@localhost.localdomain>
References: <200208141849.g7EInPj21457@odiug.zope.com>
 <15706.49125.814428.988008@localhost.localdomain>
Message-ID: <200208142114.g7ELE4x32586@odiug.zope.com>

> Since I haven't actually looked at the patch yet, this doesn't qualify as a
> review, but how about renaming PyString_InternInPlace to
> PyString_InternImmortal?  My guess is "InPlace" refers to some structural
> difference between mortal and immortal interned strings which doesn't give
> the programmer any hints about intended usage of either function.

Better still, I think we could safely make all interned strings mortal
-- I don't see any use for immortal strings.  (I see a use for immoral
strings but that's a topic for over a couple beers. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com  Wed Aug 14 22:26:00 2002
From: skip@pobox.com (Skip Montanaro)
Date: Wed, 14 Aug 2002 16:26:00 -0500
Subject: [Python-Dev] Alternative implementation of interning
In-Reply-To: <200208142114.g7ELE4x32586@odiug.zope.com>
References: <200208141849.g7EInPj21457@odiug.zope.com>
 <15706.49125.814428.988008@localhost.localdomain>
 <200208142114.g7ELE4x32586@odiug.zope.com>
Message-ID: <15706.51944.872639.806768@localhost.localdomain>

    >> ... how about renaming PyString_InternInPlace to
    >> PyString_InternImmortal?

    Guido> Better still, I think we could safely make all interned strings
    Guido> mortal -- I don't see any use for immortal strings.

Wasn't this part of the original discussion?  Extension modules are free to
call PyString_InternInPlace and may well expect immortal strings, so for
backward compatibility, the functionality probably has to remain for a time,
yes?

Of course, I'm speaking with my fake expert hat on.  I've never even
considered interning a string, immortal, immoral, or otherwise.

Skip


From ark@research.att.com  Wed Aug 14 22:22:35 2002
From: ark@research.att.com (Andrew Koenig)
Date: 14 Aug 2002 17:22:35 -0400
Subject: [Python-Dev] type categories
In-Reply-To: <200208140316.g7E3GtT30902@pcp02138704pcs.reston01.va.comcast.net>
References: <200208131802.g7DI2Ro27807@europa.research.att.com>
 <200208132115.g7DLFwL25088@odiug.zope.com>
 <200208132127.g7DLRJO29696@europa.research.att.com>
 <200208140316.g7E3GtT30902@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: 

>> Perhaps the reason it's rare is that it's difficult to do.

Guido> Perhaps...  Is it the chicken or the egg?

Did you hear about the two philosophers in the diner?  One ordered a
chicken-salad sandwich and the other ordered an egg-salad sandwich,
because they wanted to see which would come first.

>> One of the cases I was thinking of was the built-in * operator,
>> which does something completely diferent if one of its operands
>> is an integer.

Guido> Really?  I suppose you're thinking of sequence repetition.

Right.

Guido> I consider that one of my early mistakes (it didn't make it to
Guido> my "regrets" list but probably should have).  It would have
Guido> been much simpler if sequences simply supported multiplcation,
Guido> and in fact repeated changes to the implementation (and subtle
Guido> edge cases of the semantics) are slowly nudging into this
Guido> direction.

It's still a plausible example, I think.

>> Another one was the buffering iterator we were discussing earlier,
>> which ideally would omit buffering entirely if asked to buffer a
>> type that already supports multiple iteration.

Guido> How do you do that in C++?  I guess you overload the function
Guido> that asks for the iterator, and call that function in a
Guido> template.  I think in Python we can ask the caller to provide a
Guido> buffering iterator when a function needs one.  Since we really
Guido> have very little power at compile time, we sometimes need to do
Guido> a little more work at run time.  But the resulting language
Guido> appears to be easier to understand (for most people anyway)
Guido> despite the theoretical deficiency.

I understand that, I think.

The C++ library has a notion of ``iterator traits'' that is implemented
by a template class named, of all thing, iterator_traits.  So, for example,
if T is an iterator type, then iterator_traits::value_type is the
type that dereferencing an object of type T will yield.  To reveal what
operations an iterator supports, iterator_traits::iterator_category
is one of the following five types, depending on the iterator:

        input_iterator_tag
        output_iterator_tag
        forward_iterator_tag
        bidirectional_iterator_tag
        random_access_iterator_tag

Each of the last three of these types is derived from the one before it.
It is possible to instantiate objects of any of these types, but the
objects carry no information beyond their type and identity.

Now, suppose you want to implement an algorithm that requires a
bidirectional iterator, but can be done more efficiently with a
random-access iterator.  Then you might write something like this:

        // The bidirectional-iterator case
        template
        void foo_aux(It begin, It end, bidirectional_iterator_tag) {
                // ...
        }

        // The random-access-iterator case
        template
        void foo_aux(It begin, It end, random_access_iterator_tag) {
                // ...
        }

and then you can select the appropriate algorithm at compile time this way:

        template
        void foo(It begin, It end) {
                foo_aux(begin, end,
                   typename iterator_traits::iterator_category());
        }

This code creates an extra object (the anonymous object created by the
expression ``typename iterator_traits::iterator_category()'' for the
sole purpose of using its type to distinguish between the two overloaded
versions of foo_aux.  This distinction is made at compile time, and if
the compiler is smart enough, it will also optimize away the empty,
anonymous object.

So this is an example of what I mean by ``dispatching based on a type
category.''  In C++ it's done at compile time, but what I care about
in the context of Python is not when it is done, but rather how
convenient it is to express.  (I don't think the C++ mode of
expression is particularly convenient, but at least it's possible)

Guido> I'm not quite sure why that is, but I am slowly developing a
Guido> theory, based on a remark by Samuele Pedroni; at least I
Guido> believe it was he who remarked at some point "Python has only
Guido> run time", ehich got me thinking.  My theory, partially
Guido> developed though it is, is that it is much harder (again, for
Guido> most people :-) to understand in your head what happens at
Guido> compile time than it is to understand what goes at run time.
Guido> Or perhaps that understanding *both* is harder than
Guido> understanding only one.

I have no problem believing that.

Guido> But I believe that for most people acquiring a sufficient
Guido> mental model for what goes on at run time is simpler than the
Guido> mental model for what goes on at compile time.  Possibly this
Guido> is because compilers really *do* rely on very sophisticated
Guido> algorithms (such as deciding which overloading function is
Guido> called based upon type information and available conversions).
Guido> Run time on the other hand is dead simple most of the time --
Guido> it has to be, since it has to be executed by a machine that has
Guido> a very limited time to make its decisions.

That's OK with me.  But I'd still like a less ad-hoc way of making
those run-time tests.

Guido> All this reminds me of a remark that I believe is due to John
Guido> Ousterhout at the VHLL conference in '94 in Santa Fe, where you & I
Guido> first met.  (Strangely it was Perl's Tom Christiansen who was in a
Guido> large part responsible for the eclectic program.)  You gave a talk
Guido> about ML, and I believe it was in response to your talk that John
Guido> remarked that ML was best suited for people with an IQ of over 150.

I'm still not convinced that's necessarily true -- I think it depends
a great deal on how ML is taught.  I do believe that most of what has
been written about ML is hard to follow for people who have grown up
in the imperative world, but I don't think it has to be that way.

Guido> That rang true to me, since the only other person besides you
Guido> that I know who is a serious ML user definitely falls into that
Guido> category.

Thanks for the compliment!

Guido> And ML is definitely a language that does more than the average
Guido> language at compile time.

Yes.  One of the reasons I find it interesting, incidentally, is that
it still manages to generate surprisingly efficient machine code.

>> Actually, I thought of them but omitted them to avoid confusion
>> between a type and a category with a single element.

Guido> Can you explain?  Neither string (which has Unicode and 8-bit,
Guido> plus a few other objects that are sufficiently string-like to
Guido> be regex-searchable, like arrays) nor file (at least in the
Guido> "lore protocol" interpretation of file-like object) are
Guido> categories with a single element.

Fair enough.  I just didn't have examples at my fingertips and thought
at first that using those exmaples would confuse matters.  I don't
mind asking them.

Guido> I believe that the notion of an informal or "lore" (as Jim
Guido> Fulton likes to call it) protocol first became apparent when we
Guido> started to use the idea of a "file-like object" as a valid
Guido> value for sys.stdout.

>> OK.  So what I'm asking about is a way of making notions such as
>> "file-like object" more formal and/or automatic.

Guido> Yeah, that's the holy Grail of interfaces in Python.

Cool!  (I care much less about type checking because, as I mentioned
in another message, there are uncheckable things such as being an
order relation that I would like to use for dispatching anyway).

>> Of course, one reason for my interest is my experience with a
>> language that supports compile-time overloading -- what I'm really
>> seeing on the horizon is some kind of notion of overloading in
>> Python, perhaps along the lines of ML's clausal function
>> definitions (which I think are truly elegant).

Guido> Honestly, I hadn't read this far ahead when I brought up ML
Guido> above. :-)

:-)

Guido> I really hope that the holy grail can be found at run time
Guido> rather than compile time.  Python's compile time doesn't have
Guido> enough information easily available, and to gather the
Guido> necessary information is very expensive (requiring
Guido> whole-program analysis) and not 100% reliable (due to Python's
Guido> extreme dynamic side).

I have no problem with that.  So here's a simple example of ML's clausal
functions:

        fun len([]) = 0
          | len(h::t) = len(t) + 1

Here, [] is an empty list, and h::t is ML's way of spelling cons(h,t).
The two clauses (one per line) are checked in order *at run time* --
we're dispatching on the *value*, not the type of the argument.
if you like, this example is equivalent to the following:

        fun len(x) = if x = [] then 0 else len(tl(x))+1

(well, not really, but only an ML expert will see why, and it's not germane)

In the Python domain, I imagine something like this:

        def f(arg: Category1):
                ....
        or  f(arg: Category2):
                ....
        or  f(arg: Category3):

I would like the implementation to try each version of f until it
finds one that passes the constraints, and then executes that one.  If
none of them fits the bill, then it should throw an exception.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


PS: Please forgive the erratic replies -- apparently our mail gateway
decided to hang onto a bunch of messages for a day or so...


From ark@research.att.com  Wed Aug 14 22:23:54 2002
From: ark@research.att.com (Andrew Koenig)
Date: Wed, 14 Aug 2002 17:23:54 -0400 (EDT)
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208142112.g7ELCSQ32576@odiug.zope.com> (message from Guido
 van Rossum on Wed, 14 Aug 2002 17:12:28 -0400)
References: <200208070227.g772R4i07678@oma.cosc.canterbury.ac.nz>  <200208142013.g7EKD9v29275@odiug.zope.com>
 <200208142034.g7EKYt508950@europa.research.att.com> <200208142112.g7ELCSQ32576@odiug.zope.com>
Message-ID: <200208142123.g7ELNsN10274@europa.research.att.com>

Guido> It is already done this way, except the attribute is called
Guido> __contains__ and we only ask the right argument for it: "x in y" calls
Guido> "y.__contains__(x)" [if it exists; otherwise there's a fallback that
Guido> loops over y's items comparing them to x].

Ah, that's why you said that it could be done backwards.


From martin@v.loewis.de  Wed Aug 14 22:25:46 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 14 Aug 2002 23:25:46 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <4C515BCF-AFBF-11D6-8B4E-003065517236@oratrix.com>
References: <4C515BCF-AFBF-11D6-8B4E-003065517236@oratrix.com>
Message-ID: 

Jack Jansen  writes:

> I sympathise with the fact that this is difficult (although I still
> don't understand why

Feel free to contribute. Answering all your questions already took
considerable time (to answer the previous one, I did an hour of online
research, just because I had never looked into normalization in that
level of detail - to find out you need a print copy of the Unicode
standard, which I have only at the university library).

Regards,
Martin


From oren-py-d@hishome.net  Wed Aug 14 22:28:25 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Thu, 15 Aug 2002 00:28:25 +0300
Subject: [Python-Dev] type categories
In-Reply-To: <200208141309.g7ED9Jb01045@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Wed, Aug 14, 2002 at 09:09:19AM -0400
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov> <20020814101819.GA93585@hishome.net> <200208141309.g7ED9Jb01045@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020815002825.A2241@hishome.net>

On Wed, Aug 14, 2002 at 09:09:19AM -0400, Guido van Rossum wrote:
...
> Now I think you've lost me.  How can a category on the one hand be
... 
> Again you've lost me.  I expect there's something here that you assume
...

Oh dear. Here we go again. I'm afraid that it may take several frustrating 
iterations just to get our terminology and assumptions in sync and be able 
to start talking about the actual issues.

> > Type categories are fundamentally different from interfaces.  An 
> > interface must be declared by the type while a category can be an 
> > observation about an existing type. 
> 
> Yup.  (In Python these have often been called "protocols".  Jim Fulton
> calls them "lore protocols".)

Nope. For me protocols are conventions to follow for performing a certain 
task.  A type category is a formally defined set of types.  

For example, the 'iterable' protocol defines conventions for a programmer
to follow for doing iteration.  The 'iterable' category is a set defined
by the membership predicate "hasattr(t, '__iter__')".  The types in the
'iterable' category presumably conform to the 'iterable' protocol so there 
is a mapping between protocols and type categories but it's not quite 1:1.

Protocols live in documentation and lore. Type categories live in the same 
place where vector spaces and other formal systems live.

> > Two types that are defined independently in different libraries may
> > in fact fit under the same category because they implement the same
> > protocol.  With named interfaces they may in fact be compatible but
> > they will not expose the same explicit interface. Requiring them to
> > import the interface from a common source starts to sound more like
> > Java than Python and would introduce dependencies and interface
> > version issues in a language that is wonderfully free from such
> > arbitrary complexities.
> 
> Hm, I'm not sure if you can solve the version incompatibility problem
> by ignoring it. :-)

Oops, I meant interface version *numbers*, not interface versions. A
version number is a unidimentional entity. Variations on protocols and
subprotocols have many dimensions. I find that set theory ("an object that 
has a method called foo and another method called bar") works better than
arithmetic ("an object with version number 2.13 of interface voom").

> Are you familiar with Zope's Interface package?  It solves this
> problem (nicely, IMO) by allowing you to place an interface
> declaration inside a class but also allowing you to make calls to an
> interface registry that declare interfaces for pre-existing classes.

I don't like the bureacracy of declaring interfaces and maintaining 
registeries. I like the ad-hoc nature of Python protocols and I want a 
type system that gives me the tools to use it better, not replace it with 
something more formal.

> > A category is defined mathematically by a membership predicate. So
> > what we need for type categories is a system for writing predicates
> > about types.
> 
> Now I think you've lost me.  How can a category on the one hand be
> observed after the fact and on the other hand defined by a rigorous
> mathematical definition?  How could a program tell by looking at a
> class whether it really is an implementation of a given protocol?

A category is defined mathematically. A protocol is a somewhat more fuzzy
meatspace concept.  A protocol can be associated with a category with
reasonable accuracy so the result of a set operation on categories is
reasonably applicable to the associated protocols. 

Even a human can't always tell whether a class is *really* an implmentation 
of a given protocol. But many protocols can be inferred with pretty good 
accuracy from the presence of methods or members. You can always add a 
member as a flag indicating compliance with a certain protocol if that is
not enough.

My basic assumption is that programmers are fundamentally lazy. It hasn't
ever failed me so far.

This way there is no need to declare all the protocols a class conforms to.
This is important since in many cases the protocol is only "discovered" 
later.  The user of the class knows what protocol is expected and only 
needs to declare that.  It should reduces the tendency to use relatively 
coarse-grained "fat" interfaces because there is not need to declare every 
minor protocol the type conforms to - it may observed by users of this 
type using a type category.

> > Standard Python expressions should not be used for defining a
> > category membership predicate. A Python expression is not a pure
> > function. This makes it impossible to cache the results of which
> > type belongs to what category for efficiency. Another problem is
> > that many different expressions may be equivalent but if two
> > independently defined categories use equivalent predicates they
> > should *be* the same category.  They should be merged at runtime
> > just like interned strings.
> 
> Again you've lost me.  I expect there's something here that you assume
> well-known.  Can you please clarify this?  What on earth do you mean
> by "A Python expression is not a pure function" ?

A function whose result depends only on its inputs and has no side effects.
In this case I would add "and can be evaluated without triggering any 
Python code". Set operations on membership predicates, caching and other
optimizations need such guarantees.


	Oren



From faassen@vet.uu.nl  Wed Aug 14 22:51:23 2002
From: faassen@vet.uu.nl (Martijn Faassen)
Date: Wed, 14 Aug 2002 23:51:23 +0200
Subject: [Python-Dev] type categories
In-Reply-To: 
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov> <20020814101819.GA93585@hishome.net> 
Message-ID: <20020814215122.GA31835@vet.uu.nl>

Andrew Koenig wrote:
> >> The category names look like general purpose interface names. The
> >> addition of interfaces has been discussed quite a bit. While many
> >> people are interested in having interfaces added to Python, there
> >> are many design issues that will have to be resolved before it
> >> happens.
> 
> Oren> Nope. Type categories are fundamentally different from
> Oren> interfaces.  An interface must be declared by the type while a
> Oren> category can be an observation about an existing type.
> 
> Why?  That is, why can't you imagine making a claim that type
> X meets interface Y, even though the author of neither X nor Y
> made that claim?

That's entirely possible, and as Guido mentioned earlier in the thread,
the Zope 3 interface package allows that. I think that still currently
doesn't work with built-in types yet, but that's an implementation detail,
not a fundamental problem.

(it's in Interface.Implements, the implements() function)
 
> However, now that you bring it up... One difference I see between
> interfaces and categories is that I can imagine categories carrying
> semantic information to the human reader of the code that is not
> actually expressed in the category itself.  As a simple example,
> I can imagine a PartialOrdering category that I might like as part
> of the specification for an argument to a sort function.

But isn't that exactly what interfaces are? Of course you may not want
to make all interfaces explicit as it is too much programming overhead;
that's in part what's nice about a dynamically typed language. However,
an interface does carry semantic information to the human reader of the
code that is not actually expressed in the category itself. By making
interfaces explicit the human reader can also write code that introspects
interface information.

Or do you mean sometimes it is not useful to make interfaces explicit
at all, as you're never going to introspect on them anyway? I'd say
they may still be useful as documentation, in which case they seem to work
like your 'category'. Or of course you can not specify them at all.

Regards,

Martijn



From guido@python.org  Wed Aug 14 22:54:35 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 17:54:35 -0400
Subject: [Python-Dev] Alternative implementation of interning
In-Reply-To: Your message of "Wed, 14 Aug 2002 16:26:00 CDT."
 <15706.51944.872639.806768@localhost.localdomain>
References: <200208141849.g7EInPj21457@odiug.zope.com> <15706.49125.814428.988008@localhost.localdomain> <200208142114.g7ELE4x32586@odiug.zope.com>
 <15706.51944.872639.806768@localhost.localdomain>
Message-ID: <200208142154.g7ELsZZ00310@odiug.zope.com>

>     Guido> Better still, I think we could safely make all interned strings
>     Guido> mortal -- I don't see any use for immortal strings.
> 
> Wasn't this part of the original discussion?  Extension modules are free to
> call PyString_InternInPlace and may well expect immortal strings, so for
> backward compatibility, the functionality probably has to remain for a time,
> yes?

In core Python, there are two common usage patterns for interning.

The most common pattern uses PyString_InternFromString() to intern
some string constant that will be used as a frequent key
(e.g. "__class__") and then stores the resulting object in a static
variable.  Those strings are immortal because the static variable has
a reference that is never released.

The other common pattern uses PyString_InternInPlace() to intern a
string object (usually a function argument) that's being used as a
dictionary key or attribute name, in the hope that the dict lookup
will be faster.  In these cases, the dict will keep the interned
string alive as long as it makes sense, and when it's no longer a key
in the dict, there's no point in having the interned object around.
(It's also fairly pointless since PyObject_SetAttr() already does
this; even that seems questionable and should probably be done by the
setattro handler of individual object types.)

Making such keys mortal might cause some churning, if non-existing
keys are frequently constructed (say, from user data) and then thrown
away -- each time the key is thrown away it is removed from the
interned dict now, and each time it is recreated and used as a key, it
is interned again -- to no avail.  But I think that's pretty rare (a
non-existing key) and it certainly isn't going to cause any breakage.

I expect that the usage patterns in 3rd party extensions are pretty
much the same.

Tim once posted a theoretical example that depended on interned
strings staying alive while no user object references a particular
string object.  But that was highly theoretical.

> OF course, I'm speaking with my fake expert hat on.  I've never even
> considered interning a string, immortal, immoral, or otherwise.

:-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From ark@research.att.com  Wed Aug 14 22:55:47 2002
From: ark@research.att.com (Andrew Koenig)
Date: Wed, 14 Aug 2002 17:55:47 -0400 (EDT)
Subject: [Python-Dev] type categories
In-Reply-To: <20020814215122.GA31835@vet.uu.nl> (message from Martijn Faassen
 on Wed, 14 Aug 2002 23:51:23 +0200)
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov> <20020814101819.GA93585@hishome.net>  <20020814215122.GA31835@vet.uu.nl>
Message-ID: <200208142155.g7ELtll10574@europa.research.att.com>

>> However, now that you bring it up... One difference I see between
>> interfaces and categories is that I can imagine categories carrying
>> semantic information to the human reader of the code that is not
>> actually expressed in the category itself.  As a simple example,
>> I can imagine a PartialOrdering category that I might like as part
>> of the specification for an argument to a sort function.

Martijn> But isn't that exactly what interfaces are?

Not really.  I can see how an interface can claim that a particular
method exists, but not how it can claim that the method implements a
function that is antisymmetric and transitive.


From aahz@pythoncraft.com  Wed Aug 14 23:11:16 2002
From: aahz@pythoncraft.com (Aahz)
Date: Wed, 14 Aug 2002 18:11:16 -0400
Subject: [snake-farm] RE: [Python-Dev] strange warnings from tempfile.mkstemped.__del__ on HP
In-Reply-To: <200208141314.g7EDELl01095@pcp02138704pcs.reston01.va.comcast.net>
References: <200208140331.g7E3Veh30980@pcp02138704pcs.reston01.va.comcast.net>  <20020814115303.GB2054@i92.ryd.student.liu.se> <20020814122749.GD2054@i92.ryd.student.liu.se> <200208141314.g7EDELl01095@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020814221116.GA5763@panix.com>

On Wed, Aug 14, 2002, Guido van Rossum wrote:
>Kalle:
>> [Kalle, on the HP-UX snake farm build]
>>>
>>> AssertionError: _mkstemp_inner raised exceptions.OSError: [Errno 24]
>>> Too many open files: '/tmp/aaU3irrA'
>>> 
>>> Hmm, I wonder how many that is, and how to change it.  I'll look
>>> around.
>> 
>> I've raised maxfiles from 200 to 2048, and the test now runs without
>> error.
> 
> Thanks!  Maybe the test was a little too eager though -- perhaps it
> could be happy with creating 100 instead of 1000 files.

Hrm.  Aren't there OSes with a default limit of 63 open files per
process?  I'm pretty sure there are some with 127 (signed char).
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Project Vote Smart: http://www.vote-smart.org/


From faassen@vet.uu.nl  Wed Aug 14 23:12:51 2002
From: faassen@vet.uu.nl (Martijn Faassen)
Date: Thu, 15 Aug 2002 00:12:51 +0200
Subject: [Python-Dev] type categories
In-Reply-To: <200208140316.g7E3GtT30902@pcp02138704pcs.reston01.va.comcast.net>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208132115.g7DLFwL25088@odiug.zope.com> <200208132127.g7DLRJO29696@europa.research.att.com> <200208140316.g7E3GtT30902@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020814221251.GB31835@vet.uu.nl>

Guido van Rossum wrote:
> > Guido> The exception is when you need to do something different based
> > Guido> on the type of an object and you can't add a method for what
> > Guido> you want to do.  But that is relatively rare.
> > 
> > Perhaps the reason it's rare is that it's difficult to do.
> 
> Perhaps...  Is it the chicken or the egg?

Once you've defined interfaces you do end up using them this way,
I've found in my own experiences. It can be more clear than the
alternative if you have a set of objects in some collection that
fall apart in a number of kinds -- 'content' versus 'container'
type things in Zope for instance. It's nice to be able to say
'is this a container' without having to think about implementation 
inheritance hierarchies or trying to call a method that should exist on
a container and not on a content object.

And of course Zope3 uses interfaces in more advanced ways to associate
objects together automatically -- a view for a content object is looked up
automatically by interface, and you can automatically hook adapters that
translate one interface to another together by looking them up in
an interface registry as well.

[snip]
> BTW A the original scarecrow proposal is at 
> http://www.foretec.com/python/workshops/1998-11/dd-fulton-sum.html

I recall looking at that for the first time and not understanding too 
much about the reasoning behind it, but by now I have some decent experience
with the descendant of that behind me (the interface package in Zope),
and it's quite nice. Many people seem to react to interfaces by 
associating them with static types and then rejecting the notion, but Python
interface checking is just as run-time as anything else.

By the way, the Twisted people are starting to use interfaces in their 
package; a home grown very simple implementation at first but they are 
trying to stay compatible with the Zope ones and are looking into adopting
the Zope interface package proper. When I first discussed interfaces
with some Twisted developers a year ago or so their thinking seemed
quite negative, but they seem to be changing their minds, at least slowly.
That's a good sign for interfaces, and I imagine it will happen with
more people.

Interfaces in Python are almost too trivial to understand, but surprisingly
useful. I imagine this is why so many smart Python users don't get it;
they either reject the notion because it seems too trivial and 'therefore
useless', or because they think it must involve far more complication
(static typing) and therefore it's too complicated and not in the spirit
of Python. :)

Regards,

Martijn



From martin@v.loewis.de  Wed Aug 14 23:42:46 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 15 Aug 2002 00:42:46 +0200
Subject: [Python-Dev] type categories
In-Reply-To: <20020815002825.A2241@hishome.net>
References: <200208131802.g7DI2Ro27807@europa.research.att.com>
 <200208131545.29856.mclay@nist.gov>
 <20020814101819.GA93585@hishome.net>
 <200208141309.g7ED9Jb01045@pcp02138704pcs.reston01.va.comcast.net>
 <20020815002825.A2241@hishome.net>
Message-ID: 

Oren Tirosh  writes:

> Nope. For me protocols are conventions to follow for performing a certain 
> task.  A type category is a formally defined set of types.  

ODP (Reference Model For Open Distributed Processing, ISO 10746)
defines that a type is a predicate; it implies a set (of which it is
the characteristic function).

By your definition, a type category, as a formally-defined means to
determine whether something belongs to the category, is a predicate,
and thus still a type.

> For example, the 'iterable' protocol defines conventions for a programmer
> to follow for doing iteration.  The 'iterable' category is a set defined
> by the membership predicate "hasattr(t, '__iter__')".  

It is not so clear that this is what defines the iterable category. It
could also be defined as "the programmer can use to for doing
iteration, by means of the iterable protocol".

> Protocols live in documentation and lore. Type categories live in the same 
> place where vector spaces and other formal systems live.

By that definition, I'd say that Andrew's list enumerates protocols,
not type categories: they all live in lore, not in a formalism.

> A category is defined mathematically. A protocol is a somewhat more fuzzy
> meatspace concept.  

A protocol can certainly be formalized, if there is need. Of all the
possible interaction sequences, you define those that follow the
protocol. Then, an object that follows the protocol in all interaction
sequences in which it participates is said to implement the protocol.

> > Again you've lost me.  I expect there's something here that you assume
> > well-known.  Can you please clarify this?  What on earth do you mean
> > by "A Python expression is not a pure function" ?
> 
> A function whose result depends only on its inputs and has no side effects.
> In this case I would add "and can be evaluated without triggering any 
> Python code".

For being a pure function, requiring that it does not trigger Python
code seems a bit too restrictive.

In any case, I think it is incorrect to say that a Python expression
is not a function. Instead, it is correct to say that it is not
necessarily a function. There are certainly expressions that are
functions.

Regards,
Martin



From martin@v.loewis.de  Wed Aug 14 23:45:43 2002
From: martin@v.loewis.de (Martin v. Loewis)
Date: 15 Aug 2002 00:45:43 +0200
Subject: [Python-Dev] type categories
In-Reply-To: <200208142155.g7ELtll10574@europa.research.att.com>
References: <200208131802.g7DI2Ro27807@europa.research.att.com>
 <200208131545.29856.mclay@nist.gov>
 <20020814101819.GA93585@hishome.net>
 
 <20020814215122.GA31835@vet.uu.nl>
 <200208142155.g7ELtll10574@europa.research.att.com>
Message-ID: 

Andrew Koenig  writes:

> Martijn> But isn't that exactly what interfaces are?
> 
> Not really.  I can see how an interface can claim that a particular
> method exists, but not how it can claim that the method implements a
> function that is antisymmetric and transitive.

An interface can certainly claim such things, in its documentation -
and indeed, the documentation of interfaces typically associates
certain semantics with the objects implementing the interface (and in
some cases, even semantics for objects using the interface).

Of course, there is typically no way to automatically *validate* such
claims; you can only validate conformance to signatures. It turns out
that, in Python, you cannot even do that.

Regards,
Martin



From greg@cosc.canterbury.ac.nz  Thu Aug 15 00:06:52 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 15 Aug 2002 11:06:52 +1200 (NZST)
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <20020814102523.A20855@hishome.net>
Message-ID: <200208142306.g7EN6qU26268@oma.cosc.canterbury.ac.nz>

Oren Tirosh :

> In Python up to 2.2 it's inconsistent between ints and longs:
> >>> hex(-16711681)
> '0xff00ffff'
> >>> hex(-16711681L)
> '-0xff0001L'		# ??!?!?

The more I think about it, the more I like the suggestion
that was made of representing this as 

  1x00ffff

which both makes the bit pattern apparent and unambiguously
indicates the sign, all without any assumptions about length.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Thu Aug 15 00:51:43 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 15 Aug 2002 11:51:43 +1200 (NZST)
Subject: [Python-Dev] type categories
In-Reply-To: 
Message-ID: <200208142351.g7ENphE26483@oma.cosc.canterbury.ac.nz>

Andrew Koenig :

> In the Python domain, I imagine something like this:
> 
>         def f(arg: Category1):
>                 ....
>         or  f(arg: Category2):
>                 ....
>         or  f(arg: Category3):
> 
> I would like the implementation to try each version of f until it
> finds one that passes the constraints

Would all the versions of f have to be written together
like that? I think when most people talk of multiple
dispatch they have something more flexible in mind.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Thu Aug 15 01:01:19 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 15 Aug 2002 12:01:19 +1200 (NZST)
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208142013.g7EKD9v29275@odiug.zope.com>
Message-ID: <200208150001.g7F01JJ26495@oma.cosc.canterbury.ac.nz>

> > Greg>   re in string
> > 
> > Greg> should be made to work too, where re is a regular
> > Greg> expression object?
> > 
> > Then the core language would have to know about regular
> > expressions, right?
> 
> Um, yes.  That kills the idea (unless you want to write this as
> "string in re", which almost makes sense :-).

Maybe there should be an __in__ method that gets called
on the left operand if the __contains__ of the right
operand doesn't know what to do?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From ark@research.att.com  Thu Aug 15 01:50:02 2002
From: ark@research.att.com (Andrew Koenig)
Date: 14 Aug 2002 20:50:02 -0400
Subject: [Python-Dev] type categories
In-Reply-To: <200208142351.g7ENphE26483@oma.cosc.canterbury.ac.nz>
References: <200208142351.g7ENphE26483@oma.cosc.canterbury.ac.nz>
Message-ID: 

Greg> Would all the versions of f have to be written together like
Greg> that?

I'm not sure.  In ML, they do, but in ML, the tests are on
values, not types (ML has neither inheritance nor overloading).
Obviously, it would be nice not to have to write the versions
of f together, but I haven't thought about how such a feature
would be defined or implemented.

Greg> I think when most people talk of multiple
Greg> dispatch they have something more flexible in mind.

Probably true.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From guido@python.org  Thu Aug 15 02:07:37 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 21:07:37 -0400
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: Your message of "Thu, 15 Aug 2002 11:06:52 +1200."
 <200208142306.g7EN6qU26268@oma.cosc.canterbury.ac.nz>
References: <200208142306.g7EN6qU26268@oma.cosc.canterbury.ac.nz>
Message-ID: <200208150107.g7F17bO02260@pcp02138704pcs.reston01.va.comcast.net>

> > In Python up to 2.2 it's inconsistent between ints and longs:
> > >>> hex(-16711681)
> > '0xff00ffff'
> > >>> hex(-16711681L)
> > '-0xff0001L'		# ??!?!?
> 
> The more I think about it, the more I like the suggestion
> that was made of representing this as 
> 
>   1x00ffff
> 
> which both makes the bit pattern apparent and unambiguously
> indicates the sign, all without any assumptions about length.

That won't help with %o, %u or %x.

I don't expect there will be much of a need to write negative hex
constants in practice: people only end up creating negative numbers
using hex constants because the want to represent 32-bit bit patterns
in a signed 32-bit int.  In Python 2.4, the recommended way will be to
write 0xffffffff and not worry about the fact that it's a positive
long; extensions that take bit masks will be fixed by then to deal
with this just fine (probably through the 'k' format code in
PyArg_Parse*).

The issue of printing negative hex constants is more a theoretical
issue: hex(-1) has to return *something*, and 0xffffffff simply isn't
acceptable.  I'd like it to return something that evaluates back to -1
when used in a Python expression, so "-0x1" and "~0x0" are still the
best candidates.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Thu Aug 15 02:15:32 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 21:15:32 -0400
Subject: [snake-farm] RE: [Python-Dev] strange warnings from tempfile.mkstemped.__del__ on HP
In-Reply-To: Your message of "Wed, 14 Aug 2002 18:11:16 EDT."
 <20020814221116.GA5763@panix.com>
References: <200208140331.g7E3Veh30980@pcp02138704pcs.reston01.va.comcast.net>  <20020814115303.GB2054@i92.ryd.student.liu.se> <20020814122749.GD2054@i92.ryd.student.liu.se> <200208141314.g7EDELl01095@pcp02138704pcs.reston01.va.comcast.net>
 <20020814221116.GA5763@panix.com>
Message-ID: <200208150115.g7F1FW602313@pcp02138704pcs.reston01.va.comcast.net>

> Hrm.  Aren't there OSes with a default limit of 63 open files per
> process?  I'm pretty sure there are some with 127 (signed char).

We'll deal with those as we encounter them.  All the mainstream
platforms go much beyond that (just as we have left the 640 KB limit
behind us :-).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Thu Aug 15 02:34:03 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 21:34:03 -0400
Subject: [Python-Dev] type categories
In-Reply-To: Your message of "Thu, 15 Aug 2002 00:12:51 +0200."
 <20020814221251.GB31835@vet.uu.nl>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208132115.g7DLFwL25088@odiug.zope.com> <200208132127.g7DLRJO29696@europa.research.att.com> <200208140316.g7E3GtT30902@pcp02138704pcs.reston01.va.comcast.net>
 <20020814221251.GB31835@vet.uu.nl>
Message-ID: <200208150134.g7F1Y3102354@pcp02138704pcs.reston01.va.comcast.net>

> Interfaces in Python are almost too trivial to understand, but
> surprisingly useful. I imagine this is why so many smart Python
> users don't get it; they either reject the notion because it seems
> too trivial and 'therefore useless', or because they think it must
> involve far more complication (static typing) and therefore it's too
> complicated and not in the spirit of Python. :)

No, I think it's because they only work well if they are used
pervasively (not necessarily everywhere).  That's why they work in
Zope: not only does almost everything in Zope have an interface, but
interfaces are used to implement many Zope features.

I haven't made up my mind yet whether Python could benefit as much as
Zope, but I am cautiosuly looking into adding something derived from
Zope's interface package.  Jim & I have rather different ideas on what
the ideal interfaces API should look like though, so it'll be a while.
Maybe I should pull down the Twisted interfaces package and see how I
like their subset (I'm sure it must be a subset -- the Zope package is
a true kitchen sink :-).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Thu Aug 15 02:35:37 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 21:35:37 -0400
Subject: [Python-Dev] type categories
In-Reply-To: Your message of "Wed, 14 Aug 2002 17:55:47 EDT."
 <200208142155.g7ELtll10574@europa.research.att.com>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov> <20020814101819.GA93585@hishome.net>  <20020814215122.GA31835@vet.uu.nl>
 <200208142155.g7ELtll10574@europa.research.att.com>
Message-ID: <200208150135.g7F1ZbE02366@pcp02138704pcs.reston01.va.comcast.net>

> Not really.  I can see how an interface can claim that a particular
> method exists, but not how it can claim that the method implements a
> function that is antisymmetric and transitive.

That's done in the docs, usually.  Zope even has the notion of a
"marker" interface -- an interface that says "this object has property
such-and-such" but which does not assert any methods or attributes.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From ark@research.att.com  Thu Aug 15 02:38:18 2002
From: ark@research.att.com (Andrew Koenig)
Date: Wed, 14 Aug 2002 21:38:18 -0400 (EDT)
Subject: [Python-Dev] type categories
In-Reply-To: <200208150135.g7F1ZbE02366@pcp02138704pcs.reston01.va.comcast.net>
 (message from Guido van Rossum on Wed, 14 Aug 2002 21:35:37 -0400)
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov> <20020814101819.GA93585@hishome.net>  <20020814215122.GA31835@vet.uu.nl>
 <200208142155.g7ELtll10574@europa.research.att.com> <200208150135.g7F1ZbE02366@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200208150138.g7F1cIC12290@europa.research.att.com>

>> Not really.  I can see how an interface can claim that a particular
>> method exists, but not how it can claim that the method implements a
>> function that is antisymmetric and transitive.

Guido> That's done in the docs, usually.  Zope even has the notion of a
Guido> "marker" interface -- an interface that says "this object has property
Guido> such-and-such" but which does not assert any methods or attributes.

So perhaps what I mean by a category is the set of all types that
implement a particular marker interface.




From tim.one@comcast.net  Thu Aug 15 02:50:02 2002
From: tim.one@comcast.net (Tim Peters)
Date: Wed, 14 Aug 2002 21:50:02 -0400
Subject: [Python-Dev] FW: multimethod-0.1
Message-ID: 

I haven't studied this, but from a quick glance it looks competent.

-----Original Message-----
From: python-list-admin@python.org
    On Behalf Of Aric Coady 
Sent: Wednesday, August 14, 2002 8:59 PM
To: python-announce@python.org
Cc: python-list@python.org
Subject: ANN: multimethod-0.1


Multimethod-0.1 is another python module for implementing multimethods 
(a.k.a.  generic functions, multiple-argument method dispatch).  This 
one features:

- support for Python2.2 type/class unification
- a precedence graph for more efficient dispatching
- a best-fit resolution algorithm, in which the method closest in 
inheritance distance is called
- a versatile 'call-next-method' or 'super' function.

Available at http://bent-arrow.com/python and the Vaults of Parnassus.

-Coady


-- 
http://mail.python.org/mailman/listinfo/python-list


From guido@python.org  Thu Aug 15 03:24:07 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 22:24:07 -0400
Subject: [Python-Dev] type categories
In-Reply-To: Your message of "Wed, 14 Aug 2002 20:50:02 EDT."
 
References: <200208142351.g7ENphE26483@oma.cosc.canterbury.ac.nz>
 
Message-ID: <200208150224.g7F2O7C02510@pcp02138704pcs.reston01.va.comcast.net>

> Greg> Would all the versions of f have to be written together like
> Greg> that?
> 
> I'm not sure.  In ML, they do, but in ML, the tests are on
> values, not types (ML has neither inheritance nor overloading).
> Obviously, it would be nice not to have to write the versions
> of f together, but I haven't thought about how such a feature
> would be defined or implemented.
> 
> Greg> I think when most people talk of multiple
> Greg> dispatch they have something more flexible in mind.
> 
> Probably true.

I can see how it could be done using some additional syntax similar to
what ML uses, e.g.:

  def f(a: Cat1):
      ...code for Cat1...
  else f(a: Cat2):
      ...code for Cat2...
  else f(a: Cat3):
      ...code for Cat3...

Don't take this syntax too seriously!  I just mean that there is a
single statement that provides the different alternative versions.

Another approach would be more in the spirit of properties in 2.2:

  def f1(a: Cat1):
    ...code for Cat1...

  def f2(a: Cat2):
    ...code for Cat2...

  def f3(a: Cat3):
    ...code for Cat3...

  f = multimethod(f1, f2, f3)

(There could be a way to spell this without having the type
declaration syntax in the argument list, and do it in the
multimethod() call instead, e.g. with keyword arguments or passing a
list of tuples: [(f1, Cat1), (f2, Cat2), ...].  I suppose this could
be extended to more arguments as well.)

It might also be possible to modify a multimethod dynamically,
e.g. later one could write:

  def f4(a: Cat4):
    ...code for Cat4...

  f.add(f4)

This is more in the spirit of Python than your original proposal,
which appeared like the compiler would have to gather all the
definitions from different places and fuse them.  That would be too
complex for Python's simple-minded compiler!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg@cosc.canterbury.ac.nz  Thu Aug 15 03:39:42 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 15 Aug 2002 14:39:42 +1200 (NZST)
Subject: [Python-Dev] type categories
In-Reply-To: <20020815002825.A2241@hishome.net>
Message-ID: <200208150239.g7F2dgm27093@oma.cosc.canterbury.ac.nz>

> I don't like the bureacracy of declaring interfaces and maintaining 
> registeries. I like the ad-hoc nature of Python protocols and I want a 
> type system that gives me the tools to use it better, not replace it with 
> something more formal.

But you seem to want *something* that's more formal.
How formal exactly do you have in mind? How does it
differ from what Zope does? (Which I know nothing
about, by the way...)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Thu Aug 15 03:42:16 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 15 Aug 2002 14:42:16 +1200 (NZST)
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <4C515BCF-AFBF-11D6-8B4E-003065517236@oratrix.com>
Message-ID: <200208150242.g7F2gGq27116@oma.cosc.canterbury.ac.nz>

Jack Jansen :

> Python jumps through hoops to make 'jack' and u'jack' compare=20
> identical and be interchangeable in dict keys and what have you,=
=20
> and now suddenly I find out that there's two ways to say u'j=E4ck'=
=20
> and they won't compare equal. Not good.

To me, this says that Python should pick one of the
canonical forms and make sure all its Unicode strings
are normalised to it. (Or at least make it appear
as if they are.)

Greg Ewing, Computer Science Dept, +---------------------------------=
-----+
University of Canterbury,=09   | A citizen of NewZealandCorp, a=09  |
Christchurch, New Zealand=09   | wholly-owned subsidiary of USA Inc. =
 |
greg@cosc.canterbury.ac.nz=09   +------------------------------------=
--+



From greg@cosc.canterbury.ac.nz  Thu Aug 15 03:58:09 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 15 Aug 2002 14:58:09 +1200 (NZST)
Subject: [Python-Dev] Deprecation warning on integer shifts and such
In-Reply-To: <200208150107.g7F17bO02260@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200208150258.g7F2w8Y27320@oma.cosc.canterbury.ac.nz>

Guido:

> In Python 2.4, the recommended way will be to write 0xffffffff and not
> worry about the fact that it's a positive long;

Yes, it won't be so much of an issue then. But you can still get a
negative long from a positive one when bit twiddling by complementing,
meaning that you have to remember to mask the result before displaying
it as hex, or end up with a hex representation that displays the bit
pattern in a way that's hard to interpret.

That's the usage I had in mind when I mentioned the 1x notation --
for display, not for input.

But thinking about it now, it would be better to provide a new
function for hexifying that you could tell how many bits you're
interested in, and it would show you that many, unsigned. Maybe
also a new format operator for this as well.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From guido@python.org  Thu Aug 15 04:02:27 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 14 Aug 2002 23:02:27 -0400
Subject: [Python-Dev] Alternative implementation of interning
In-Reply-To: Your message of "Wed, 14 Aug 2002 16:04:49 CDT."
 <15706.50673.81267.900261@localhost.localdomain>
References: <15706.50673.81267.900261@localhost.localdomain>
Message-ID: <200208150302.g7F32RX03115@pcp02138704pcs.reston01.va.comcast.net>

> * Probably makes no difference, but it seems oddly asymmetric to fiddle with
>   the interned string's refcount in string_dealloc, call PyObject_DelItem,
>   then not restore the refcount to zero.

That's unnecessary because the next line executed simply frees the
object.  free() doesn't check the refcount.  It's a bit optimistic in
that it doesn't check the DelItem for an error; but that's a separate
issue, and I don't know what it should do when it gets an error at
that point.  Probably call Py_FatalError(); if it wanted to recover,
it would have to call PyErr_Fetch() / PyErr_Restore() around the
DelItem() call, because we're in a dealloc handler here and that
shouldn't change the exception state.

> * Should be Py_DECREF(keys) (not Py_XDECREF(keys)) in
>   _Py_ReleaseInternedStrings.  If you've gotten that far keys can't be
>   NULL.  If you're worried about keys being NULL, you should check it before
>   the for loop (PyMapping_Size() will barf on a NULL arg).

You're right.  Also, I think it should use PyDict_Keys() and
PyDict_Size() -- it knows that interned is a dict so all the hoopla
that PyMapping_Keys() adds is unnecessary.  Maybe the best thing to do
is to remove _Py_ReleaseInternedStrings() and let Barry worry about
how to implement it the next time he wants to use Insure++.

> Also, regarding the name of PyString_InternInPlace, I see now that's the
> original name.  I suggest that name be deprecated in favor of
> PyString_InternImmortal with a macro defined in stringobject.h for
> compatibility.

Yeah, if we keep the immortality feature at all.

BTW, it looks like Oren was careless with error checking in a few
places.  The whole patch needs to checked carefully.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg@cosc.canterbury.ac.nz  Thu Aug 15 04:13:53 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 15 Aug 2002 15:13:53 +1200 (NZST)
Subject: [Python-Dev] type categories
In-Reply-To: <200208150224.g7F2O7C02510@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200208150313.g7F3Dr727504@oma.cosc.canterbury.ac.nz>

Guido:

> I can see how it could be done using some additional syntax similar to
> what ML uses, e.g.:
> 
>   def f(a: Cat1):
>       ...code for Cat1...
>   else f(a: Cat2):
>       ...code for Cat2...
>   else f(a: Cat3):
>       ...code for Cat3...

As long as all the implementations have to be in one place,
this is equivalent to

  def f(a):
    if belongstocategory(a, Cat1):
      ...
    elif belongstocategory(a, Cat2):
      ...
    elif belongstocategory(a, Cat3):
      ...

so you're not gaining much from the new syntax.

> It might also be possible to modify a multimethod dynamically,
> e.g. later one could write:
> 
>   def f4(a: Cat4):
>     ...code for Cat4...
> 
>   f.add(f4)

This sort of scheme makes me uneasy, because it means that any module
can change the behaviour of any call of f() in any other
module. Currently, if you know the types involved in a method call,
you can fairly easily track down in the source which piece of code
will be called. With this sort of generic function, that will no
longer be possible. It's kind of like an "import *" in reverse -- you
won't know what's coming from where, and you can get things that
you never even asked for.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From aahz@pythoncraft.com  Thu Aug 15 05:10:25 2002
From: aahz@pythoncraft.com (Aahz)
Date: Thu, 15 Aug 2002 00:10:25 -0400
Subject: [Python-Dev] Mac forever! (was Re: The memo of pickle)
Message-ID: <20020815041025.GA9165@panix.com>

On Sun, Aug 11, 2002, Tim Peters wrote:
> 
> Cool!  I was just wondering the other day whether there are any Mac users
> left apart from Jack and Guido's brother.  It's a landslide .

I'm an OS X user, does that count?  <0.8 wink>

Incidentally, O'Reilly is looking for presentations for the OS X
conference in San Jose end of September.  I'll post the e-mail address
of the program chair if more than one person wants it -- or you can do
as I did and look at the web site.
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Project Vote Smart: http://www.vote-smart.org/


From David Abrahams" 
Message-ID: <115c01c24418$76902fe0$6501a8c0@boostconsulting.com>

From: "Tim Peters" 
> I haven't studied this, but from a quick glance it looks competent.
>
>
> Multimethod-0.1 is another python module for implementing multimethods
> (a.k.a.  generic functions, multiple-argument method dispatch).  This
> one features:
>
> - support for Python2.2 type/class unification
> - a precedence graph for more efficient dispatching
> - a best-fit resolution algorithm, in which the method closest in
> inheritance distance is called
> - a versatile 'call-next-method' or 'super' function.
>
> Available at http://bent-arrow.com/python and the Vaults of Parnassus.
>
> -Coady

It's a good start, but from the docs it doesn't appear to deal with:

a. Type categories -- it seems as though the only way for a multimethod
implementation to match an actual argument is if the formal argument has an
inheritance relationship with it.

b. Implicit conversions -- If I declare a function that accepts a Python
int, can I pass a Python float?

I think both of the above are important for any Python multimethod
implementation.

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com



#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################


From David Abrahams"                 <200208150224.g7F2O7C02510@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <115d01c24418$772766d0$6501a8c0@boostconsulting.com>

From: "Guido van Rossum" 
to more arguments as well.)
> 
> It might also be possible to modify a multimethod dynamically,
> e.g. later one could write:
> 
>   def f4(a: Cat4):
>     ...code for Cat4...
> 
>   f.add(f4)
> 
> This is more in the spirit of Python than your original proposal,
> which appeared like the compiler would have to gather all the
> definitions from different places and fuse them.  That would be too
> complex for Python's simple-minded compiler!

This is most like what I had in mind.

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com




From oren-py-d@hishome.net  Thu Aug 15 07:30:31 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Thu, 15 Aug 2002 09:30:31 +0300
Subject: [Python-Dev] type categories
In-Reply-To: <200208150138.g7F1cIC12290@europa.research.att.com>; from ark@research.att.com on Wed, Aug 14, 2002 at 09:38:18PM -0400
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov> <20020814101819.GA93585@hishome.net>  <20020814215122.GA31835@vet.uu.nl> <200208142155.g7ELtll10574@europa.research.att.com> <200208150135.g7F1ZbE02366@pcp02138704pcs.reston01.va.comcast.net> <200208150138.g7F1cIC12290@europa.research.att.com>
Message-ID: <20020815093031.A9040@hishome.net>

On Wed, Aug 14, 2002 at 09:38:18PM -0400, Andrew Koenig wrote:
> >> Not really.  I can see how an interface can claim that a particular
> >> method exists, but not how it can claim that the method implements a
> >> function that is antisymmetric and transitive.
> 
> Guido> That's done in the docs, usually.  Zope even has the notion of a
> Guido> "marker" interface -- an interface that says "this object has property
> Guido> such-and-such" but which does not assert any methods or attributes.
> 
> So perhaps what I mean by a category is the set of all types that
> implement a particular marker interface.

I propose that any method or attribute may serve as a marker. This makes it
possible to use an existing practice as a marker so protocols can be
defined retroactively for an existing code base. It's also possible, of 
course, to add an attribute called 'has_property_such_and_such' to serve 
as an explicit marker.

A type category is defined by a predicate that tests for the presence of
one or more markers.  Predicates can test not only for the presence of
markers but also for the type category of the marker object and for call 
signatures. When optional type checking is implemented they should also be
able to test for the categories of arguments and return values.

A new category may be defined as a union or intersection of two existing
categories. This is done by ANDing or ORing the membership predicates of
the two categories and reducing them back to canonical form. Canonizing
a predicate is done by conversion into Disjnctive Normal Form, elimination 
of redundant terms and products, sorting and a few other steps.

A global dictionary of canonical predicates is kept (similar to interning
of strings) so any equivalent categories are merged. Each type object
can store a cache of categories in which it is a member so evaluation of
a membership predicate only needs to be done once for each type.

This may sound complicated by here's how it might work in practice:

Extracting a category from an existing class:
foobarlike = like(FooBar)

The members of the foobarlike category are any classes that implement the
same methods and attributes as FooBar, whether or not they are actually
descended from it. They may be defined independently in another library.
FooBar may be an abstract class used just as a template for a category.

Asserting that a class must be a member of a category:

class SomeClass:
   __category__ = like(AnotherClass)
   ...

At the end of the class definition it will be checked whether it really is
a member of that category (like(SomeClass) issubsetof like(AnotherClass))
This attribute is inherited by subclasses.  Any subclass of this class
will be checked whether it is still a member of the category.  A subclass
may also override this attribute:

class InheritImplementationButNotTheCategoryCheckFrom(SomeClass):
   __category__ = some_other_category
   ...

class AddAdditionalRestrictionsTo(SomeClass):
   __category__ = __category__ & like(YetAnotherClass)

If there is a conflict between the two categories the new category will
reduce to the empty set and an error will be generated. The error can be
quite informative by extracting a category from the new class, subtracting
it from the defined category and printing the difference.

When a backward compatible change is made to a protocol (e.g. adding a new
method) any modules that use the old category should still work because
the new category is a subcategory of the old one. When a non backward
compatible change is made (e.g. removing a method, changing its call
signature) existing code may still run without complaining depending on
the category it uses to do the checking. If it's a wider category that
doesn't check for the method it should be ok.

A non backward compatible change must change the exposed interface. This
may be ensured by adding an attribute or method that serves as an explicit
marker and includes a version number or is renamed in some other way when
making incompatible changes. Category union may be used to check for two
incompatible versions that are known to implement a common subset even if
it has never been given a name, etc.

	Oren


From martin@strakt.com  Thu Aug 15 08:26:46 2002
From: martin@strakt.com (Martin =?ISO-8859-1?Q?Sj=F6gren?=)
Date: 15 Aug 2002 09:26:46 +0200
Subject: [Python-Dev] string.find() again (was Re: timsort for jython)
In-Reply-To: <200208142112.g7ELCSQ32576@odiug.zope.com>
References: <200208070227.g772R4i07678@oma.cosc.canterbury.ac.nz>
 
 <200208142013.g7EKD9v29275@odiug.zope.com>
 <200208142034.g7EKYt508950@europa.research.att.com>
 <200208142112.g7ELCSQ32576@odiug.zope.com>
Message-ID: <1029396406.30031.3.camel@ratthing-b3cf>

ons 2002-08-14 klockan 23.12 skrev Guido van Rossum:
> Unfortunately that would be a significant change in internal shit.

Just curious, is "internal shit" a technical term in Python? ;-)

*ducks and starts running*


/Martin

--=20
Martin Sj=F6gren
  martin@strakt.com              ICQ : 41245059
  Phone: +46 (0)31 7710870       Cell: +46 (0)739 169191
  GPG key: http://www.strakt.com/~martin/gpg.html


From oren-py-d@hishome.net  Thu Aug 15 09:08:01 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Thu, 15 Aug 2002 04:08:01 -0400
Subject: [Python-Dev] type categories
In-Reply-To: 
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov> <20020814101819.GA93585@hishome.net> 
Message-ID: <20020815080801.GA66070@hishome.net>

On Wed, Aug 14, 2002 at 10:08:59AM -0400, Andrew Koenig wrote:
> >> The category names look like general purpose interface names. The
> >> addition of interfaces has been discussed quite a bit. While many
> >> people are interested in having interfaces added to Python, there
> >> are many design issues that will have to be resolved before it
> >> happens.
> 
> Oren> Nope. Type categories are fundamentally different from
> Oren> interfaces.  An interface must be declared by the type while a
> Oren> category can be an observation about an existing type.
> 
> Why?  That is, why can't you imagine making a claim that type
> X meets interface Y, even though the author of neither X nor Y
> made that claim?

It's not a failure of imagination, it's a failure of terminology. In
contexts where the term 'interface' is used (Java, COM, etc) it usually 
means something you explicitly expose from your objects. I find that the 
term 'category' implies something you observe after the fact without 
modifying the object - "these objects both have property so-and-so, let's 
group them together and call it a category".

> However, now that you bring it up... One difference I see between
> interfaces and categories is that I can imagine categories carrying
> semantic information to the human reader of the code that is not
> actually expressed in the category itself.  As a simple example,
> I can imagine a PartialOrdering category that I might like as part
> of the specification for an argument to a sort function.

You can define any category you like and attach a semantic meaning to it
as long as you can write a membership predicate for the category. It may
be based on a marker that the type must have or, in case you can't change
the type (e.g. a builtin type) you can write a membership predicate that
also tests for some set of specific types. 

> Oren> A category is defined mathematically by a membership
> Oren> predicate. So what we need for type categories is a system for
> Oren> writing predicates about types.
> 
> Indeed, that's what I was thinking about initially.  Guido pointed out
> that the notion could be expanded to making concrete assertions about
> the interface to a class.  I had originally considered that those
> assertions could be just that--assertions, but then when Guido started
> talking about interfaces, I realized that my original thought of
> expressing satisfaction of a predicate by inheriting it could be
> extended by simply adding methods to those predicates.  Of course,
> this technique has the disadvantage that it's not easy to add base
> classes to a class after it has been defined.

That's why the intelligence should be in the membership predicate, not in 
the classes it selects. Nothing needs to be changed about types. 
Conceptually, categories apply to *references*, not to *objects*. They help 
you ensure that during execution certain references may only point to 
objects from a limited category of types so that the operations you perform 
on them are meaningful (though not necessarily correct). A situation that 
may lead to a reference pointing to an object outside the valid category 
should be detected as early as possible. Detecting this during compilation 
is great. On module import is good. At runtime it's ok.

Can you can think of a better name than 'categories' to describe a set of 
types selected by a membership predicate? 

	Oren



From oren-py-d@hishome.net  Thu Aug 15 09:40:13 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Thu, 15 Aug 2002 04:40:13 -0400
Subject: [Python-Dev] type categories
In-Reply-To: 
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov> <20020814101819.GA93585@hishome.net> <200208141309.g7ED9Jb01045@pcp02138704pcs.reston01.va.comcast.net> <20020815002825.A2241@hishome.net> 
Message-ID: <20020815084013.GB66070@hishome.net>

On Thu, Aug 15, 2002 at 12:42:46AM +0200, Martin v. Loewis wrote:
> Oren Tirosh  writes:
> 
> > Nope. For me protocols are conventions to follow for performing a certain 
> > task.  A type category is a formally defined set of types.  
> 
> ODP (Reference Model For Open Distributed Processing, ISO 10746)
> defines that a type is a predicate; it implies a set (of which it is
> the characteristic function).

A type is a predicate about an object. A category is a predicate about a
type.

Objects have a type. References have a category. 

Well, Python references currently all have the 'any' category because 
Python has no type checking. Any Python reference may point to an object 
of any type. 

In a dynamically typed language there is no such thing as an 'integer
variable' but it can be simulated by a reference that may only point to
objects in the 'integer' category.

> It is not so clear that this is what defines the iterable category. It
> could also be defined as "the programmer can use to for doing
> iteration, by means of the iterable protocol".

Your definition is not formal and cannot be evaluated by a program.  

The iterable category matches the set of types implementing the iterable
protocol with reasonable accuracy. It doesn't have to be perfect to be 
useful.

> > Protocols live in documentation and lore. Type categories live in the same 
> > place where vector spaces and other formal systems live.
> 
> By that definition, I'd say that Andrew's list enumerates protocols,
> not type categories: they all live in lore, not in a formalism.

Exactly.
 
> For being a pure function, requiring that it does not trigger Python
> code seems a bit too restrictive.

That's not a formal requirement, it for robustness and efficiency. 

	Oren 



From mwh@python.net  Thu Aug 15 10:42:29 2002
From: mwh@python.net (Michael Hudson)
Date: 15 Aug 2002 10:42:29 +0100
Subject: [Python-Dev] type categories
In-Reply-To: Greg Ewing's message of "Thu, 15 Aug 2002 15:13:53 +1200 (NZST)"
References: <200208150313.g7F3Dr727504@oma.cosc.canterbury.ac.nz>
Message-ID: <2mfzxg5ugq.fsf@starship.python.net>

Greg Ewing  writes:

> As long as all the implementations have to be in one place,
> this is equivalent to
> 
>   def f(a):
>     if belongstocategory(a, Cat1):
>       ...
>     elif belongstocategory(a, Cat2):
>       ...
>     elif belongstocategory(a, Cat3):
>       ...
> 
> so you're not gaining much from the new syntax.

Good point.

> > It might also be possible to modify a multimethod dynamically,
> > e.g. later one could write:
> > 
> >   def f4(a: Cat4):
> >     ...code for Cat4...
> > 
> >   f.add(f4)
> 
> This sort of scheme makes me uneasy, because it means that any module
> can change the behaviour of any call of f() in any other
> module.

True, but I don't think this is a problem in practice with CLOS is it?

I mean, you can currently do

import mod

mod.func = my_func # evil cackle!

but you don't.

> Currently, if you know the types involved in a method call,
> you can fairly easily track down in the source which piece of code
> will be called. With this sort of generic function, that will no
> longer be possible. 

I would sincerely hope that any core implementation of such an idea
would be introspective enough to allow finding method implementations.
Obviously this would only work at run time, but it would be a help
(imagine running under pdb).

Cheers,
M.

-- 
  Well, yes.  I don't think I'd put something like "penchant for anal
  play" and "able to wield a buttplug" in a CV unless it was relevant
  to the gig being applied for...
                                 -- Matt McLeod, alt.sysadmin.recovery


From Jack.Jansen@oratrix.com  Wed Aug 14 20:52:00 2002
From: Jack.Jansen@oratrix.com (Jack Jansen)
Date: Wed, 14 Aug 2002 21:52:00 +0200
Subject: [Python-Dev] PEP 277 (unicode filenames): please review
In-Reply-To: <200208141213.g7ECD5V00311@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <4C515BCF-AFBF-11D6-8B4E-003065517236@oratrix.com>

On woensdag, augustus 14, 2002, at 02:13 , Guido van Rossum wrote:
> Note that normalization doesn't belong in the codecs (except perhaps
> as a separate Unicode->Unicode codec, since codecs seem to be useful
> for all string->string transformations).  It's a separate step that
> the application has to request; only the app knows whether a
> particular Unicode string is already normalized or not, and whether
> the expense is useful for the app, or not.

I don't like this, I don't like it at all.

Python jumps through hoops to make 'jack' and u'jack' compare=20
identical and be interchangeable in dict keys and what have you,=20
and now suddenly I find out that there's two ways to say u'j=E4ck'=20
and they won't compare equal. Not good.

I sympathise with the fact that this is difficult (although I=20
still don't understand why: whereas when you want to create the=20
decomposed version I can imagine there's N! ways to notate a=20
character with N combining chars, I would think there's one and=20
only one way to write a combined character), but that shouldn't=20
stop us at least planning to fix this.

And I don't think the burden should fall on the application.=20
That same reasoning could have been followed for making ascii=20
and unicode-ascii-subset compare equal: the application will=20
know it has to convert ascii to unicode before comparing.
--
- Jack Jansen               =20
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution --=20
Emma Goldman -



From guido@python.org  Thu Aug 15 13:13:52 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 15 Aug 2002 08:13:52 -0400
Subject: [Python-Dev] type categories
In-Reply-To: Your message of "Thu, 15 Aug 2002 09:30:31 +0300."
 <20020815093031.A9040@hishome.net>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov> <20020814101819.GA93585@hishome.net>  <20020814215122.GA31835@vet.uu.nl> <200208142155.g7ELtll10574@europa.research.att.com> <200208150135.g7F1ZbE02366@pcp02138704pcs.reston01.va.comcast.net> <200208150138.g7F1cIC12290@europa.research.att.com>
 <20020815093031.A9040@hishome.net>
Message-ID: <200208151213.g7FCDqc04131@pcp02138704pcs.reston01.va.comcast.net>

> Extracting a category from an existing class:
> foobarlike = like(FooBar)
> 
> The members of the foobarlike category are any classes that
> implement the same methods and attributes as FooBar, whether or not
> they are actually descended from it.  They may be defined
> independently in another library.

This seems fairly userless in practice -- you almost never want to use
*all* methods and attributes of a class as the characteristic.  (Even
if you skip names starting with _.)

> FooBar may be an abstract class used just as a template for a category.

Then the like() syntax seems unnecessary.  It then becomes similar to
Zope's Interfaces.

> Asserting that a class must be a member of a category:
> 
> class SomeClass:
>    __category__ = like(AnotherClass)
>    ...

In Zope:

  class SomeClass:
    __implements__ = AnotherClass

By convention, AnotherClass usually has a name that indicates it
is an interface: IAnotherClass.

> At the end of the class definition it will be checked whether it
> really is a member of that category (like(SomeClass) issubsetof
> like(AnotherClass)) This attribute is inherited by subclasses.  Any
> subclass of this class will be checked whether it is still a member
> of the category.

I've been mulling over another way to spell this; perhaps you can
add categories to the inheritance list:

  class SomeClass(IAnotherClass):
    ...

There's ambiguity here though: extending an interface already uses the
same syntax:

  class IExtendedClass(IAnotherClass):
    ...

Disambiguating based on name conventions seems wrong and unpythonic.
In C++, abstract classes are those that have one or more abstract
methods; maybe we can borrow from that.

> A subclass
> may also override this attribute:
> 
> class InheritImplementationButNotTheCategoryCheckFrom(SomeClass):
>    __category__ = some_other_category
>    ...

My alternative spelling idea currently has no way to do this; but one
is needed, and preferably one that's not too ugly.

> class AddAdditionalRestrictionsTo(SomeClass):
>    __category__ = __category__ & like(YetAnotherClass)

There's a (shallow) problem here, in that __category__ is not
initially in your class's namespace: at the start of executing the
class statement, you begin with an empty local namespace.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Thu Aug 15 13:20:06 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 15 Aug 2002 08:20:06 -0400
Subject: [Python-Dev] type categories
In-Reply-To: Your message of "Thu, 15 Aug 2002 04:40:13 EDT."
 <20020815084013.GB66070@hishome.net>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov> <20020814101819.GA93585@hishome.net> <200208141309.g7ED9Jb01045@pcp02138704pcs.reston01.va.comcast.net> <20020815002825.A2241@hishome.net> 
 <20020815084013.GB66070@hishome.net>
Message-ID: <200208151220.g7FCK6b04165@pcp02138704pcs.reston01.va.comcast.net>

> In a dynamically typed language there is no such thing as an 'integer
> variable' but it can be simulated by a reference that may only point to
> objects in the 'integer' category.

This seems a game with words.  I don't see the difference between an
integer variable and a reference that must point to an integer.
(Well, I see a difference, in the sharing semantics, but that's just
the difference between a value and an pointer in C.  They're both
variables.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From faassen@vet.uu.nl  Thu Aug 15 13:31:43 2002
From: faassen@vet.uu.nl (Martijn Faassen)
Date: Thu, 15 Aug 2002 14:31:43 +0200
Subject: [Python-Dev] type categories
In-Reply-To: <20020815093031.A9040@hishome.net>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov> <20020814101819.GA93585@hishome.net>  <20020814215122.GA31835@vet.uu.nl> <200208142155.g7ELtll10574@europa.research.att.com> <200208150135.g7F1ZbE02366@pcp02138704pcs.reston01.va.comcast.net> <200208150138.g7F1cIC12290@europa.research.att.com> <20020815093031.A9040@hishome.net>
Message-ID: <20020815123143.GA2880@vet.uu.nl>

Oren Tirosh wrote:
> On Wed, Aug 14, 2002 at 09:38:18PM -0400, Andrew Koenig wrote:
> > >> Not really.  I can see how an interface can claim that a particular
> > >> method exists, but not how it can claim that the method implements a
> > >> function that is antisymmetric and transitive.
> > 
> > Guido> That's done in the docs, usually.  Zope even has the notion of a
> > Guido> "marker" interface -- an interface that says "this object has property
> > Guido> such-and-such" but which does not assert any methods or attributes.
> > 
> > So perhaps what I mean by a category is the set of all types that
> > implement a particular marker interface.
> 
> I propose that any method or attribute may serve as a marker. This makes it
> possible to use an existing practice as a marker so protocols can be
> defined retroactively for an existing code base. It's also possible, of 
> course, to add an attribute called 'has_property_such_and_such' to serve 
> as an explicit marker.

This is an interesting idea. I'd say you could plug such a 
thing into an interface system, by making 'interface.isImplementedBy()' 
calling some hooks that may dynamically claim an object implements
an interface, based on methods and attributes.

I'm not sure if it's a good idea, as if you're going to state this in
code anyway it seems to me it's clearer to actually explicitly use marker
interfaces instead of writing some code that guesses based on the presence
of particular attributes, but it's definitely an interesting idea.

[snip] 
> A new category may be defined as a union or intersection of two existing
> categories. This is done by ANDing or ORing the membership predicates of
> the two categories and reducing them back to canonical form.

This is similar to some ideas I came up with a couple of years ago on the
types-SIG, and Guido told me to talk about it at a conference over some beer,
instead. :)

http://mail.python.org/pipermail/types-sig/1999-December/000633.html

http://mail.python.org/pipermail/types-sig/2000-January/000923.html
(see bottom: Interfaces can be implied:)

and here's Guido's finding my idea too absurd:
http://mail.python.org/pipermail/types-sig/2000-January/000932.html

But they're still interesting ideas. :) You can basically deduce what
attributes (and methods) are on an interface by just giving a whole bunch 
of classes that you claim implement the interface, for instance.

Regards,

Martijn



From ark@research.att.com  Thu Aug 15 14:02:22 2002
From: ark@research.att.com (Andrew Koenig)
Date: 15 Aug 2002 09:02:22 -0400
Subject: [Python-Dev] type categories
In-Reply-To: <20020815080801.GA66070@hishome.net>
References: <200208131802.g7DI2Ro27807@europa.research.att.com>
 <200208131545.29856.mclay@nist.gov>
 <20020814101819.GA93585@hishome.net>
 
 <20020815080801.GA66070@hishome.net>
Message-ID: 

>> Why?  That is, why can't you imagine making a claim that type
>> X meets interface Y, even though the author of neither X nor Y
>> made that claim?

Oren> It's not a failure of imagination, it's a failure of
Oren> terminology. In contexts where the term 'interface' is used
Oren> (Java, COM, etc) it usually means something you explicitly
Oren> expose from your objects. I find that the term 'category'
Oren> implies something you observe after the fact without modifying
Oren> the object - "these objects both have property so-and-so, let's
Oren> group them together and call it a category".

But what if it is possible to express property so-and-so as an
interface?  It's like observing that a particular set, that someone
else defined, is a group, so now all the group theorems apply to it.
Similarly, if someone has defined a class, and I happen to notice that
that class is really a reversible iterator, I would like a way saying
so that will let anyone who wants to use that class in a context that
requires a reversible iterator to do so.

>> However, now that you bring it up... One difference I see between
>> interfaces and categories is that I can imagine categories carrying
>> semantic information to the human reader of the code that is not
>> actually expressed in the category itself.  As a simple example,
>> I can imagine a PartialOrdering category that I might like as part
>> of the specification for an argument to a sort function.

Oren> You can define any category you like and attach a semantic
Oren> meaning to it as long as you can write a membership predicate
Oren> for the category. It may be based on a marker that the type must
Oren> have or, in case you can't change the type (e.g. a builtin type)
Oren> you can write a membership predicate that also tests for some
Oren> set of specific types.

Or perhaps a membership predicate that tests whether a type satisfies
a particular interface.

Oren> A category is defined mathematically by a membership
Oren> predicate. So what we need for type categories is a system for
Oren> writing predicates about types.

And, perhaps, a way for defining predicates that determine whether
types meet interfaces.

>> Indeed, that's what I was thinking about initially.  Guido pointed
>> out that the notion could be expanded to making concrete assertions
>> about the interface to a class.  I had originally considered that
>> those assertions could be just that--assertions, but then when
>> Guido started talking about interfaces, I realized that my original
>> thought of expressing satisfaction of a predicate by inheriting it
>> could be extended by simply adding methods to those predicates.  Of
>> course, this technique has the disadvantage that it's not easy to
>> add base classes to a class after it has been defined.

Oren> That's why the intelligence should be in the membership
Oren> predicate, not in the classes it selects. Nothing needs to be
Oren> changed about types.  Conceptually, categories apply to
Oren> *references*, not to *objects*.

I don't see why categories should not also apply to class objects.

Oren> They help you ensure that during execution certain references
Oren> may only point to objects from a limited category of types so
Oren> that the operations you perform on them are meaningful (though
Oren> not necessarily correct). A situation that may lead to a
Oren> reference pointing to an object outside the valid category
Oren> should be detected as early as possible. Detecting this during
Oren> compilation is great. On module import is good. At runtime it's
Oren> ok.

Agreed.

Oren> Can you can think of a better name than 'categories' to describe
Oren> a set of types selected by a membership predicate?

Not offhand.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From faassen@vet.uu.nl  Thu Aug 15 13:58:53 2002
From: faassen@vet.uu.nl (Martijn Faassen)
Date: Thu, 15 Aug 2002 14:58:53 +0200
Subject: [Python-Dev] type categories
In-Reply-To: <20020815080801.GA66070@hishome.net>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov> <20020814101819.GA93585@hishome.net>  <20020815080801.GA66070@hishome.net>
Message-ID: <20020815125853.GB2880@vet.uu.nl>

Oren Tirosh wrote:
> On Wed, Aug 14, 2002 at 10:08:59AM -0400, Andrew Koenig wrote:
[snip]
> > Why?  That is, why can't you imagine making a claim that type
> > X meets interface Y, even though the author of neither X nor Y
> > made that claim?
> 
> It's not a failure of imagination, it's a failure of terminology. In
> contexts where the term 'interface' is used (Java, COM, etc) it usually 
> means something you explicitly expose from your objects. I find that the 
> term 'category' implies something you observe after the fact without 
> modifying the object - "these objects both have property so-and-so, let's 
> group them together and call it a category".

Okay, but in Python interfaces as we know them (the Scarecrow descended
interfaces as in use in Zope 2 and Zope 3), you can say things like this
(with some current limitations concerning basic types). Usually however
one does use the __implements__ class attribute, but that's because it's
often clearer and easier when one is writing a new class.

[snip]
> That's why the intelligence should be in the membership predicate, not in 
> the classes it selects. Nothing needs to be changed about types. 
> Conceptually, categories apply to *references*, not to *objects*. They help 
> you ensure that during execution certain references may only point to 
> objects from a limited category of types so that the operations you perform 
> on them are meaningful (though not necessarily correct).

This is quite different from the Zope interfaces approach, I think. 
Zope interfaces do talk about objects, not references. This is leaning
towards a more static feel of typing (even though it may be quite different
from static typing in the details), which I think should be clearly
marked as quite independent from a discussion on interfaces. 

Though I'd still call your beasties Interfaces and not categories, even
though you want to use them in a statically typed way -- but please let's
not reject simple interfaces just because we may want to do something
complicated and involved with static typing later..

Regards,

Martijn



From ark@research.att.com  Thu Aug 15 14:06:21 2002
From: ark@research.att.com (Andrew Koenig)
Date: 15 Aug 2002 09:06:21 -0400
Subject: [Python-Dev] type categories
In-Reply-To: <200208150313.g7F3Dr727504@oma.cosc.canterbury.ac.nz>
References: <200208150313.g7F3Dr727504@oma.cosc.canterbury.ac.nz>
Message-ID: 

>> I can see how it could be done using some additional syntax similar to
>> what ML uses, e.g.:

Guido> def f(a: Cat1):
Guido>    ...code for Cat1...
Guido> else f(a: Cat2):
Guido>    ...code for Cat2...
Guido> else f(a: Cat3):
Guido>    ...code for Cat3...

Greg> As long as all the implementations have to be in one place,
Greg> this is equivalent to

Greg>   def f(a):
Greg>     if belongstocategory(a, Cat1):
Greg>       ...
Greg>     elif belongstocategory(a, Cat2):
Greg>       ...
Greg>     elif belongstocategory(a, Cat3):
Greg>       ...

Greg> so you're not gaining much from the new syntax.

I'm not so sure.  The code is alreasy somewhat simpler here, and it
would be substantially simpler in examples such as

        def arctan(x):
            ...
        else arctan(y, x):
            ...

>> It might also be possible to modify a multimethod dynamically,
>> e.g. later one could write:
>> 
>> def f4(a: Cat4):
>> ...code for Cat4...
>> 
>> f.add(f4)

Greg> This sort of scheme makes me uneasy, because it means that any module
Greg> can change the behaviour of any call of f() in any other
Greg> module.

It makes me uneasy because the behavior of programs might depend on the
order in which modules are loaded.  That's why I didn't suggest a way
of defining the variations on f in separate places.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From faassen@vet.uu.nl  Thu Aug 15 14:08:25 2002
From: faassen@vet.uu.nl (Martijn Faassen)
Date: Thu, 15 Aug 2002 15:08:25 +0200
Subject: [Python-Dev] type categories
In-Reply-To: <200208150134.g7F1Y3102354@pcp02138704pcs.reston01.va.comcast.net>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208132115.g7DLFwL25088@odiug.zope.com> <200208132127.g7DLRJO29696@europa.research.att.com> <200208140316.g7E3GtT30902@pcp02138704pcs.reston01.va.comcast.net> <20020814221251.GB31835@vet.uu.nl> <200208150134.g7F1Y3102354@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020815130825.GC2880@vet.uu.nl>

Guido van Rossum wrote:
> > Interfaces in Python are almost too trivial to understand, but
> > surprisingly useful. I imagine this is why so many smart Python
> > users don't get it; they either reject the notion because it seems
> > too trivial and 'therefore useless', or because they think it must
> > involve far more complication (static typing) and therefore it's too
> > complicated and not in the spirit of Python. :)
> 
> No, I think it's because they only work well if they are used
> pervasively (not necessarily everywhere).  That's why they work in
> Zope: not only does almost everything in Zope have an interface, but
> interfaces are used to implement many Zope features.

That is not true for Zope 2, and I do use interfaces in Zope 2. Zope 2
certainly doesn't use interfaces pervasively.

I use them pervasively in some of my code which is a framework
on top of Zope, which may weaken my argument, but it's still not
true interfaces are not useful unless they're used pervasively.
It's definitely a lot more powerful if you do, of course, though.

I also think it may help that one can declare a class implements an
interface outside said class itself, in a different section of the code.
I do not have any practical experience with that outside some Zope 3
hackery, however, so I can't really defend this one very well.

> I haven't made up my mind yet whether Python could benefit as much as
> Zope, but I am cautiosuly looking into adding something derived from
> Zope's interface package.  Jim & I have rather different ideas on what
> the ideal interfaces API should look like though, so it'll be a while.
> Maybe I should pull down the Twisted interfaces package and see how I
> like their subset (I'm sure it must be a subset -- the Zope package is
> a true kitchen sink :-).

It's an extremely small subset and very trivial, and last I checked
they used 'implements' in a different way than Zope, unfortunately (I pointed
it out and they may have fixed that by now, not sure).

But if you are looking for another API then the Twisted version doesn't
help (except for the inadvertent 'implements()' difference).

I don't consider the Zope 3 interface package to be a kitchen sink
myself, but I've been working with it for a while now. I would note
that some of its extensibility and introspection features is quite
useful when implementing Schema (a special kind of interfaces with
descriptions about non-method attributes). If a new package is to
be designed I hope that those use cases will be taken into account.

Regards,

Martijn



From ark@research.att.com  Thu Aug 15 14:09:05 2002
From: ark@research.att.com (Andrew Koenig)
Date: 15 Aug 2002 09:09:05 -0400
Subject: [Python-Dev] type categories
In-Reply-To: <20020815123143.GA2880@vet.uu.nl>
References: <200208131802.g7DI2Ro27807@europa.research.att.com>
 <200208131545.29856.mclay@nist.gov>
 <20020814101819.GA93585@hishome.net>
 
 <20020814215122.GA31835@vet.uu.nl>
 <200208142155.g7ELtll10574@europa.research.att.com>
 <200208150135.g7F1ZbE02366@pcp02138704pcs.reston01.va.comcast.net>
 <200208150138.g7F1cIC12290@europa.research.att.com>
 <20020815093031.A9040@hishome.net> <20020815123143.GA2880@vet.uu.nl>
Message-ID: 

Martijn> Oren Tirosh wrote:

Oren> I propose that any method or attribute may serve as a
Oren> marker. This makes it possible to use an existing practice as a
Oren> marker so protocols can be defined retroactively for an existing
Oren> code base. It's also possible, of course, to add an attribute
Oren> called 'has_property_such_and_such' to serve as an explicit
Oren> marker.

Martijn> This is an interesting idea. I'd say you could plug such a
Martijn> thing into an interface system, by making
Martijn> 'interface.isImplementedBy()' calling some hooks that may
Martijn> dynamically claim an object implements an interface, based on
Martijn> methods and attributes.

In that case, a marker is really just an interface with a single element.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From oren-py-d@hishome.net  Thu Aug 15 14:13:35 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Thu, 15 Aug 2002 09:13:35 -0400
Subject: [Python-Dev] type categories
In-Reply-To: <200208151220.g7FCK6b04165@pcp02138704pcs.reston01.va.comcast.net>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov> <20020814101819.GA93585@hishome.net> <200208141309.g7ED9Jb01045@pcp02138704pcs.reston01.va.comcast.net> <20020815002825.A2241@hishome.net>  <20020815084013.GB66070@hishome.net> <200208151220.g7FCK6b04165@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020815131335.GA4567@hishome.net>

On Thu, Aug 15, 2002 at 08:20:06AM -0400, Guido van Rossum wrote:
> > In a dynamically typed language there is no such thing as an 'integer
> > variable' but it can be simulated by a reference that may only point to
> > objects in the 'integer' category.
> 
> This seems a game with words.  I don't see the difference between an
> integer variable and a reference that must point to an integer.
> (Well, I see a difference, in the sharing semantics, but that's just
> the difference between a value and an pointer in C.  They're both
> variables.)

In C a pointer and a value are both "objects".  But Python references are 
not objects. In a language where almost everything is an object they are a 
conspicous exception. A slot in a list is bound to an object but there is 
no introspectable object that represents the slot *itself*.

And yes, sharing semantics make a big difference.

My basic distinction is that type categories are not a property of objects. 
An object is what it is. It doesn't need "type checking". Type categories 
are useful to check *references* and ensure that operations on a reference 
are meaningful. A useful type checking system can be built that makes no 
change at all to objects and type, only applying tests to references. The
__category__ attribute I proposed for classes is not much more than a 
convenient way to spell:

class Foo:
    ...

assert Foo in category

The category is not stored inside the class. It is an observation about 
the class, not a property of the class.

	Oren


From jacobs@penguin.theopalgroup.com  Thu Aug 15 14:55:04 2002
From: jacobs@penguin.theopalgroup.com (Kevin Jacobs)
Date: Thu, 15 Aug 2002 09:55:04 -0400 (EDT)
Subject: [Python-Dev] type categories
In-Reply-To: <20020815131335.GA4567@hishome.net>
Message-ID: 

[I'm just jumping into this thread -- please forgive me if my reply does not
make sense in the context of the past discussions on this thread -- I've
only had time to read part of the archive.]

On Thu, 15 Aug 2002, Oren Tirosh wrote:
> In C a pointer and a value are both "objects".  But Python references are 
> not objects.

References are so transparent that you can treat them as 'instance
aliases'.

> In a language where almost everything is an object they are a 
> conspicous exception.

What semantics do you propose for "reference objects"?

> A slot in a list is bound to an object but there is no introspectable
> object that represents the slot *itself*.

Of course there is -- slots bind a memory location in an object via a
descriptor object stored in its class.

> And yes, sharing semantics make a big difference. My basic distinction is
> that type categories are not a property of objects.

Why not deconstruct these ideas a little more and really explore what we
want, and not how we want to go about implementing it.  Let us consider the
following characteristics:

  1) Static interfaces/categories vs. dynamic interfaces/categories?

  i.e., does a particular instance always implement a given interface or
        category, or can it change over the lifetime of an object (to to
        state changes within the object, or changes within the
        interface/category registry system).
        
  2) Unified interface/category registry vs. distributed interface/category
                                          registry?

  i.e., are the supported interfaces/categories instrinsic to a class or are
        they reflections of the desires of the code wishing to use an
        instance?  For example, there are many possible and sensible subsets
        of the "file-like" interface.  Must any class implementing
        "file-like" objects know about and advertize that it implements a
        predefined set of interfaces?  Or, should the code wishing to use
        the object instance be able to construct a specialized
        interface/category query with which to check the specific
        capabilities of the instance?

  3) (related) Should interfaces/categories of a particular class/instance
     be enumerable?

  4) Should interfaces/categories be monotonic with respect to inheritance?

  i.e., is it sufficient that base class of an instance implements a given
        interface/category for a derived instance to implement that
        interface/category.

  
> An object is what it is. It doesn't need "type checking". Type categories 
> are useful to check *references* and ensure that operations on a reference 
> are meaningful.

This distinction is meaningless for Python.  Objects references are not
typed, and thus any reference can potentially be to any type.  Putting too
fine a point on the semantic difference between the type of an object and
the type of an object reffered to by some reference is just playing games.

> A useful type checking system can be built that makes no 
> change at all to objects and type, only applying tests to references. The
> __category__ attribute I proposed for classes is not much more than a 
> convenient way to spell:
> 
> class Foo:
>     ...
> 
> assert Foo in category
> 
> The category is not stored inside the class. It is an observation about 
> the class, not a property of the class.

Given this description, I am guessing that these are your answers to my
previous questions:

1) dynamic, since your categories are not fixed at class creation
2) ?, either is possible, though I suspect you are advocating a standard
   unified category registry.
3) Yes, depending on implementation details
4) No

Please correct my guesses if I am mistaken.

Thanks,
-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com



From pedroni@inf.ethz.ch  Thu Aug 15 14:58:46 2002
From: pedroni@inf.ethz.ch (Samuele Pedroni)
Date: Thu, 15 Aug 2002 15:58:46 +0200
Subject: [Python-Dev] FW: multimethod-0.1
Message-ID: <001801c24463$e06d3660$6d94fea9@newmexico>

[David Abrahms]
>> I haven't studied this, but from a quick glance it looks competent.
>>
>>
>> Multimethod-0.1 is another python module for implementing multimethods
>> (a.k.a.  generic functions, multiple-argument method dispatch).  This
>> one features:
>>
>> - support for Python2.2 type/class unification

It works only with new style classes and types, not with old style
classes, don't know if it's a problem, in the latter case
mutability of __bases__ becomes a problem.

>> - a precedence graph for more efficient dispatching

>> - a best-fit resolution algorithm, in which the method closest in
>> inheritance distance is called

This makes me uneasy, either we go the CLOS way where lefter
arguments take priority, or the Dylan way, i.e.  in  face of ambiguity
throw an exception (in that case we could add a mechanism to
force disptach on a supplied supertype like my _redisptach).


>> - a versatile 'call-next-method' or 'super' function.

FYI which uses a dictionaries on frames

>It's a good start, but from the docs it doesn't appear to deal with:

it's a start

>a. Type categories -- it seems as though the only way for a multimethod
>implementation to match an actual argument is if the formal argument has an
>inheritance relationship with it.

but it's really an orthogonal problem (the concrete problem wrt
multidispatch is just how to merge categories in the mro),
make categories/protocols firtst class is a different can of worms.

>b. Implicit conversions -- If I declare a function that accepts a Python
>int, can I pass a Python float?

maybe if it accepts a float you can pass an int . I see the issue
but I don't know if this should be the default on the Python side in general.
It should be configurable for the single multimethod.

c. there should be a version that can be put in class a and behave
like a method (e.g. to implement the moral of overloading) producing
bound and unbound in the first argument versions when retrieved through the
class or instances, it should probably take a function to be called
in case no matching method is found, this can be used to redispatch
to the super classes, or something similar, unless we want to redifine how the
whole single
dispatch work.

regards.



From ark@research.att.com  Thu Aug 15 15:02:31 2002
From: ark@research.att.com (Andrew Koenig)
Date: 15 Aug 2002 10:02:31 -0400
Subject: [Python-Dev] type categories
In-Reply-To: 
References: 
Message-ID: 

>> In a language where almost everything is an object they are a 
>> conspicous exception.

Kevin> What semantics do you propose for "reference objects"?

I think the idea is to constrain the circumstances under which a
reference can be created in the first place.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From guido@python.org  Thu Aug 15 15:11:30 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 15 Aug 2002 10:11:30 -0400
Subject: [Python-Dev] type categories
In-Reply-To: Your message of "Thu, 15 Aug 2002 14:58:53 +0200."
 <20020815125853.GB2880@vet.uu.nl>
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208131545.29856.mclay@nist.gov> <20020814101819.GA93585@hishome.net>  <20020815080801.GA66070@hishome.net>
 <20020815125853.GB2880@vet.uu.nl>
Message-ID: <200208151411.g7FEBUk01289@odiug.zope.com>

I think we may have retired the types-sig a week or two too
early... :-)

This kind of discussion is great for sharpening our intellect; the
types-sig had outbursts like this maybe twice a year.  But I have yet
to see something come out of it that was practical enough to be added
to Python "now".  Maybe Zope Interfaces are our best bet.  They surely
have been used and refined for almost four years now...

--Guido van Rossum (home page: http://www.python.org/~guido/)



From oren-py-d@hishome.net  Thu Aug 15 15:53:02 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Thu, 15 Aug 2002 10:53:02 -0400
Subject: [Python-Dev] type categories
In-Reply-To: 
References: <20020815131335.GA4567@hishome.net> 
Message-ID: <20020815145302.GA15633@hishome.net>

On Thu, Aug 15, 2002 at 09:55:04AM -0400, Kevin Jacobs wrote:
> [I'm just jumping into this thread -- please forgive me if my reply does not
> make sense in the context of the past discussions on this thread -- I've
> only had time to read part of the archive.]
> 
> On Thu, 15 Aug 2002, Oren Tirosh wrote:
> > In C a pointer and a value are both "objects".  But Python references are 
> > not objects.
> 
> References are so transparent that you can treat them as 'instance
> aliases'.
> 
> > In a language where almost everything is an object they are a 
> > conspicous exception.
> 
> What semantics do you propose for "reference objects"?

No no no! I am not proposing anything like that.

What I'm saying is that interfaces/categories/whateveryouwannacallit are
more about references to objects than about the objects themselves and 
pointed out that references are not even Python objects.

Two references to the same object may have very different expectations about 
what they are pointing to. I went a step further and decided to completely 
decouple it from the object: All the intelligence is in the category that 
makes observations about the object's form without requiring any change to 
the objects or types.

	Oren


From guido@python.org  Thu Aug 15 17:13:31 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 15 Aug 2002 12:13:31 -0400
Subject: [Python-Dev] Re: SET_LINENO killer
In-Reply-To: Your message of "Wed, 14 Aug 2002 14:51:25 EDT."
Message-ID: <200208151613.g7FGDVF16382@odiug.zope.com>

> python/sf/587993
> 
> Looks like Michael Hudson did an *outstanding* and very thorough job
> on this.  Does anybody see a reason why I shouldn't let him check this
> in?

OK, Michael's checked it in, after some comments from Martin.  Woo
hoo!

But here's some sad news.  I only see a speed increase of 0.5%!  I
believe that when we first looked at this patch the speedup was about
5%...  Worse, Tim claims that on his Windows box it's actually 5%
slower.  What happened?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Thu Aug 15 17:20:37 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 15 Aug 2002 12:20:37 -0400
Subject: [Python-Dev] Alternative implementation of interning
In-Reply-To: Your message of "Wed, 14 Aug 2002 23:02:27 EDT."
Message-ID: <200208151620.g7FGKb216411@odiug.zope.com>

Again: python/sf/576101

I'd like to make all interned strings mortal; this allows some
simplifications to the patch.  This would mean that in the following
example:

  x = intern('12345'*4)
  nx = id(x)
  del x
  ...do something else...
  y = intern('12345'*4)
  ny = id(y)

nx doesn't necessarily equal ny any more.  This is a backward
incompatibility but I'm willing to break programs that rely on this;
it sounds highly unlikely that the author of any such code as exists
would mind it being broken.

Opinions?

(Reminder: this is python-dev, not types-sig. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mwh@python.net  Thu Aug 15 17:22:59 2002
From: mwh@python.net (Michael Hudson)
Date: 15 Aug 2002 17:22:59 +0100
Subject: [Python-Dev] Re: SET_LINENO killer
In-Reply-To: Guido van Rossum's message of "Thu, 15 Aug 2002 12:13:31 -0400"
References: <200208151613.g7FGDVF16382@odiug.zope.com>
Message-ID: <2meld0f5wc.fsf@starship.python.net>

Guido van Rossum  writes:

> > python/sf/587993
> > 
> > Looks like Michael Hudson did an *outstanding* and very thorough job
> > on this.  Does anybody see a reason why I shouldn't let him check this
> > in?
> 
> OK, Michael's checked it in, after some comments from Martin.  Woo
> hoo!

Hurrah!

> But here's some sad news.  I only see a speed increase of 0.5%!  I
> believe that when we first looked at this patch the speedup was about
> 5%...  Worse, Tim claims that on his Windows box it's actually 5%
> slower.  What happened?

Beats me.  I still see a healthy speed up:

Before:

$ ./python ../Lib/test/pystone.py 
Pystone(1.1) time for 50000 passes = 3.99
This machine benchmarks at 12531.3 pystones/second

After:

$ ./python ../Lib/test/pystone.py 
Pystone(1.1) time for 50000 passes = 3.65
This machine benchmarks at 13698.6 pystones/second

(which is nosing on for 10% faster, actually).

You're not testing a debug vs a release build or anything like that
are you?

Cheers,
M.

-- 
  That's why the smartest companies use Common Lisp, but lie about it
  so all their competitors think Lisp is slow and C++ is fast.  (This
  rumor has, however, gotten a little out of hand. :)
                                        -- Erik Naggum, comp.lang.lisp


From nas@python.ca  Thu Aug 15 17:37:12 2002
From: nas@python.ca (Neil Schemenauer)
Date: Thu, 15 Aug 2002 09:37:12 -0700
Subject: [Python-Dev] Re: SET_LINENO killer
In-Reply-To: <200208151613.g7FGDVF16382@odiug.zope.com>; from guido@python.org on Thu, Aug 15, 2002 at 12:13:31PM -0400
References: <200208151613.g7FGDVF16382@odiug.zope.com>
Message-ID: <20020815093712.A4116@glacier.arctrix.com>

Guido van Rossum wrote:
> But here's some sad news.  I only see a speed increase of 0.5%!

Based on pystone, the current CVS tree seems to be about 8% faster on my
machine than it was two days ago.

  Neil


From tim@zope.com  Thu Aug 15 17:46:25 2002
From: tim@zope.com (Tim Peters)
Date: Thu, 15 Aug 2002 12:46:25 -0400
Subject: [Python-Dev] Alternative implementation of interning
In-Reply-To: <200208151620.g7FGKb216411@odiug.zope.com>
Message-ID: 

[Guido]
> I'd like to make all interned strings mortal; this allows some
> simplifications to the patch.  This would mean that in the following
> example:
>
>   x = intern('12345'*4)
>   nx = id(x)
>   del x
>   ...do something else...
>   y = intern('12345'*4)
>   ny = id(y)
>
> nx doesn't necessarily equal ny any more.  This is a backward
> incompatibility but I'm willing to break programs that rely on this;
> it sounds highly unlikely that the author of any such code as exists
> would mind it being broken.
>
> Opinions?

As the only person to have posted an example relying on this behavior, it's
OK by me if that example breaks -- it was made up just to illustrate the
possibility and raise a caution flag.  I don't take it seriously.

> (Reminder: this is python-dev, not types-sig. :-)

Oops!  I guess I should take it more seriously then .



From tim@zope.com  Thu Aug 15 17:54:29 2002
From: tim@zope.com (Tim Peters)
Date: Thu, 15 Aug 2002 12:54:29 -0400
Subject: [Python-Dev] Re: SET_LINENO killer
In-Reply-To: <2meld0f5wc.fsf@starship.python.net>
Message-ID: 

[Michael Hudson]
> Beats me.  I still see a healthy speed up:
>
> Before:
>
> $ ./python ../Lib/test/pystone.py
> Pystone(1.1) time for 50000 passes = 3.99
> This machine benchmarks at 12531.3 pystones/second
>
> After:
>
> $ ./python ../Lib/test/pystone.py
> Pystone(1.1) time for 50000 passes = 3.65
> This machine benchmarks at 13698.6 pystones/second
>
> (which is nosing on for 10% faster, actually).
>
> You're not testing a debug vs a release build or anything like that
> are you?

I'm not, but I was comparing -O times (in release builds).  Three runs
before patch:

C:\Code\python\PCbuild>python -O ../lib/test/pystone.py
Pystone(1.1) time for 50000 passes = 3.49756
This machine benchmarks at 14295.7 pystones/second

C:\Code\python\PCbuild>python -O ../lib/test/pystone.py
Pystone(1.1) time for 50000 passes = 3.49881
This machine benchmarks at 14290.6 pystones/second

C:\Code\python\PCbuild>python -O ../lib/test/pystone.py
Pystone(1.1) time for 50000 passes = 3.52653
This machine benchmarks at 14178.2 pystones/second


Three runs after patch:

C:\Code\python\PCbuild>python -O  ../lib/test/pystone.py
Pystone(1.1) time for 50000 passes = 3.74291
This machine benchmarks at 13358.6 pystones/second

C:\Code\python\PCbuild>python -O  ../lib/test/pystone.py
Pystone(1.1) time for 50000 passes = 3.74544
This machine benchmarks at 13349.6 pystones/second

C:\Code\python\PCbuild>python   ../lib/test/pystone.py
Pystone(1.1) time for 50000 passes = 3.74487
This machine benchmarks at 13351.6 pystones/second


Three runs after commenting out the new

		if (tstate->c_tracefunc != NULL && !tstate->tracing) {
			/* see maybe_call_line_trace
			   for expository comments */
			maybe_call_line_trace(opcode,
					      tstate->c_tracefunc,
					      tstate->c_traceobj,
					      f, &instr_lb, &instr_ub);
		}

on the eval-loop critical path:

C:\Code\python\PCbuild>python   ../lib/test/pystone.py
Pystone(1.1) time for 50000 passes = 3.59444
This machine benchmarks at 13910.4 pystones/second

C:\Code\python\PCbuild>python   ../lib/test/pystone.py
Pystone(1.1) time for 50000 passes = 3.59211
This machine benchmarks at 13919.4 pystones/second

C:\Code\python\PCbuild>python   ../lib/test/pystone.py
Pystone(1.1) time for 50000 passes = 3.59742
This machine benchmarks at 13898.9 pystones/second


OTOH, MSVC 6 has been generating faster ceval.c code than gcc for a long
time; given how touchy this is, maybe it's just time for gcc to win 587 coin
flips in a row .



From guido@python.org  Thu Aug 15 18:31:04 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 15 Aug 2002 13:31:04 -0400
Subject: [Python-Dev] Re: SET_LINENO killer
In-Reply-To: Your message of "Thu, 15 Aug 2002 17:22:59 BST."
 <2meld0f5wc.fsf@starship.python.net>
References: <200208151613.g7FGDVF16382@odiug.zope.com>
 <2meld0f5wc.fsf@starship.python.net>
Message-ID: <200208151731.g7FHV4F04526@odiug.zope.com>

> > But here's some sad news.  I only see a speed increase of 0.5%!  I
> > believe that when we first looked at this patch the speedup was about
> > 5%...  Worse, Tim claims that on his Windows box it's actually 5%
> > slower.  What happened?
> 
> Beats me.  I still see a healthy speed up:
> 
> Before:
> 
> $ ./python ../Lib/test/pystone.py 
> Pystone(1.1) time for 50000 passes = 3.99
> This machine benchmarks at 12531.3 pystones/second
> 
> After:
> 
> $ ./python ../Lib/test/pystone.py 
> Pystone(1.1) time for 50000 passes = 3.65
> This machine benchmarks at 13698.6 pystones/second
> 
> (which is nosing on for 10% faster, actually).
> 
> You're not testing a debug vs a release build or anything like that
> are you?

Absolutely not.  I did a "cvs update -D yesterday" and built a
Python binary.  That was only about 1.5% slower than today's binary
built in the same directory minutes earlier.

On the other hand, on a different machine, I had a checkout that was
approximately 2 days old, and there the latest checkout was about 6%
faster (all without -O).  Perhaps something else has happened in the
last two days that actually is responsible for the speedup?

I'm also happy to report that with current cvs, -O makes almost no
difference: it's only 0.18% faster.  Current cvs witout -O is about
the same speed as two days ago with -O.

So it's still a mystery.  What happened yesterday that could have
caused a speedup?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tismer@tismer.com  Thu Aug 15 18:43:20 2002
From: tismer@tismer.com (Christian Tismer)
Date: Thu, 15 Aug 2002 19:43:20 +0200
Subject: [Python-Dev] Re: SET_LINENO killer
References: <200208151613.g7FGDVF16382@odiug.zope.com>              <2meld0f5wc.fsf@starship.python.net> <200208151731.g7FHV4F04526@odiug.zope.com>
Message-ID: <3D5BE838.3050400@tismer.com>

Guido van Rossum wrote:
...

> So it's still a mystery.  What happened yesterday that could have
> caused a speedup?

Without looking ijto it, but I have spent weeks in
speeding up the main loop, and I can tell you that
it shows some fractal behavior, at least under
Windows. It appears to make a difference, when small
code moves happen, or some variable vanishes and the
compiler decides to re-arrange registers or change
code ordering, and especially folding. I experienced
that some changes of mine caused the compiler to
re-use a common code sequence, which caused one
more jump and made it slower.

Just to tell you, it isn't always under your control,
and something that *should* run faster is actually
slower, especially when you're fiddling with
fractions of percentiles.

SET_LINENO was so cheap, that after some tests,
I decided to keep it in, also since I found it
useful for line-wise interrupts.

Btw., with the new patch, how is tracing done now?
(sorry, I could read the sources but I'm under pressure)

cheers - chris

-- 
Christian Tismer             :^)   
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/




From pedroni@inf.ethz.ch  Thu Aug 15 18:37:39 2002
From: pedroni@inf.ethz.ch (Samuele Pedroni)
Date: Thu, 15 Aug 2002 19:37:39 +0200
Subject: [Python-Dev] Alternative implementation of interning
Message-ID: <001201c24482$7458a940$6d94fea9@newmexico>

In Jython as long as we want to support Java 1.1
(and AFAIK Finn still will) we cannot make interned
string always mortal.
So it is OK if CPython goes this route, but the Python
manual should say that it is unspecified whether
intern results are mortal or immortal or nothing on the subject
(now it explicitly says immortal).

regards.



From walter@livinglogic.de  Thu Aug 15 18:58:16 2002
From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=)
Date: Thu, 15 Aug 2002 19:58:16 +0200
Subject: [Python-Dev] mimetypes patch #554192
Message-ID: <3D5BEBB8.7080904@livinglogic.de>

Patch http://www.python.org/sf/554192 adds a function to
mimetypes.py that returns all known extensions for a mimetype,
e.g.

 >>> import mimetypes
 >>> mimetypes.guess_all_extensions("image/jpeg")
['.jpg', '.jpe', '.jpeg']

Martin v. Loewis and I were discussing whether it would make
sense to make the helper method add_type (which is used for
adding a mapping between one type and one extension) visible
on the module level.

Any comments?

Bye,
    Walter Dörwald



From guido@python.org  Thu Aug 15 19:04:55 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 15 Aug 2002 14:04:55 -0400
Subject: [Python-Dev] Alternative implementation of interning
In-Reply-To: Your message of "Thu, 15 Aug 2002 19:37:39 +0200."
 <001201c24482$7458a940$6d94fea9@newmexico>
References: <001201c24482$7458a940$6d94fea9@newmexico>
Message-ID: <200208151804.g7FI4tr04779@odiug.zope.com>

> In Jython as long as we want to support Java 1.1
> (and AFAIK Finn still will) we cannot make interned
> string always mortal.
> So it is OK if CPython goes this route, but the Python
> manual should say that it is unspecified whether
> intern results are mortal or immortal or nothing on the subject
> (now it explicitly says immortal).

That's okay.  Immortality of interned strings is mostly an issue for
very long running server processes that take connections from
arbitrary clients; the issue is that arbitrary client data
accidentally gets immortalized because it is tried as an attribute
name or mapping key.  While Jython *could* be used in JSP server
setups, I expect that most long-running Python servers are using
CPython and a framework like Zope, Twisted or Quixote.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From aahz@pythoncraft.com  Thu Aug 15 19:04:11 2002
From: aahz@pythoncraft.com (Aahz)
Date: Thu, 15 Aug 2002 14:04:11 -0400
Subject: [Python-Dev] Mutable exceptions? (was Re: PEP 293, Codec Error Handling Callbacks)
Message-ID: <20020815180411.GA24780@panix.com>

On Mon, Aug 12, 2002, Martin v. Loewis wrote:
> "M.-A. Lemburg"  writes:
>> 
>> What ? That exceptions are immutable ? I think it's a big win that
>> exceptions are in fact mutable -- they are great for transporting
>> extra information up the chain...
> 
> I see. So this is an open issue.

This looks like an issue that potentially deserves more community
feedback, so I'm ripping it out and starting a new thread: should
exception objects be treated as mutable as exceptions get caught and
re-raised?

(I'm not suggesting any code changes, just trying to get a feel for what
"standard practice" ought to be, partly for the book I'm writing.)
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Project Vote Smart: http://www.vote-smart.org/


From aahz@pythoncraft.com  Thu Aug 15 19:05:57 2002
From: aahz@pythoncraft.com (Aahz)
Date: Thu, 15 Aug 2002 14:05:57 -0400
Subject: [Python-Dev] Alternative implementation of interning
In-Reply-To: <001201c24482$7458a940$6d94fea9@newmexico>
References: <001201c24482$7458a940$6d94fea9@newmexico>
Message-ID: <20020815180557.GB24780@panix.com>

On Thu, Aug 15, 2002, Samuele Pedroni wrote:
>
> In Jython as long as we want to support Java 1.1 (and AFAIK Finn still
> will) we cannot make interned string always mortal.  So it is OK if
> CPython goes this route, but the Python manual should say that it is
> unspecified whether intern results are mortal or immortal or nothing
> on the subject (now it explicitly says immortal).

Isn't this irrelevant, anyway, because Jython doesn't implement
CPython's refcount mechanisms?
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

Project Vote Smart: http://www.vote-smart.org/


From guido@python.org  Thu Aug 15 19:15:51 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 15 Aug 2002 14:15:51 -0400
Subject: [Python-Dev] Mutable exceptions? (was Re: PEP 293, Codec Error Handling Callbacks)
In-Reply-To: Your message of "Thu, 15 Aug 2002 14:04:11 EDT."
 <20020815180411.GA24780@panix.com>
References: <20020815180411.GA24780@panix.com>
Message-ID: <200208151815.g7FIFpn04821@odiug.zope.com>

> This looks like an issue that potentially deserves more community
> feedback, so I'm ripping it out and starting a new thread: should
> exception objects be treated as mutable as exceptions get caught and
> re-raised?

(A new thread in python-dev hardly counts as "community feedback".)

I'd say definitely.  Code like this looks reasonable to me:

  def some_function(arg):
    try:
      call_some_other_function(arg)
    except SomeExpectedExceptionClass, obj:
      obj.add_context(arg)
      raise

Then some outer piece of code catches exceptions and produces a
traceback augmented by information added by various calls to
add_context().

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry@python.org  Thu Aug 15 19:19:24 2002
From: barry@python.org (Barry A. Warsaw)
Date: Thu, 15 Aug 2002 14:19:24 -0400
Subject: [Python-Dev] mimetypes patch #554192
References: <3D5BEBB8.7080904@livinglogic.de>
Message-ID: <15707.61612.844119.819432@anthem.wooz.org>

>>>>> "WD" =3D=3D Walter D=F6rwald  writes:

    WD> Martin v. Loewis and I were discussing whether it would make
    WD> sense to make the helper method add_type (which is used for
    WD> adding a mapping between one type and one extension) visible
    WD> on the module level.

    WD> Any comments?

+1 on add_types() being public, but it should probably have a strict
flag to decide whether to add the new entry to the standard types dict
or the common types dict.

-Barry


From barry@python.org  Thu Aug 15 19:21:38 2002
From: barry@python.org (Barry A. Warsaw)
Date: Thu, 15 Aug 2002 14:21:38 -0400
Subject: [Python-Dev] Mutable exceptions? (was Re: PEP 293, Codec Error Handling Callbacks)
References: <20020815180411.GA24780@panix.com>
Message-ID: <15707.61746.849095.773761@anthem.wooz.org>

>>>>> "A" == Aahz   writes:

    >> "M.-A. Lemburg"  writes:
    >> What ? That exceptions are immutable ? I think it's a big win
    >> that exceptions are in fact mutable -- they are great for
    >> transporting extra information up the chain...
    >> I see. So this is an open issue.

    A> This looks like an issue that potentially deserves more
    A> community feedback, so I'm ripping it out and starting a new
    A> thread: should exception objects be treated as mutable as
    A> exceptions get caught and re-raised?

    A> (I'm not suggesting any code changes, just trying to get a feel
    A> for what "standard practice" ought to be, partly for the book
    A> I'm writing.)

MAL's right, it /is/ occasionally useful to do this.  A call higher up
the chain may have more information about the failing condition, and
it can be useful to augment the exception object with this extra
information.  That's one of the reasons why exception classes are so
much nicer than exception strings!

-Barry


From nas@python.ca  Thu Aug 15 20:02:31 2002
From: nas@python.ca (Neil Schemenauer)
Date: Thu, 15 Aug 2002 12:02:31 -0700
Subject: [Python-Dev] Alternative implementation of interning
In-Reply-To: <20020815180557.GB24780@panix.com>; from aahz@pythoncraft.com on Thu, Aug 15, 2002 at 02:05:57PM -0400
References: <001201c24482$7458a940$6d94fea9@newmexico> <20020815180557.GB24780@panix.com>
Message-ID: <20020815120231.A4727@glacier.arctrix.com>

Aahz wrote:
> Isn't this irrelevant, anyway, because Jython doesn't implement
> CPython's refcount mechanisms?

I don't think mark & sweep or copying GC saves you.  If Jython keeps a
reference to interned strings then the GC cannot free that memory.

  Neil


From pedroni@inf.ethz.ch  Thu Aug 15 19:51:54 2002
From: pedroni@inf.ethz.ch (Samuele Pedroni)
Date: Thu, 15 Aug 2002 20:51:54 +0200
Subject: [Python-Dev] Alternative implementation of interning
References: <001201c24482$7458a940$6d94fea9@newmexico>  <200208151804.g7FI4tr04779@odiug.zope.com>
Message-ID: <005a01c2448c$d37a19e0$6d94fea9@newmexico>

From: Guido van Rossum 
> > In Jython as long as we want to support Java 1.1
> > (and AFAIK Finn still will) we cannot make interned
> > string always mortal.
> > So it is OK if CPython goes this route, but the Python
> > manual should say that it is unspecified whether
> > intern results are mortal or immortal or nothing on the subject
> > (now it explicitly says immortal).
>
> That's okay.  Immortality of interned strings is mostly an issue for
> very long running server processes that take connections from
> arbitrary clients; the issue is that arbitrary client data
> accidentally gets immortalized because it is tried as an attribute
> name or mapping key.  While Jython *could* be used in JSP server
> setups, I expect that most long-running Python servers are using
> CPython and a framework like Zope, Twisted or Quixote.

Ok, thinking a bit more it's a kind of trade-off
('is' speed for Python strings and 2 ref plus an int vs. a ref a boolean and an
int
of space required for Python strings (which is kind of VM depedent and should
be measured)), we could make
the Python interned strings mortal but anyway:

- we use Java interned strings (immortal anyway) for class,module, and instance
dictionaries anyway.
- for the rest Python interned strings are just the result of intern()
  with the property that the wrapped Java string is also a Java interned one
(so immortal).

so the point for us is a bit muddy.

regards.






From dave@boost-consulting.com  Thu Aug 15 22:02:04 2002
From: dave@boost-consulting.com (David Abrahams)
Date: Thu, 15 Aug 2002 17:02:04 -0400
Subject: [Python-Dev] type categories
References: <200208131802.g7DI2Ro27807@europa.research.att.com> <200208132115.g7DLFwL25088@odiug.zope.com> <0c0501c24311$8cebbdc0$6501a8c0@boostconsulting.com> <200208140242.g7E2gCs30811@pcp02138704pcs.reston01.va.comcast.net> <0ccc01c24341$d839b130$6501a8c0@boostconsulting.com> <200208140405.g7E45s731824@pcp02138704pcs.reston01.va.comcast.net> <0ce601c24346$ff967f60$6501a8c0@boostconsulting.com> <20020814085440.A31966@glacier.arctrix.com>
Message-ID: <13b001c2449f$08423240$6501a8c0@boostconsulting.com>

From: "Neil Schemenauer" 


> David Abrahams wrote:
> > There's not all that much to what I'm doing. I have a really
simple-minded
> > dispatching scheme which checks each overload in sequence, and takes
the
> > first one which can get a match for all arguments.
>
> Can you explain in more detail how the matching is done?  Wouldn't
> having some kind of type declarations be a precondition to implementing
> multiple dispatch.

Since in Boost.Python we are ultimately wrapping C++ function and member
function pointers, the type declarations are available to us. For each C++
type, any number of from_python converters may be registered with the
system. Each converter can have its own matching criterion. For example,
there is a pre-registered converter for each of the built-in C++ integral
types which checks the source object's tp_int field to decide
convertibility. When you wrap a C++ class, a from_python converter is
registered whose convertibility criterion checks to see if the source
object is one of my extension classes, then asks if it contains a C++
object of the appropriate type. Since we have C++ types corresponding to
some of the built-in Python types (e.g. list, dict, str), the
convertibility criterion for those just checks to see whether the Python
object has the appropriate type. However, we're not limited to matching
precise types: we could easily make a C++ type called "sequence" whose
converter would match any Python sequence (if we could decide exactly what
constitutes a Python sequence <.02 wink>).

HTH,
Dave

P.S. If you want even /more/ gory details, just ask: I have plenty ;-)

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com





From David Abrahams"  
Message-ID: <14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com>

From: "Andrew Koenig" 

> Greg> so you're not gaining much from the new syntax.
>
> I'm not so sure.  The code is alreasy somewhat simpler here, and it
> would be substantially simpler in examples such as
>
>         def arctan(x):
>             ...
>         else arctan(y, x):
>             ...
>
> >> It might also be possible to modify a multimethod dynamically,
> >> e.g. later one could write:
> >>
> >> def f4(a: Cat4):
> >> ...code for Cat4...
> >>
> >> f.add(f4)
>
> Greg> This sort of scheme makes me uneasy, because it means that any
module
> Greg> can change the behaviour of any call of f() in any other
> Greg> module.
>
> It makes me uneasy because the behavior of programs might depend on the
> order in which modules are loaded.  That's why I didn't suggest a way
> of defining the variations on f in separate places.

This concern seems most un-pythonic to my eye, since there are already all
kinds of ways any module can change the behavior of any call in another
module. The moset direct way is by rebinding the implementation of another
module's function. Python is a dynamic language, and that is usually seen
as a strength.

More importantly, though, forcing all the definitions to be in one place
prevents an important (you might even say the most important) use case: the
author of a new type should be able to provide a a multimethod
implementation corresponding to that type. For example, if I write a
rational number class, I should be able to plug in a corresponding arctan
implementation.

I'm extra-surprised to see that Andy's uneasy about this, since a C++
feature which (colloquially) bears his name was purpose-built to make this
sort of thing possible. Koenig lookup raises a similar issue: that the
behavior of a function call can be changed depending on which headers are
#included, and even the order they're #included in.

[I personally have many other concerns about how that feature worked out in
C++ - paper available on request - but the Python implementation being
suggested here suffers none of those problems because of its explicit
nature]

-Dave





From greg@cosc.canterbury.ac.nz  Fri Aug 16 01:18:41 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 16 Aug 2002 12:18:41 +1200 (NZST)
Subject: [Python-Dev] type categories
In-Reply-To: <2mfzxg5ugq.fsf@starship.python.net>
Message-ID: <200208160018.g7G0IfK05482@oma.cosc.canterbury.ac.nz>

> I mean, you can currently do
> 
> import mod
> 
> mod.func = my_func # evil cackle!

That's my point -- f.add(method) is just like doing that.

As with import *, no doubt the problems can be managed with
appropriate discipline. But it's not clear to me what sort of
discipline is needed.

Suppose you have a module H defining class Hobbit, and a class E
defining class Elf. Now you want to be able to add hobbits and elves,
but you don't want to clutter up either H or E with stuff concerning
the other one, so you put it in a third module M.

Now suppose module X creates a Hobbit, and moule Y creates an Elf. In
the course of processing, they both end up in module Z, which adds
them together.

It's not clear how you can tell, when looking at Z, where to look to
find out what method will be called -- even if you know you're dealing
with a Hobbit and an Elf.

There's another problem, too -- who is responsible for *importing*
module M? It's not E or H, neither of which knows about the
other. It's not X, which only knows about Hobbits, or Y, which only
knows about Elves. It's not Z, which doesn't know about either of them
-- it just gets two things to add together.

So, unless you arbitrarily insert an import of M into one of these
modules, for no reason that's apparent from looking at that module --
M won't be imported at all, and the method it contains won't be added
to the generic function.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Fri Aug 16 01:48:38 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 16 Aug 2002 12:48:38 +1200 (NZST)
Subject: [Python-Dev] type categories
In-Reply-To: <20020815145302.GA15633@hishome.net>
Message-ID: <200208160048.g7G0mc605574@oma.cosc.canterbury.ac.nz>

Oren Tirosh :

> Two references to the same object may have very different expectations
> about what they are pointing to.

It sounds a bit odd to talk about references having expectations.

I think all Oren is trying to say is that different pieces of code may
have different requirements of the same object, or the same piece of
code at different times, and that it's not practical to precalculate
all the requirements that might exist and put information about them
in the object itself or its class, or anywhere else for that matter.
So he wants to check the requirements procedurally whenever such
a check is needed.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Fri Aug 16 01:51:51 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 16 Aug 2002 12:51:51 +1200 (NZST)
Subject: [Python-Dev] Alternative implementation of interning
In-Reply-To: <200208151620.g7FGKb216411@odiug.zope.com>
Message-ID: <200208160051.g7G0ppf05579@oma.cosc.canterbury.ac.nz>

>  x = intern('12345'*4)
>  nx = id(x)
>  del x
>  ...do something else...
>  y = intern('12345'*4)
>  ny = id(y)
>
> nx doesn't necessarily equal ny any more.  This is a backward
> incompatibility

If you wrote something like that *expecting* the strings
to be immortal, there would be no reason to bother with
the ids -- just keep references to the strings themselves.

If you *weren't* expecting them to be immortal, there would
be no reason to expect the ids to be equal anyway.

So I agree -- it's not a problem.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Fri Aug 16 02:10:31 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 16 Aug 2002 13:10:31 +1200 (NZST)
Subject: [Python-Dev] type categories
In-Reply-To: <14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com>
Message-ID: <200208160110.g7G1AVD09783@cosc.canterbury.ac.nz>

David Abrahams :

> This concern seems most un-pythonic to my eye, since there are already
> all kinds of ways any module can change the behavior of any call in
> another module.

Yes, but most of the time you don't have to use them!
With this feature, it would be the *normal* way of using
it.

> forcing all the definitions to be in one place prevents an important
> (you might even say the most important) use case: the author of a new
> type should be able to provide a a multimethod implementation
> corresponding to that type.

You can get that without the notion of a generic function as a
separate entity. Just have a dispatch mechanism that looks in all the
arguments of a call for a method to use, instead of just the first
one.

That would be relatively tractable, since at least you'd know that the
method must be found in one of the argument classes somewhere.

It also doesn't suffer from the who-imports-the-module problem, since
someone must have imported it in order to get an object of that class
in the first place.

The use case that this doesn't cover is where you're not defining a
new class, just trying to add behaviour to handle a previously
unanticipated combination of existing classes.  The organisational
problems involved in that aren't unique to Python, and seem to me an
inherent feature of the problem itself. Where does functionality
belong that isn't owned by any class?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Fri Aug 16 02:26:10 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 16 Aug 2002 13:26:10 +1200 (NZST)
Subject: [Python-Dev] type categories
In-Reply-To: <14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com>
Message-ID: <200208160126.g7G1QA410865@cosc.canterbury.ac.nz>

David Abrahams :

> Koenig lookup raises a similar issue: that the behavior of a function
> call can be changed depending on which headers are #included, and even
> the order they're #included in.

But at least you can, in principle, figure out what will be done by a
particular call in the source, by examining the files included by that
source file.

With the proposed generic function mechanism in Python, that wouldn't
be true.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From David Abrahams" 
Message-ID: <14db01c244c3$0cc231c0$6501a8c0@boostconsulting.com>

From: "Greg Ewing" 


> David Abrahams :
>
> > This concern seems most un-pythonic to my eye, since there are already
> > all kinds of ways any module can change the behavior of any call in
> > another module.
>
> Yes, but most of the time you don't have to use them!
> With this feature, it would be the *normal* way of using
> it.

I don't understand. You still don't have to use it. Nobody would force you
to add or encourage multimethod overloads. In fact, I think it would be
most appropriate if multimethods meant to be overloaded had to be declared
explicitly.

> > forcing all the definitions to be in one place prevents an important
> > (you might even say the most important) use case: the author of a new
> > type should be able to provide a a multimethod implementation
> > corresponding to that type.
>
> You can get that without the notion of a generic function as a
> separate entity. Just have a dispatch mechanism that looks in all the
> arguments of a call for a method to use, instead of just the first
> one.
>
> That would be relatively tractable, since at least you'd know that the
> method must be found in one of the argument classes somewhere.

Nooooo-o-o-o-o....!

(Sorry, I'm overreacting... but just a little)

That approach suffers from all the problems of Koenig lookup in C++.
Namely, if I provide a method foo in my class, and two different modules
are invoking "foo", whose idea of the "foo" semantics am I implementing?
That really becomes a problem for authors of generic functions (the ones
that call the multimethods) because every time they call a function it
essentially reserves the name of that function for the semantics they're it
to have. This is currently, IMO, one of the most-intractable problems in
C++ and I'd hate to see Python go down that path.** If you want to see the
gory details, ask me to send you my paper about it.

> It also doesn't suffer from the who-imports-the-module problem, since
> someone must have imported it in order to get an object of that class
> in the first place.

I don't think that's a serious problem. Multimethod definitions that apply
to a given type will typically be supplied by the same module as the type.

> The use case that this doesn't cover is where you're not defining a
> new class, just trying to add behaviour to handle a previously
> unanticipated combination of existing classes.  The organisational
> problems involved in that aren't unique to Python, and seem to me an
> inherent feature of the problem itself. Where does functionality
> belong that isn't owned by any class?

Often there's behavior associated with combinations of classes from the
same package or module. It's reasonable to supply that at module scope.
Besides the practical problems mentioned above, I think it's unnatural to
try to tie MULTImethod implementations to a single class. When you try to
generalize that arrangement to two arguments, you end up with something
like the __add__/__radd__ system, and generalizing it to three arguments is
next to impossible.

Where to supply multimethods that work for types defined in different
modules/packages is an open question, but it's a question that applies to
both the class-scope and module-scope approaches

-Dave

** Python is very nice about using explicit qualification to associate
semantics with implementation (i.e. we write self.foo(x) and not just
foo(x)), and this would be a major break with that tradition. Explicit is
better than implicit.



From David Abrahams" 
Message-ID: <150601c244c3$f43264d0$6501a8c0@boostconsulting.com>

From: "Greg Ewing" 


> David Abrahams :
> 
> > Koenig lookup raises a similar issue: that the behavior of a function
> > call can be changed depending on which headers are #included, and even
> > the order they're #included in.
> 
> But at least you can, in principle, figure out what will be done by a
> particular call in the source, by examining the files included by that
> source file.
> 
> With the proposed generic function mechanism in Python, that wouldn't
> be true.

Like anything in Python, you can figure out what will happen by 

a. examining all the source that will be executed
b. examining the state of things at runtime.

What's new?

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com




#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################


From greg@cosc.canterbury.ac.nz  Fri Aug 16 03:11:09 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 16 Aug 2002 14:11:09 +1200 (NZST)
Subject: [Python-Dev] type categories
In-Reply-To: <14db01c244c3$0cc231c0$6501a8c0@boostconsulting.com>
Message-ID: <200208160211.g7G2B9Q11860@cosc.canterbury.ac.nz>

David Abrahams :

> I think it's unnatural to try to tie MULTImethod implementations to a
> single class.

I'm not sure what you mean by that. What I was talking about wouldn't
be tied to a single class. Any given method implementation would have
to reside in some class, but the dispatch mechanism would be
symmetrical with respect to all its arguments.

> When you try to generalize that arrangement to two arguments, you
> end up with something like the __add__/__radd__ system, and
> generalizing it to three arguments is next to impossible.

But it's exactly the same problem as implementing a generic function
dispatch mechanism. If you can solve one, you can solve the other.

I'm talking about replacing

  f(a, b, c)

where f is a generic function, with

  (a, b, c).f

(not necessarily that syntax, but that's more or less what it would
mean.) The dispatch mechanism -- whatever it is -- is the same,
but the generic function entity itself doesn't exist.

> Where to supply multimethods that work for types defined in
> different modules/packages is an open question, but it's a question
> that applies to both the class-scope and module-scope approaches

The class-scope approach would be saying effectively that
you're not allowed to have a method that doesn't belong in 
any class -- you have to pick a class and put it there.

That doesn't solve the problem, I know, but at least it
would be explicit about not solving it!

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From David Abrahams" 
Message-ID: <154301c244ca$0c338220$6501a8c0@boostconsulting.com>

From: "Greg Ewing" 


> David Abrahams :
>
> > I think it's unnatural to try to tie MULTImethod implementations to a
> > single class.
>
> I'm not sure what you mean by that. What I was talking about wouldn't
> be tied to a single class. Any given method implementation would have
> to reside in some class,

"reside in" is approximately equivalent to what I meant by "tied to". I
think it's unnatural to force users to associate a function designed to be
considered symmetrically over a combination of types (and "type
categories", I hope) with a single one of those types.

That approach also prevents another important use-case:

I want to use type X with generic function F, and can write a plausible
implementation of some multimethod call used by F for X, but the author of
X didn't supply it

> > When you try to generalize that arrangement to two arguments, you
> > end up with something like the __add__/__radd__ system, and
> > generalizing it to three arguments is next to impossible.
>
> But it's exactly the same problem as implementing a generic function
> dispatch mechanism. If you can solve one, you can solve the other.
>
> I'm talking about replacing
>
>   f(a, b, c)
>
> where f is a generic function, with
>
>   (a, b, c).f
>
> (not necessarily that syntax, but that's more or less what it would
> mean.) The dispatch mechanism -- whatever it is -- is the same,
> but the generic function entity itself doesn't exist.

Are you talking about allowing the "self" argument of a multimethod to
appear in any position in the argument list? Othewise you get a
proliferation of __add__, __radd__, __r2add__, etc. methods

> > Where to supply multimethods that work for types defined in
> > different modules/packages is an open question, but it's a question
> > that applies to both the class-scope and module-scope approaches
>
> The class-scope approach would be saying effectively that
> you're not allowed to have a method that doesn't belong in
> any class -- you have to pick a class and put it there.
>
> That doesn't solve the problem, I know, but at least it
> would be explicit about not solving it!

And that's progress?

Anyway, you've managed to avoid the most important problem with this
approach (and this is a way of rephrasing my analogy to the problems with
Koenig lookup in C++): it breaks namespaces. When a module author defines a
generic function he shouldn't be forced to go to some name distribution
authority to find a unique name: some_module.some_function should be enough
to ensure uniqueness. Class authors had better be able to say precisely
which module's idea of "some_function" they're implementing. If you want
class authors to write something like:

    def some_module.some_function(T1: x, T2: y)

within the class body, I guess it's OK with me, but it seems rather
pointless to force the association with a class in that case, since the
really important association is with the module defining the generic
function's semantics. Explicit is better than implicit.

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com




From sholden@holdenweb.com  Fri Aug 16 03:36:01 2002
From: sholden@holdenweb.com (Steve Holden)
Date: Thu, 15 Aug 2002 22:36:01 -0400
Subject: [Python-Dev] CGIHTTPServer interactions with Internet Explorer
Message-ID: <07da01c244cd$ac5d3810$6300000a@holdenweb.com>

I'm currently researching some changes needed to solve a couple of bugs
(430160 and 428345) where Internet Explorer (ironically in the name of
Netscape compatibility, though as far as I can see Netscape stopped doing
this at about release 2) will send an extra CRLF over and above the
advertised Content-Length in a POST method input stream.

If the server closes the socket before removing this input, IE somehow gets
confused, and will (usually) send a second POST request, (most often)
followed by a GET request. [This had me tearing my hair out for three days
when writing PWP].

With Kevin Altis' help I have what appears to be a basic fix for
CGIHTTPServer, but there are a couple of points I'd appreciate some advice
on.

1) Although the basic code can use select() to ensure the input stream is no
longer readable (and therefore presumably flushed), I'm not confident enough
about the modifications to assert that they'll work when assembled with
Forking or Threading mixins. If anyone knows the code well enough to offer
an opinion it would be helpful.

2) I understand that the appropriate RFC mandates that SCRIPTS must not read
more than Content-Length bytes and believe this is the relevant quote:

> > When a CGI gets a POSTed request, the "message-body" appears on standard
> > input:
> >
> >   6.2. Request Message-Bodies
> >
> >    As there may be a data entity attached to the request, there
> MUST be a
> >    system defined method for the script to read these data.
> Unless defined
> >    otherwise, this will be via the 'standard input' file descriptor.
> >
> >    If the CONTENT_LENGTH value (see section 6.1.2) is non-NULL,
> the server
> >    MUST supply at least that many bytes to scripts on the standard input
> >    stream. Scripts are not obliged to read the data. Servers MAY signal
> >    an EOF condition after CONTENT_LENGTH bytes have been read, but are
> >    not obligated to do so. Therefore, scripts MUST NOT attempt to read
> >    more than CONTENT_LENGTH bytes, even if more data are available.

Clearly this would be significant for HTTP/1.1. Technically the change would
be the *server* reading the extra bytes and not the *script*. Under HTTP/1.0
I suspect I can assume nothing will break. I'm less happy if a persistent
connection is invoked, since I'm just sucking on the socket until it comes
up empty. This could clearly interfere with a request with a "Connection:
Keep-Alive" header. Does anyone know whether IE uses this header when it's
indulging in the error behavior?

The current first-round  patch is available under


https://sourceforge.net/tracker/?func=detail&aid=430160&group_id=5470&atid=1
05470

if anyone wants to test it and let me know of any problems or suggestions
for improvement.

regards
-----------------------------------------------------------------------
Steve Holden                                 http://www.holdenweb.com/
Python Web Programming                http://pydish.holdenweb.com/pwp/
-----------------------------------------------------------------------






From greg@cosc.canterbury.ac.nz  Fri Aug 16 04:42:13 2002
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 16 Aug 2002 15:42:13 +1200 (NZST)
Subject: [Python-Dev] type categories
In-Reply-To: <154301c244ca$0c338220$6501a8c0@boostconsulting.com>
Message-ID: <200208160342.g7G3gDn13556@cosc.canterbury.ac.nz>

David Abrahams :

> I think it's unnatural to force users to associate a function
> designed to be considered symmetrically over a combination of types
> ... with a single one of those types.

I agree, but the alternative seems to be for it to reside
"out there" somewhere in an unknown location from the point
of view of code which uses it.

> Are you talking about allowing the "self" argument of a multimethod to
> appear in any position in the argument list?

Something like that. I haven't really thought it through
very much. Maybe the method name could be systematically
modified to indicate which argument position is "self",
or something like that.

> > That doesn't solve the problem, I know, but at least it
> > would be explicit about not solving it!
> 
> And that's progress?

Maybe. I don't know. At least it would generalise and make available
to the user what's already there in an ad-hoc way to deal with numeric
types.

> Class authors had better be able to say precisely
> which module's idea of "some_function" they're implementing. If you want
> class authors to write something like:
> 
>     def some_module.some_function(T1: x, T2: y)

That would only be an issue if T1 and T2 were already both using the
name some_function for incompatible purposes. That can happen now
anyway with multiple inheritance -- name clashes can occur whenever
you inherit from two pre-existing classes. I don't see that it's any
more of a problem here.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From barry@python.org  Fri Aug 16 07:50:51 2002
From: barry@python.org (Barry A. Warsaw)
Date: Fri, 16 Aug 2002 02:50:51 -0400
Subject: [Python-Dev] gmane.org
Message-ID: <15708.41163.759579.616038@anthem.wooz.org>

I don't know how many of you are hip to this yet, but in case you
occasionally feel overwhelmed by traffic on the various python lists,
or if you just prefer a news interface over your mail reader, or if
you occasionally want to drop in on a thread for a list you don't
normally follow, you should check out gmane.org.  This domain is run
by Lars Magne Ingebrigtsen, the guy who wrote the GNUS mail and news
reader for Emacs.  It's basically a non-expiring mail/news gateway for
mailing lists.

Among the tons of lists it carries, it's got all the Python lists I've
heard of and many I haven't, although it takes a little figuring out
the mapping (e.g. python-list <-> gmane.comp.python.general).  There's
a web page that can help you find the list you're interested in.

Check out www.gmane.org for details.

-Barry


From walter@livinglogic.de  Fri Aug 16 09:27:17 2002
From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=)
Date: Fri, 16 Aug 2002 10:27:17 +0200
Subject: [Python-Dev] Mutable exceptions? (was Re: PEP 293, Codec Error
 Handling Callbacks)
References: <20020815180411.GA24780@panix.com> <200208151815.g7FIFpn04821@odiug.zope.com>
Message-ID: <3D5CB765.80403@livinglogic.de>

Guido van Rossum wrote:

>>This looks like an issue that potentially deserves more community
>>feedback, so I'm ripping it out and starting a new thread: should
>>exception objects be treated as mutable as exceptions get caught and
>>re-raised?
> 
> 
> (A new thread in python-dev hardly counts as "community feedback".)
> 
> I'd say definitely.  Code like this looks reasonable to me:
> 
>   def some_function(arg):
>     try:
>       call_some_other_function(arg)
>     except SomeExpectedExceptionClass, obj:
>       obj.add_context(arg)
>       raise
> 
> Then some outer piece of code catches exceptions and produces a
> traceback augmented by information added by various calls to
> add_context().

So, if add_context() changes any exception attribute that was
originally specified in the constructor and is thus part of
the args attribute, should this change be reflected in the
args attribute?

Bye,
    Walter Dörwald



From mwh@python.net  Fri Aug 16 09:36:16 2002
From: mwh@python.net (Michael Hudson)
Date: 16 Aug 2002 09:36:16 +0100
Subject: [Python-Dev] Re: SET_LINENO killer
In-Reply-To: "Tim Peters"'s message of "Thu, 15 Aug 2002 12:54:29 -0400"
References: 
Message-ID: <2m65ybb3pb.fsf@starship.python.net>

"Tim Peters"  writes:

> [Michael Hudson]
> > Beats me.  I still see a healthy speed up:
> >
> > Before:
> >
> > $ ./python ../Lib/test/pystone.py
> > Pystone(1.1) time for 50000 passes = 3.99
> > This machine benchmarks at 12531.3 pystones/second
> >
> > After:
> >
> > $ ./python ../Lib/test/pystone.py
> > Pystone(1.1) time for 50000 passes = 3.65
> > This machine benchmarks at 13698.6 pystones/second
> >
> > (which is nosing on for 10% faster, actually).
> >
> > You're not testing a debug vs a release build or anything like that
> > are you?
> 
> I'm not, but I was comparing -O times (in release builds).

Ah.  FWIW gcc makes my patch a small win even with -O.

> Three runs before patch:
[...]
> This machine benchmarks at 14295.7 pystones/second
[...]
> Three runs after patch:
[...]
> This machine benchmarks at 13351.6 pystones/second

Ouch!

> Three runs after commenting out the new
[...]
> on the eval-loop critical path:
[...]
> This machine benchmarks at 13910.4 pystones/second

This makes no sense; after you've commented out the trace stuff, the
only difference left is that the switch is smaller!

Actually, there are some other changes, like always updating
f->f_lasti, and allocating 8 more bytes on the stack.  Does commenting
out the definition of instr_lb & instr_ub make any difference?

> OTOH, MSVC 6 has been generating faster ceval.c code than gcc for a long
> time; given how touchy this is, maybe it's just time for gcc to win 587 coin
> flips in a row .

Does reading assembly give any clues?  Not that I'd really expect
anyone to read all of the main loop...

I'm baffled.  Perhaps you can put SET_LINENO back in for the Windows
build <1e-6 wink>.

Cheers,
M.

-- 
  Programming languages should be designed not by piling feature on
  top of feature, but by removing the weaknesses and restrictions
  that make the additional features appear necessary.
               -- Revised(5) Report on the Algorithmic Language Scheme


From mwh@python.net  Fri Aug 16 10:20:23 2002
From: mwh@python.net (Michael Hudson)
Date: 16 Aug 2002 10:20:23 +0100
Subject: [Python-Dev] type categories
In-Reply-To: Greg Ewing's message of "Fri, 16 Aug 2002 12:18:41 +1200 (NZST)"
References: <200208160018.g7G0IfK05482@oma.cosc.canterbury.ac.nz>
Message-ID: <2m4rdvjh2g.fsf@starship.python.net>

Greg Ewing  writes:

> > I mean, you can currently do
> > 
> > import mod
> > 
> > mod.func = my_func # evil cackle!
> 
> That's my point -- f.add(method) is just like doing that.
> 
> [stuff]

You raise reasonable questions.  The thought that occurs to me is that
people using CLOS must face similar issues, and I don't hear of them
as being great hold ups.  I know CL a bit, but I've never really used
CLOS in anger -- anyone else?

Some of the issues might be less because of CL not being a Lisp1 and
modularization working differently, I guess.

Cheers,
M.

-- 
  First of all, email me your AOL password as a security measure. You
  may find that won't be able to connect to the 'net for a while. This
  is normal. The next thing to do is turn your computer upside down
  and shake it to reboot it.                     -- Darren Tucker, asr


From walter@livinglogic.de  Fri Aug 16 12:35:41 2002
From: walter@livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Fri, 16 Aug 2002 13:35:41 +0200
Subject: [Python-Dev] mimetypes patch #554192
References: <3D5BEBB8.7080904@livinglogic.de> <15707.61612.844119.819432@anthem.wooz.org>
Message-ID: <3D5CE38D.9080905@livinglogic.de>

Barry A. Warsaw wrote:
>>>>>>"WD" == Walter Dörwald  writes:
>>>>>
> 
>     WD> Martin v. Loewis and I were discussing whether it would make
>     WD> sense to make the helper method add_type (which is used for
>     WD> adding a mapping between one type and one extension) visible
>     WD> on the module level.
> 
>     WD> Any comments?
> 
> +1 on add_types() being public, but it should probably have a strict
> flag to decide whether to add the new entry to the standard types dict
> or the common types dict.

OK, so we probably need a reverse mapping for common_types too, but 
shouldn't we consider common_types to be fixed?

Maybe we should add a guess_all_types too, so we can handle duplicate 
extensions, i.e.
 >>> mimetypes.guess_all_types(".cdf")
['application/x-cdf', 'application/x-netcdf']

This would of course require to change the initialization of types_map
from a dict constant to many calls to add_type.

Even better would be, if we could assign priorities to the mappings,
so that for e.g. image/jpeg the preferred extension is .jpeg.
Then guess_type() and guess_extension() would return the preferred
mimetype/extension.

Bye,
    Walter Dörwald



From pedroni@inf.ethz.ch  Fri Aug 16 14:04:15 2002
From: pedroni@inf.ethz.ch (Samuele Pedroni)
Date: Fri, 16 Aug 2002 15:04:15 +0200
Subject: [Python-Dev] Multimethods (quel horreur?)
Message-ID: <003101c24525$6cfcab80$6d94fea9@newmexico>

[f,g,... are functions ;  T1,T2,T3 are type tuples a.k.a multi-method
signatures, T1
References: <200208150313.g7F3Dr727504@oma.cosc.canterbury.ac.nz>
 
 <14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com>
Message-ID: 

ark> It makes me uneasy because the behavior of programs might depend
ark> on the order in which modules are loaded.  That's why I didn't
ark> suggest a way of defining the variations on f in separate places.

David> This concern seems most un-pythonic to my eye, since there are
David> already all kinds of ways any module can change the behavior of
David> any call in another module. The moset direct way is by
David> rebinding the implementation of another module's
David> function. Python is a dynamic language, and that is usually
David> seen as a strength.

Indeed.  What concerns me is not dynamic behavior, but order-dependent
behavior that might be occurring behind the scenes.  I would really like
to be confident that if I write

        import x, y

it has the same effect as

        import y, x

I understand that there is no guarantee of that property now, but I suspect
that most people write programs in a way that does guarantee it.  I would
hate to see the language evolve in ways that makes it substantially more
difficult to avoid such order dependencies, so I am reluctant to propose
a feature that would increase that difficulty.

David> More importantly, though, forcing all the definitions to be in
David> one place prevents an important (you might even say the most
David> important) use case: the author of a new type should be able to
David> provide a a multimethod implementation corresponding to that
David> type. For example, if I write a rational number class, I should
David> be able to plug in a corresponding arctan implementation.

Yes.  I'm not saying such a feature shouldn't exist; just that I
don't know what form it should take.

David> I'm extra-surprised to see that Andy's uneasy about this, since
David> a C++ feature which (colloquially) bears his name was
David> purpose-built to make this sort of thing possible. Koenig
David> lookup raises a similar issue: that the behavior of a function
David> call can be changed depending on which headers are #included,
David> and even the order they're #included in.

The C++ #include mechanism, based as it is on copying source text,
offers almost no hope of sensible behavior without active
collaboration from programmers.

David> [I personally have many other concerns about how that feature
David> worked out in C++ - paper available on request - but the Python
David> implementation being suggested here suffers none of those
David> problems because of its explicit nature]

Which doesn't mean it can't suffer from other problems.  In
particular, if I know that modules x and y overload the same function,
and I want to be sure that x's case is tested first, one would think I
could ensure it by writing

        import x, y

But in fact I can't, because someone else may have imported y already,
in which case the second import is a no-op.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From guido@python.org  Fri Aug 16 14:17:40 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 16 Aug 2002 09:17:40 -0400
Subject: [Python-Dev] Mutable exceptions? (was Re: PEP 293, Codec Error Handling Callbacks)
In-Reply-To: Your message of "Fri, 16 Aug 2002 10:27:17 +0200."
 <3D5CB765.80403@livinglogic.de>
References: <20020815180411.GA24780@panix.com> <200208151815.g7FIFpn04821@odiug.zope.com>
 <3D5CB765.80403@livinglogic.de>
Message-ID: <200208161317.g7GDHeD25565@pcp02138704pcs.reston01.va.comcast.net>

> > I'd say definitely.  Code like this looks reasonable to me:
> > 
> >   def some_function(arg):
> >     try:
> >       call_some_other_function(arg)
> >     except SomeExpectedExceptionClass, obj:
> >       obj.add_context(arg)
> >       raise
> > 
> > Then some outer piece of code catches exceptions and produces a
> > traceback augmented by information added by various calls to
> > add_context().
> 
> So, if add_context() changes any exception attribute that was
> originally specified in the constructor and is thus part of
> the args attribute, should this change be reflected in the
> args attribute?

Usually yes, but that's up to the class that defines add_context().

--Guido van Rossum (home page: http://www.python.org/~guido/)


From David Abrahams" <14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com> 
Message-ID: <166701c24525$1859a5b0$6501a8c0@boostconsulting.com>

From: "Andrew Koenig" 


> ark> It makes me uneasy because the behavior of programs might depend
> ark> on the order in which modules are loaded.  That's why I didn't
> ark> suggest a way of defining the variations on f in separate places.
>
> David> This concern seems most un-pythonic to my eye, since there are
> David> already all kinds of ways any module can change the behavior of
> David> any call in another module. The moset direct way is by
> David> rebinding the implementation of another module's
> David> function. Python is a dynamic language, and that is usually
> David> seen as a strength.
>
> Indeed.  What concerns me is not dynamic behavior, but order-dependent
> behavior that might be occurring behind the scenes.  I would really like
> to be confident that if I write
>
>         import x, y
>
> it has the same effect as
>
>         import y, x
>
> I understand that there is no guarantee of that property now, but I
suspect
> that most people write programs in a way that does guarantee it.  I would
> hate to see the language evolve in ways that makes it substantially more
> difficult to avoid such order dependencies, so I am reluctant to propose
> a feature that would increase that difficulty.

Oh, easily solved: "in the face of ambiguity, refuse the temptation to
guess".
There should be a best match rule, and if there are two best matches, it's
an error.

-----------------------------------------------------------
           David Abrahams * Boost Consulting
dave@boost-consulting.com * http://www.boost-consulting.com




From mwh@python.net  Fri Aug 16 14:28:48 2002
From: mwh@python.net (Michael Hudson)
Date: 16 Aug 2002 14:28:48 +0100
Subject: [Python-Dev] type categories
In-Reply-To: Andrew Koenig's message of "16 Aug 2002 09:11:33 -0400"
References: <200208150313.g7F3Dr727504@oma.cosc.canterbury.ac.nz>  <14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com> 
Message-ID: <2mn0rn9blb.fsf@starship.python.net>

Andrew Koenig  writes:

> ark> It makes me uneasy because the behavior of programs might depend
> ark> on the order in which modules are loaded.  That's why I didn't
> ark> suggest a way of defining the variations on f in separate places.
> 
> David> This concern seems most un-pythonic to my eye, since there are
> David> already all kinds of ways any module can change the behavior of
> David> any call in another module. The moset direct way is by
> David> rebinding the implementation of another module's
> David> function. Python is a dynamic language, and that is usually
> David> seen as a strength.
> 
> Indeed.  What concerns me is not dynamic behavior, but order-dependent
> behavior that might be occurring behind the scenes.  I would really like
> to be confident that if I write
> 
>         import x, y
> 
> it has the same effect as
> 
>         import y, x
> 
> I understand that there is no guarantee of that property now, but I suspect
> that most people write programs in a way that does guarantee it.  I would
> hate to see the language evolve in ways that makes it substantially more
> difficult to avoid such order dependencies, so I am reluctant to propose
> a feature that would increase that difficulty.

I may be getting lost in subthreads here, but are we still talking
about multimethods?  If we are, then surely any sane multimethod
system's method resolution has to be independent of the order of
method definition.  There are ways of doing this.

An implementation along the lines of:

def match(args, spec):
  for a, t in zip(args, spec):
    if not isinstance(a, t):
      return False
  else:
    return True

class multi_method:
  def __init__(self):
    self.methods = []
  def add(self, f, typespec):
    self.methods.append((f, typespec))
  def __call__(self, *args):
    for meth, typespec in self.methods:
      if match(args, typespec):
        return meth(*args)

is just insane.

If I've missed the source of concern, I'm sorry...

Cheers,
M.

-- 
  incidentally, asking why things are "left out of the language" is
  a good sign that the asker is fairly clueless.
                                        -- Erik Naggum, comp.lang.lisp


From mcherm@destiny.com  Fri Aug 16 14:30:51 2002
From: mcherm@destiny.com (Michael Chermside)
Date: Fri, 16 Aug 2002 09:30:51 -0400
Subject: [Python-Dev] Re: gmane.org
Message-ID: <3D5CFE8B.3070207@destiny.com>

 > [www.gmane.org offers mailing list <--> news gateway]

Thanks very much! I've always wanted to be able to use a newsreader to 
follow python newsgroups (c.p.l for instance) but since my source of 
connectivity doesn't provide access to news, I've had to make do with a 
limited "newsreader" I wrote which plucked its input from 
mail.python.org/pipermail.

I don't really need the non-expiring feature of gmane, and its mail <--> 
news gateway isn't as important to me, but the fact that it's an open 
news server is really appreciated!

-- Michael Chermside




From ark@research.att.com  Fri Aug 16 14:37:11 2002
From: ark@research.att.com (Andrew Koenig)
Date: Fri, 16 Aug 2002 09:37:11 -0400 (EDT)
Subject: [Python-Dev] type categories
In-Reply-To: <166701c24525$1859a5b0$6501a8c0@boostconsulting.com>
 (dave@boost-consulting.com)
References: <200208150313.g7F3Dr727504@oma.cosc.canterbury.ac.nz><14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com>  <166701c24525$1859a5b0$6501a8c0@boostconsulting.com>
Message-ID: <200208161337.g7GDbB509592@europa.research.att.com>

David> Oh, easily solved: "in the face of ambiguity, refuse the
David> temptation to guess".  There should be a best match rule, and
David> if there are two best matches, it's an error.

In the ML example I showed earlier:

       fun len([]) = 0
         | len(h::t) = len(t) + 1

ordering is crucial: As long as the argument is not empty, both cases
match, so the language is defined to test the clauses in sequence.
My intuition is that people will often want to define category tests
to be done in a particular order.  There is no problem with such ordering
as long as all of the tests are specified together.

Once the tests are distributed, ordering becomes a problem, because
one person's intentional order dependency is another person's
ambiguity.  Which means that how one specifies distributed tests will
probably be different from how one specifies tests all in one place.

That's yet another reason I think it may be right to consider the
two problems separately.



From tim.one@comcast.net  Fri Aug 16 14:45:31 2002
From: tim.one@comcast.net (Tim Peters)
Date: Fri, 16 Aug 2002 09:45:31 -0400
Subject: [Python-Dev] type categories
In-Reply-To: <2mn0rn9blb.fsf@starship.python.net>
Message-ID: 

FYI,

    http://www.cs.washington.edu/research/projects/cecil/www/pubs/

has lots of good papers from the Cecil project, a pioneering
multiple-dispatch language.  Or you could save time reading and learn by
repeating their early mistakes .



From martin@strakt.com  Fri Aug 16 14:55:05 2002
From: martin@strakt.com (Martin =?ISO-8859-1?Q?Sj=F6gren?=)
Date: 16 Aug 2002 15:55:05 +0200
Subject: [Python-Dev] type categories
In-Reply-To: <200208161337.g7GDbB509592@europa.research.att.com>
References: <200208150313.g7F3Dr727504@oma.cosc.canterbury.ac.nz><14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com>
 
 <166701c24525$1859a5b0$6501a8c0@boostconsulting.com>
 <200208161337.g7GDbB509592@europa.research.att.com>
Message-ID: <1029506105.4254.3.camel@ratthing-b3cf>

fre 2002-08-16 klockan 15.37 skrev Andrew Koenig:
> David> Oh, easily solved: "in the face of ambiguity, refuse the
> David> temptation to guess".  There should be a best match rule, and
> David> if there are two best matches, it's an error.
>=20
> In the ML example I showed earlier:
>=20
>        fun len([]) =3D 0
>          | len(h::t) =3D len(t) + 1
>=20
> ordering is crucial: As long as the argument is not empty, both cases
> match, so the language is defined to test the clauses in sequence.
> My intuition is that people will often want to define category tests
> to be done in a particular order.  There is no problem with such ordering
> as long as all of the tests are specified together.

What does "not empty" mean in this context? "not []"? Does h::t match []
or does [2] match []? Why is the ordering crucial? In Haskell:

f [] =3D 0
f (x:xs) =3D 1 + f xs

is totally equivalent with:

f (x:xs) =3D 1 + f xs
f [] =3D 0

Of course, if the different patterns overlap, THEN ordering is crucial,
I just find it odd that [] and h::t would overlap...


Martin

--=20
Martin Sj=F6gren
  martin@strakt.com              ICQ : 41245059
  Phone: +46 (0)31 7710870       Cell: +46 (0)739 169191
  GPG key: http://www.strakt.com/~martin/gpg.html


From ark@research.att.com  Fri Aug 16 15:18:34 2002
From: ark@research.att.com (Andrew Koenig)
Date: 16 Aug 2002 10:18:34 -0400
Subject: [Python-Dev] type categories
In-Reply-To: <1029506105.4254.3.camel@ratthing-b3cf>
References: <200208150313.g7F3Dr727504@oma.cosc.canterbury.ac.nz>
 
 <14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com>
 
 <166701c24525$1859a5b0$6501a8c0@boostconsulting.com>
 <200208161337.g7GDbB509592@europa.research.att.com>
 <1029506105.4254.3.camel@ratthing-b3cf>
Message-ID: 

>> In the ML example I showed earlier:

>> fun len([]) = 0
>> | len(h::t) = len(t) + 1

>> ordering is crucial: As long as the argument is not empty, both cases
>> match, so the language is defined to test the clauses in sequence.
>> My intuition is that people will often want to define category tests
>> to be done in a particular order.  There is no problem with such ordering
>> as long as all of the tests are specified together.

Martin> What does "not empty" mean in this context? "not []"? Does h::t match []
Martin> or does [2] match []? Why is the ordering crucial? In Haskell:

Martin> f [] = 0
Martin> f (x:xs) = 1 + f xs

Martin> is totally equivalent with:

Martin> f (x:xs) = 1 + f xs
Martin> f [] = 0

I'm sorry, you're right.  In this particular example, there is no
overlap, so order doesn't matter.  However, the general point still
stands: ML patterns are order-sensitive in cases where there is
overlap.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From barry@python.org  Fri Aug 16 15:19:07 2002
From: barry@python.org (Barry A. Warsaw)
Date: Fri, 16 Aug 2002 10:19:07 -0400
Subject: [Python-Dev] Re: gmane.org
References: <3D5CFE8B.3070207@destiny.com>
Message-ID: <15709.2523.467857.187371@anthem.wooz.org>

>>>>> "MC" == Michael Chermside  writes:

    MC> I don't really need the non-expiring feature of gmane, and its
    MC> mail <--> news gateway isn't as important to me, but the fact
    MC> that it's an open news server is really appreciated!

The other interesting thing is that it's faster than my ISP's
newsfeed, at least for c.l.py and c.l.py.a!

-Barry


From mwh@python.net  Fri Aug 16 15:11:45 2002
From: mwh@python.net (Michael Hudson)
Date: 16 Aug 2002 15:11:45 +0100
Subject: [Python-Dev] Re: type categories
References: <200208150313.g7F3Dr727504@oma.cosc.canterbury.ac.nz>  <14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com>  <166701c24525$1859a5b0$6501a8c0@boostconsulting.com> <200208161337.g7GDbB509592@europa.research.att.com>
Message-ID: 

Andrew Koenig  writes:

> David> Oh, easily solved: "in the face of ambiguity, refuse the
> David> temptation to guess".  There should be a best match rule, and
> David> if there are two best matches, it's an error.
> 
> In the ML example I showed earlier:
> 
>        fun len([]) = 0
>          | len(h::t) = len(t) + 1
> 
> ordering is crucial: As long as the argument is not empty, both cases
> match, so the language is defined to test the clauses in sequence.
> My intuition is that people will often want to define category tests
> to be done in a particular order.  There is no problem with such ordering
> as long as all of the tests are specified together.

If multimethods make it into Python, I think (hope!) it's a safe bet
that they will look more like CLOS's multimethods than ML's pattern
matching.

Cheers,
M.

-- 
  ZAPHOD:  You know what I'm thinking?
    FORD:  No.
  ZAPHOD:  Neither do I.  Frightening isn't it?
                   -- The Hitch-Hikers Guide to the Galaxy, Episode 11




From pedroni@inf.ethz.ch  Fri Aug 16 15:22:46 2002
From: pedroni@inf.ethz.ch (Samuele Pedroni)
Date: Fri, 16 Aug 2002 16:22:46 +0200
Subject: [Python-Dev] type categories
Message-ID: <005d01c24530$6511b040$6d94fea9@newmexico>

>FYI,
>
>   http://www.cs.washington.edu/research/projects/cecil/www/pubs/
>
>has lots of good papers from the Cecil project, a pioneering
>multiple-dispatch language.  Or you could save time reading and learn by
>repeating their early mistakes .

it's prototype based, not class based so not everything is 
relevant, but at least the survey part
in (not the algo descr) is relevant to the discussion at hand:

Efficient Multiple and Predicate Dispatching

http://www.cs.washington.edu/research/projects/cecil/www/pubs/dispatching.html

btw predicate dispatching is a generalizazion of multimethod dispatch
but still is not the same as ML pattern matching form of function
definition:

""Functions can actually perform pattern matching on the argument. The form:
fun f (x:t1):t2 => (case x
    of pat_1 => exp_1
    | ...
    | pat_n => exp_n)
can be written directly as:
    fun f pat_1 = exp_1
    | ...
    | f pat_n = exp_n

"""  [http://www.cs.cornell.edu/riccardo/prog-smlnj/notes-011001.pdf]
for 'case' order is relevant as Andrew Koeinig said.

I don't think it makes sense to generalize this in a non-local
way. For the point of view of predicate dispatch all
the single patterns would be different "predicates"
so potentially ambiguous.

And you people are bringing me mad, you cannot even
agree on the terminology and avoid in that way the tiniest
non-problems, Argh 

everybody-should-do-his-homework'ly y'rs.



From ark@research.att.com  Fri Aug 16 15:35:56 2002
From: ark@research.att.com (Andrew Koenig)
Date: 16 Aug 2002 10:35:56 -0400
Subject: [Python-Dev] type categories
In-Reply-To: <2mn0rn9blb.fsf@starship.python.net>
References: <200208150313.g7F3Dr727504@oma.cosc.canterbury.ac.nz>
 
 <14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com>
 
 <2mn0rn9blb.fsf@starship.python.net>
Message-ID: 

Michael> I may be getting lost in subthreads here, but are we still
Michael> talking about multimethods?

Well, I started by talking about type categories and ways of
writing programs that tested them.  Dave Abrahams said, in effect,
that I was really just talking about multimethods.  I'm still
not convinced.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


From ark@research.att.com  Fri Aug 16 15:48:10 2002
From: ark@research.att.com (Andrew Koenig)
Date: Fri, 16 Aug 2002 10:48:10 -0400 (EDT)
Subject: [Python-Dev] Python build trouble with the new gcc/binutils
Message-ID: <200208161448.g7GEmAm19971@europa.research.att.com>

I can build Python 2.2.1 just fine on my Solaris 2.8 machine using gcc
3.1.1 and binutils 2.12.1

If I install either binutils 2.13 or the just-released gcc 3.2, I can
no longer build Python -- it dumps core quite far into the build process.

I don't really have a clue as to whether it's a gcc problem, a binutils
problem, or a Python problem.  Any suggestions as to how to proceed?


From barry@python.org  Fri Aug 16 16:04:16 2002
From: barry@python.org (Barry A. Warsaw)
Date: Fri, 16 Aug 2002 11:04:16 -0400
Subject: [Python-Dev] Python build trouble with the new gcc/binutils
References: <200208161448.g7GEmAm19971@europa.research.att.com>
Message-ID: <15709.5232.302092.575564@anthem.wooz.org>

>>>>> "AK" == Andrew Koenig  writes:

    AK> I can build Python 2.2.1 just fine on my Solaris 2.8 machine
    AK> using gcc 3.1.1 and binutils 2.12.1

    AK> If I install either binutils 2.13 or the just-released gcc
    AK> 3.2, I can no longer build Python -- it dumps core quite far
    AK> into the build process.

Stack trace?

    AK> I don't really have a clue as to whether it's a gcc problem, a
    AK> binutils problem, or a Python problem.  Any suggestions as to
    AK> how to proceed?

I started to try to build Py2.3cvs w/ gcc 3.2 but I had problems with
the gcc build.  I foolishly attempted to install it with an
alternative suffix so it wouldn't interfere with my existing gcc, but
that seems like a broken process.  I'm trying again, but it takes a
while to build gcc.

-Barry


From guido@python.org  Fri Aug 16 16:09:00 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 16 Aug 2002 11:09:00 -0400
Subject: [Python-Dev] Python build trouble with the new gcc/binutils
In-Reply-To: Your message of "Fri, 16 Aug 2002 10:48:10 EDT."
 <200208161448.g7GEmAm19971@europa.research.att.com>
References: <200208161448.g7GEmAm19971@europa.research.att.com>
Message-ID: <200208161509.g7GF90s06272@pcp02138704pcs.reston01.va.comcast.net>

> I can build Python 2.2.1 just fine on my Solaris 2.8 machine using gcc
> 3.1.1 and binutils 2.12.1
> 
> If I install either binutils 2.13 or the just-released gcc 3.2, I can
> no longer build Python -- it dumps core quite far into the build process.
> 
> I don't really have a clue as to whether it's a gcc problem, a binutils
> problem, or a Python problem.  Any suggestions as to how to proceed?

I haven't heard this before.  I guess gcc 3.2 is brand new?  I'm not
generally following gcc releases except from hearsay.

I suppose you *did* do a "make clean" before trying with a different
compiler?  Maybe even re-run configure.

If that doesn't help, try turning off optimization (edit the generated
Makefile to delete the "-O3" option, them make clean).  If that helps,
it must be a gcc optimizer problem.

If that doesn't help, it's still most likely to be a gcc or binutils
problem.

A SourceForge bug report might be in order regardless.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From oren-py-d@hishome.net  Fri Aug 16 16:22:58 2002
From: oren-py-d@hishome.net (Oren Tirosh)
Date: Fri, 16 Aug 2002 11:22:58 -0400
Subject: [Python-Dev] Alternative implementation of interning
In-Reply-To: 
References: <200208151620.g7FGKb216411@odiug.zope.com> 
Message-ID: <20020816152258.GA99140@hishome.net>

On Thu, Aug 15, 2002 at 12:46:25PM -0400, Tim Peters wrote:
> As the only person to have posted an example relying on this behavior, it's
> OK by me if that example breaks -- it was made up just to illustrate the
> possibility and raise a caution flag.  I don't take it seriously.

In Python it's easier to just use the string so there is no real incentive 
to use the id.  I would say that making the result of the intern() builtin
mortal is probably safe.

The problem is in C extension modules. In C there is an incentive to rely
on the immortality of interned strings because it makes the code simpler.
There was an example of this in the Mac import code. PyString_InternInPlace 
should probably create immortal interned strings for backward compatibility 
(and deprecated, of course)

Maybe PyString_Intern should be renamed to PyString_InternReference to
make it more obvious that it modifies the pointer "in place".

	Oren



From ark@research.att.com  Fri Aug 16 16:36:04 2002
From: ark@research.att.com (Andrew Koenig)
Date: Fri, 16 Aug 2002 11:36:04 -0400 (EDT)
Subject: [Python-Dev] Python build trouble with the new gcc/binutils
In-Reply-To: <200208161509.g7GF90s06272@pcp02138704pcs.reston01.va.comcast.net>
 (message from Guido van Rossum on Fri, 16 Aug 2002 11:09:00 -0400)
References: <200208161448.g7GEmAm19971@europa.research.att.com> <200208161509.g7GF90s06272@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <200208161536.g7GFa4j20160@europa.research.att.com>

Guido> I haven't heard this before.  I guess gcc 3.2 is brand new?  I'm not
Guido> generally following gcc releases except from hearsay.

gcc 3.2 was released yesterday.

Guido> I suppose you *did* do a "make clean" before trying with a different
Guido> compiler?  Maybe even re-run configure.

Whenever I build Python, I start by unpacking the source-code
distribution into an empty directory, so I am quite confident that
there are no dregs left over from earlier builds.

Guido> If that doesn't help, try turning off optimization (edit the
Guido> generated Makefile to delete the "-O3" option, them make
Guido> clean).  If that helps, it must be a gcc optimizer problem.

I'll give that a try.

Guido> If that doesn't help, it's still most likely to be a gcc or
Guido> binutils problem.

There was definitely a problem with binutils 2.13 -- when handed
the libtcl8.3.so distributed by ActiveState, it dumps core.
However, that problem does not occur if I build libtcl8.3.so from
the tcl source distribution.  Nor does it occur with binutils 2.12.

Guido> A SourceForge bug report might be in order regardless.

I'll file one once I have identified the failure conditions more accurately.


From ark@research.att.com  Fri Aug 16 17:38:24 2002
From: ark@research.att.com (Andrew Koenig)
Date: 16 Aug 2002 12:38:24 -0400
Subject: [Python-Dev] Python build trouble with the new gcc/binutils
In-Reply-To: <15709.5232.302092.575564@anthem.wooz.org>
References: <200208161448.g7GEmAm19971@europa.research.att.com>
 <15709.5232.302092.575564@anthem.wooz.org>
Message-ID: 

>>>>>> "AK" == Andrew Koenig  writes:

AK> I can build Python 2.2.1 just fine on my Solaris 2.8 machine
AK> using gcc 3.1.1 and binutils 2.12.1

AK> If I install either binutils 2.13 or the just-released gcc
AK> 3.2, I can no longer build Python -- it dumps core quite far
AK> into the build process.

Barry> Stack trace?

OK, here's what I've been able to find so far.

It fails at the point in the installation process where it is trying to do this:



$ CC='gcc' LDSHARED='gcc -shared' OPT='-DNDEBUG -g -O3 -Wall -Wstrict-prototypes' ./python -E ./setup.py build  
running build
running build_ext
skipping 'struct' extension (up-to-date)
Segmentation Fault - core dumped



Here's the back trace:




#0  __register_frame_info_bases (begin=0xfed50000, ob=0xfed50000, tbase=0x0, 
    dbase=0x0) at /tmp/build1165/gcc-3.1.1/gcc/unwind-dw2-fde.c:83
#1  0xfed517ec in frame_dummy ()
   from /export/spurr1/homes1/ark/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so
#2  0xfed516d4 in _init ()
   from /export/spurr1/homes1/ark/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so
#3  0xff3bc174 in ?? ()
#4  0xff3c0a8c in ?? ()
#5  0xff3c0ba8 in ?? ()
#6  0x0007b384 in _PyImport_GetDynLoadFunc (fqname=0x2 
, shortname=0x1002c8 "", pathname=0xffbedbd8 "/export/spurr1/homes1/ark/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so", fp=0xfc1d8) at Python/dynload_shlib.c:90 #7 0x0006edd8 in _PyImport_LoadDynamicModule (name=0xffbee0c8 "struct", pathname=0xffbedbd8 "/export/spurr1/homes1/ark/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so", fp=0xfc1d8) at Python/importdl.c:42 #8 0x0006b63c in load_module (name=0xffbee0c8 "struct", fp=0xfc1d8, buf=0xffbedbd8 "/export/spurr1/homes1/ark/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so", type=3) at Python/import.c:1365 #9 0x0006c4e4 in import_submodule (mod=0xe08b8, subname=0xffbee0c8 "struct", fullname=0xffbee0c8 "struct") at Python/import.c:1895 #10 0x0006c008 in load_next (mod=0xe08b8, altmod=0xe08b8, p_name=0xffbee0c8, buf=0xffbee0c8 "struct", p_buflen=0xffbee0c4) at Python/import.c:1751 #11 0x0006de54 in import_module_ex (name=0x0, globals=0xe08b8, locals=0x0, fromlist=0x0) at Python/import.c:1602 #12 0x0006d024 in PyImport_ImportModuleEx (name=0x268cac "struct", globals=0x0, locals=0x0, fromlist=0x0) at Python/import.c:1643 #13 0x000bca6c in builtin___import__ (self=0x0, args=0x268cac) at Python/bltinmodule.c:40 #14 0x000ba3a0 in PyCFunction_Call (func=0x101ca8, arg=0x29d638, kw=0x0) at Objects/methodobject.c:69 #15 0x00047ed0 in eval_frame (f=0x2c0198) at Python/ceval.c:2004 #16 0x00048b38 in PyEval_EvalCodeEx (co=0x2603a8, globals=0x2c0198, locals=0x0, args=0x2a74a8, argcount=2, kws=0x2a74b0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #17 0x00049f28 in fast_function (func=0x0, pp_stack=0x2a74a8, n=2782384, na=2, nk=0) at Python/ceval.c:3161 #18 0x00047de8 in eval_frame (f=0x2a7350) at Python/ceval.c:2024 #19 0x00048b38 in PyEval_EvalCodeEx (co=0x283138, globals=0x2a7350, locals=0x0, args=0x28d124, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #20 0x000aaaa4 in function_call (func=0x27bd78, arg=0x28d118, kw=0x0) at Objects/funcobject.c:374 #21 0x00095d08 in PyObject_Call (func=0x27bd78, arg=0x28d118, kw=0x0) at Objects/abstract.c:1684 #22 0x0009e64c in instancemethod_call (func=0x27bd78, arg=0x28d118, kw=0x0) at Objects/classobject.c:2276 #23 0x00095d08 in PyObject_Call (func=0x27bd78, arg=0x28d118, kw=0x0) at Objects/abstract.c:1684 #24 0x00049fec in do_call (func=0x10ffb8, pp_stack=0xffbeebe0, na=1, nk=2674968) at Python/ceval.c:3262 #25 0x00047d30 in eval_frame (f=0x2b3f48) at Python/ceval.c:2027 #26 0x00048b38 in PyEval_EvalCodeEx (co=0x261510, globals=0x2b3f48, locals=0x0, args=0x1c5684, argcount=1, kws=0x1c5688, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #27 0x00049f28 in fast_function (func=0x0, pp_stack=0x1c5684, n=1857160, na=1, nk=0) at Python/ceval.c:3161 #28 0x00047de8 in eval_frame (f=0x1c5520) at Python/ceval.c:2024 #29 0x00048b38 in PyEval_EvalCodeEx (co=0x2821d0, globals=0x1c5520, locals=0x0, args=0x175fa8, argcount=1, kws=0x175fac, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #30 0x00049f28 in fast_function (func=0x0, pp_stack=0x175fa8, n=1531820, na=1, nk=0) at Python/ceval.c:3161 #31 0x00047de8 in eval_frame (f=0x175e50) at Python/ceval.c:2024 #32 0x00048b38 in PyEval_EvalCodeEx (co=0x2495c0, globals=0x175e50, locals=0x0, args=0x1762fc, argcount=2, kws=0x176304, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #33 0x00049f28 in fast_function (func=0x0, pp_stack=0x1762fc, n=1532676, na=2, nk=0) at Python/ceval.c:3161 #34 0x00047de8 in eval_frame (f=0x1761a8) at Python/ceval.c:2024 #35 0x00048b38 in PyEval_EvalCodeEx (co=0x20c908, globals=0x1761a8, locals=0x0, args=0x1734b8, argcount=2, kws=0x1734c0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #36 0x00049f28 in fast_function (func=0x0, pp_stack=0x1734b8, n=1520832, na=2, nk=0) at Python/ceval.c:3161 #37 0x00047de8 in eval_frame (f=0x173360) at Python/ceval.c:2024 #38 0x00048b38 in PyEval_EvalCodeEx (co=0x293e50, globals=0x173360, locals=0x0, args=0x298c00, argcount=1, kws=0x298c04, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #39 0x00049f28 in fast_function (func=0x0, pp_stack=0x298c00, n=2722820, na=1, nk=0) at Python/ceval.c:3161 #40 0x00047de8 in eval_frame (f=0x298aa8) at Python/ceval.c:2024 #41 0x00048b38 in PyEval_EvalCodeEx (co=0x2495c0, globals=0x298aa8, locals=0x0, args=0x15e008, argcount=2, kws=0x15e010, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #42 0x00049f28 in fast_function (func=0x0, pp_stack=0x15e008, n=1433616, na=2, nk=0) at Python/ceval.c:3161 #43 0x00047de8 in eval_frame (f=0x15deb0) at Python/ceval.c:2024 #44 0x00048b38 in PyEval_EvalCodeEx (co=0x233e28, globals=0x15deb0, locals=0x0, args=0x1325f4, argcount=1, kws=0x1325f8, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #45 0x00049f28 in fast_function (func=0x0, pp_stack=0x1325f4, n=1254904, na=1, nk=0) at Python/ceval.c:3161 #46 0x00047de8 in eval_frame (f=0x132488) at Python/ceval.c:2024 #47 0x00048b38 in PyEval_EvalCodeEx (co=0x25d8a8, globals=0x132488, locals=0x0, args=0x28, argcount=0, kws=0x273fd4, kwcount=5, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #48 0x00049f28 in fast_function (func=0x0, pp_stack=0x28, n=2572284, na=0, nk=5) at Python/ceval.c:3161 #49 0x00047de8 in eval_frame (f=0x273e80) at Python/ceval.c:2024 #50 0x00048b38 in PyEval_EvalCodeEx (co=0x2618a8, globals=0x273e80, locals=0x0, args=0x1130b0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #51 0x00049f28 in fast_function (func=0x0, pp_stack=0x1130b0, n=1126576, na=0, nk=0) at Python/ceval.c:3161 #52 0x00047de8 in eval_frame (f=0x112f60) at Python/ceval.c:2024 #53 0x00048b38 in PyEval_EvalCodeEx (co=0x267978, globals=0x112f60, locals=0x0, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #54 0x0004b79c in PyEval_EvalCode (co=0x267978, globals=0x1114b8, locals=0x0) at Python/ceval.c:483 #55 0x00076fe8 in run_node (n=0xffe20, filename=0x1114b8 "", globals=0x1114b8, locals=0x1114b8, flags=0x1114b8) at Python/pythonrun.c:1079 #56 0x000766e8 in PyRun_SimpleFileExFlags (fp=0xfc1d8, filename=0xffbefe48 "./setup.py", closeit=1, flags=0xffbefcec) at Python/pythonrun.c:685 #57 0x0001c544 in Py_Main (argc=0, argv=0x1) at Modules/main.c:364 -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From guido@python.org Fri Aug 16 17:44:22 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 16 Aug 2002 12:44:22 -0400 Subject: [Python-Dev] Alternative implementation of interning In-Reply-To: Your message of "Fri, 16 Aug 2002 11:22:58 EDT." <20020816152258.GA99140@hishome.net> References: <200208151620.g7FGKb216411@odiug.zope.com> <20020816152258.GA99140@hishome.net> Message-ID: <200208161644.g7GGiMm13995@pcp02138704pcs.reston01.va.comcast.net> > On Thu, Aug 15, 2002 at 12:46:25PM -0400, Tim Peters wrote: > > As the only person to have posted an example relying on this behavior, it's > > OK by me if that example breaks -- it was made up just to illustrate the > > possibility and raise a caution flag. I don't take it seriously. [Oren] > In Python it's easier to just use the string so there is no real incentive > to use the id. I would say that making the result of the intern() builtin > mortal is probably safe. OK, there seems consensus on this one. > The problem is in C extension modules. In C there is an incentive to rely > on the immortality of interned strings because it makes the code simpler. > There was an example of this in the Mac import code. PyString_InternInPlace > should probably create immortal interned strings for backward compatibility > (and deprecated, of course) But the vast majority of C code does *not* depend on this. I'd rather keep PyString_InternInPlace(), so we don't have to change all call locations, only the very rare ones that rely on this (Martin found another two). Maybe we can add even detect the abusing cases by putting a test in PyString_InternInPlace() like this: if (s->ob_refcnt == 1) { PyErr_Warn(PyExc_DeprecationWarning, "interning won't keep your string alive"); PyErr_Clear(); /* In case the warning was an error, ignore it */ Py_INCREF(s); /* Make s immortal */ } > Maybe PyString_Intern should be renamed to PyString_InternReference to > make it more obvious that it modifies the pointer "in place". The perfect name for that API already exists: PyString_InternInPlace(). :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Fri Aug 16 17:53:26 2002 From: ark@research.att.com (Andrew Koenig) Date: 16 Aug 2002 12:53:26 -0400 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <200208161509.g7GF90s06272@pcp02138704pcs.reston01.va.comcast.net> References: <200208161448.g7GEmAm19971@europa.research.att.com> <200208161509.g7GF90s06272@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido> If that doesn't help, try turning off optimization (edit the generated Guido> Makefile to delete the "-O3" option, them make clean). If that helps, Guido> it must be a gcc optimizer problem. Guido> If that doesn't help, it's still most likely to be a gcc or binutils Guido> problem. It doesn't help. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From zack@codesourcery.com Fri Aug 16 18:28:59 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Fri, 16 Aug 2002 10:28:59 -0700 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> Message-ID: <20020816172858.GA30840@codesourcery.com> On Fri, Aug 16, 2002 at 12:38:24PM -0400, Andrew Koenig wrote: > > #0 __register_frame_info_bases (begin=0xfed50000, ob=0xfed50000, tbase=0x0, > dbase=0x0) at /tmp/build1165/gcc-3.1.1/gcc/unwind-dw2-fde.c:83 Er, is the directory name misleading, or have you picked up libgcc_s.so from 3.1.1? In theory that shouldn't be a problem; in practice it could well be the problem. > #1 0xfed517ec in frame_dummy () > from /export/spurr1/homes1/ark/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so > #2 0xfed516d4 in _init () > from /export/spurr1/homes1/ark/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so > #3 0xff3bc174 in ?? () > #4 0xff3c0a8c in ?? () > #5 0xff3c0ba8 in ?? () > #6 0x0007b384 in _PyImport_GetDynLoadFunc (fqname=0x2
, > shortname=0x1002c8 "", > pathname=0xffbedbd8 "/export/spurr1/homes1/ark/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so", fp=0xfc1d8) at Python/dynload_shlib.c:90 Can you disable all use of dynamic loading and try the build again? Unfortunately, the only practical way to do this seems to be to edit configure.in and force DYNLOADFILE to be dynload_stub.o (right before the line saying AC_MSG_RESULT($DYNLOADFILE)), then regenerate configure. (Might be a good idea to add an --enable switch.) This will obviously not get you an installable build, but it will let us narrow down the problem a bit. zw From ark@research.att.com Fri Aug 16 18:34:05 2002 From: ark@research.att.com (Andrew Koenig) Date: Fri, 16 Aug 2002 13:34:05 -0400 (EDT) Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <20020816172858.GA30840@codesourcery.com> (message from Zack Weinberg on Fri, 16 Aug 2002 10:28:59 -0700) References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> Message-ID: <200208161734.g7GHY5421747@europa.research.att.com> Zack> On Fri, Aug 16, 2002 at 12:38:24PM -0400, Andrew Koenig wrote: >> #0 __register_frame_info_bases (begin=0xfed50000, ob=0xfed50000, tbase=0x0, >> dbase=0x0) at /tmp/build1165/gcc-3.1.1/gcc/unwind-dw2-fde.c:83 Zack> Er, is the directory name misleading, or have you picked up Zack> libgcc_s.so from 3.1.1? In theory that shouldn't be a problem; in Zack> practice it could well be the problem. This particular test was done with gcc 3.1.1 and binutils 2.13. As I said at the beginning of the discussion, if I use gcc 3.1.1 and binutils 2.12.1, the Python install works. If I use gcc 3.2 *or* binutils 2.13, the Python install fails. From barry@python.org Fri Aug 16 18:36:26 2002 From: barry@python.org (Barry A. Warsaw) Date: Fri, 16 Aug 2002 13:36:26 -0400 Subject: [Python-Dev] Python build trouble with the new gcc/binutils References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> Message-ID: <15709.14362.115758.559982@anthem.wooz.org> One quick problem that I ran into when trying to build with gcc 3.2, installed in /usr/local/bin on a RH 7.3 system: I had to use --with-cxx=/usr/local/bin/c++ otherwise I got this error: -------------------- snip snip -------------------- % ./configure --with-pydebug checking MACHDEP... linux2 checking for --without-gcc... no checking for --with-cxx=... no checking for c++... c++ checking for C++ compiler default output... a.out checking whether the C++ compiler works... configure: error: cannot run C++ compiled programs. If you meant to cross compile, use `--host'. -------------------- snip snip -------------------- Even though: % c++ --version c++ (GCC) 3.2 Copyright (C) 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. No time to dig into this right now, but I thought I'd report it. -Barry From guido@python.org Fri Aug 16 18:36:40 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 16 Aug 2002 13:36:40 -0400 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: Your message of "Fri, 16 Aug 2002 13:34:05 EDT." <200208161734.g7GHY5421747@europa.research.att.com> References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161734.g7GHY5421747@europa.research.att.com> Message-ID: <200208161736.g7GHaee16089@pcp02138704pcs.reston01.va.comcast.net> > As I said at the beginning of the discussion, if I use gcc 3.1.1 > and binutils 2.12.1, the Python install works. If I use gcc 3.2 > *or* binutils 2.13, the Python install fails. I'm guessing that gcc 3.2 somehow also installs binutils 2.13 and that the bug is in the latter. --Guido van Rossum (home page: http://www.python.org/~guido/) PS: Mail to guido@python.org works again. A comcast outage caused the forwarding service to bounce, probably from 11:30 till 1:30 EDT today. Bad Exim! Thanks to Barry for the quick fix. From ark@research.att.com Fri Aug 16 18:42:24 2002 From: ark@research.att.com (Andrew Koenig) Date: Fri, 16 Aug 2002 13:42:24 -0400 (EDT) Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <20020816172858.GA30840@codesourcery.com> (message from Zack Weinberg on Fri, 16 Aug 2002 10:28:59 -0700) References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> Message-ID: <200208161742.g7GHgOT21944@europa.research.att.com> Zack> Can you disable all use of dynamic loading and try the build again? I can, but I'm not sure it will help. I had a very similar problem earlier, which I definitely traced to a bug in gnu ld: If you say ld libtcl8.3.so where libtcl8.3.so is as distributed by ActiveTcl, it dumps core. What I found was that setup.py was ultimately invoking the gnu linker which, in turn, was casing the problem by crashing. So it may be that the problem is still in the linker; I just don't have a clue as to where. Anyway, I'll try some experiments and see if I can find something interesting. From ark@research.att.com Fri Aug 16 18:52:19 2002 From: ark@research.att.com (Andrew Koenig) Date: Fri, 16 Aug 2002 13:52:19 -0400 (EDT) Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <200208161736.g7GHaee16089@pcp02138704pcs.reston01.va.comcast.net> (message from Guido van Rossum on Fri, 16 Aug 2002 13:36:40 -0400) References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161734.g7GHY5421747@europa.research.att.com> <200208161736.g7GHaee16089@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208161752.g7GHqJs21999@europa.research.att.com> >> As I said at the beginning of the discussion, if I use gcc 3.1.1 >> and binutils 2.12.1, the Python install works. If I use gcc 3.2 >> *or* binutils 2.13, the Python install fails. Guido> I'm guessing that gcc 3.2 somehow also installs binutils 2.13 and that Guido> the bug is in the latter. I see no evidence that gcc 3.2 installs binutils 2.13. In particular, if I install gcc 3.2, *then* install binutils 2.12.1, it still fails. From ark@research.att.com Fri Aug 16 19:15:36 2002 From: ark@research.att.com (Andrew Koenig) Date: Fri, 16 Aug 2002 14:15:36 -0400 (EDT) Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <20020816172858.GA30840@codesourcery.com> (message from Zack Weinberg on Fri, 16 Aug 2002 10:28:59 -0700) References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> Message-ID: <200208161815.g7GIFaw22059@europa.research.att.com> Zack> This will obviously not get you an installable build, but it Zack> will let us narrow down the problem a bit. I've narrowed it down somewhat. Apparently what happens is that setup.py successfully installs the "struct" extension (or so it thinks), then crashes when it is trying to import it. After it has done so, I can duplicate the crash by executing ./python and then typing import struct so I don't have to run setup.py at all to cause the crash at that point. I'm trying to rebuild without dynamic loading now, to see what happens. From ark@research.att.com Fri Aug 16 19:43:05 2002 From: ark@research.att.com (Andrew Koenig) Date: Fri, 16 Aug 2002 14:43:05 -0400 (EDT) Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <20020816172858.GA30840@codesourcery.com> (message from Zack Weinberg on Fri, 16 Aug 2002 10:28:59 -0700) References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> Message-ID: <200208161843.g7GIh5X22326@europa.research.att.com> Zack> Can you disable all use of dynamic loading and try the build Zack> again? Unfortunately, the only practical way to do this seems Zack> to be to edit configure.in and force DYNLOADFILE to be Zack> dynload_stub.o (right before the line saying Zack> AC_MSG_RESULT($DYNLOADFILE)), then regenerate configure. (Might Zack> be a good idea to add an --enable switch.) As I sort of expected, this makes the crash go away. However, it is now replaced by lots of messages like building 'grp' extension gcc -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -I. -I/export/spurr1/homes1/ark/Python-2.2.1/./Include -I/usr/local/include -IInclude/ -c /export/spurr1/homes1/ark/Python-2.2.1/Modules/grpmodule.c -o build/temp.solaris-2.7-sun4u-2.2/grpmodule.o gcc -shared build/temp.solaris-2.7-sun4u-2.2/grpmodule.o -L/usr/local/lib -o build/lib.solaris-2.7-sun4u-2.2/grp.so WARNING: removing "grp" since importing it failed If I put back the dynamic loading stuff and rebuild everything from scratch, I again get a python that crashes when I try to import struct. It occurs to me that the traceback from that might be useful. Needless to say, it is much shorter than the earlier one. I must say that the "Address 0x2 out of bounds" note makes me suspicious. Here's the traceback: #0 __register_frame_info_bases (begin=0xfed40000, ob=0xfed40000, tbase=0x0, dbase=0x0) at /tmp/build1165/gcc-3.1.1/gcc/unwind-dw2-fde.c:83 #1 0xfed417ec in frame_dummy () from /export/spurr1/homes1/ark/test-python/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so #2 0xfed416d4 in _init () from /export/spurr1/homes1/ark/test-python/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so #3 0xff3bc174 in ?? () #4 0xff3c0a8c in ?? () #5 0xff3c0ba8 in ?? () #6 0x0007b384 in _PyImport_GetDynLoadFunc (fqname=0x2
, shortname=0x1002d0 "", pathname=0xffbeedb8 "/export/spurr1/homes1/ark/test-python/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so", fp=0xfc1e0) at Python/dynload_shlib.c:90 #7 0x0006edd8 in _PyImport_LoadDynamicModule (name=0xffbef2a8 "struct", pathname=0xffbeedb8 "/export/spurr1/homes1/ark/test-python/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so", fp=0xfc1e0) at Python/importdl.c:42 #8 0x0006b63c in load_module (name=0xffbef2a8 "struct", fp=0xfc1e0, buf=0xffbeedb8 "/export/spurr1/homes1/ark/test-python/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so", type=3) at Python/import.c:1365 #9 0x0006c4e4 in import_submodule (mod=0xe08c0, subname=0xffbef2a8 "struct", fullname=0xffbef2a8 "struct") at Python/import.c:1895 #10 0x0006c008 in load_next (mod=0xe08c0, altmod=0xe08c0, p_name=0xffbef2a8, buf=0xffbef2a8 "struct", p_buflen=0xffbef2a4) at Python/import.c:1751 #11 0x0006de54 in import_module_ex (name=0x0, globals=0xe08c0, locals=0x111548, fromlist=0xe08c0) at Python/import.c:1602 #12 0x0006d024 in PyImport_ImportModuleEx (name=0x182eec "struct", globals=0x111548, locals=0x111548, fromlist=0xe08c0) at Python/import.c:1643 #13 0x000bca6c in builtin___import__ (self=0x0, args=0x182eec) at Python/bltinmodule.c:40 #14 0x000ba3a0 in PyCFunction_Call (func=0x101cb0, arg=0x10a440, kw=0x0) at Objects/methodobject.c:69 #15 0x00095d08 in PyObject_Call (func=0x101cb0, arg=0x10a440, kw=0x0) at Objects/abstract.c:1684 #16 0x00049c9c in PyEval_CallObjectWithKeywords (func=0x101cb0, arg=0x10a440, kw=0x0) at Python/ceval.c:3049 #17 0x00047810 in eval_frame (f=0x186898) at Python/ceval.c:1839 #18 0x00048b38 in PyEval_EvalCodeEx (co=0x18d7b0, globals=0x186898, locals=0x0, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2585 #19 0x0004b79c in PyEval_EvalCode (co=0x18d7b0, globals=0x111548, locals=0x0) at Python/ceval.c:483 #20 0x00076fe8 in run_node (n=0xffd68, filename=0x111548 "", globals=0x111548, locals=0x111548, flags=0x111548) at Python/pythonrun.c:1079 #21 0x00076450 in PyRun_InteractiveOneFlags (fp=0xffffffff, filename=0xc0e70 "", flags=0xffbefcf4) at Python/pythonrun.c:590 #22 0x000761f0 in PyRun_InteractiveLoopFlags (fp=0xfc1b0, filename=0xc0e70 "", flags=0xffbefcf4) at Python/pythonrun.c:526 #23 0x00077c78 in PyRun_AnyFileExFlags (fp=0xfc1b0, filename=0xc0e70 "", closeit=0, flags=0xffbefcf4) at Python/pythonrun.c:489 #24 0x0001c544 in Py_Main (argc=1, argv=0x1) at Modules/main.c:364 From guido@python.org Fri Aug 16 19:51:18 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 16 Aug 2002 14:51:18 -0400 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: Your message of "Fri, 16 Aug 2002 14:15:36 EDT." <200208161815.g7GIFaw22059@europa.research.att.com> References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161815.g7GIFaw22059@europa.research.att.com> Message-ID: <200208161851.g7GIpIE19367@pcp02138704pcs.reston01.va.comcast.net> > I've narrowed it down somewhat. > > Apparently what happens is that setup.py successfully installs the > "struct" extension (or so it thinks), then crashes when it is trying > to import it. After it has done so, I can duplicate the crash by > executing ./python and then typing > > import struct > > so I don't have to run setup.py at all to cause the crash at that > point. > > I'm trying to rebuild without dynamic loading now, to see what happens. I'm sure the 'struct' module is implicated only because it happens to be the first module that setup.py tries to build. This points to a problem with the dynamic linker. --Guido van Rossum (home page: http://www.python.org/~guido/) From zack@codesourcery.com Fri Aug 16 19:56:46 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Fri, 16 Aug 2002 11:56:46 -0700 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <200208161815.g7GIFaw22059@europa.research.att.com> References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161815.g7GIFaw22059@europa.research.att.com> Message-ID: <20020816185646.GC30840@codesourcery.com> On Fri, Aug 16, 2002 at 02:15:36PM -0400, Andrew Koenig wrote: > Zack> This will obviously not get you an installable build, but it > Zack> will let us narrow down the problem a bit. > > I've narrowed it down somewhat. > > Apparently what happens is that setup.py successfully installs the > "struct" extension (or so it thinks), then crashes when it is trying > to import it. After it has done so, I can duplicate the crash by > executing ./python and then typing > > import struct > > so I don't have to run setup.py at all to cause the crash at that > point. > > I'm trying to rebuild without dynamic loading now, to see what happens. Another thing to try is gcc 3.2 with the Sun assembler and linker. It could be that 3.2 triggers a bug in all extant versions of GNU ld on Solaris. zw From pedroni@inf.ethz.ch Fri Aug 16 19:53:26 2002 From: pedroni@inf.ethz.ch (Samuele Pedroni) Date: Fri, 16 Aug 2002 20:53:26 +0200 Subject: [Python-Dev] type categories (redux?) Message-ID: <007401c24556$47b5dc80$6d94fea9@newmexico> [Andrew Koenig] >Michael> I may be getting lost in subthreads here, but are we still >Michael> talking about multimethods? > >Well, I started by talking about type categories and ways of >writing programs that tested them. Dave Abrahams said, in effect, >that I was really just talking about multimethods. I'm still >not convinced. On one hand we have [0. destructuring pattern matching which is a kind of local control-flow construct] 1. Multiple dispatch [and predicate dispatch] which are about generic functions defined as a set of tuples (function,signature), and where, given a tuple of arguments, one find the applicable function subset based on the signatures and then tries to induce a total/partial order on the subset based on the arguments and calls the inferior. The signatures typically involve types(classes) and the order is about subtype(subclass) relationshisps. [again see http://www.cs.washington.edu/research/projects/cecil/www/pubs/dispatching.html ] On the other hand: Dave Abrahams wants multiple dispatch and also wants to dispatch e.g. on arg being a mapping. Now to define in Python what a mapping is as first-class object/formal language construct is a kind of Holy Grail. So the discussion i) type categories as (Zope) interfaces ii) type categories in terms of hasattr [not totally safe but used in practice] iii) type categories in terms of predicates etc It is maybe worth to underline that [this was somehow implicit in much of the discussion]: no amount of formalism and what-not can make an approach extending Python as-is safer than (ii) unless with the addition of some kind of explicitness (explicit tagging or labeling, explicit predicate (forced) assertion, explicit interfaces with central register or not). IMHO Dave can wait for a long long time or go the pragmatical route: *) either integrating (Zope) interfaces in the dispatch model *) or adopting some minimal form of predicate dispatching too, noticing that once you have multi-method dispatch you can define e.g. defgeneric ismapping addmethod ismapping(obj: Any): return False addmethod ismapping(obj: dict): return True [in CLIM e.g. the notion of procols is defined in this way too (or through abstract superclasses)] and you gain some flexibility because a user - composing a system - can redefine this for a part of the type hierarchy in terms of hasattr or what-not ... or define ismapping for a pre-existent type a posteriori. It's a bit more flexible than the registry approach in Zope interfaces (if I understand that correctly). But you don't have a direct notion of subcategory (this can be a problem or not) *) or a mixture of the two approaches [about which I admit I should think more] [I start to feel that Python obession about not being explicit about protocols has gone a bit too far ( it's just a very personal feel), even in Smalltalk people add to Object things like Object>>>respondsToArithmetic ^false. Object>>>isSequenceable ^false. ... and then redefine them to return true somewhere down the hierarchy, and use these predicates to select behavior. It is used sparingly, but it is used. "We" could'nt do that because in Python there was no modifiable ur-object, but both with a registry or multi-methods one can enable essentially this.] for-better-or-worse-Alex-Martelli-wasn't-listening- and-we-haven't-even-scratched-the-interactions-or-non-interactions- with-PEP-246'ly y'rs. -*- "In my experience, much of language design is like this. We think we know how it will all come out, but we don't always. Usage patterns are often surprising, as one learns if one is around long enough to design a language or two and then watch how expectations play out in reality over a course of years. So it's a gamble. But the only way not to gamble is not to move ahead." -- Kent M. Pitman From martin@v.loewis.de Fri Aug 16 19:58:03 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 16 Aug 2002 20:58:03 +0200 Subject: [Python-Dev] mimetypes patch #554192 In-Reply-To: <3D5CE38D.9080905@livinglogic.de> References: <3D5BEBB8.7080904@livinglogic.de> <15707.61612.844119.819432@anthem.wooz.org> <3D5CE38D.9080905@livinglogic.de> Message-ID: Walter D=F6rwald writes: > OK, so we probably need a reverse mapping for common_types too, but > shouldn't we consider common_types to be fixed? If anything, types_map should be fixed: Those are the official IANA-supported types (including the official x- extension mechanism). The common types are those that violate IANA specs, yet found in real life. If you wanted to support strictness in add_type, then you would require that the type starts with x-; since mimetypes.py should have all registered types incorporated (if it misses some, that's a bug). > Even better would be, if we could assign priorities to the mappings, > so that for e.g. image/jpeg the preferred extension is .jpeg. > Then guess_type() and guess_extension() would return the preferred > mimetype/extension. Do you have a specific application for that in mind? It sounds like overkill. Regards, Martin From guido@python.org Fri Aug 16 19:58:36 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 16 Aug 2002 14:58:36 -0400 Subject: [Python-Dev] Last call: mortal interned strings Message-ID: <200208161858.g7GIwaM19389@pcp02138704pcs.reston01.va.comcast.net> (python.org/sf/576101) I'm beginning to be convinced that mortal interned strings are a good idea. I've uploaded a patch that defaults interned strings to mortal status unless explicitly requested with PyString_InternImmortal(). There are no calls to that function in the core. I'm very tempted to check this in and see how it goes. It's not that hard to change our mind about the default closer to the 2.3 release date. Any objections? --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Fri Aug 16 20:03:41 2002 From: ark@research.att.com (Andrew Koenig) Date: Fri, 16 Aug 2002 15:03:41 -0400 (EDT) Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <200208161851.g7GIpIE19367@pcp02138704pcs.reston01.va.comcast.net> (message from Guido van Rossum on Fri, 16 Aug 2002 14:51:18 -0400) References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161815.g7GIFaw22059@europa.research.att.com> <200208161851.g7GIpIE19367@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208161903.g7GJ3f122409@europa.research.att.com> Guido> I'm sure the 'struct' module is implicated only because it happens to Guido> be the first module that setup.py tries to build. Guido> This points to a problem with the dynamic linker. Yes, it does. Indeed, if I look in Python/dynload_shlib.c, near the end, I see code like this: if (Py_VerboseFlag) printf("dlopen(\"%s\", %x);\n", pathname, dlopenflags); handle = dlopen(pathname, dlopenflags); if (handle == NULL) { PyErr_SetString(PyExc_ImportError, dlerror()); return NULL; } If I insert, immediately after the call to dlopen, the following: if (Py_VerboseFlag) printf("after dlopen(\"%s\", %x);\n", pathname, dlopenflags); fflush(stdout); and then run "./python -v" and try to import struct, it does not print the second set of output: $ ./python -v [GCC 3.1.1] on sunos5 Type "help", "copyright", "credits" or "license" for more information. import readline # builtin >>> import struct dlopen("/export/spurr1/homes1/ark/test-python/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so", 2); Segmentation Fault - core dumped This behavior strongly suggests to me that it is crashing in dlopen. However, when I write a little C program that just calls dlopen with the file in question: #include void *dlopen(const char *, int); main() { void *handle = dlopen("/export/spurr1/homes1/ark/test-python/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so", 2); printf ("Handle = %x\n", handle); } it quietly succeds and prints "Handle = 0" At this point I am way out of my depth. From martin@v.loewis.de Fri Aug 16 20:04:29 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 16 Aug 2002 21:04:29 +0200 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <200208161752.g7GHqJs21999@europa.research.att.com> References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161734.g7GHY5421747@europa.research.att.com> <200208161736.g7GHaee16089@pcp02138704pcs.reston01.va.comcast.net> <200208161752.g7GHqJs21999@europa.research.att.com> Message-ID: Andrew Koenig writes: > Guido> I'm guessing that gcc 3.2 somehow also installs binutils 2.13 and that > Guido> the bug is in the latter. > > I see no evidence that gcc 3.2 installs binutils 2.13. > In particular, if I install gcc 3.2, *then* install > binutils 2.12.1, it still fails. You can't do that (if installing 2.12.1 means to downgrade from 2.13). gcc configuration analyses features of binutils at configure time, and relies on those features to be present at run-time. Are you sure that gcc picks up the binutils you had installed when you configured gcc? In particular, what happens if you do gcc --print-prog-name=as gcc --print-prog-name=ld Are those the once that you had in PATH when configuring? Regards, Martin From ark@research.att.com Fri Aug 16 20:04:27 2002 From: ark@research.att.com (Andrew Koenig) Date: Fri, 16 Aug 2002 15:04:27 -0400 (EDT) Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <20020816185646.GC30840@codesourcery.com> (message from Zack Weinberg on Fri, 16 Aug 2002 11:56:46 -0700) References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161815.g7GIFaw22059@europa.research.att.com> <20020816185646.GC30840@codesourcery.com> Message-ID: <200208161904.g7GJ4Rm22413@europa.research.att.com> Zack> Another thing to try is gcc 3.2 with the Sun assembler and Zack> linker. It could be that 3.2 triggers a bug in all extant Zack> versions of GNU ld on Solaris. It could be. However, I seem to remember that gcc 3.x does not work well with the Sun assembler and linker at all on Solaris. From martin@v.loewis.de Fri Aug 16 20:09:13 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 16 Aug 2002 21:09:13 +0200 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <15709.14362.115758.559982@anthem.wooz.org> References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <15709.14362.115758.559982@anthem.wooz.org> Message-ID: barry@python.org (Barry A. Warsaw) writes: > checking whether the C++ compiler works... configure: error: cannot run C++ compiled programs. > If you meant to cross compile, use `--host'. > -------------------- snip snip -------------------- > > Even though: > > % c++ --version > c++ (GCC) 3.2 That means that the c++ that you have installed fails to build working binaries. This, in turn, most likely means that libgcc_s.so.1 was not found. To correct this, either - install libgcc_s.so.1 into /usr/local/lib, and re-run ldconfig, or - add the path that has libgcc_s.so.1 to /etc/ld.so.conf, and re-run ldconfig. Alternatively, configure --without-cxx. Regards, Martin From guido@python.org Fri Aug 16 20:08:58 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 16 Aug 2002 15:08:58 -0400 Subject: [Python-Dev] Alternative implementation of interning In-Reply-To: Your message of "Fri, 16 Aug 2002 12:44:22 EDT." Message-ID: <200208161908.g7GJ8xU20237@pcp02138704pcs.reston01.va.comcast.net> [I wrote] > Maybe we can add even detect the abusing cases by putting a test in > PyString_InternInPlace() like this: > > if (s->ob_refcnt == 1) { > PyErr_Warn(PyExc_DeprecationWarning, > "interning won't keep your string alive"); > PyErr_Clear(); /* In case the warning was an error, ignore it */ > Py_INCREF(s); /* Make s immortal */ > } I tried this, and alas it doesn't work; there are many legit places where there's only one reference. So we'll have to use more traditional ways of tracking down C code that makes assumptions of immortality so it can drop its own reference. (Apart from getclassname() I've seen none.) --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Fri Aug 16 20:09:17 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Fri, 16 Aug 2002 15:09:17 -0400 Subject: [Python-Dev] pystone(object) Message-ID: <15709.19933.149356.178760@slothrop.zope.com> Anyone interested in making pystone use a new-style class? I just tried it and it slowed pystone down by 12%. Using __slots__ bought back 5%. On the one hand, we end up comparing the new-style class implementation of one Python with the classic class version of older Pythons. On the other hand, we seems to think that new-style classes are preferred. I think we ought to measure them. Jeremy Index: pystone.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/test/pystone.py,v retrieving revision 1.7 diff -c -c -r1.7 pystone.py *** pystone.py 6 Aug 2002 17:21:20 -0000 1.7 --- pystone.py 16 Aug 2002 18:49:56 -0000 *************** *** 40,46 **** [Ident1, Ident2, Ident3, Ident4, Ident5] = range(1, 6) ! class Record: def __init__(self, PtrComp = None, Discr = 0, EnumComp = 0, IntComp = 0, StringComp = 0): --- 40,46 ---- [Ident1, Ident2, Ident3, Ident4, Ident5] = range(1, 6) ! class Record(object): def __init__(self, PtrComp = None, Discr = 0, EnumComp = 0, IntComp = 0, StringComp = 0): From martin@v.loewis.de Fri Aug 16 20:12:07 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 16 Aug 2002 21:12:07 +0200 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> Message-ID: Andrew Koenig writes: > #0 __register_frame_info_bases (begin=0xfed50000, ob=0xfed50000, tbase=0x0, > dbase=0x0) at /tmp/build1165/gcc-3.1.1/gcc/unwind-dw2-fde.c:83 That is the initialization for exception handling regions (which is irrelevant for C, but linked into every shared library just in case C++ objects are also present). My guess is that you have been using the system linker to link this binary (struct.so). Regards, Martin From ark@research.att.com Fri Aug 16 20:13:53 2002 From: ark@research.att.com (Andrew Koenig) Date: 16 Aug 2002 15:13:53 -0400 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161734.g7GHY5421747@europa.research.att.com> <200208161736.g7GHaee16089@pcp02138704pcs.reston01.va.comcast.net> <200208161752.g7GHqJs21999@europa.research.att.com> Message-ID: Martin> You can't do that (if installing 2.12.1 means to downgrade from Martin> 2.13). gcc configuration analyses features of binutils at configure Martin> time, and relies on those features to be present at run-time. Martin> Are you sure that gcc picks up the binutils you had installed when you Martin> configured gcc? In particular, what happens if you do Martin> gcc --print-prog-name=as Martin> gcc --print-prog-name=ld Martin> Are those the once that you had in PATH when configuring? Yes. The way I install stuff on this particular machine is to build each package (gcc, binutils, etc.) in a completely separate directory, then make symbolic links to that directory from a common directory in which everything is actually executed. So gcc always thinks the linker is in a single place, and "installing binutils 2.12.1" means removing all the symlinks to the version of binutils that was previously in place and making new symlinks to the binutils 2.12.1 binaries. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From guido@python.org Fri Aug 16 20:13:57 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 16 Aug 2002 15:13:57 -0400 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: Your message of "Fri, 16 Aug 2002 14:43:05 EDT." <200208161843.g7GIh5X22326@europa.research.att.com> References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161843.g7GIh5X22326@europa.research.att.com> Message-ID: <200208161913.g7GJDvQ20808@pcp02138704pcs.reston01.va.comcast.net> > As I sort of expected, this makes the crash go away. However, it is now > replaced by lots of messages like > > building 'grp' extension > gcc -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -I. -I/export/spurr1/homes1/ark/Python-2.2.1/./Include -I/usr/local/include -IInclude/ -c /export/spurr1/homes1/ark/Python-2.2.1/Modules/grpmodule.c -o build/temp.solaris-2.7-sun4u-2.2/grpmodule.o > gcc -shared build/temp.solaris-2.7-sun4u-2.2/grpmodule.o -L/usr/local/lib -o build/lib.solaris-2.7-sun4u-2.2/grp.so > WARNING: removing "grp" since importing it failed Yeah, you'd have to enable all the modules you're interested in by editing Modules/Setup. That's a pain, which is why we generally use dynamic loading. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Fri Aug 16 20:15:35 2002 From: ark@research.att.com (Andrew Koenig) Date: Fri, 16 Aug 2002 15:15:35 -0400 (EDT) Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: (martin@v.loewis.de) References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> Message-ID: <200208161915.g7GJFZb22506@europa.research.att.com> Martin> Andrew Koenig writes: >> #0 __register_frame_info_bases (begin=0xfed50000, ob=0xfed50000, tbase=0x0, >> dbase=0x0) at /tmp/build1165/gcc-3.1.1/gcc/unwind-dw2-fde.c:83 Martin> That is the initialization for exception handling regions Martin> (which is irrelevant for C, but linked into every shared Martin> library just in case C++ objects are also present). Martin> My guess is that you have been using the system linker to link Martin> this binary (struct.so). Seems unlikely -- the system linker isn't in my search path and "ls -ltu" shows that it hasn't been executed in 10 days. From martin@v.loewis.de Fri Aug 16 20:17:19 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 16 Aug 2002 21:17:19 +0200 Subject: [Python-Dev] Alternative implementation of interning In-Reply-To: <200208161644.g7GGiMm13995@pcp02138704pcs.reston01.va.comcast.net> References: <200208151620.g7FGKb216411@odiug.zope.com> <20020816152258.GA99140@hishome.net> <200208161644.g7GGiMm13995@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido van Rossum writes: > Maybe we can add even detect the abusing cases by putting a test in > PyString_InternInPlace() like this: > > if (s->ob_refcnt == 1) { > PyErr_Warn(PyExc_DeprecationWarning, > "interning won't keep your string alive"); > PyErr_Clear(); /* In case the warning was an error, ignore it */ > Py_INCREF(s); /* Make s immortal */ > } I believe this will trigger very often; the first usage of InternFromString (for a certain string) will likely trigger it. If people do PyObject *__foo__; int init(){ __foo__ = PyString_InternFromString("__foo__"); } then this is perfectly safe: you get an extra reference back (on top of ones that the intern dictionary just stops holding); you can keep this reference as long as you want. So I would agree with your analysis that this will cause only few problems. Unfortunately, those will be hard to track down. Regards, Martin From zack@codesourcery.com Fri Aug 16 20:47:39 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Fri, 16 Aug 2002 12:47:39 -0700 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <200208161843.g7GIh5X22326@europa.research.att.com> References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161843.g7GIh5X22326@europa.research.att.com> Message-ID: <20020816194739.GD30840@codesourcery.com> --2fHTh5uZTiUOsy+g Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Fri, Aug 16, 2002 at 02:43:05PM -0400, Andrew Koenig wrote: > Zack> Can you disable all use of dynamic loading and try the build > Zack> again? Unfortunately, the only practical way to do this seems > Zack> to be to edit configure.in and force DYNLOADFILE to be > Zack> dynload_stub.o (right before the line saying > Zack> AC_MSG_RESULT($DYNLOADFILE)), then regenerate configure. (Might > Zack> be a good idea to add an --enable switch.) > > As I sort of expected, this makes the crash go away. Okay, so that pretty much guarantees it's a bug in the toolchain, or in Solaris ld.so. > If I put back the dynamic loading stuff and rebuild everything from scratch, > I again get a python that crashes when I try to import struct. > It occurs to me that the traceback from that might be useful. > Needless to say, it is much shorter than the earlier one. > > I must say that the "Address 0x2 out of bounds" note makes me suspicious. That's almost certainly GDB screwing up. In any case, dynload_shlib.c's version of _PyImport_GetDynLoadFunc does not use that argument, so that can't be the cause of the problem. > #0 __register_frame_info_bases (begin=0xfed40000, ob=0xfed40000, tbase=0x0, > dbase=0x0) at /tmp/build1165/gcc-3.1.1/gcc/unwind-dw2-fde.c:83 Having tbase and dbase be 0x0 is not a problem. The begin and ob pointers should _not_ be the same. ob should point to a fairly large data object in the .bss section, and begin should point to the beginning of the .eh_frame section. This could be GDB screwing up again, but unwind-dw2-fde.c is compiled with less aggressive optimization than dynload_shlib.c, so it's more likely to be accurate. Also, this particular screw-up is a plausible linker or dynamic linker bug. I suspect struct.so was loaded at address 0xfed40000, which leaves both these pointers aimed at the beginning of the (unwritable) .text section -- __r_f_i_b tries to modify the object pointed to by ob, crash. Please execute the attached shell script with CC set to your test gcc 3.2 and/or binutils 2.13.x installation and see what happens. If we do have a toolchain bug, it ought to be provoked by this test. zw --2fHTh5uZTiUOsy+g Content-Type: application/x-sh Content-Disposition: attachment; filename="test.sh" Content-Transfer-Encoding: quoted-printable #! /bin/sh=0A=0Amkdir /tmp/t.$$ || exit 3=0Acd /tmp/t.$$ || exit 3=0A= =0Acat >main.c <<'EOF'=0A#include =0A#include =0A=0Aint m= ain(void)=0A{=0A void *handle, *sym;=0A char *error;=0A=0A puts("c= alling dlopen");=0A handle =3D dlopen("./dyn.so", RTLD_NOW);=0A if (!= handle) {=0A printf("%s\n", dlerror());=0A return 1;=0A }=0A=0A = puts("calling dlsym");=0A sym =3D dlsym(handle, "sym");=0A if ((err= or =3D dlerror()) !=3D 0) {=0A printf("%s\n", error);=0A return 1;= =0A }=0A puts("calling sym");=0A ((void (*)(void))sym)();=0A pu= ts("done");=0A return 0;=0A}=0AEOF=0A=0Acat >dyn.c <<'EOF'=0A#include =0Avoid sym(void)=0A{=0A puts("in sym");=0A}=0AEOF=0A=0A[ -n "$SH= FLAGS" ] || SHFLAGS=3D"-fPIC -shared"=0A[ -n "$CC" ] || CC=3Dgcc=0A=0Aset = -x=0A=0A$CC $CFLAGS $SHFLAGS dyn.c -o dyn.so=0A$CC $CFLAGS main.c -o main -= ldl=0A=0A./main || exit $?=0A=0Acd /tmp=0Arm -rf t.$$=0A --2fHTh5uZTiUOsy+g-- From zack@codesourcery.com Fri Aug 16 20:51:15 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Fri, 16 Aug 2002 12:51:15 -0700 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <200208161903.g7GJ3f122409@europa.research.att.com> References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161815.g7GIFaw22059@europa.research.att.com> <200208161851.g7GIpIE19367@pcp02138704pcs.reston01.va.comcast.net> <200208161903.g7GJ3f122409@europa.research.att.com> Message-ID: <20020816195115.GE30840@codesourcery.com> On Fri, Aug 16, 2002 at 03:03:41PM -0400, Andrew Koenig wrote: > However, when I write a little C program that just calls dlopen with the > file in question: > > #include > void *dlopen(const char *, int); > > main() > { > void *handle = dlopen("/export/spurr1/homes1/ark/test-python/Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/struct.so", 2); > printf ("Handle = %x\n", handle); > } > > it quietly succeds and prints "Handle = 0" Handle = 0 indicates a *failure*. Either try the test script I sent you, or change your test program to read #include #include main() { void *handle = dlopen("/export/spurr1/homes1/ark/test-python/" "Python-2.2.1/build/lib.solaris-2.7-sun4u-2.2/" "struct.so", 2); printf ("Handle = %x\n", handle); if (handle == 0) printf("Error: %s\n", dlerror(); } and try it again, or, better, both. zw From zack@codesourcery.com Fri Aug 16 20:54:34 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Fri, 16 Aug 2002 12:54:34 -0700 Subject: [Python-Dev] Re: tempfile.py In-Reply-To: <200208141259.g7ECxiL00996@pcp02138704pcs.reston01.va.comcast.net> References: <200208141259.g7ECxiL00996@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020816195434.GG30840@codesourcery.com> On Wed, Aug 14, 2002 at 08:59:44AM -0400, Guido van Rossum wrote: > The mkstemp() function in the rewritten tempfile has an argument with > a curious name and default: binary=True. This caused confusion (even > the docstring in the original patch was confused :-). It would be > much easier to explain if this was changed to text=False. That is, to > deviate from the default mode, i.e. use text mode, you'll have to > write mkstemp(text=True) rather than mkstemp(binary=False). I see you've already done this, but in any case I do think it's a good idea. zw From ark@research.att.com Fri Aug 16 20:54:47 2002 From: ark@research.att.com (Andrew Koenig) Date: Fri, 16 Aug 2002 15:54:47 -0400 (EDT) Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <20020816195115.GE30840@codesourcery.com> (message from Zack Weinberg on Fri, 16 Aug 2002 12:51:15 -0700) References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161815.g7GIFaw22059@europa.research.att.com> <200208161851.g7GIpIE19367@pcp02138704pcs.reston01.va.comcast.net> <200208161903.g7GJ3f122409@europa.research.att.com> <20020816195115.GE30840@codesourcery.com> Message-ID: <200208161954.g7GJslF22751@europa.research.att.com> Zack> Handle = 0 indicates a *failure*. Yeah, but it didn't crash. I'll try your other stuff after I'm out of a meeting that I'm in now. From martin@v.loewis.de Fri Aug 16 20:55:14 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 16 Aug 2002 21:55:14 +0200 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <200208161915.g7GJFZb22506@europa.research.att.com> References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <200208161915.g7GJFZb22506@europa.research.att.com> Message-ID: Andrew Koenig writes: > Seems unlikely -- the system linker isn't in my search path > and "ls -ltu" shows that it hasn't been executed in 10 days. Absence in the search path is irrelevant - gcc just knows that the system linker is in /usr/ccs/bin/ld (see gcc/config/svr4.h). It would help enourmously if you'ld focus on a single failing scenario, and, for this scenario, would confirm that the binutils that you had at configuration time of your compiler are also the ones that it uses at run-time. To confirm the latter, it would help if you would report the output that you get from adding "-v" to one of the linker lines (e.g. for struct.so). Regards, Martin From guido@python.org Fri Aug 16 20:55:52 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 16 Aug 2002 15:55:52 -0400 Subject: [Python-Dev] pystone(object) In-Reply-To: Your message of "Fri, 16 Aug 2002 15:09:17 EDT." <15709.19933.149356.178760@slothrop.zope.com> References: <15709.19933.149356.178760@slothrop.zope.com> Message-ID: <200208161955.g7GJtqv21741@pcp02138704pcs.reston01.va.comcast.net> > Anyone interested in making pystone use a new-style class? I just > tried it and it slowed pystone down by 12%. Using __slots__ bought > back 5%. Yeah, I've noticed the same (though I think I got back more than 5% with slots). > On the one hand, we end up comparing the new-style class > implementation of one Python with the classic class version of older > Pythons. On the other hand, we seems to think that new-style classes > are preferred. I think we ought to measure them. It'll be hard to close the gap completely. For new-style classes, instance getattr and setattr operations requires at least two dict lookups ops: it must look in the instance dict as well as in the class dict (and in the base classes, in MRO order). This is so that properties (and other descriptors that support the __set__ protocol) can override instance variables: setattr can't just store into the instance dict, it has to check for a property first; and similar for getattr (it shouldn't trust the instance dict unless there's nothing in the class). Slots can get you back most of this, but not all. Dict lookup is already extremely tight code, and when I profiled this, most of the time was spent there -- twice as many lookup calls using new-style classes than for classic classes. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Aug 16 22:54:53 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 16 Aug 2002 17:54:53 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib Message-ID: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> (I know there's a sets list at SF, but SF refuses mail from this machine, and that list is essentially dead.) In CVS at python/nondist/sandbox/sets/ you'll find a module set.py (and its unit tests) that implement a versatile Set class roughly according to PEP 218. This code has three authors: Greg V. Wilson wrote the first version; Alex Martelli changed the strategy around (im)mutability and removed inheritance from dict in favor of having a dict attribute; and I cleaned it up and changed some implementation approaches to what I think are faster algorithms (at the cost of somewhat more verbose code), also changing the API a little bit. I'd like to do a little more cleanup and then move it into the standard library. The API is roughly that of PEP 218, although it supports several methods and operations that aren't listed in the PEP, such as comparisons and subset/superset tests. I'm not sure whether I want to go fix the PEP to match this implementation, or whether to skip that step and simply document the module in the standard library docs (that has to be done anyway). The biggest difference with the PEP is a different approach to the (im)mutability issue. This problem is that on the one hand, sets are containers that hold a potentially large number of elements, and as such they ought to be mutable: you should be able to start with an empty set and then add elements to it one at a time. On the other hand, the only efficient implementation strategy is to represent a set internally as the keys of a dictionary whose values are ignored. This way insertion and removal can be done very efficiently, at the cost of a restriction: set elements must be immutable. The implication is that you can't have sets of sets -- but occasionally those are handy! The PEP proposed a strategy based on "freezing" a set as soon as it is incorporated into another set; but in practice this didn't work out very well, in part because the test 's1 in s2' would cause s1 to be frozen: its hash value has to be computed to implement the membership test, and computing the hash value signals freezing the set. This caused too many surprises. Alex Martelli implemented a different strategy (see python.org/sf/580995): there are two set classes, (mutable) Set and ImmutableSet. They derive from a common base class (BaseSet, an abstract class) that implements most set operations. ImmutableSet adds __hash__, and Set adds mutating operations like update(), clear(), in-place union, and so on. The ImmutableSet constructor accepts a sequence or another set as its argument. While this is an easy enough way to construct an immutable set from a mutable set, for added convenience, if a mutable set is added to another set (except in the constructor), an immutable shallow copy is automatically made under the covers and added instead. Also, when 's1 in s2' is requested and s1 is a mutable set, it is wrapped in a temporarily-immutable wrapper class (an internal class that is not exposed) which compares equal to s1 and has a hash value equal to the hash value that would be computed for an immutable copy of s1. The temporary wrapper does not make a copy; none of the code here is thread-safe anyway, so if multiple threads are going to share a mutable set, they will have to use their own locks. I deviated from the original API in a few places: - I renamed the method is_subset_of() to issubset(), since I don't like underscores in method names, and think leaving the 'of' off will cause ambiguity; I also added issuperset(). - I renamed intersect() to intersection() and sym_difference() to symmetric_difference(); ditto for the corresponding _update() methods. - Alex's code explicitly disallowed in-place operators (e.g. __ior__ meaning in-place union) for immutable sets. I decided against this; instead, when s1 is a variable referencing an immutable set, the statement 's1 |= s2' will compute s1|s2 as an immutable set and store that in the variable s1. On the other hand, if s1 references a mutable set, 's1 |= s2' will update the object refereced by s1 in-place. This is similar to the behavior of lists and tuples under the += operator. - The left operand of a union, intersection or difference operation will decide the type of the result. This means that generally when you've got immutable sets, you'll produce more immutable sets; and when you've got mutable sets, you'll produce mutable sets. When you mix the two kinds, the left argument of binary operations wins. Some minor open questions: - The set constructors have an optional second argument, sort_repr, defaulting to False, which decides whether the elements are sorted when str() or repr() is taken. I'm not sure if there would be negative consequences of removing this argument and always sorting the string representation. It would simplify the interface and make the output nicer (since usually you don't think about setting this argument until it's too late). I don't think that the extra cost of sorting the list of elements will be a burden in practice, since normally one would only print small sets (if a set has thousands of elements, printing it isn't very user friendly). - I'd like to change the module name from set.py to sets.py. Somehow it makes more sense to write from sets import Set, ImmutableSet than from set import Set, ImmutableSet - I'm aware that this set implementation is not the be-all and end-all of sets. I've seen a set implementation written in C, but I was not very impressed -- it was a massive copy-paste-edit job done on the dict implementation, and we don't need such code duplication. But eventually there may be a C implementation which will change some implementation details. - Like other concrete types in Python, these Set classes aren't really designed to mix well with other set implementations. For example, sets of small cardinal numbers may be represented efficiently by long ints, but the union method currently requires that the other argument uses the same implementation. I'm not sure whether this will eventually require changing. Any comments? Or shall I just check this in? --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Fri Aug 16 22:59:05 2002 From: ark@research.att.com (Andrew Koenig) Date: Fri, 16 Aug 2002 17:59:05 -0400 (EDT) Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <20020816194739.GD30840@codesourcery.com> (message from Zack Weinberg on Fri, 16 Aug 2002 12:47:39 -0700) References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161843.g7GIh5X22326@europa.research.att.com> <20020816194739.GD30840@codesourcery.com> Message-ID: <200208162159.g7GLx5q24129@europa.research.att.com> Zack> Please execute the attached shell script with CC set to your test gcc Zack> 3.2 and/or binutils 2.13.x installation and see what happens. If we Zack> do have a toolchain bug, it ought to be provoked by this test. Excellent! $ sh test-dynload + gcc -fPIC -shared dyn.c -o dyn.so + gcc main.c -o main -ldl + ./main calling dlopen Segmentation Fault - core dumped + exit 139 Now to see if I can figure out why... From ark@research.att.com Fri Aug 16 22:49:13 2002 From: ark@research.att.com (Andrew Koenig) Date: Fri, 16 Aug 2002 17:49:13 -0400 (EDT) Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: (martin@v.loewis.de) References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <200208161915.g7GJFZb22506@europa.research.att.com> Message-ID: <200208162149.g7GLnDc24060@europa.research.att.com> Martin> Andrew Koenig writes: >> Seems unlikely -- the system linker isn't in my search path >> and "ls -ltu" shows that it hasn't been executed in 10 days. Martin> Absence in the search path is irrelevant - gcc just knows that the Martin> system linker is in /usr/ccs/bin/ld (see gcc/config/svr4.h). But the system linker has not been executed in 10 days. So whatever gcc might or might not know, that proves that gcc did not execute /usr/ccs/bin/ld during any of this testing. Martin> It would help enourmously if you'ld focus on a single failing Martin> scenario, and, for this scenario, would confirm that the Martin> binutils that you had at configuration time of your compiler Martin> are also the ones that it uses at run-time. Martin> To confirm the latter, it would help if you would report the Martin> output that you get from adding "-v" to one of the linker Martin> lines (e.g. for struct.so). which I do how, exactly? From zack@codesourcery.com Fri Aug 16 23:30:40 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Fri, 16 Aug 2002 15:30:40 -0700 Subject: [Python-Dev] A few lessons from the tempfile.py rewrite Message-ID: <20020816223040.GL30840@codesourcery.com> While doing the tempfile.py rewrite I discovered some places where improvements could be made to the rest of the standard library. I'd like to discuss these here. 1) Dummy threads module. Currently, a library module that wishes to be thread-safe but still work on platforms where threads are not implemented, has to jump through hoops. In tempfile.py we have | try: | import thread as _thread | _allocate_lock = _thread.allocate_lock | except (ImportError, AttributeError): | class _allocate_lock: | def acquire(self): | pass | release = acquire It would be nice if the thread and threading modules existed on all platforms, providing these sorts of dummy locks on the platforms that don't actually implement threads. I notice that Queue.py uses 'import thread' unconditionally -- perhaps this is already the case? I can't find any evidence of it. 2) pthread_once equivalent. pthread_once is a handy function in the C pthreads library which can be used to guarantee that some data object is initialized exactly once, and no thread sees it in a partially initialized state. I had to implement a fake version in tempfile.py. | _once_lock = _allocate_lock() | | def _once(var, initializer): | """Wrapper to execute an initialization operation just once, | even if multiple threads reach the same point at the same time. | | var is the name (as a string) of the variable to be entered into | the current global namespace. | | initializer is a callable which will return the appropriate initial | value for variable. It will be called only if variable is not | present in the global namespace, or its current value is None. | | Do not call _once from inside an initializer routine, it will deadlock. | """ | | vars = globals() | # Check first outside the lock. | if vars.get(var) is not None: | return | try: | _once_lock.acquire() | # Check again inside the lock. | if vars.get(var) is not None: | return | vars[var] = initializer() | finally: | _once_lock.release() I call it fake for three reasons. First, it should be using threading.RLock so that recursive calls do not deadlock. That's a trivial fix (this sort of high level API probably belongs in threading.py anyway). Second, it uses globals(), which means that all symbols it initializes live in the namespace of its own module, when what's really wanted is the caller's module. And most important, I'm certain that this interface is Not The Python Way To Do It. Unfortunately, I've not been able to figure out what the Python Way To Do It is, for this problem. 3) test_support.TestSkipped and unittest.py Simple - you can't use TestSkipped in a unittest.py-based test set. This is a missing feature of unittest, which has no notion of skipping a given test. Any exception thrown from inside one of its test routines is taken to indicate a failure. I think the right fix here is to add a skip() method to unittest.TestCase which works with both a bare unittest.py-based test framework, and Python's own test_support.py. Thoughts? zw From tim.one@comcast.net Fri Aug 16 23:32:15 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 16 Aug 2002 18:32:15 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido] > ... > Some minor open questions: > > - The set constructors have an optional second argument, sort_repr, > defaulting to False, which decides whether the elements are sorted > when str() or repr() is taken. I'm not sure if there would be > negative consequences of removing this argument and always sorting > the string representation. I'd rather you left this alone. Being sortable is a much stronger requirement on set elements than just supporting __hash__ and __eq__, and, e.g., it would suck if I could create a set of complex numbers but couldn't print it(!). > It would simplify the interface and make the output nicer (since > usually you don't think about setting this argument until it's too > late). I don't think that the extra cost of sorting the list of > elements will be a burden in practice, since normally one would only > print small sets (if a set has thousands of elements, printing it > isn't very user friendly). I think you could (and should ) get 95% of the benefit here by changing the sort_repr default to True. I'm happy to say False when I know I've got unsortable keys. > - I'd like to change the module name from set.py to sets.py. Somehow > it makes more sense to write > > from sets import Set, ImmutableSet > > than > > from set import Set, ImmutableSet Cool. > - I'm aware that this set implementation is not the be-all and end-all > of sets. That's OK -- there is no such set implementation in existence. This one covers all simple uses, and many advanced uses -- go for it! > I've seen a set implementation written in C, but I was not very > impressed -- it was a massive copy-paste-edit job done on the > dict implementation, and we don't need such code duplication. Ditto (& I've said so before about what was most likely the same implementation). > ... > - Like other concrete types in Python, these Set classes aren't really > designed to mix well with other set implementations. For example, > sets of small cardinal numbers may be represented efficiently by > long ints, but the union method currently requires that the other > argument uses the same implementation. I'm not sure whether this > will eventually require changing. We could rehabilitate __coerce__ in a hypergeneralized form . From zack@codesourcery.com Fri Aug 16 23:35:38 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Fri, 16 Aug 2002 15:35:38 -0700 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <200208162159.g7GLx5q24129@europa.research.att.com> References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161843.g7GIh5X22326@europa.research.att.com> <20020816194739.GD30840@codesourcery.com> <200208162159.g7GLx5q24129@europa.research.att.com> Message-ID: <20020816223538.GM30840@codesourcery.com> On Fri, Aug 16, 2002 at 05:59:05PM -0400, Andrew Koenig wrote: > Zack> Please execute the attached shell script with CC set to your test gcc > Zack> 3.2 and/or binutils 2.13.x installation and see what happens. If we > Zack> do have a toolchain bug, it ought to be provoked by this test. > > Excellent! > > $ sh test-dynload > + gcc -fPIC -shared dyn.c -o dyn.so > + gcc main.c -o main -ldl > + ./main > calling dlopen > Segmentation Fault - core dumped > + exit 139 Bingo. That demonstrates conclusively that this isn't a Python bug. Please repeat the test like so: $ CC="gcc -v" sh test-dynload Send the complete output, the result of "uname -a", and the script itself to both gcc-bugs@gcc.gnu.org and bug-binutils@gnu.org. And I think we can stop bothering python-dev. zw From ark@research.att.com Fri Aug 16 23:42:13 2002 From: ark@research.att.com (Andrew Koenig) Date: Fri, 16 Aug 2002 18:42:13 -0400 (EDT) Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <20020816223538.GM30840@codesourcery.com> (message from Zack Weinberg on Fri, 16 Aug 2002 15:35:38 -0700) References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <200208161843.g7GIh5X22326@europa.research.att.com> <20020816194739.GD30840@codesourcery.com> <200208162159.g7GLx5q24129@europa.research.att.com> <20020816223538.GM30840@codesourcery.com> Message-ID: <200208162242.g7GMgDI24489@europa.research.att.com> Zack> Bingo. That demonstrates conclusively that this isn't a Python bug. Zack> Please repeat the test like so: Zack> $ CC="gcc -v" sh test-dynload Zack> Send the complete output, the result of "uname -a", and the script Zack> itself to both gcc-bugs@gcc.gnu.org and bug-binutils@gnu.org. And I Zack> think we can stop bothering python-dev. Will do. Thanks for the help! From greg@cosc.canterbury.ac.nz Sat Aug 17 03:08:34 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Sat, 17 Aug 2002 14:08:34 +1200 (NZST) Subject: [Python-Dev] type categories In-Reply-To: Message-ID: <200208170208.g7H28YQ15597@cosc.canterbury.ac.nz> Andrew Koenig : > if I know that modules x and y overload the same function, > and I want to be sure that x's case is tested first, one would think I > could ensure it by writing > > import x, y > > But in fact I can't, because someone else may have imported y already, > in which case the second import is a no-op. So far no-one has addressed the other importing problem I mentioned, which is how to ensure that the relevant modules get imported *at all*. Currently in Python, a module gets imported because some other module needs to use a name from it. If no other module needs to do so, the module is not needed. But with generic functions, this will no longer be true. It will be possible for a module to be needed by the system as a whole, yet no other module knows that it is needed! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From md9ms@mdstud.chalmers.se Sat Aug 17 07:57:14 2002 From: md9ms@mdstud.chalmers.se (Martin =?ISO-8859-1?Q?Sj=F6gren?=) Date: 17 Aug 2002 08:57:14 +0200 Subject: [Python-Dev] A few lessons from the tempfile.py rewrite In-Reply-To: <20020816223040.GL30840@codesourcery.com> References: <20020816223040.GL30840@codesourcery.com> Message-ID: <1029567435.597.3.camel@winterfell> l=C3=B6r 2002-08-17 klockan 00.30 skrev Zack Weinberg: > 2) pthread_once equivalent. >=20 > pthread_once is a handy function in the C pthreads library which > can be used to guarantee that some data object is initialized exactly > once, and no thread sees it in a partially initialized state. I had > to implement a fake version in tempfile.py. >=20 > | _once_lock =3D _allocate_lock() > |=20 > | def _once(var, initializer): > | """Wrapper to execute an initialization operation just once, > | even if multiple threads reach the same point at the same time. > |=20 > | var is the name (as a string) of the variable to be entered into > | the current global namespace. > |=20 > | initializer is a callable which will return the appropriate initial > | value for variable. It will be called only if variable is not > | present in the global namespace, or its current value is None. > |=20 > | Do not call _once from inside an initializer routine, it will deadl= ock. > | """ > |=20 > | vars =3D globals() > | # Check first outside the lock. > | if vars.get(var) is not None: > | return Wouldn't it make more sense to use has_key (or 'in')? If var is assigned to None before _once is called, that value is overridden... > | try: > | _once_lock.acquire() > | # Check again inside the lock. > | if vars.get(var) is not None: > | return > | vars[var] =3D initializer() > | finally: > | _once_lock.release() I agree that pthread_once is useful, and it would be nice to have something like this in the standard library. Regards, Martin From martin@v.loewis.de Sat Aug 17 08:29:37 2002 From: martin@v.loewis.de (Martin v. Loewis) Date: 17 Aug 2002 09:29:37 +0200 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <200208162149.g7GLnDc24060@europa.research.att.com> References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <200208161915.g7GJFZb22506@europa.research.att.com> <200208162149.g7GLnDc24060@europa.research.att.com> Message-ID: Andrew Koenig writes: > Martin> To confirm the latter, it would help if you would report the > Martin> output that you get from adding "-v" to one of the linker > Martin> lines (e.g. for struct.so). > > which I do how, exactly? You copy the line that is used to link struct.so, e.g. gcc -shared build/temp.linux-i686-2.3/structmodule.o -L/usr/local/lib -o build/lib.linux-i686-2.3/struct.so and add a -v option to it. Actually, adding -Wl,-V is even better; on my system, this gives GNU ld version 2.11.92.0.10 20011021 (SuSE) Supported emulations: elf_i386 i386linux Regards, Martin From guido@python.org Sat Aug 17 13:14:47 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 17 Aug 2002 08:14:47 -0400 Subject: [Python-Dev] A few lessons from the tempfile.py rewrite In-Reply-To: Your message of "Fri, 16 Aug 2002 15:30:40 PDT." <20020816223040.GL30840@codesourcery.com> References: <20020816223040.GL30840@codesourcery.com> Message-ID: <200208171214.g7HCEmW30342@pcp02138704pcs.reston01.va.comcast.net> > 1) Dummy threads module. > > Currently, a library module that wishes to be thread-safe but still > work on platforms where threads are not implemented, has to jump > through hoops. In tempfile.py we have > > | try: > | import thread as _thread > | _allocate_lock = _thread.allocate_lock > | except (ImportError, AttributeError): > | class _allocate_lock: > | def acquire(self): > | pass > | release = acquire > > It would be nice if the thread and threading modules existed on all > platforms, providing these sorts of dummy locks on the platforms that > don't actually implement threads. I notice that Queue.py uses 'import > thread' unconditionally -- perhaps this is already the case? I can't > find any evidence of it. The Queue module was never intended for use in an unthreaded program; it's specifically intended for safe communication between threads. So if you import Queue on a threadless platform, the import thread will fail for a good reason. The question with a dummy module is, how far do you want it to go? I guess one thing we could do would be to always make the thread module available, and implement a dummy lock. The lock would have acquire and release methods that simply set and test a flag; acquire() raises an exception the flag is already set (to simulate deadlock), and release() raises an exception if it isn't set. But it should *not* provide start_new_thread; this can be used as a flag to indicate the real presence of threads. Then the threading module would import successfully, but calling start() on a Thread object would fail. Hm, I'm not sure if I like that; maybe instantiating the Thread class should fail already. There's a backwards incompatibility problem. There is code that currently tries to import the thread or threading module and if this succeeds expects it can use threads (not just locks), and has an alternative implementation if threads do not exist. Such code would break if the thread module always existed. How about a compromise: we can put a module dummy_thread.py in the standard library, and at the top of tempfile.py, you can write try: import thread as _thread except ImportError: import dummy_thread as _thread _allocate_lock = _thread.allocate_lock If you can provide an implementation of dummy_thread.py I'll gladly check it in. > 2) pthread_once equivalent. > > pthread_once is a handy function in the C pthreads library which > can be used to guarantee that some data object is initialized exactly > once, and no thread sees it in a partially initialized state. I had > to implement a fake version in tempfile.py. > > | _once_lock = _allocate_lock() > | > | def _once(var, initializer): > | """Wrapper to execute an initialization operation just once, > | even if multiple threads reach the same point at the same time. > | > | var is the name (as a string) of the variable to be entered into > | the current global namespace. > | > | initializer is a callable which will return the appropriate initial > | value for variable. It will be called only if variable is not > | present in the global namespace, or its current value is None. > | > | Do not call _once from inside an initializer routine, it will deadlock. > | """ > | > | vars = globals() > | # Check first outside the lock. > | if vars.get(var) is not None: > | return (Martin Sj. commented at this line that this would overwrite var if it was defined as None; from the docstring I gather that that's intentional behavior.) > | try: > | _once_lock.acquire() IMO this has a subtle bug: the acquire() should come *before* the try: call. If for whatever reason acquire() fails, you'd end up doing a release() on a lock you didn't acquire. It's true that l.acquire() try: is not atomic since a signal handler could raise an exception between them, but there's a race condition either way, and I don't know how to fix them both at the same time (not without adding a construct to shield signals, which IMO is overkill -- be Pythonic, live dangerously, accept the risk that a ^C can screw you. It can anyway. :-) > | # Check again inside the lock. > | if vars.get(var) is not None: > | return > | vars[var] = initializer() > | finally: > | _once_lock.release() > > I call it fake for three reasons. First, it should be using > threading.RLock so that recursive calls do not deadlock. That's a > trivial fix (this sort of high level API probably belongs in > threading.py anyway). What's the use case for that? Surely an initialization function can avoid calling itself. I'd say YAGNI. > Second, it uses globals(), which means that all > symbols it initializes live in the namespace of its own module, when > what's really wanted is the caller's module. And most important, I'm > certain that this interface is Not The Python Way To Do It. > Unfortunately, I've not been able to figure out what the Python Way To > Do It is, for this problem. In the case of tempfile.py, I think the code will improve in clarity if you simply write it out. I tried this and it saved 10 lines of code (mostly the docstring in _once() -- but that's fair game, since _once embodies more tricks than the expanded code). In addition, since gettempdir() is called for the default argument value of mkstemp(), it would be much simpler to initialize tempdir at module initialization time; the module initialization is guaranteed to run only once. If I do this, I save another 8 lines; but I believe you probably wanted to postpone calling gettempdir() until any of the functions that have gettempdir() as their argument get *called*, which means that in fact all those functions have to be changed to have None for their default and insert if dir is None: dir = gettempdir() at the top of their bodies. > 3) test_support.TestSkipped and unittest.py > > Simple - you can't use TestSkipped in a unittest.py-based test set. > This is a missing feature of unittest, which has no notion of skipping > a given test. Any exception thrown from inside one of its test > routines is taken to indicate a failure. > > I think the right fix here is to add a skip() method to > unittest.TestCase which works with both a bare unittest.py-based test > framework, and Python's own test_support.py. Maybe you can bring this one up in the PyUnit list? I don't know where that is, but we're basically tracking Steve Purcell's code. Maybe he has a good argument against this feature, or a better way to do it; or maybe he thinks it is cool. Personally, I think the thing to do is put tests that can't always run in a separate test suite class and only add that class to the list of suites when applicable. I see you *almost* stumbled upon this idiom with the dummy_test_TemporaryFile; but that approach seems overkill: why not simply skip test_TemporaryFile when it's the same as NamedTemporaryFile? --Guido van Rossum (home page: http://www.python.org/~guido/) From pedroni@inf.ethz.ch Sat Aug 17 13:17:26 2002 From: pedroni@inf.ethz.ch (Samuele Pedroni) Date: Sat, 17 Aug 2002 14:17:26 +0200 Subject: [Python-Dev] multimethods: missing parts (was: type categories) Message-ID: <002801c245e8$0cf1d620$6d94fea9@newmexico> [Greg Ewing] >So far no-one has addressed the other importing problem >I mentioned, which is how to ensure that the relevant >modules get imported *at all*. We were busy agreeing on the basic stuff :(. > >Currently in Python, a module gets imported because >some other module needs to use a name from it. If no >other module needs to do so, the module is not needed. > >But with generic functions, this will no longer be >true. It will be possible for a module to be needed >by the system as a whole, yet no other module knows >that it is needed! What a dramatic personification ! Yes it's a real bookkeeping problem. Sidenote: Common Lisp systems deliver programs as whole system images, and one often uses an image to store development snapshots. So it's not really an issue for the program at startup. Libraries nevertheless should come with "scripts" that assure that all their relevant parts are loaded. OTOH as long as a programmer immediately loads his definitions/redefinitions and use the image for snapshots he can forget about the issue. But even he should still care about being able to reload/reconstruct the system from the source files and from scratch. So in the end the problem is not completely only a Python problem. At the moment the following bookkeeping rules come to my mind to address the issue: - a module in a library that exposes a generic function defined by the library (defgeneric) should make sure that all the definitions of methods in the library are already added to the generic function once the generic function is exposed. - a module in a library that exposes classes for which it adds specialized methods to a generic function defined in and imported from another library, should assure that once the class instances can be obtained the specialized versions are already added. It seems to me that this should cover most of the sane cases. Recap/ Issues' orthogonal and not so orthogonal decomposition: - Support for multimethods can be written in (pure) Python today, at this level the question is whether to have such support in the std lib or not; - if my understanding is correct Dave Abrahams wants multimethod dispatch as the moral equivalent of static overloading, for that use gf-method definitions would in most cases be just in one place (especially if what we are "overloading" is a normal class method) and the generic function is expected to be used just as a normal function; - nice to have: dispatch on protocols/categories; this intersect a well-known everrecurrent issue; - how much should multimethods become part of the language? are they a useful addition? what about newbies? can multimethods be made play nice with the rest of the language (especially with single dispatch and normal class methods [*])? will they eventually deserve syntax sugar? are they a Py3K thing? and-who-put-that-multimethods'-evangelist-hat-on-my-head-?-[and-I-had not-much-time-(still-true)]-who?-and-now-I-will-hypnotize-you-all-and-everybody -should-buy-the-AMOP-ly y'rs - Samuele. [*] at the moment personally I see two possibilities: - have a type of generic function that can live inside a class (namespace) be able to redispatch (maybe just only if appropriately configurated) to superclasses' methods if no matching gf-method is found. [after, before, around methods would be difficult (impossible?) to implement with the right semantics] - redefine the normal dispatch rules (obj.meth(...)) in order to directly take care of generic functions inside classes [larger change] From ark@research.att.com Sat Aug 17 14:29:32 2002 From: ark@research.att.com (Andrew Koenig) Date: Sat, 17 Aug 2002 09:29:32 -0400 (EDT) Subject: [Python-Dev] Python build trouble with the new gcc/binutils -- last word for now (I hope) In-Reply-To: (martin@v.loewis.de) References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <200208161915.g7GJFZb22506@europa.research.att.com> <200208162149.g7GLnDc24060@europa.research.att.com> Message-ID: <200208171329.g7HDTWM05979@europa.research.att.com> After trying various long-running tests on various machines, I have determined the following: 1) The culprit appears to be binutils 2.13, not gcc 3.2. If I use binutils 2.13 with gcc 3.2 or gcc 3.1.1, Python does not install. If I use binutils 2.12.1 with gcc 3.2 or gcc 3.1.1, Python installs. What was misleading me yesterday was some kind of version skew, perhaps caused by compiling gcc 3.2 with binutils 2.13; when I recompiled gcc 3.2 after installing binutils 2.12.1, all was well. 2) The little dynamic-linker test program is a reliable indicator for this problem. I have filed a binutils bug report. I will also file a sourceforge bug report for python, suggesting that the dynamic-linker test program should be included as part of the configuration process, as an early warning against this problem. Thanks for all the help! From tim.one@comcast.net Sat Aug 17 16:12:13 2002 From: tim.one@comcast.net (Tim Peters) Date: Sat, 17 Aug 2002 11:12:13 -0400 Subject: [Python-Dev] A few lessons from the tempfile.py rewrite In-Reply-To: <20020816223040.GL30840@codesourcery.com> Message-ID: [Zack Weinberg] > ... > 2) pthread_once equivalent. > > pthread_once is a handy function in the C pthreads library which > can be used to guarantee that some data object is initialized exactly > once, and no thread sees it in a partially initialized state. I don't know that it comes up enough in Python to bother doing something about it -- as Guido said, there's an import lock under the covers that ensures only one thread executes module init code (== all "top level" code in a module). So modules that need one-shot initialization can simply do it at module level. tempfile has traditionally gone overboard in avoiding use of this feature, though. A more Pythonic approach may be gotten via emulating pthread_once more closely, forgetting the "data object" business in favor of executing arbitrary functions "just once". Like so, maybe: def do_once(func, lock=threading.RLock(), done={}): if func not in done: lock.acquire() try: if func not in done: func() done[func] = True finally: lock.release() "done" is a set of function objects that have already been run, represented by a dict mapping function objects to bools (although the dict values make no difference, only key presence matters). Default arguments are abused here to give do_once persistent bindings to objects without polluting the global namespace. A more purist alternative is def do_once(func): if func not in do_once.done: do_once.lock.acquire() try: if func not in do_once.done: func() do_once.done[func] = True finally: do_once.lock.release() do_once.lock = threading.RLock() do_once.done = {} This is "more Pythonic", chiefly in not trying to play presumptive games with namespaces. If some module M wants to set its own attr goob, fine, M can do def setgoob(): global goob goob = 42 do_once(setgoob) and regardless of which module do_once came from. Now what setgoob does is utterly obvious, and do_once() doesn't make helpful assumptions that get in the way . From pedroni@inf.ethz.ch Sat Aug 17 16:36:22 2002 From: pedroni@inf.ethz.ch (Samuele Pedroni) Date: Sat, 17 Aug 2002 17:36:22 +0200 Subject: [Python-Dev] Re: Cecil papers (clarification) Message-ID: <009301c24603$d723f7a0$6d94fea9@newmexico> [me] > [Tim Peters] > >FYI, > > > > http://www.cs.washington.edu/research/projects/cecil/www/pubs/ > > > >has lots of good papers from the Cecil project, a pioneering > >multiple-dispatch language. Or you could save time reading and learn by > >repeating their early mistakes . > > it's prototype based, not class based so not everything is > relevant I forgot to mention that Cecil is meant for static compilation under a closed-world assumption, so the implementation aspects basically do not translate. regards. From oren-py-d@hishome.net Sat Aug 17 19:34:35 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Sat, 17 Aug 2002 14:34:35 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020817183435.GA77677@hishome.net> On Fri, Aug 16, 2002 at 06:32:15PM -0400, Tim Peters wrote: > > - The set constructors have an optional second argument, sort_repr, > > defaulting to False, which decides whether the elements are sorted > > when str() or repr() is taken. I'm not sure if there would be > > negative consequences of removing this argument and always sorting > > the string representation. > > I'd rather you left this alone. Being sortable is a much stronger > requirement on set elements than just supporting __hash__ and __eq__, and, > e.g., it would suck if I could create a set of complex numbers but couldn't > print it(!). How about sorting with this comparison function: errors = (TypeError, ...?) def cmpval(x): try: cmp(0, x) except errors: try: h = hash(x) except errors: h = -1 return (1, h, id(x)) return (0, x) def robust_cmp(a,b): try: return cmp(a,b) except errors: try: return cmp(cmpval(a), cmpval(b)) except errors: return 0 >>> l=[3j, 2j, 4, 4j, 1, 2, 1j, 3] >>> l.sort(robust_cmp) >>> l [1, 2, 3, 4, 1j, 2j, 3j, 4j] It's equivalent to standard cmp if no errors are encountered. For lists containing uncomparable objects it produces a pretty consistent order. It's not perfect but should be good enough for aesthetic purposes. Oren From David Abrahams" <14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com><2mn0rn9blb.fsf@starship.python.net> Message-ID: <009a01c2461a$a82a3e70$6501a8c0@boostconsulting.com> From: "Andrew Koenig" > Michael> I may be getting lost in subthreads here, but are we still > Michael> talking about multimethods? > > Well, I started by talking about type categories and ways of > writing programs that tested them. Dave Abrahams said, in effect, > that I was really just talking about multimethods. I'm still > not convinced. Huh? That's certainly not what I thought I was saying. I was saying that a reason I thought it was important to be able to test type categories (what Guido calls "look before you leap") was for implementing multiple dispatch. In other words, an idiom which most people agree is usually a bad choice for user code might be a great choice for a generalized library or language facility. It's pretty hard to see how you could construe my remarks as asserting some interpretation of what you were saying. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From David Abrahams" <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> Message-ID: <012b01c2461e$259992e0$6501a8c0@boostconsulting.com> From: "Zack Weinberg" > On Fri, Aug 16, 2002 at 12:38:24PM -0400, Andrew Koenig wrote: > > > > #0 __register_frame_info_bases (begin=0xfed50000, ob=0xfed50000, tbase=0x0, > > dbase=0x0) at /tmp/build1165/gcc-3.1.1/gcc/unwind-dw2-fde.c:83 > > Er, is the directory name misleading, or have you picked up > libgcc_s.so from 3.1.1? In theory that shouldn't be a problem; in > practice it could well be the problem. I'm still ploughing through several days of messages here (so this may have been discussed already) but I have recently learned that despite the existinence of a "-V" option, it has long been impossible to correctly install new versions of GCC on systems with existing versions without using --prefix= to select a unique location. Why GCC's configure doesn't issue a warning about this when you do it wrong, I don't know. The only clue that this is going to be a problem was buried in a FAQ somewhere as of three months ago. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From skip@pobox.com Sat Aug 17 20:16:15 2002 From: skip@pobox.com (Skip Montanaro) Date: Sat, 17 Aug 2002 14:16:15 -0500 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15710.41215.451302.851810@localhost.localdomain> Tim> I think you could (and should ) get 95% of the benefit here Tim> by changing the sort_repr default to True. I'm happy to say False Tim> when I know I've got unsortable keys. Why not just get rid of sort_repr, always attempt to sort for printing, and just discard the TypeError resulting from the attempted sort? Skip From ark@research.att.com Sat Aug 17 21:00:12 2002 From: ark@research.att.com (Andrew Koenig) Date: Sat, 17 Aug 2002 16:00:12 -0400 (EDT) Subject: [Python-Dev] type categories In-Reply-To: <009a01c2461a$a82a3e70$6501a8c0@boostconsulting.com> (dave@boost-consulting.com) References: <200208150313.g7F3Dr727504@oma.cosc.canterbury.ac.nz><14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com><2mn0rn9blb.fsf@starship.python.net> <009a01c2461a$a82a3e70$6501a8c0@boostconsulting.com> Message-ID: <200208172000.g7HK0CX16681@europa.research.att.com> ark> Well, I started by talking about type categories and ways of ark> writing programs that tested them. Dave Abrahams said, in ark> effect, that I was really just talking about multimethods. I'm ark> still not convinced. David> Huh? That's certainly not what I thought I was saying. I was David> saying that a reason I thought it was important to be able to David> test type categories (what Guido calls "look before you leap") David> was for implementing multiple dispatch. In other words, an David> idiom which most people agree is usually a bad choice for user David> code might be a great choice for a generalized library or David> language facility. David> It's pretty hard to see how you could construe my remarks as David> asserting some interpretation of what you were saying. It sure looked that way to me. In any event, I can think of other contexts in which LBYL can be useful. To go back to Guido's example, I agree completely that testing whether a file exists, and then opening it in a separate operation, is a bad idea. One reason is that by the time you get around to opening the file, it may no longer exist, so the open has to test anyway. On the other hand, it does make sense to test whether what you have is a valid file name as soon as you know that you are going to open the file, even if you aren't going to open it for a while. The principle here is that when failure is certain, failing early is usually better than failing late. From dave@boost-consulting.com Sat Aug 17 20:43:13 2002 From: dave@boost-consulting.com (David Abrahams) Date: Sat, 17 Aug 2002 15:43:13 -0400 Subject: [Python-Dev] type categories References: <200208150313.g7F3Dr727504@oma.cosc.canterbury.ac.nz><14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com><2mn0rn9blb.fsf@starship.python.net> <009a01c2461a$a82a3e70$6501a8c0@boostconsulting.com> <200208172000.g7HK0CX16681@europa.research.att.com> Message-ID: <041501c24626$562063a0$6501a8c0@boostconsulting.com> From: "Andrew Koenig" > David> It's pretty hard to see how you could construe my remarks as > David> asserting some interpretation of what you were saying. > > It sure looked that way to me. Maybe you were just reading fast and confused yourself with Mr. Alex Martelli? (http://aspn.activestate.com/ASPN/Mail/Message/1320579) > In any event, I can think of other contexts in which LBYL can be > useful. Of course; I never meant to imply that multiple dispatch was the only reason to LBYL; it just happens to be the most important one to me. > To go back to Guido's example, I agree completely that > testing whether a file exists, and then opening it in a separate > operation, is a bad idea. One reason is that by the time you get > around to opening the file, it may no longer exist, so the open > has to test anyway. > > On the other hand, it does make sense to test whether what you have is > a valid file name as soon as you know that you are going to open the > file, even if you aren't going to open it for a while. The principle > here is that when failure is certain, failing early is usually better > than failing late. Usually this goes to the same question we were discussing about re-iterability detection. You want to fail early because it's faster, but also because you don't want to mutate important program state in some un-recoverable way in systems that are actually supposed to recover from errors. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From guido@python.org Sat Aug 17 21:10:57 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 17 Aug 2002 16:10:57 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Sat, 17 Aug 2002 14:34:35 EDT." <20020817183435.GA77677@hishome.net> References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> <20020817183435.GA77677@hishome.net> Message-ID: <200208172010.g7HKAv131753@pcp02138704pcs.reston01.va.comcast.net> > How about sorting with this comparison function: > > errors = (TypeError, ...?) > > def cmpval(x): > try: > cmp(0, x) > except errors: > try: > h = hash(x) > except errors: > h = -1 > return (1, h, id(x)) > return (0, x) > > > def robust_cmp(a,b): > try: > return cmp(a,b) > except errors: > try: > return cmp(cmpval(a), cmpval(b)) > except errors: > return 0 > > >>> l=[3j, 2j, 4, 4j, 1, 2, 1j, 3] > >>> l.sort(robust_cmp) > >>> l > [1, 2, 3, 4, 1j, 2j, 3j, 4j] > > It's equivalent to standard cmp if no errors are encountered. For lists > containing uncomparable objects it produces a pretty consistent order. > It's not perfect but should be good enough for aesthetic purposes. Too convoluted. Explicit is better than implicit. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Aug 17 21:12:17 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 17 Aug 2002 16:12:17 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Sat, 17 Aug 2002 14:16:15 CDT." <15710.41215.451302.851810@localhost.localdomain> References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> <15710.41215.451302.851810@localhost.localdomain> Message-ID: <200208172012.g7HKCIQ31764@pcp02138704pcs.reston01.va.comcast.net> > Why not just get rid of sort_repr, always attempt to sort for > printing, and just discard the TypeError resulting from the > attempted sort? I don't like discarding TypeErrors. Who know what bug you're hiding then. I like Tim's suggestion just fine, and checked it in: let sort_repr default to True. Unsortable values aren't that common in Python. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sat Aug 17 21:14:47 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 17 Aug 2002 16:14:47 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Sat, 17 Aug 2002 14:16:15 CDT." <15710.41215.451302.851810@localhost.localdomain> References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> <15710.41215.451302.851810@localhost.localdomain> Message-ID: <200208172014.g7HKElX31829@pcp02138704pcs.reston01.va.comcast.net> Does anybody have any *other* comments on the proposed sets module besides convoluted suggestions on the sorted representation? :-) Here's the source code for your web perusal: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbox/sets/set.py?rev=HEAD&content-type=text/vnd.viewcvs-markup --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Sat Aug 17 21:17:05 2002 From: ark@research.att.com (Andrew Koenig) Date: 17 Aug 2002 16:17:05 -0400 Subject: [Python-Dev] type categories In-Reply-To: <041501c24626$562063a0$6501a8c0@boostconsulting.com> References: <200208150313.g7F3Dr727504@oma.cosc.canterbury.ac.nz> <14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com> <2mn0rn9blb.fsf@starship.python.net> <009a01c2461a$a82a3e70$6501a8c0@boostconsulting.com> <200208172000.g7HK0CX16681@europa.research.att.com> <041501c24626$562063a0$6501a8c0@boostconsulting.com> Message-ID: David> Usually this goes to the same question we were discussing about David> re-iterability detection. You want to fail early because it's David> faster, but also because you don't want to mutate important David> program state in some un-recoverable way in systems that are David> actually supposed to recover from errors. And also because it is usually much easier to explain what went wrong if the failure is detected early. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From David Abrahams" <14c201c244b2$8cc651f0$6501a8c0@boostconsulting.com><2mn0rn9blb.fsf@starship.python.net> <009a01c2461a$a82a3e70$6501a8c0@boostconsulting.com> <200208172000.g7HK0CX16681@europa.research.att.com> <041501c24626$562063a0$6501a8c0@boostconsulting.com> Message-ID: <043101c24629$9f4ef2f0$6501a8c0@boostconsulting.com> From: "David Abrahams" > From: "Andrew Koenig" > > > > David> It's pretty hard to see how you could construe my remarks as > > David> asserting some interpretation of what you were saying. > > > > It sure looked that way to me. > > Maybe you were just reading fast and confused yourself with Mr. Alex > Martelli? > (http://aspn.activestate.com/ASPN/Mail/Message/1320579) Oh, and just for reference to what I was talking about in the above message, here's where Alex explains why he thinks LBYL is sometimes good: http://aspn.activestate.com/ASPN/Mail/Message/1287715 -Dave From dave@boost-consulting.com Sat Aug 17 21:48:39 2002 From: dave@boost-consulting.com (David Abrahams) Date: Sat, 17 Aug 2002 16:48:39 -0400 Subject: [Python-Dev] Re: Multimethods (quel horreur?) References: <003101c24525$6cfcab80$6d94fea9@newmexico> Message-ID: <065b01c2462f$86053290$6501a8c0@boostconsulting.com> From: "Samuele Pedroni" > [f,g,... are functions ; T1,T2,T3 are type tuples a.k.a multi-method > signatures, T1 are "subtypes" ] > > 1. if I see things correctly, you are really fiddling with a multi-method > M = { (f,T1), (g,T2), ... }from a library > > only if you do e.g. > > add(M,(h,T3)) with T3==T2 (*) or T1 > [Things one typically does not do without thinking > twice] Let's see if I can understand this in plain English. Hmm, maybe I shouldn't try. You've mixed hard-core formalism and lax informalism here. What does "fiddling" mean? Against my better judgement, I'll try anyway. I think you're saying: "The only way you're changing the behavior of a multimethod [what I think you mean by "fiddling"] is if 1. the type signature of the implementation you add duplicates one that's already been added, or 2. it can match all the same types as another one (g,T2), and for which there's another implementation (f,T1) which can match all the same types as the new implementation. In other words, it falls between two other signatures in specificity. Well, I still don't get it. I clearly don't know what "fiddling" means, since any added signature can change the behavior of the multimethod. I think I would be inclined to forbid your first case, where you're adding a multimethod implementation whose signature exactly matches another one that's already in the multimethod. > [Btw up to (*), missing a module or calling > a generic function before all modules are loaded, > load order does not count.] The above sentence is completely incomprehensible to me. > If T3 is < or incomparable with all the signatures in M, > is not fiddling. Okay, you're homing in on a formal definition of "fiddling" here, but I still don't know what its significance is. > Incomparable can mean dispatch ambiguities > but that's a different issue. I guess so, by definition (yours ). > import lib > > class Sub(lib.Class1): > ... > > class New: > ... > > addmethod lib.gf(x: Sub,y: Sub): > ... > > addmethod lib.meth(arg: Sub): > ... > > is no different than defining new classes and subclassing and > overriding methods. Also the kind of resulting program > logic scattering is not that different under normal usage. I agree. > 2. Dispatch ambiguities: the more predictable the rule > the better, Right. > the best-fit of multimethod does not match > such a criterion, see my previous postings. > > Rules for CLOS: > http://www.lispworks.com/reference/HyperSpec/Body/07_ffab.htm > (NB things are eminently configurable in CLOS) > > Rules for Dylan: > http://www.gwydiondylan.org/drm/drm_50.htm I'm away from the internet at the moment, so I can't look... > The class predecende list is the same notion > as Python 2.2 mro. > > See my posted code for the idea of redispatching > on forced types, which seems to me reasonably Pythonic > and allows OTOH to choose a very strict approach > in face of ambiguity because there's anyway a user > controllable escape. Could you please explain your scheme in plain English? What is a "forced type"? > My opinion: left-to-right and refuse ambiguity > are depending on the generic function both > reasonable approaches. I assume that "left-to-right" is some kind of precedence ordering for ambiguous multimethod implementations. Can you give an example where that would be appropriate? > 3. IMO documentation, doc strings, and introspection > should be enough to tell generic functions apart. Sure. > The proposed notation or whatever should be at most > just syntax sugar: > > (a,b,c).f(d) === f(a,b,c,d) in general. All notations (except really ugly ones) are syntax sugar. What point are you trying to make? > Generic functions should be first-class object > that can be passed around and used everywhere > functions can be. Already today f(a,b) > can invoke a function, a callable instance > (maybe implemeting some multimethod logic given that > one can write multimethod support in pure Python), > a bound method ... of course. > 4. AFAIK multimethods where invented in environments > with code developed and defined in memory incrementally > and libraries loaded, that means that through introspection > one could list the methods of a generic functions and jump > to the various definitions points, and warnings could > be issued for redefinitions and such (see 1.). > For more static approaches to introspection > syntactic sugar would be probably useful: > > defgeneric addmethod vs. f.add(...) I don't understand why you keep applying the term "generic" here. Ordinary python functions are already as generic as you can get. Multimethod implementations are less-so. > Of course a Python impl could also > optionally emit warnings etc, this requiring > the good practice to load library code before > user code, and being maximally useful in > an incremental environment. of course. > 5. It is true that once you have multimethods you have > the choice: > > class C: > def meth(...): ... > > vs. > > class C: ... > > defgeneric meth > > addmethod meth(obj: C): ... It now looks like you were trying to say in 3. that multimethods should be invokable and definable in the same way as single-methods. Well, I'm not sure that this elegant idea from Dylan is essential or even neccessarily good for Python. It's cute, but what does it really buy us? > 6. It is true that generic functions kind of add > a new degree of freedom to the modularity problem > space. > PS: this is my input together with what I have already > posted, if something is unclear please ask. Done! ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From pedroni@inf.ethz.ch Sat Aug 17 22:47:16 2002 From: pedroni@inf.ethz.ch (Samuele Pedroni) Date: Sat, 17 Aug 2002 23:47:16 +0200 Subject: [Python-Dev] Re: Multimethods (quelle horreur?) Message-ID: <013301c24637$a7d9a740$6d94fea9@newmexico> [generic function (abbreviated gf) is the used terminology (CLOS, Dylan, goo, research papers) for a multidispatching function, and yes normal Common Lisp functions are as generic as you can get, still that's the terminology] the question was whether adding a method to a gf is always the moral equivalent of lib.py: class A: def meth(self,...): ... class B(A): ... class C(B): def meth(self,...): ... abusive_user.py: from lib import A def foo(...): ... A.meth=foo > Well, I still don't get it. I clearly don't know what "fiddling" means, > since any added signature can change the behavior of the multimethod. I > think I would be inclined to forbid your first case, where you're adding a > multimethod implementation whose signature exactly matches another one > > that's already in the multimethod. The point is whether the behavior is changed in an undetected way with respect to sets of arguments for which some matching signature/ method is already defined. So my conditions add(M,(h,T3)) with T3==T2 (*) or T1 > [Btw up to (*), missing a module or calling > > a generic function before all modules are loaded, > > load order does not count.] > The above sentence is completely incomprehensible to me. Dispatching outcomes are invariant wrt the order by which you add gf-methods to a gf. > > import lib > > > > class Sub(lib.Class1): > > ... > > > > class New: > > ... > > > > addmethod lib.gf(x: Sub,y: Sub): > > ... > > > > addmethod lib.meth(arg: Sub): > > ... > > > > is no different than defining new classes and subclassing and > > overriding methods. Also the kind of resulting program > > logic scattering is not that different under normal usage. > > I agree. this was just an example of the incomprehesible theory above . So we agree. > > > > See my posted code for the idea of redispatching > > on forced types, which seems to me reasonably Pythonic > > and allows OTOH to choose a very strict approach > > in face of ambiguity because there's anyway a user > > controllable escape. > > Could you please explain your scheme in plain English? > What is a "forced type"? the idea is to allow optionally to specify together with an argument a supertype of the argument and to have the dispatching mechanism use the supertype instead of the type of the argument for dispatching: gf(a,b,_redispatch=(None,SuperTypeOf_b)) the dispatch mechanism will consider the tuple (type(a),SuperTypeOf_b) instead of (type(a),type(b)) for dispatching. This is the moral equivalent of single dispatching: SuperTypeOf_b.meth(b) or super(SuperTypeOf_b).meth(b) and can be used as a kind of "super" mechanism or more interstingly to disambiguate ambigous calls on user behalf. > > My opinion: left-to-right and refuse ambiguity > > are depending on the generic function both > > reasonable approaches. > I assume that "left-to-right" is some kind of precedence ordering for > ambiguous multimethod implementations. Can you give an example where that > would be appropriate? is the default used by CLOS, that simply means that signature (type tuples) are compared using the lexico-order, given class A: pass class B(A): pass then (B,A)<(A,B) you get the same effect as multidispatching simulated through chained single dispatching. > > The proposed notation or whatever should be at most > > just syntax sugar: > > > > (a,b,c).f(d) === f(a,b,c,d) in general. > > All notations (except really ugly ones) are syntax sugar. What point are > you trying to make? what I was trying to convoy is that it would be bad to have multimethod invocation be a special operation different from the usual function invocation (which currently is at work also for method invocation). > > 5. It is true that once you have multimethods you have > > the choice: > > > > class C: > > def meth(...): ... > > > > vs. > > > > class C: ... > > > > defgeneric meth > > > > addmethod meth(obj: C): ... > It now looks like you were trying to say in 3. that multimethods should be > invokable and definable in the same way as single-methods. no, I was saying that at most (a,).gf(b) should just be equivalent to gf(a,b) (*) but plain a.meth() should still mean what it means today. But honestly I find (*) unnecessary and ugly. I'm not even sure one can really disambiguate such syntax: (a,b).__contains__(2) (a,).__contains__(2) are valid Python. >Well, I'm not > sure that this elegant idea from Dylan is essential or even neccessarily > good for Python. It's cute, but what does it really buy us? nothing. My point is that once we have multimethods, one has the choice, one can define classes without normal methods and just use multimethods instead, I was not advocating that C().meth() should be equivalent to meth(C()) in every respect and that the method definitions inside the class definitions should triggers gf-method definitions. It would be a disruptive change for Python. So we agree. Multidispatching functions should be an extension of the notion of function in Python not of class methods. OTOH class methods are defined by defining functions inside class namespaces, so it should be possible to get class methods also from gfs defined in a class namespace. regards. From David Abrahams" Message-ID: <06f301c24639$1cee88b0$6501a8c0@boostconsulting.com> From: "Samuele Pedroni" > the question was whether > adding a method to a gf > is always the moral equivalent of > > lib.py: > > class A: > def meth(self,...): ... > > class B(A): ... > > class C(B): > def meth(self,...): ... > > abusive_user.py: > > from lib import A > > def foo(...): ... > > A.meth=foo And the answer is, "clearly not always". > > Well, I still don't get it. I clearly don't know what "fiddling" means, > > since any added signature can change the behavior of the multimethod. I > > think I would be inclined to forbid your first case, where you're adding a > > multimethod implementation whose signature exactly matches another one > > > that's already in the multimethod. > > The point is whether the behavior is changed in an undetected way > with respect to sets of arguments for which some matching signature/ > method is already defined. So my conditions > > add(M,(h,T3)) with T3==T2 (*) or T1 (assuming that T3==T2 triggers substitution) . > > [T3==T2 case corresponds to the above > > A.meth = foo I hope you are covering this case just for generality's sake. It's easy enough to forbid. > T1 > from lib import B > > B.meth=... ] I don't understand why you're using such a complicated condition; you can change the behavior "in an undetected way WRT sets of arguments for which some matching signature/method is already defined" simply by adding a signature T4 s.t. T4 < X for some X in the signatures of the multimethod. > If T3 is < or uncomparable with all the signatures > already in M: > - you are doing the moral equivalent of overriding > in the single dispatch case Sort of. You might not be the same person that supplies the types in T3. > - or you are defining the gf for some unrelated > class hierarchies Yep. > - or some case that was unambiguous > will become ambiguous and the outcome > will depend on the rules you choose to > deal with ambiguity (which is a general > problem with multidispatching). Yep. > > > [Btw up to (*), missing a module or calling > > > a generic function before all modules are loaded, > > > load order does not count.] > > The above sentence is completely incomprehensible to me. > > Dispatching outcomes are invariant wrt > the order by which you add gf-methods to a gf. It depends on your dispatching rules, of course. However, I'd like to pick order-independent rules. > > > See my posted code for the idea of redispatching > > > on forced types, which seems to me reasonably Pythonic > > > and allows OTOH to choose a very strict approach > > > in face of ambiguity because there's anyway a user > > > controllable escape. > > > > Could you please explain your scheme in plain English? > > What is a "forced type"? > > the idea is to allow optionally to specify together > with an argument a supertype of the argument and to have > the dispatching mechanism use the supertype instead > of the type of the argument for dispatching: > > gf(a,b,_redispatch=(None,SuperTypeOf_b)) > > the dispatch mechanism will consider the > tuple (type(a),SuperTypeOf_b) instead > of (type(a),type(b)) for dispatching. More "sugarily:" gf(a, dispatch_as(b, SuperTypeOf_b)) Interesting. Not sure how I feel about this. > This is the moral equivalent of > single dispatching: > > SuperTypeOf_b.meth(b) > > or super(SuperTypeOf_b).meth(b) Hmm. OK, I see the analogy. I hardly ever have to do that even in the single case, but I get what you're up to. > > > My opinion: left-to-right and refuse ambiguity > > > are depending on the generic function both > > > reasonable approaches. > > > I assume that "left-to-right" is some kind of precedence ordering for > > ambiguous multimethod implementations. Can you give an example where that > > would be appropriate? > > is the default used by CLOS, that simply means that > signature (type tuples) are compared using the lexico-order, > > given > class A: pass > class B(A): pass > > then (B,A)<(A,B) > > you get the same effect as multidispatching > simulated through chained single dispatching. That seems a bit arbitrary, but I guess there are other precedents in Python for an arbitrary ordering (e.g. ordering on type names for heterogeneous object comparison). > > > The proposed notation or whatever should be at most > > > just syntax sugar: > > > > > > (a,b,c).f(d) === f(a,b,c,d) in general. > > > > All notations (except really ugly ones) are syntax sugar. What point are > > you trying to make? > > what I was trying to convoy is that it would > be bad to have multimethod invocation be a special > operation different from the usual function invocation > (which currently is at work also for method invocation). Agreed. > > It now looks like you were trying to say in 3. that multimethods should be > > invokable and definable in the same way as single-methods. > > no, I was saying that at most > > (a,).gf(b) should just be equivalent to gf(a,b) (*) > > but plain a.meth() should still mean what it means today. > > But honestly I find (*) unnecessary and ugly. > I'm not even sure one can really disambiguate > such syntax: > > (a,b).__contains__(2) > (a,).__contains__(2) > > are valid Python. > > >Well, I'm not > > sure that this elegant idea from Dylan is essential or even neccessarily > > good for Python. It's cute, but what does it really buy us? > > nothing. My point is that once we have multimethods, > one has the choice, one can > define classes without normal methods and just use multimethods instead, There's the (minor?) issue of access to "private" members whose names begin with two underscores. > I was not advocating that C().meth() should be equivalent > to meth(C()) in every respect and that the method definitions > inside the class definitions should triggers gf-method > definitions. It would be a disruptive change for Python. > > So we agree. Good! > Multidispatching functions should be an extension of the notion > of function in Python not of class methods. OTOH > class methods are defined by defining functions > inside class namespaces, so it should be possible > to get class methods also from gfs defined > in a class namespace. Yes, I strongly agree. -Dave From pedroni@inf.ethz.ch Sun Aug 18 00:01:18 2002 From: pedroni@inf.ethz.ch (Samuele Pedroni) Date: Sun, 18 Aug 2002 01:01:18 +0200 Subject: [Python-Dev] Re: Multimethods (quelle horreur?) References: <013301c24637$a7d9a740$6d94fea9@newmexico> <06f301c24639$1cee88b0$6501a8c0@boostconsulting.com> Message-ID: <014901c24641$ff060b80$6d94fea9@newmexico> From: David Abrahams > > [T3==T2 case corresponds to the above > > > > A.meth = foo > > I hope you are covering this case just for generality's sake. It's easy > enough to forbid. Yes, but Python is a dynamic language, you should allow for redefinition in some way. > > the idea is to allow optionally to specify together > > with an argument a supertype of the argument and to have > > the dispatching mechanism use the supertype instead > > of the type of the argument for dispatching: > > > > gf(a,b,_redispatch=(None,SuperTypeOf_b)) > > > > the dispatch mechanism will consider the > > tuple (type(a),SuperTypeOf_b) instead > > of (type(a),type(b)) for dispatching. > > More "sugarily:" > > gf(a, dispatch_as(b, SuperTypeOf_b)) > > Interesting. Not sure how I feel about this. > > > This is the moral equivalent of > > single dispatching: > > > > SuperTypeOf_b.meth(b) > > > > or super(SuperTypeOf_b).meth(b) > > Hmm. OK, I see the analogy. I hardly ever have to do that even in the > single case, but I get what you're up to. more than as a super, it can be useful if you are picky about ambiguities. > > you get the same effect as multidispatching > > simulated through chained single dispatching. > > That seems a bit arbitrary, but I guess there are other precedents in > Python for an arbitrary ordering (e.g. ordering on type names for > heterogeneous object comparison). Yes, e.g. the mro used for single dispatch in case of multiple inheritance. The other option is to totally refuse ambiguity. Which is also reasonable. Honestly there is no big agreement about whether automatically solving ambiguities is a good thing in general (even for the single dispatch case). See for a short opinionated survey: The Cecil Language Specification and Rationale Craig Chambers 2.7 Method lookup 2.7.1 Philosophy http://www.cs.washington.edu/research/projects/cecil/www/Refman/wmwork/www/ceci l-spec_12.html#HEADING38 The predictability depends not only on the rule but also on the complexities of the class hierarchies at hand, especially in the presence of multiple inheritance. For the stuff I tried with my multimethods impl, it seemed that the CLOS rule made sense, and getting ambiguities seemed more an impediment. As I said it corresponds to chained single dispatch and that was basically what I needed in sweeter way. Anyway this can be made configurable for the single gf. regards. From pinard@iro.umontreal.ca Sun Aug 18 00:45:12 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 17 Aug 2002 19:45:12 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] I felt comfortable, or at least I think so, with all the contents of the message. All described compromises, not repeated here, seemed reasonable to me. Except maybe for the following: > - The set constructors have an optional second argument, sort_repr, > defaulting to False, which decides whether the elements are sorted > when str() or repr() is taken. I'm not sure if there would be > negative consequences of removing this argument and always sorting > the string representation. Unless there is something deep attached to the properties of the sets themselves, I do not understand why the sorting/non-sorting virtues of `repr' should be tied with the constructor. There is a precedent with dicts. They print non-sorted, but they pretty-print (through the `pprint' module) sorted. Maybe the same could be done for sets: use `pprint' if you want a sorted representation. But otherwise, sets as well as dicts should print using the same order by which elements are to be iterated upon or listed, in various other circumstances. -- François Pinard http://www.iro.umontreal.ca/~pinard From aahz@pythoncraft.com Sun Aug 18 00:49:45 2002 From: aahz@pythoncraft.com (Aahz) Date: Sat, 17 Aug 2002 19:49:45 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020817234945.GA4461@panix.com> On Sat, Aug 17, 2002, François Pinard wrote: > [Guido van Rossum] >> >> - The set constructors have an optional second argument, sort_repr, >> defaulting to False, which decides whether the elements are sorted >> when str() or repr() is taken. I'm not sure if there would be >> negative consequences of removing this argument and always sorting >> the string representation. > > Unless there is something deep attached to the properties of the sets > themselves, I do not understand why the sorting/non-sorting virtues of > `repr' should be tied with the constructor. > > There is a precedent with dicts. They print non-sorted, but they > pretty-print (through the `pprint' module) sorted. Maybe the same could > be done for sets: use `pprint' if you want a sorted representation. > But otherwise, sets as well as dicts should print using the same order > by which elements are to be iterated upon or listed, in various other > circumstances. +1 -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From zack@codesourcery.com Sun Aug 18 05:11:31 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Sat, 17 Aug 2002 21:11:31 -0700 Subject: [Python-Dev] Python build trouble with the new gcc/binutils In-Reply-To: <012b01c2461e$259992e0$6501a8c0@boostconsulting.com> References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <012b01c2461e$259992e0$6501a8c0@boostconsulting.com> Message-ID: <20020818041131.GC14079@codesourcery.com> On Sat, Aug 17, 2002 at 02:43:33PM -0400, David Abrahams wrote: > From: "Zack Weinberg" > > > On Fri, Aug 16, 2002 at 12:38:24PM -0400, Andrew Koenig wrote: > > > > > > #0 __register_frame_info_bases (begin=0xfed50000, ob=0xfed50000, > tbase=0x0, > > > dbase=0x0) at /tmp/build1165/gcc-3.1.1/gcc/unwind-dw2-fde.c:83 > > > > Er, is the directory name misleading, or have you picked up > > libgcc_s.so from 3.1.1? In theory that shouldn't be a problem; in > > practice it could well be the problem. > > I'm still ploughing through several days of messages here (so this may have > been discussed already) but I have recently learned that despite the > existinence of a "-V" option, it has long been impossible to correctly > install new versions of GCC on systems with existing versions without > using --prefix= to select a unique location. Why GCC's configure doesn't > issue a warning about this when you do it wrong, I don't know. The only > clue that this is going to be a problem was buried in a FAQ somewhere as of > three months ago. I believe that this has been addressed in the current documentation -- the INSTALL file clearly states that multiple versions shouldn't be installed in the same prefix, and the -V option (which never worked properly) has been removed. Would you mind reading the docs shipped with 3.2, and reporting any remaining confusion as a bug? (note that we're not going to add a configure check as you suggest, because there are conditions where it's safe, and we don't want to make life harder for people who really do want that.) zw From dave@boost-consulting.com Sun Aug 18 05:11:08 2002 From: dave@boost-consulting.com (David Abrahams) Date: Sun, 18 Aug 2002 00:11:08 -0400 Subject: [Python-Dev] Python build trouble with the new gcc/binutils References: <200208161448.g7GEmAm19971@europa.research.att.com> <15709.5232.302092.575564@anthem.wooz.org> <20020816172858.GA30840@codesourcery.com> <012b01c2461e$259992e0$6501a8c0@boostconsulting.com> <20020818041131.GC14079@codesourcery.com> Message-ID: <075501c2466d$4b2041e0$6501a8c0@boostconsulting.com> From: "Zack Weinberg" To: "David Abrahams" > > I'm still ploughing through several days of messages here (so this may have > > been discussed already) but I have recently learned that despite the > > existinence of a "-V" option, it has long been impossible to correctly > > install new versions of GCC on systems with existing versions without > > using --prefix= to select a unique location. Why GCC's configure doesn't > > issue a warning about this when you do it wrong, I don't know. The only > > clue that this is going to be a problem was buried in a FAQ somewhere as of > > three months ago. > > I believe that this has been addressed in the current documentation -- > the INSTALL file clearly states that multiple versions shouldn't be > installed in the same prefix, and the -V option (which never worked > properly) has been removed. Would you mind reading the docs shipped > with 3.2, and reporting any remaining confusion as a bug? The *only* place I have ever looked for documentation about how to install was here: http://gcc.gnu.org/install/configure.html ...and I still see nothing about this issue. There should be a prominent, eye-catching warning about this, for people like me who have gotten used to the procedure and have been doing it wrong for a while without knowing better. > (note that we're not going to add a configure check as you suggest, > because there are conditions where it's safe, and we don't want to > make life harder for people who really do want that.) Optimizing for the 1% case? Seems like a bad choice to me. How much more difficult could it be for those 1%-ers if you added a warning? -Dave ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From oren-py-d@hishome.net Sun Aug 18 05:50:30 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Sun, 18 Aug 2002 00:50:30 -0400 Subject: [Python-Dev] type categories In-Reply-To: <009a01c2461a$a82a3e70$6501a8c0@boostconsulting.com> References: <009a01c2461a$a82a3e70$6501a8c0@boostconsulting.com> Message-ID: <20020818045030.GA31010@hishome.net> On Sat, Aug 17, 2002 at 02:18:56PM -0400, David Abrahams wrote: > Huh? That's certainly not what I thought I was saying. I was saying that a > reason I thought it was important to be able to test type categories (what > Guido calls "look before you leap") was for implementing multiple dispatch. > In other words, an idiom which most people agree is usually a bad choice > for user code might be a great choice for a generalized library or language > facility. Multiple dispatch is one possible use for testing type categories. Another important use is early detection of errors and more informative error messages. When using a library the errors resulting from a bad argument to one of its published entry points are often raised deep within someone else's code with an uninformative message, making them hard to trace. If an object of the incorrect category is passed to a method that just stores a reference to it the actual error may only be raised much later, making it even harder to trace. There is also the issue of trusting someone else's code - maybe it's a bug in the library? With explicit category checks it's easier to tell that the source of the problem. The extreme form of early detection is static typing, of course. Forcing category checks on all arguments passed is too much overhead for me. I prefer explicit checks for protocol compliance at some well-defined interface points between different domains of code. When an exception is raised the region of uncertainty about its real source can sometimes be quite large. Category checks can serve as a kind of fire door to try to limit the spread of uncertainty. The problem with putting too many fire doors is that they hinder passage because they must be kept closed at all times. Oren From pedroni@inf.ethz.ch Sun Aug 18 10:59:03 2002 From: pedroni@inf.ethz.ch (Samuele Pedroni) Date: Sun, 18 Aug 2002 11:59:03 +0200 Subject: [Python-Dev] Re: multimethods (quelle horreur?) (clarification) Message-ID: <002401c2469d$e23aa860$6d94fea9@newmexico> [David Abrahams] > >[myself] >> gf(a,b,_redispatch=(None,SuperTypeOf_b)) >> >> the dispatch mechanism will consider the >> tuple (type(a),SuperTypeOf_b) instead >> of (type(a),type(b)) for dispatching. > >More "sugarily:" > > gf(a, dispatch_as(b, SuperTypeOf_b)) > >Interesting. Not sure how I feel about this. > >> This is the moral equivalent of >> single dispatching: >> >> SuperTypeOf_b.meth(b) what I described above corresponds to this. >> >> or super(SuperTypeOf_b).meth(b) to be quivalent with that gf(a, dispatch_as(b, SuperTypeOf_b)) would have to consider SuperType_Of_b but together with the order induced by type(b).mro(). regards. From skip@manatee.mojam.com Sun Aug 18 13:00:59 2002 From: skip@manatee.mojam.com (Skip Montanaro) Date: Sun, 18 Aug 2002 07:00:59 -0500 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200208181200.g7IC0xFR027673@manatee.mojam.com> Bug/Patch Summary ----------------- 267 open / 2767 total bugs (-1) 106 open / 1653 total patches (-12) New Bugs -------- python-mode.el disables raw_input() (2002-08-13) http://python.org/sf/594833 printing email object deletes whitespace (2002-08-13) http://python.org/sf/594893 test_nis test fails on TRU64 5.1 (2002-08-14) http://python.org/sf/594998 inspect.getsource shows incorrect output (2002-08-14) http://python.org/sf/595018 Support for masks in getargs.c (2002-08-14) http://python.org/sf/595026 AESend on Jaguar (2002-08-14) http://python.org/sf/595105 asynchat problems multi-threaded (2002-08-14) http://python.org/sf/595217 string method bugs w/ 8bit, unicode args (2002-08-14) http://python.org/sf/595350 pythonw has a console on Win98 (2002-08-15) http://python.org/sf/595537 file (& socket) I/O are not thread safe (2002-08-15) http://python.org/sf/595601 Get rid of FutureWarnings in Carbon (2002-08-15) http://python.org/sf/595763 IDLE/Command Line Output Differ (2002-08-15) http://python.org/sf/595791 pickle_complex in copy_reg.py (2002-08-15) http://python.org/sf/595837 popenN return only text mode pipes (2002-08-16) http://python.org/sf/595919 build dumps core (binutils 2.13/solaris) (2002-08-17) http://python.org/sf/596422 textwrap has problems wrapping hyphens (2002-08-17) http://python.org/sf/596434 NetBSD 1.4.3, a.out, shared modules (2002-08-17) http://python.org/sf/596576 New Patches ----------- PEP 277: Unicode file name support (2002-08-12) http://python.org/sf/594001 turtle tracer bugfixes and new functions (2002-08-14) http://python.org/sf/595111 Update environ for CGIHTTPServer.py (2002-08-15) http://python.org/sf/595846 urllib.splituser(): '@' in usrname (2002-08-17) http://python.org/sf/596581 Closed Bugs ----------- Replace strcat, strcpy (2001-11-30) http://python.org/sf/487703 whichdb lies about db type (2001-12-11) http://python.org/sf/491888 pickle interns strings (2002-01-11) http://python.org/sf/502503 pydoc doesn't show C types (2002-04-25) http://python.org/sf/548845 list(xrange(1e9)) --> seg fault (2002-05-14) http://python.org/sf/556025 illegal use of malloc/free (2002-05-16) http://python.org/sf/557028 faqwiz.py could do email obfuscation (2002-05-19) http://python.org/sf/558072 pydoc(.org) does not find file.flush() (2002-06-26) http://python.org/sf/574057 convert_path fails with empty pathname (2002-06-26) http://python.org/sf/574235 multiple inheritance w/ slots dumps core (2002-06-28) http://python.org/sf/575229 System Error with slots and multi-inh (2002-07-05) http://python.org/sf/577777 pickle error message unhelpful (2002-07-16) http://python.org/sf/582297 Empty genindex.html pages (2002-07-26) http://python.org/sf/586926 pydoc -w fails with path specified (2002-07-26) http://python.org/sf/586931 Bug with deepcopy and new style objects (2002-08-08) http://python.org/sf/592567 u'%c' % large value: broken result (2002-08-10) http://python.org/sf/593581 Closed Patches -------------- Changing the preferences mechanism (2001-10-18) http://python.org/sf/472593 Distutils -- set runtime library path (2001-11-26) http://python.org/sf/485572 Make site.py more friendly to PDAs (2001-12-20) http://python.org/sf/495688 Remove eval in pickle and cPickle (2002-01-19) http://python.org/sf/505705 suppress type restrictions on locals() (2002-01-31) http://python.org/sf/511219 bug in pydoc on python 2.2 release (2002-02-07) http://python.org/sf/514628 help asyncore recover from repr() probs (2002-03-25) http://python.org/sf/534862 Set softspace to 0 in raw_input() (2002-04-29) http://python.org/sf/550192 Karatsuba multiplication (2002-05-24) http://python.org/sf/560379 Better token-related error messages (2002-07-25) http://python.org/sf/586561 alternative SET_LINENO killer (2002-07-29) http://python.org/sf/587993 _locale library patch (2002-07-30) http://python.org/sf/588564 Mindless editing, DL_EXPORT/IMPORT (2002-07-31) http://python.org/sf/588982 tempfile.py rewrite (2002-08-01) http://python.org/sf/589982 Split-out ntmodule.c (2002-08-08) http://python.org/sf/592529 Static names (2002-08-11) http://python.org/sf/593627 From walter@livinglogic.de Sun Aug 18 14:07:57 2002 From: walter@livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Sun, 18 Aug 2002 15:07:57 +0200 Subject: [Python-Dev] mimetypes patch #554192 References: <3D5BEBB8.7080904@livinglogic.de> <15707.61612.844119.819432@anthem.wooz.org> <3D5CE38D.9080905@livinglogic.de> Message-ID: <3D5F9C2D.8010209@livinglogic.de> Martin v. Loewis wrote: > Walter Dörwald writes: > >>OK, so we probably need a reverse mapping for common_types too, but >>shouldn't we consider common_types to be fixed? > > If anything, types_map should be fixed: Those are the official > IANA-supported types (including the official x- extension mechanism). > > The common types are those that violate IANA specs, yet found in real > life. > > If you wanted to support strictness in add_type, then you would > require that the type starts with x-; since mimetypes.py should have > all registered types incorporated (if it misses some, that's a bug). OK, but then adding the entries to types_map must be done differently. I'd prefer if both can be done by add_type (but then we'd need tree modes: Initialising types_map, adding further mappings to types_map (checking that only x- types/subtypes are used, and adding mappings to common_types. >>Even better would be, if we could assign priorities to the mappings, >>so that for e.g. image/jpeg the preferred extension is .jpeg. >>Then guess_type() and guess_extension() would return the preferred >>mimetype/extension. > > > Do you have a specific application for that in mind? It sounds like > overkill. I'm using a web mirror script which uses the extensions from guess_extension to save all downloaded resources, and I hate it when the HTML files are named .htm and JPEG images are named .jpe. Bye, Walter Dörwald From sholden@holdenweb.com Sun Aug 18 15:06:55 2002 From: sholden@holdenweb.com (Steve Holden) Date: Sun, 18 Aug 2002 10:06:55 -0400 Subject: [Python-Dev] Platforms missing both fork and popen2/3 Message-ID: <0a6501c246c0$84209c30$6300000a@holdenweb.com> The "third leg" of the CGIHTTPServer run_cgi() method is only taken if the platform's os module contains neither fork nor popen2 nor popen3 attributes. My platform experience is pretty much limited to the mainstream . Which platforms are we looking at here besides Macintosh OS 9 and prior? regards ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming http://pydish.holdenweb.com/pwp/ ----------------------------------------------------------------------- From fredrik@pythonware.com Sun Aug 18 16:59:19 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sun, 18 Aug 2002 17:59:19 +0200 Subject: [Python-Dev] Platforms missing both fork and popen2/3 References: <0a6501c246c0$84209c30$6300000a@holdenweb.com> Message-ID: <017a01c246d0$3842a9b0$ced241d5@hagrid> Steve wrote: > The "third leg" of the CGIHTTPServer run_cgi() method is only taken if the > platform's os module contains neither fork nor popen2 nor popen3 attributes. > My platform experience is pretty much limited to the mainstream . Which > platforms are we looking at here besides Macintosh OS 9 and prior? Python, before 2.0, on non-unix platforms. From sholden@holdenweb.com Sun Aug 18 19:40:01 2002 From: sholden@holdenweb.com (Steve Holden) Date: Sun, 18 Aug 2002 14:40:01 -0400 Subject: [Python-Dev] Platforms missing both fork and popen2/3 References: <0a6501c246c0$84209c30$6300000a@holdenweb.com> <017a01c246d0$3842a9b0$ced241d5@hagrid> Message-ID: <0c0101c246e6$b0a54a00$6300000a@holdenweb.com> [Fredrik] > Steve wrote: > > > > The "third leg" of the CGIHTTPServer run_cgi() method is only taken if the > > platform's os module contains neither fork nor popen2 nor popen3 attributes. > > My platform experience is pretty much limited to the mainstream . Which > > platforms are we looking at here besides Macintosh OS 9 and prior? > > Python, before 2.0, on non-unix platforms. > Thanks. I'm happy to ignore that one for the purposes of a 2.3 fix. I'm not sure whether I ought to be looking at backporting this one to 2.2, though. The *HTTPServer modules are so patently not production-quality code I suspect it won't matter. regards ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming http://pydish.holdenweb.com/pwp/ ----------------------------------------------------------------------- From guido@python.org Sun Aug 18 21:04:56 2002 From: guido@python.org (Guido van Rossum) Date: Sun, 18 Aug 2002 16:04:56 -0400 Subject: [Python-Dev] Platforms missing both fork and popen2/3 In-Reply-To: Your message of "Sun, 18 Aug 2002 14:40:01 EDT." <0c0101c246e6$b0a54a00$6300000a@holdenweb.com> References: <0a6501c246c0$84209c30$6300000a@holdenweb.com> <017a01c246d0$3842a9b0$ced241d5@hagrid> <0c0101c246e6$b0a54a00$6300000a@holdenweb.com> Message-ID: <200208182004.g7IK4vH08746@pcp02138704pcs.reston01.va.comcast.net> > Thanks. I'm happy to ignore that one for the purposes of a 2.3 fix. I'm not > sure whether I ought to be looking at backporting this one to 2.2, though. > The *HTTPServer modules are so patently not production-quality code I > suspect it won't matter. I imagine a backport would be easy, because not much has changed in that code. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim@zope.com Mon Aug 19 04:15:08 2002 From: tim@zope.com (Tim Peters) Date: Sun, 18 Aug 2002 23:15:08 -0400 Subject: [Python-Dev] Alternative implementation of interning In-Reply-To: <20020816152258.GA99140@hishome.net> Message-ID: [Oren Tirosh] > ... > The problem is in C extension modules. In C there is an incentive to rely > on the immortality of interned strings because it makes the code simpler. Are you sure about that? I haven't seen it. > There was an example of this in the Mac import code. Code *inside* the core can play any dirty tricks it likes, because we guarantee to keep it working as things change across releases. But, AFAICT, we have no evidence that anything outside the core abuses this stuff. > PyString_InternInPlace should probably create immortal interned strings > for backward compatibility (and deprecated, of course) I still doubt it matters to anything outside the core. > Maybe PyString_Intern should be renamed to PyString_InternReference to > make it more obvious that it modifies the pointer "in place". You're talking about a function that doesn't exist now, right (I don't recognize the name PyString_Intern, and neither it nor PyString_InternReference scream anything obvious to me)? From tim.one@comcast.net Mon Aug 19 04:52:11 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 18 Aug 2002 23:52:11 -0400 Subject: [Python-Dev] pystone(object) In-Reply-To: <200208161955.g7GJtqv21741@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido] > ... > Slots can get you back most of this, but not all. Dict lookup is > already extremely tight code, and when I profiled this, most of the > time was spent there -- twice as many lookup calls using new-style > classes than for classic classes. As I've said, and as Oren later demonstrated with code, the cost of a namespace dict lookup now is more in the layers of function call overhead than in the actual lookup. We could whittle that down in Oren-like ways, although I'd rather we spent whatever time we can devote to stuff like this on advancing one of the more-general optimization schemes that were a hot topic before the Python conference. From tim.one@comcast.net Mon Aug 19 05:15:12 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 19 Aug 2002 00:15:12 -0400 Subject: [Python-Dev] Re: SET_LINENO killer In-Reply-To: <2m65ybb3pb.fsf@starship.python.net> Message-ID: [Michael Hudson] > ... > This makes no sense; after you've commented out the trace stuff, the > only difference left is that the switch is smaller! When things like this don't make sense, it just means we're naive . The eval loop overwhelms most optimizers via a crushing overload of "too many" variables and "too many" basic blocks connected via a complex topology, and compiler optimization phases are in the business of using (mostly) linear-time heuristics to solve exponential-time optimization problems. IOW, the performance of the eval loop is as touchy as a heterosexual sailor coming off 2 years at sea, and there's no predicting what minor changes will do to speed. This has been observed repeatedly by everyone who has tried to speed it, across many platforms, and across a decade of staring at it: the eval loop is in unstable equilibrium on its best days. In the limit, the eval loop "should be" a little slower now under -O, just because we've added another test + taken-branch to the normal path. From that POV, your > FWIW gcc makes my patch a small win even with -O. is as much "a mystery" as why MSVC 6 hates it. > Actually, there are some other changes, like always updating f->f_lasti, > and allocating 8 more bytes on the stack. Does commenting out the > definition of instr_lb & instr_ub make any difference? I'll try that on Tuesday, but don't hold your breath. It could be that I can get back all the loss by declaring tstate volatile -- or doing any other random thing . > ... > Does reading assembly give any clues? Not that I'd really expect > anyone to read all of the main loop... I will if it's important, but a good HW simulator is a better tool for this kind of thing, and in any case I doubt I can make enough time to do what would be needed to address this for real. > I'm baffled. Join the club -- we've held this invitation open for you for years . > Perhaps you can put SET_LINENO back in for the Windows build > <1e-6 wink>. If it's an unfortunate I-cache conflict among heavily-hit code addresses (something a good HW simulator can tell you), that could actually solve it! Then anything that manages to move one of the colliding code chunks to a different address could yield "a mysterious speedup". These mysteries are only irritating when they work against you . relax-be-happy-ly y'rs - tim From oren-py-d@hishome.net Mon Aug 19 05:46:35 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Mon, 19 Aug 2002 00:46:35 -0400 Subject: [Python-Dev] Alternative implementation of interning In-Reply-To: References: <20020816152258.GA99140@hishome.net> Message-ID: <20020819044635.GA87829@hishome.net> On Sun, Aug 18, 2002 at 11:15:08PM -0400, Tim Peters wrote: > [Oren Tirosh] > > ... > > The problem is in C extension modules. In C there is an incentive to rely > > on the immortality of interned strings because it makes the code simpler. > > Are you sure about that? I haven't seen it. > > > There was an example of this in the Mac import code. > > Code *inside* the core can play any dirty tricks it likes, because we > guarantee to keep it working as things change across releases. But, AFAICT, > we have no evidence that anything outside the core abuses this stuff. I doubt that whoever wrote that code was thinking "hey, this is part of the core so I can do this". More likely he was just following the *documented* promise that interned string are immortal. There may be extension modules outside the core that also rely on this promise. > > PyString_InternInPlace should probably create immortal interned strings > > for backward compatibility (and deprecated, of course) > > I still doubt it matters to anything outside the core. Perhaps I'm being overly cautious about breaking promises. If both you and Guido say it's OK who am I to argue... > > Maybe PyString_Intern should be renamed to PyString_InternReference to > > make it more obvious that it modifies the pointer "in place". > > You're talking about a function that doesn't exist now, right (I don't > recognize the name PyString_Intern, and neither it nor > PyString_InternReference scream anything obvious to me)? PyString_Intern is the name of the function in my patch that creates a mortal interned string. PyString_InternInPlace creates immortal interned strings for compatibility. Oren From mwh@python.net Mon Aug 19 10:39:05 2002 From: mwh@python.net (Michael Hudson) Date: 19 Aug 2002 10:39:05 +0100 Subject: [Python-Dev] Re: SET_LINENO killer In-Reply-To: Tim Peters's message of "Mon, 19 Aug 2002 00:15:12 -0400" References: Message-ID: <2m65y7xk5i.fsf@starship.python.net> Tim Peters writes: > [Michael Hudson] > > ... > > This makes no sense; after you've commented out the trace stuff, the > > only difference left is that the switch is smaller! > > When things like this don't make sense, it just means we're naive . > The eval loop overwhelms most optimizers via a crushing overload of "too > many" variables and "too many" basic blocks connected via a complex > topology, and compiler optimization phases are in the business of using > (mostly) linear-time heuristics to solve exponential-time optimization > problems. IOW, the performance of the eval loop is as touchy as a > heterosexual sailor coming off 2 years at sea, and there's no predicting > what minor changes will do to speed. This has been observed repeatedly by > everyone who has tried to speed it, across many platforms, and across a > decade of staring at it: the eval loop is in unstable equilibrium on its > best days. I knew all this, but was still surprised by the magnitude of the slowdown. > In the limit, the eval loop "should be" a little slower now under -O, just > because we've added another test + taken-branch to the normal path. From > that POV, your > > > FWIW gcc makes my patch a small win even with -O. > > is as much "a mystery" as why MSVC 6 hates it. No kidding. I wonder if some of the slow comes from repeatedly hauling the threadstate into the cache. I guess wonderings like this are almost exactly valueless. > > Actually, there are some other changes, like always updating f->f_lasti, > > and allocating 8 more bytes on the stack. Does commenting out the > > definition of instr_lb & instr_ub make any difference? > > I'll try that on Tuesday, but don't hold your breath. It could be that I > can get back all the loss by declaring tstate volatile -- or doing any other > random thing . > > > ... > > Does reading assembly give any clues? Not that I'd really expect > > anyone to read all of the main loop... > > I will if it's important, but a good HW simulator is a better tool for this > kind of thing, and in any case I doubt I can make enough time to do what > would be needed to address this for real. On linux there's cachegrind which comes with valgrind and might prove helpful. But that only runs on linux, and I'm not sure I want to explain the linux mystery, as it might go away :) > > I'm baffled. > > Join the club -- we've held this invitation open for you for years . Attempting PhD in mathematics is providing enough bafflement for this schmuck, but thanks for the offer. > > Perhaps you can put SET_LINENO back in for the Windows build > > <1e-6 wink>. > > If it's an unfortunate I-cache conflict among heavily-hit code addresses > (something a good HW simulator can tell you), that could actually solve it! > Then anything that manages to move one of the colliding code chunks to a > different address could yield "a mysterious speedup". These mysteries are > only irritating when they work against you . Well, quite. Lets send Julian Seward an email asking him if he wants to port valgrind to Windows . Cheers, M. -- surely, somewhere, somehow, in the history of computing, at least one manual has been written that you could at least remotely attempt to consider possibly glancing at. -- Adam Rixey From walter@livinglogic.de Mon Aug 19 13:53:15 2002 From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=) Date: Mon, 19 Aug 2002 14:53:15 +0200 Subject: [Python-Dev] PyString_DecodeEscape and PEP293 Message-ID: <3D60EA3B.7030008@livinglogic.de> A recent checkin added a function PyString_DecodeEscape() to stringobject.c. To make this function PEP293 compatible it would need access to unicode_decode_call_errorhandler which is defined static in unicodeobject.c. Does PyString_DecodeEscape() really need an errors argument? If yes, we could either move it to unicodeobject.c or make unicode_decode_call_errorhandler externally visible. Another problem that I noticed is that string-escape can't be used for encoding Unicode objects: >>> u"\u0100".encode("string-escape") Traceback (most recent call last): File "", line 1, in ? TypeError: escape_encode() argument 1 must be str, not unicode Bye, Walter Dörwald From smurf@noris.de Mon Aug 19 14:36:41 2002 From: smurf@noris.de (Matthias Urlichs) Date: Mon, 19 Aug 2002 15:36:41 +0200 Subject: [Python-Dev] PEP 277 (unicode filenames): please review Message-ID: Guido: > > My guess it's not his listdir() or filesystem, but the keyboard > driver. > No, it's MacOSX. It always uses the decomposed form. That is very noticeable via NFS volumes where files with combined character names are unopenable from the GUI. I've filed a bug report about that - i don't know whether OSX 10.2 will allow NFC filenames, at least read-only. -- Matthias Urlichs From smurf@noris.de Mon Aug 19 14:44:08 2002 From: smurf@noris.de (Matthias Urlichs) Date: Mon, 19 Aug 2002 15:44:08 +0200 Subject: [Python-Dev] PEP 277 (unicode filenames): please review Message-ID: Martin: > Indeed, that would be consistent. I deliberately want to leave this > out of PEP 277. On Unix, things are not that clear - as Jack points > out, readlink() and getcwd() also need consideration. > Linux and MacOSX use UTF-8 and should probably be treated as such,=20 i.e. I want to open("=E4=F6=FC"), not open("=E4=F6=FC".encode("utf-8"))= =2E One interesting tidbit is that MacOSX requires Unicode filenames to be = in NFD. I don't know whether anybody agreed on a standard normal form for Linux= =2E > In this terrain, Windows has the cleaner API (they consider file nam= es > as character strings, not as byte strings), so doing the right thing > is easier. > Byte strings are perfectly OK if they have a common encoding (meaning=20 UTF-8, in some accepted normal form). Character strings are bad if=20 their interpretation, or indeed their usability, changes with the=20 presense of some random environment variable / registry entry /=20 whatever. Under these constraints, calling it a character string vs.=20 a byte string, and/or using it as such, is a matter of programmers'=20 convenience. --=20 Matthias Urlichs From guido@python.org Mon Aug 19 15:06:47 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 19 Aug 2002 10:06:47 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "17 Aug 2002 19:45:12 EDT." References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208191406.g7JE6lK10592@pcp02138704pcs.reston01.va.comcast.net> [Guido] > > - The set constructors have an optional second argument, sort_repr, > > defaulting to False, which decides whether the elements are sorted > > when str() or repr() is taken. I'm not sure if there would be > > negative consequences of removing this argument and always sorting > > the string representation. [François] > Unless there is something deep attached to the properties of the sets > themselves, I do not understand why the sorting/non-sorting virtues of > `repr' should be tied with the constructor. > > There is a precedent with dicts. They print non-sorted, but they > pretty-print (through the `pprint' module) sorted. Maybe the same could > be done for sets: use `pprint' if you want a sorted representation. > But otherwise, sets as well as dicts should print using the same order > by which elements are to be iterated upon or listed, in various other > circumstances. This is a pretty convincing argument. If dicts can survive being rendered unsorted, then so can Sets. Maybe I should remove the sort_repr argument altogether; it's easy enough for the test suite to use some other trick. But for now, I'll just leave sort_repr=False in. I'm gonna check this in now, but that doesn't mean we can't tweak the API or implementation, so keep those comments coming! --Guido van Rossum (home page: http://www.python.org/~guido/) From pinard@iro.umontreal.ca Mon Aug 19 15:51:29 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 19 Aug 2002 10:51:29 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208191406.g7JE6lK10592@pcp02138704pcs.reston01.va.comcast.net> References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> <200208191406.g7JE6lK10592@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] > [François] > > [...] why the sorting/non-sorting virtues of `repr' should be tied > > with the constructor. [...] > If dicts can survive being rendered unsorted, then so can Sets. Maybe I > should remove the sort_repr argument altogether; it's easy enough for > the test suite to use some other trick. I presume you already have a solution when testing dicts? > But for now, I'll just leave sort_repr=False in. As long as users do not discover it, they will not use it! :-) By the way (this was discussed on Python list a while ago), it might be worth stressing in the official documentation that dicts, and maybe Sets as well, all have a "natural" iteration order which remains fixed at least while the dict or Set does not loose or acquire keys, and that this same fixed order is used for .items(), .keys(), .values(), and all three .iter* flavours. It is sometimes useful being able to rely on this fact, especially if Python clearly commits itself through the documentation. The printing order for dicts and Sets could be documentated as a simple way to reveal the current "natural" fixed order. -- François Pinard http://www.iro.umontreal.ca/~pinard From guido@python.org Mon Aug 19 16:18:35 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 19 Aug 2002 11:18:35 -0400 Subject: [Python-Dev] pystone(object) In-Reply-To: Your message of "Sun, 18 Aug 2002 23:52:11 EDT." References: Message-ID: <200208191518.g7JFIam11101@pcp02138704pcs.reston01.va.comcast.net> > > Slots can get you back most of this, but not all. Dict lookup is > > already extremely tight code, and when I profiled this, most of the > > time was spent there -- twice as many lookup calls using new-style > > classes than for classic classes. > > As I've said, and as Oren later demonstrated with code, the cost of a > namespace dict lookup now is more in the layers of function call overhead > than in the actual lookup. We could whittle that down in Oren-like ways, > although I'd rather we spent whatever time we can devote to stuff like this > on advancing one of the more-general optimization schemes that were a hot > topic before the Python conference. Here's a some info taken from a profile of a program that requests an instance attribute of a new-style class without slots or properties ten million times (using a for-loop over xrange(100000) and then 100 attribute lookups (a.foo) in the for-loop body). The following functions are called for each attribute lookup: #calls seconds name 3 1.72 lookdict_string 1 1.17 PyObject_GenericGetAttr 1 1.10 _PyType_Lookup 3 1.00 PyDict_GetItem 1 0.45 _PyObject_GetDictPtr 1 0.38 PyObject_GetAttr 10 5.82 Subtotal 3.28 eval_frame (one call!) 9.10 Total Here, "seconds" is the total time spent in 10 million times the number of calls. In addition, the program spent 3.28 seconds in 500 calls to eval_frame, I assume nearly all of it in the one call that corresponds to the body of the test function, so I've added that. The call graph is as follows: eval_frame -> (10 million times) PyObject_GetAttr -> PyObject_GenericGetAttr -> _PyObject_GetDictPtr _PyType_Lookup -> PyDict_GetItem -> lookdict_string PyDict_GetItem -> lookdict_string PyDict_GetItem -> lookdict_string If we want to be really aggressive about this, I suppose we could inline all of that in PyObject_GenericGetAttr, for the case that the name passes the PyString_CheckExact test and has a pre-calculated hash. In particular, PyDict_GetItem then pretty much boils down to "mp->ma_lookup(mp, key, hash)->me_value". That should cut out 5 function calls. A quick small gain would be to inline just the call to _PyObject_GetDictPtr. (I tried this; it saves about 2% on the total running time of this particular test when not using the profiler.) An intermediate gain would be to inline the call to _PyType_Lookup. Here's the code I profiled: ============================================================================ class C(object): pass def main(): a = C() a.foo = 42 for i in xrange(100000): a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo; a.foo main() ============================================================================ If I add __slots__ = ['foo'] to the class definition, here's what I get this call graph (prefixed with the total seconds for each function; each function is called exactly once per attribute lookup in this case): 3.22 eval_frame -> (10 million times) 0.33 PyObject_GetAttr -> 1.05 PyObject_GenericGetAttr -> 0.35 PyDescr_IsData 0.36 member_get -> 0.15 descr_check -> 0.27 PyObject_IsInstance 0.44 PyMember_GetOne 0.49 _PyType_Lookup -> 0.35 PyDict_GetItem -> 1.17 lookdict_string 8.18 Total This profile points out a bug in descr_check! It calls PyObject_IsInstance, which is a very general routine and hence relatively expensive. But descr_check's call to it always passes a genuine PyTypeObject as the second argument, and we can in-line this by writing PyObject_TypeCheck(obj, descr->d_type); that's a macro that may call PyType_IsSubtype but in this case never needs to, saving about 6% on the total running time of this particular test when not using the profiler. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Aug 19 16:25:38 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 19 Aug 2002 11:25:38 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Mon, 19 Aug 2002 10:51:29 EDT." References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> <200208191406.g7JE6lK10592@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208191525.g7JFPcF11152@pcp02138704pcs.reston01.va.comcast.net> > > [François] > > > [...] why the sorting/non-sorting virtues of `repr' should be tied > > > with the constructor. [...] > > > If dicts can survive being rendered unsorted, then so can Sets. Maybe I > > should remove the sort_repr argument altogether; it's easy enough for > > the test suite to use some other trick. > > I presume you already have a solution when testing dicts? These tests just require an equality test between the actual outcome and the expected outcome. Since sets support equality testing, there's no reason not to use that. (I guess the original test was being paranoid, or was written before __eq__ was implemented.) > > But for now, I'll just leave sort_repr=False in. > > As long as users do not discover it, they will not use it! :-) We can mull that over until the first beta release. > By the way (this was discussed on Python list a while ago), it might > be worth stressing in the official documentation that dicts, and maybe > Sets as well, all have a "natural" iteration order which remains fixed at > least while the dict or Set does not loose or acquire keys, and that this > same fixed order is used for .items(), .keys(), .values(), and all three > .iter* flavours. It is sometimes useful being able to rely on this fact, > especially if Python clearly commits itself through the documentation. AFAIK that's well documented. --Guido van Rossum (home page: http://www.python.org/~guido/) From python@rcn.com Mon Aug 19 16:27:16 2002 From: python@rcn.com (Raymond Hettinger) Date: Mon, 19 Aug 2002 11:27:16 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net><200208191406.g7JE6lK10592@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <001201c24794$e709b460$98f8a4d8@othello> [GvR] > > But for now, I'll just leave sort_repr=False in. [FP] > As long as users do not discover it, they will not use it! :-) Just to make sure, why not move sort_repr=False out of the parameter list and into the code body. [FP] > By the way (this was discussed on Python list a while ago), it might > be worth stressing in the official documentation that dicts, and maybe > Sets as well, all have a "natural" iteration order which remains fixed at > least while the dict or Set does not loose or acquire keys, and that this > same fixed order is used for .items(), .keys(), .values(), and all three > .iter* flavours. It is sometimes useful being able to rely on this fact, > especially if Python clearly commits itself through the documentation. Just like stability for the new list.sort(), this promise ought to remain a hidden, undocumented implementation detail. Because of collision resolution, the "natural" order can vary depending on the order that the keys are inserted. While the ordering stays constant until there is a change, it is fragile and could be changed by a resize operation even if the keys remain the same. Let's keep the options open here in case someday we want GC or a memory manager to rebuild the dictionary at an arbitrary time. Raymond Hettinger From guido@python.org Mon Aug 19 16:38:01 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 19 Aug 2002 11:38:01 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Mon, 19 Aug 2002 11:27:16 EDT." <001201c24794$e709b460$98f8a4d8@othello> References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> <200208191406.g7JE6lK10592@pcp02138704pcs.reston01.va.comcast.net> <001201c24794$e709b460$98f8a4d8@othello> Message-ID: <200208191538.g7JFc1n11244@pcp02138704pcs.reston01.va.comcast.net> > Just to make sure, why not move sort_repr=False out of the parameter > list and into the code body. Good idea. It can be a non-public class variable. > [FP] > > By the way (this was discussed on Python list a while ago), it might > > be worth stressing in the official documentation that dicts, and maybe > > Sets as well, all have a "natural" iteration order which remains fixed at > > least while the dict or Set does not loose or acquire keys, and that this > > same fixed order is used for .items(), .keys(), .values(), and all three > > .iter* flavours. It is sometimes useful being able to rely on this fact, > > especially if Python clearly commits itself through the documentation. > > Just like stability for the new list.sort(), this promise ought to remain > a hidden, undocumented implementation detail. Because of collision > resolution, the "natural" order can vary depending on the order that > the keys are inserted. While the ordering stays constant until there > is a change, it is fragile and could be changed by a resize operation > even if the keys remain the same. Let's keep the options open here > in case someday we want GC or a memory manager to rebuild the > dictionary at an arbitrary time. I don't think François was stating that the order was only dependent on the inserted keys. I believe he was merely referring to the fact that the order doesn't change as long as you don't mutate a dict, and that it's the same for items(), keys(), values(), iterators, and display order. There's no reason to keep that hidden, and I believe it's documented (though François didn't find it). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Aug 19 16:57:43 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 19 Aug 2002 11:57:43 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Mon, 19 Aug 2002 11:38:18 EDT." <200208191138.18398.mclay@nist.gov> References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> <15710.41215.451302.851810@localhost.localdomain> <200208172014.g7HKElX31829@pcp02138704pcs.reston01.va.comcast.net> <200208191138.18398.mclay@nist.gov> Message-ID: <200208191557.g7JFvhK11331@pcp02138704pcs.reston01.va.comcast.net> [Michael McLay] > > http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbo > >x/sets/set.py?rev=HEAD&content-type=text/vnd.viewcvs-markup > > Did you consider making BaseSet._data a slot? Hm, maybe I should. If this is a proposed standard data type, we might as well get people used to the fact that they can't add random new instance variables without subclassing first. OTOH what then to do with _sort_repr -- make it a class var or an instance var? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Mon Aug 19 17:36:09 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 19 Aug 2002 12:36:09 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208191538.g7JFc1n11244@pcp02138704pcs.reston01.va.comcast.net> Message-ID: >From the Library Reference manual, section "Mapping Types": Keys and values are listed in random order. If keys() and values() are called with no intervening modifications to the dictionary, the two lists will directly correspond. This allows the creation of (value, key) pairs using zip(): "pairs = zip(a.values(), a.keys())". The same footnote should be reworked to cover, and be referened from, the .iter{keys, value, items} methods too. From guido@python.org Mon Aug 19 20:35:48 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 19 Aug 2002 15:35:48 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Mon, 19 Aug 2002 11:57:43 EDT." Message-ID: <200208191935.g7JJZnk19511@pcp02138704pcs.reston01.va.comcast.net> By the way, I've checked in the "sets" module in Lib. The unit tests in test/test_sets.py need work (no tests for ImmutableSet for example) and there's no latex documentation; however "import sets; help(sets)" shows a wealth of information derived from docstrings. I plan to fix the unit tests but could use help with the docs. --Guido van Rossum (home page: http://www.python.org/~guido/) From jacobs@penguin.theopalgroup.com Mon Aug 19 20:44:22 2002 From: jacobs@penguin.theopalgroup.com (Kevin Jacobs) Date: Mon, 19 Aug 2002 15:44:22 -0400 (EDT) Subject: [Python-Dev] Standard datetime objects? Message-ID: I know it has been asked before, but I was wondering where we are with our new standard datatime objects? I'm re-working some of my data/time code, and will be in a position to also work on whatever is keeping the prototype from being completed. Thanks, -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com From guido@python.org Mon Aug 19 20:50:00 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 19 Aug 2002 15:50:00 -0400 Subject: [Python-Dev] Standard datetime objects? In-Reply-To: Your message of "Mon, 19 Aug 2002 15:44:22 EDT." References: Message-ID: <200208191950.g7JJo0019610@pcp02138704pcs.reston01.va.comcast.net> > I know it has been asked before, but I was wondering where we are with our > new standard datatime objects? I'm re-working some of my data/time code, > and will be in a position to also work on whatever is keeping the prototype > from being completed. Please have a look at the prototype in python/nondist/sandbox/datetime/. Note that there are comments pointing to a Wiki with design discussions too. Fred's working on completing the C reimplementation (also there); in fact, I'm expecting a checkpoint checkin from him any moment now. --Guido van Rossum (home page: http://www.python.org/~guido/) From python@rcn.com Mon Aug 19 20:55:31 2002 From: python@rcn.com (Raymond Hettinger) Date: Mon, 19 Aug 2002 15:55:31 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib References: <200208191935.g7JJZnk19511@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <001b01c247ba$60865260$af66accf@othello> From: "Guido van Rossum" > By the way, I've checked in the "sets" module in Lib. The unit tests > in test/test_sets.py need work (no tests for ImmutableSet for example) > and there's no latex documentation; however "import sets; help(sets)" > shows a wealth of information derived from docstrings. I plan to fix > the unit tests but could use help with the docs. I'll do the docs. Raymond Hettinger From drifty@bigfoot.com Mon Aug 19 21:23:27 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Mon, 19 Aug 2002 13:23:27 -0700 (PDT) Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208191557.g7JFvhK11331@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] > OTOH what then to do with _sort_repr -- make it a class var or an > instance var? Well, how often can you imagine someone printing out a single set sorted, but having other sets that they didn't want printed out sorted? I would suspect that it is going to be a very rare case when someone wants just part of their sets printing sorted and the rest not. I say make it a class var. -Brett C. From mcherm@destiny.com Mon Aug 19 21:31:18 2002 From: mcherm@destiny.com (Michael Chermside) Date: Mon, 19 Aug 2002 16:31:18 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib Message-ID: <3D615596.9090908@destiny.com> [Guido van Rossum] > OTOH what then to do with _sort_repr -- make it a class var or an > instance var? Setting a class var in a standard library class is like playing with a global variable with all the attendent problems. Senario... I want my sets sorted, but I import some library that uses sets of complex numbers for internal purposes. Or (slightly more plausible) I want my sets UNsorted, but I use some library whose author counted on the string output being sorted (ok... the author shouldn't have depended on it because of the existance of the rarely used class variable, but even non-experts write libraries using the standard library). -- Michael Chermside From guido@python.org Mon Aug 19 21:38:41 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 19 Aug 2002 16:38:41 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Mon, 19 Aug 2002 13:23:27 PDT." References: Message-ID: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> > [Guido van Rossum] > > OTOH what then to do with _sort_repr -- make it a class var or an > > instance var? [Brett C] > Well, how often can you imagine someone printing out a single set sorted, > but having other sets that they didn't want printed out sorted? I would > suspect that it is going to be a very rare case when someone wants just > part of their sets printing sorted and the rest not. > > I say make it a class var. Hm, but what if two different library modules have conflicting requirements? E.g. module A creates sets of complex numbers and must have sort_repr=False, while module B needs sort_repr=True for user-friendliness (or because it relies on this). My current approach (now in CVS!) is to remove the sort_repr flag to the constructor, but to provide a method that can produce a sorted or an unsorted representation. __repr__ will always return the items unsorted, which matches what repr(dict) does. After all, I think it could be confusing to a user when 'print s' shows Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) but for i in s: print i, prints 9 8 7 6 5 4 3 2 1 0 --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@python.ca Mon Aug 19 22:02:52 2002 From: nas@python.ca (Neil Schemenauer) Date: Mon, 19 Aug 2002 14:02:52 -0700 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: <3D615596.9090908@destiny.com>; from mcherm@destiny.com on Mon, Aug 19, 2002 at 04:31:18PM -0400 References: <3D615596.9090908@destiny.com> Message-ID: <20020819140252.A22991@glacier.arctrix.com> Michael Chermside wrote: > Setting a class var in a standard library class is like playing with a > global variable with all the attendent problems. Senario... I want my > sets sorted, but I import some library that uses sets of complex numbers > for internal purposes. I think the intention is that you would subclass to override the class variable. As you point out, modifying a class variable in a library is asking for trouble. Neil From pinard@iro.umontreal.ca Mon Aug 19 21:56:01 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 19 Aug 2002 16:56:01 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] > My current approach (now in CVS!) is [...] to provide a method that > can produce a sorted or an unsorted representation. Could a method with a similar name be available for dicts as well? -- François Pinard http://www.iro.umontreal.ca/~pinard From skip@pobox.com Mon Aug 19 22:18:13 2002 From: skip@pobox.com (Skip Montanaro) Date: Mon, 19 Aug 2002 16:18:13 -0500 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects bufferobject.c,2.19,2.20 complexobject.c,2.62,2.63 floatobject.c,2.114,2.115 intobject.c,2.91,2.92 stringobject.c,2.180,2.181 tupleobject.c,2.71,2.72 In-Reply-To: References: Message-ID: <15713.24725.984557.521006@gargle.gargle.HOWL> guido> Call me anal, but there was a particular phrase that was speading guido> to comments everywhere that bugged me: /* Foo is inlined */ guido> instead of /* Inline Foo */. Somehow the "is inlined" phrase guido> always confused me for half a second (thinking, "No it isn't" guido> until I added the missing "here"). The new phrase is hopefully guido> unambiguous. Perhaps a comment at the definition of Foo that says "this code has been inlined elsewhere" makes sense so that if people fix bugs or enhance them they will be prompted to scout around for other places that need fixing. (I hesitate to suggest that all the places a piece of code is inlined should be recorded, but perhaps that's another option.) Skip From ark@research.att.com Mon Aug 19 22:20:40 2002 From: ark@research.att.com (Andrew Koenig) Date: 19 Aug 2002 17:20:40 -0400 Subject: [Python-Dev] type categories -- an example In-Reply-To: <200208131802.g7DI2Ro27807@europa.research.att.com> References: <200208131802.g7DI2Ro27807@europa.research.att.com> Message-ID: In the message that started all of this type-category discussion, I said: As far as I know, there is no uniform method of determining into which category or categories a particular object falls. Of course, there are non-uniform ways of doing so, but in general, those ways are, um, nonuniform. Therefore, if you want to check whether an object is in one of these categories, you haven't necessarily learned much about how to check if it is in a different one of these categories. As it happens, I'm presently working on a program in which I would like to be able to determine whether a given value is: -- a member of a particular class hierarchy that I've defined; -- a callable object; -- a compiled regular expression; or -- anything else. and do something different in each of these four cases. Testing for the first category is easy: I evaluate isinstance(x, B), where B is the base class of my hierarchy. Testing for the second is also easy: I evaluate callable(x). How do I test for the third? I guess I need to know the name of the type of a compiled regular expression object. Hmmm... A quick scan through the documentation doesn't reveal it. So I do an experiment: >>> import re >>> re.compile("foo") <_sre.SRE_Pattern object at 0x111018> Hmmm... This doesn't look good -- Can I really count on a compiled regular expression being an instance of _sre.SRE_Pattern for the future? From guido@python.org Mon Aug 19 22:27:35 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 19 Aug 2002 17:27:35 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Mon, 19 Aug 2002 16:56:01 EDT." References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208192127.g7JLRZk25200@pcp02138704pcs.reston01.va.comcast.net> > Could a method with a similar name be available for dicts as well? Well, it wouldn't have any advantage over doing this "by hand", extracting the keys into a list and sorting that. The same reasoning applies to the sets class, which is why I've made it a non-public method (named '_repr'). It may go if I find a solution to the one use there is in the test suite. --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Mon Aug 19 22:43:01 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 19 Aug 2002 17:43:01 -0400 Subject: [Python-Dev] type categories -- an example In-Reply-To: References: <200208131802.g7DI2Ro27807@europa.research.att.com> Message-ID: <15713.26213.91880.761312@slothrop.zope.com> >>>>> "AK" == Andrew Koenig writes: AK> How do I test for the third? I guess I need to know the name of AK> the type of a compiled regular expression object. Hmmm... A AK> quick scan through the documentation doesn't reveal it. So I do AK> an experiment: >>>> import re re.compile("foo") AK> <_sre.SRE_Pattern object at 0x111018> AK> Hmmm... This doesn't look good -- Can I really count on a AK> compiled regular expression being an instance of AK> _sre.SRE_Pattern for the future? I'd put this at the module level: compiled_re_type = type(re.compile("")) Then you can use isistance() to test: isinstance(re.compile("spam+"), compiled_re_type) Jeremy From guido@python.org Mon Aug 19 22:44:30 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 19 Aug 2002 17:44:30 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects bufferobject.c,2.19,2.20 complexobject.c,2.62,2.63 floatobject.c,2.114,2.115 intobject.c,2.91,2.92 stringobject.c,2.180,2.181 tupleobject.c,2.71,2.72 In-Reply-To: Your message of "Mon, 19 Aug 2002 16:18:13 CDT." <15713.24725.984557.521006@gargle.gargle.HOWL> References: <15713.24725.984557.521006@gargle.gargle.HOWL> Message-ID: <200208192144.g7JLiUK27244@pcp02138704pcs.reston01.va.comcast.net> From guido@python.org Mon Aug 19 22:45:02 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 19 Aug 2002 17:45:02 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Objects bufferobject.c,2.19,2.20 complexobject.c,2.62,2.63 floatobject.c,2.114,2.115 intobject.c,2.91,2.92 stringobject.c,2.180,2.181 tupleobject.c,2.71,2.72 In-Reply-To: Your message of "Mon, 19 Aug 2002 16:18:13 CDT." <15713.24725.984557.521006@gargle.gargle.HOWL> References: <15713.24725.984557.521006@gargle.gargle.HOWL> Message-ID: <200208192145.g7JLj2p27404@pcp02138704pcs.reston01.va.comcast.net> > Perhaps a comment at the definition of Foo that says "this code has > been inlined elsewhere" makes sense so that if people fix bugs or > enhance them they will be prompted to scout around for other places > that need fixing. (I hesitate to suggest that all the places a > piece of code is inlined should be recorded, but perhaps that's > another option.) That's a good idea. Maybe one of the "code janitors" can help with this? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Mon Aug 19 22:48:15 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 19 Aug 2002 17:48:15 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido] > ... > My current approach (now in CVS!) is to remove the sort_repr flag to > the constructor, but to provide a method that can produce a sorted or > an unsorted representation. +1. That's the best way to go. > __repr__ will always return the items unsorted, which matches what repr > (dict) does. After all, I think it could be confusing to a user when > 'print s' shows > > Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > but > > for i in s: print i, > > prints > > 9 8 7 6 5 4 3 2 1 0 >>> from sets import Set >>> print Set(range(10)) Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> When I optimized a useless ~ out of the dict code for 2.2, it became much more likely that the traversal order for an int-keyed dict would match numeric order. I have evidence that this has fooled newbies into believing that dicts are ordered maps! If it wouldn't cost an extra cycle, I'd be tempted to slop the ~ back in again <0.9 wink>. From drifty@bigfoot.com Mon Aug 19 23:23:02 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Mon, 19 Aug 2002 15:23:02 -0700 (PDT) Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] > > My current approach (now in CVS!) is to remove the sort_repr flag to > the constructor, but to provide a method that can produce a sorted or > an unsorted representation. __repr__ will always return the items > unsorted, which matches what repr(dict) does. After all, I think it I just updated my CVS copy and I like your implementation. +1 from me for how you are handling it. > could be confusing to a user when 'print s' shows > > Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > but > > for i in s: print i, > > prints > > 9 8 7 6 5 4 3 2 1 0 > Good point. As Fran?s pointed out, newbies do expect dicts to always come out in the same order (and I must say that I, like Fran?s, have never seen any docs saying that it does always come out the same order as long as nothing has mutated) and I would expect that expectation to carry over to sets. -Brett C. From ark@research.att.com Mon Aug 19 23:25:45 2002 From: ark@research.att.com (Andrew Koenig) Date: Mon, 19 Aug 2002 18:25:45 -0400 (EDT) Subject: [Python-Dev] type categories -- an example In-Reply-To: <15713.26213.91880.761312@slothrop.zope.com> (message from Jeremy Hylton on Mon, 19 Aug 2002 17:43:01 -0400) References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15713.26213.91880.761312@slothrop.zope.com> Message-ID: <200208192225.g7JMPj023386@europa.research.att.com> Jeremy> I'd put this at the module level: Jeremy> compiled_re_type = type(re.compile("")) Jeremy> Then you can use isistance() to test: Jeremy> isinstance(re.compile("spam+"), compiled_re_type) But is it guaranteed that re.compile will always yield an object of the same type? From jeremy@alum.mit.edu Mon Aug 19 23:32:11 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 19 Aug 2002 18:32:11 -0400 Subject: [Python-Dev] type categories -- an example In-Reply-To: <200208192225.g7JMPj023386@europa.research.att.com> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15713.26213.91880.761312@slothrop.zope.com> <200208192225.g7JMPj023386@europa.research.att.com> Message-ID: <15713.29163.526655.111678@slothrop.zope.com> >>>>> "AK" == Andrew Koenig writes: Jeremy> I'd put this at the module level: compiled_re_type = Jeremy> type(re.compile("")) Jeremy> Then you can use isistance() to test: Jeremy> isinstance(re.compile("spam+"), compiled_re_type) AK> But is it guaranteed that re.compile will always yield an object AK> of the same type? Hard to say. I can read the code and see that the current implementation will always return objects of the same type. In fact, it's using type(sre_compile.compile("", 0)) internally to represent that type. That's not a guarantee. Perhaps Fredrik wants to reserve the right to change this in the future. It's not unusual for Python modules to be under-specified in this way. Jeremy From ark@research.att.com Mon Aug 19 23:45:39 2002 From: ark@research.att.com (Andrew Koenig) Date: 19 Aug 2002 18:45:39 -0400 Subject: [Python-Dev] type categories -- an example In-Reply-To: <15713.29163.526655.111678@slothrop.zope.com> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15713.26213.91880.761312@slothrop.zope.com> <200208192225.g7JMPj023386@europa.research.att.com> <15713.29163.526655.111678@slothrop.zope.com> Message-ID: Jeremy> Hard to say. I can read the code and see that the current Jeremy> implementation will always return objects of the same type. Jeremy> In fact, it's using type(sre_compile.compile("", 0)) Jeremy> internally to represent that type. Jeremy> That's not a guarantee. Perhaps Fredrik wants to reserve the Jeremy> right to change this in the future. It's not unusual for Jeremy> Python modules to be under-specified in this way. The real point is that this is an example of why a uniform way of checking for such types would be nice. I shouldn't have to read the source to figure out how to tell if something is a compiled regular expression. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From drifty@bigfoot.com Mon Aug 19 23:49:44 2002 From: drifty@bigfoot.com (Brett Cannon) Date: Mon, 19 Aug 2002 15:49:44 -0700 (PDT) Subject: [Python-Dev] type categories -- an example In-Reply-To: <15713.29163.526655.111678@slothrop.zope.com> Message-ID: [Jeremy Hylton] > >>>>> "AK" == Andrew Koenig writes: > > Jeremy> I'd put this at the module level: compiled_re_type = > Jeremy> type(re.compile("")) > > Jeremy> Then you can use isistance() to test: > > Jeremy> isinstance(re.compile("spam+"), compiled_re_type) > > AK> But is it guaranteed that re.compile will always yield an object > AK> of the same type? > > Hard to say. I can read the code and see that the current > implementation will always return objects of the same type. In fact, > it's using type(sre_compile.compile("", 0)) internally to represent > that type. > > That's not a guarantee. This might be a stupid question, but why wouldn't isinstance(re.compile("spam+"), type(re.compile(''))) always work (this is Jeremey's code, just inlined)? Unless the instance being tested was marshalled (I think), the test should always work. Even using an unpickled instance (I think, again) should work since it would use the current implementation of a pattern object. So as long as the instance being tested is not somehow being stored and then brought back using a newer version of Python it should always work. If not true, then I have been lied to. =) -Brett C. From jeremy@alum.mit.edu Mon Aug 19 23:59:14 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 19 Aug 2002 18:59:14 -0400 Subject: [Python-Dev] type categories -- an example In-Reply-To: References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15713.26213.91880.761312@slothrop.zope.com> <200208192225.g7JMPj023386@europa.research.att.com> <15713.29163.526655.111678@slothrop.zope.com> Message-ID: <15713.30786.767176.781663@slothrop.zope.com> >>>>> "ARK" == Andrew Koenig writes: Jeremy> Hard to say. I can read the code and see that the current Jeremy> implementation will always return objects of the same type. Jeremy> In fact, it's using type(sre_compile.compile("", 0)) Jeremy> internally to represent that type. Jeremy> That's not a guarantee. Perhaps Fredrik wants to reserve Jeremy> the right to change this in the future. It's not unusual Jeremy> for Python modules to be under-specified in this way. ARK> The real point is that this is an example of why a uniform way ARK> of checking for such types would be nice. I shouldn't have to ARK> read the source to figure out how to tell if something is a ARK> compiled regular expression. Let's assume for the moment that the re module wants to define an explicit type of compiled regular expression objects. This seems a sensible thing to do, and it already has such a type internally. I'm not sure how this relates to your real point. You didn't have to read the source code to figure out if something is a compiled regular expression. Instead, I recommended that you use type(obj) where obj was a compiled regular expression. It might have been convenient if there was a module constant, such that re.CompiledRegexType == type(re.compile("")). Then you asked if re.compile() was guaranteed to return an object of the same type. That question is all about the contract of the re module. The answer might have been: "No. In version X, it happens to always return objects of the same type, but in version Z, I may want to change this." I suppose we could get at the general question of checking types by assuming that re.compile() returned instances of two apparently unrelated classes and that we wanted a way to declare their relationship. I'm thinking of something like Haskell typeclasses here. Jeremy From guido@python.org Tue Aug 20 01:33:56 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 19 Aug 2002 20:33:56 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Mon, 19 Aug 2002 17:48:15 EDT." References: Message-ID: <200208200033.g7K0XuR28707@pcp02138704pcs.reston01.va.comcast.net> > When I optimized a useless ~ out of the dict code for 2.2, it became much > more likely that the traversal order for an int-keyed dict would match > numeric order. I have evidence that this has fooled newbies into believing > that dicts are ordered maps! If it wouldn't cost an extra cycle, I'd be > tempted to slop the ~ back in again <0.9 wink>. Maybe add a ~ to the int hash code? That means it's not on the critical path for dicts with string keys. --Guido van Rossum (home page: http://www.python.org/~guido/) From dave@boost-consulting.com Tue Aug 20 01:35:53 2002 From: dave@boost-consulting.com (David Abrahams) Date: Mon, 19 Aug 2002 20:35:53 -0400 Subject: [Python-Dev] nested extension modules? Message-ID: <012401c247e1$8c4570d0$6501a8c0@boostconsulting.com> Hi, Using the source (Luke), I was trying to figure out the best way to add a nested submodule from within an extension module. I noticed that the module initialization code will set the module name from the package context (if set), altogether discarding any name passed explicitly: [modsupport.c: Py_InitModule4()] ... if (_Py_PackageContext != NULL) { char *p = strrchr(_Py_PackageContext, '.'); if (p != NULL && strcmp(name, p+1) == 0) { name = _Py_PackageContext; _Py_PackageContext = NULL; } } This _Py_PackageContext is set up from within _PyImport_LoadDynamicModule [importdl.c:] ... oldcontext = _Py_PackageContext; _Py_PackageContext = packagecontext; (*p)(); _Py_PackageContext = oldcontext; IIUC, this means that when an extension module is loaded as part of a package, any submodules I create my calling Py_InitModule will come out with the same name. Questions: a. Have I got the analysis right? b. Is there a more-sanctioned way around this other than touching _Py_PackageContext (which seems to be intended to be private) TIA, Dave ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From guido@python.org Tue Aug 20 01:42:50 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 19 Aug 2002 20:42:50 -0400 Subject: [Python-Dev] type categories -- an example In-Reply-To: Your message of "Mon, 19 Aug 2002 18:25:45 EDT." <200208192225.g7JMPj023386@europa.research.att.com> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15713.26213.91880.761312@slothrop.zope.com> <200208192225.g7JMPj023386@europa.research.att.com> Message-ID: <200208200042.g7K0goH28754@pcp02138704pcs.reston01.va.comcast.net> > But is it guaranteed that re.compile will always yield > an object of the same type? There are no guarantees in life, but I expect that that is something that plenty of code depends on, so it will likely stay that way. --Guido van Rossum (home page: http://www.python.org/~guido/) From jriehl@spaceship.com Tue Aug 20 02:12:09 2002 From: jriehl@spaceship.com (Jonathan Riehl) Date: Mon, 19 Aug 2002 20:12:09 -0500 (CDT) Subject: [Python-Dev] PEP 269 versus 283. Message-ID: I was looking over some of the PEP's and I saw that 269 was considered dead according to PEP 283. This is kind of odd because I was planning to have an implementation by the end of the week. This is subject to the constraints of reality; I am taking a whopping huge vacation starting this next weekend. It is either going be ready for python-dev to play with this week or in the middle of next month. My posts to the parser-sig are trying to be deferential to the charter of the SIG (starting w/requirements for a general purpose parser generator, not implementation of PEP 269). I am certainly going to try to wrangle the parser-sig onwards, but a pgen module is way overdue. -Jon From greg@cosc.canterbury.ac.nz Tue Aug 20 02:09:38 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 20 Aug 2002 13:09:38 +1200 (NZST) Subject: [Python-Dev] Another command line parser Message-ID: <200208200109.g7K19cD27428@oma.cosc.canterbury.ac.nz> In view of the recent discussion on command line parsers, you may be interested in the attached module which I wrote in response to a c.l.py posting. The return values are designed so that they can be used as the *args and/or **kwds arguments to a function if desired. #----------------------------------------------------------------- # # A Pythonically minimalistic command line parser # Inspired by ideas from Huaiyu Zhu # and Robert Biddle # . # # Author: Greg Ewing # #----------------------------------------------------------------- class CommandLineError(Exception): pass def clparse(switches, flags, argv = None): """clparse(switches, flags, argv = None) Parse command line arguments. switches = string of option characters not taking arguments flags = string of option characters taking an argument argv = command line to parse (including program name), defaults to sys.argv Returns (args, options) where: args = list of non-option arguments options = dictionary mapping switch character to number of occurrences of the switch, and flag character to list of arguments specified with that flag Arguments following "--" are regarded as non-option arguments even if they start with a hyphen. """ if not argv: import sys argv = sys.argv argv = argv[1:] opts = {} args = [] for c in switches: opts[c] = 0 for c in flags: if c in switches: raise ValueError("'%c' both switch and flag" % c) opts[c] = [] seen_dashdash = 0 while argv: arg = argv.pop(0) if arg == "--": seen_dashdash = 1 elif not seen_dashdash and arg.startswith("-"): for c in arg[1:]: if c in switches: opts[c] += 1 elif c in flags: try: val = argv.pop(0) except IndexError: raise CommandLineError("Missing argument for option -%c" % c) opts[c].append(val) else: raise CommandLineError("Unknown option -%c" % c) else: args.append(arg) return args, opts if __name__ == "__main__": def spam(args, a, b, c, x, y, z): print "a =", a print "b =", b print "c =", c print "x =", x print "y =", y print "z =", z print "args =", args args, kwds = clparse("abc", "xyz") spam(args, **kwds) From jeremy@alum.mit.edu Tue Aug 20 04:41:58 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 19 Aug 2002 23:41:58 -0400 Subject: [Python-Dev] PEP 269 versus 283. In-Reply-To: References: Message-ID: <15713.47750.25495.802269@slothrop.zope.com> I lobbied in favor of PEP 269 earlier, because it is mostly exposing functionality that already exists but is difficult to reuse. That seems like a good thing even if some people want to use other parser generators. I've only seen pgen used for one application, and that one application has been modestly successfuly. So a language can do worse than starting with pgen. Jeremy From aahz@pythoncraft.com Tue Aug 20 04:51:15 2002 From: aahz@pythoncraft.com (Aahz) Date: Mon, 19 Aug 2002 23:51:15 -0400 Subject: [Python-Dev] Names again (was Re: type categories) Message-ID: <20020820035115.GA25575@panix.com> On Thu, Aug 15, 2002, Guido van Rossum wrote: >Oren: >> >> In a dynamically typed language there is no such thing as an 'integer >> variable' but it can be simulated by a reference that may only point to >> objects in the 'integer' category. > > This seems a game with words. I don't see the difference between an > integer variable and a reference that must point to an integer. > (Well, I see a difference, in the sharing semantics, but that's just > the difference between a value and an pointer in C. They're both > variables.) Going off on a tangent (and riding one of my favorite hobby horses), Python doesn't have variables. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From greg@cosc.canterbury.ac.nz Tue Aug 20 05:29:18 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 20 Aug 2002 16:29:18 +1200 (NZST) Subject: [Python-Dev] PEP 269 versus 283. In-Reply-To: <15713.47750.25495.802269@slothrop.zope.com> Message-ID: <200208200429.g7K4TIC28027@oma.cosc.canterbury.ac.nz> Jeremy Hylton : > I lobbied in favor of PEP 269 earlier, because it is mostly exposing > functionality that already exists but is difficult to reuse. That > seems like a good thing even if some people want to use other parser > generators. There's a downside that you'd then be committed to supporting it, even if Python stopped using pgen itself some time in the future. If that's not a worry, then fine -- just pointing out that exposing previously unexposed functionality isn't necessarily without cost. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Tue Aug 20 05:31:17 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 20 Aug 2002 16:31:17 +1200 (NZST) Subject: [Python-Dev] Names again (was Re: type categories) In-Reply-To: <20020820035115.GA25575@panix.com> Message-ID: <200208200431.g7K4VH128034@oma.cosc.canterbury.ac.nz> Aahz : > Going off on a tangent (and riding one of my favorite hobby horses), > Python doesn't have variables. Only for some definitions of the word "variable". And not the definition we have in mind when we use the word "variable" in a Python context (if we do at all). Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From oren@hishome.net Tue Aug 20 05:55:50 2002 From: oren@hishome.net (Oren Tirosh) Date: Tue, 20 Aug 2002 07:55:50 +0300 Subject: [Python-Dev] Re: Last call: mortal interned strings In-Reply-To: <200208161858.g7GIwaM19389@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Fri, Aug 16, 2002 at 02:58:36PM -0400 References: <200208161858.g7GIwaM19389@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020820075550.A10609@hishome.net> I see that this has been checked in. My version: PyString_InternInPlace - immortal PyString_Intern - mortal Your version: PyString_InternInPlace - mortal PyString_InternImmortal - immortal My version favors backward compatibility - existing modules will not break if they rely on the immortality of interned strings. Your version appears to maximize the benefit of interned strings - existing modules automatically get the new mortal semantics without requiring any changes. I was wondering what was the rationale behind this decision. If the only reason was that the name PyString_Intern is not descriptive enough it can be renamed to something like PyString_InternReference to make it clear that it operates on a reference to a string and modifies it "in place". Oren From tim.one@comcast.net Tue Aug 20 06:31:51 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 20 Aug 2002 01:31:51 -0400 Subject: [Python-Dev] Re: Last call: mortal interned strings In-Reply-To: <20020820075550.A10609@hishome.net> Message-ID: [Oren Tirosh, to Guido] > I see that this has been checked in. > > My version: > > PyString_InternInPlace - immortal > PyString_Intern - mortal > > Your version: > > PyString_InternInPlace - mortal > PyString_InternImmortal - immortal > > My version favors backward compatibility - existing modules will > not break if they rely on the immortality of interned strings. > > Your version appears to maximize the benefit of interned strings > - existing modules automatically get the new mortal semantics without > requiring any changes. > > I was wondering what was the rationale behind this decision. My guess is that it was so existing modules automatically get the benefit of mortal semantics without requiring any changes -- coupled with that nobody believes any module outside the core relies on immortality (Jack's Mac support code is part of the core, and Jack knows that). > If the only reason was that the name PyString_Intern is not descriptive > enough it can be renamed to something like PyString_InternReference to > make it clear that it operates on a reference to a string and modifies > it "in place". If that's all there were to it, I expect Guido would have renamed PyString_Intern to PyString_InternMortal (a "reference" suffix still doesn't mean anything to me -- and you've explained it twice ). If we're wrong that extension modules don't rely on immortality, the alpha and beta releases should shake that out for all major extensions, including any that work their way under the PBF umbrella. From fredrik@pythonware.com Tue Aug 20 11:17:42 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 20 Aug 2002 12:17:42 +0200 Subject: [Python-Dev] type categories -- an example References: Message-ID: <028401c24832$d3f48fa0$0900a8c0@spiff> brett wrote: > This might be a stupid question, but why wouldn't > isinstance(re.compile("spam+"), type(re.compile(''))) > always work. re.compile is a factory function, and it might (in theory) return different types for different kind of patterns. From casey@zope.com Tue Aug 20 15:00:20 2002 From: casey@zope.com (Casey Duncan) Date: Tue, 20 Aug 2002 10:00:20 -0400 Subject: [Python-Dev] type categories -- an example In-Reply-To: References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15713.29163.526655.111678@slothrop.zope.com> Message-ID: <200208201000.20650.casey@zope.com> On Monday 19 August 2002 06:45 pm, Andrew Koenig wrote: > Jeremy> Hard to say. I can read the code and see that the current > Jeremy> implementation will always return objects of the same type. > Jeremy> In fact, it's using type(sre_compile.compile("", 0)) > Jeremy> internally to represent that type. >=20 > Jeremy> That's not a guarantee. Perhaps Fredrik wants to reserve the > Jeremy> right to change this in the future. It's not unusual for > Jeremy> Python modules to be under-specified in this way. >=20 > The real point is that this is an example of why a uniform way > of checking for such types would be nice. I shouldn't have > to read the source to figure out how to tell if something is > a compiled regular expression. In general you wouldn't care whether is was a sre_foo or an sre_bar, just= if=20 it acts like a compiled regular expression, and therefore supports that=20 interface. So, the real solution would be to have re assert that interfac= e on=20 whatever the compiler returns so that you can check for it, something lik= e:=20 if ISre.isImplementedBy(unknown_ob): # It's a regex Where ISre is the compiled regular expression interface object. If the=20 implementation varies the test would still work. Even if the interface=20 varied, the test would work (but it might break other stuff). -Casey From guido@python.org Tue Aug 20 14:56:52 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 09:56:52 -0400 Subject: [Python-Dev] PEP 269 versus 283. In-Reply-To: Your message of "Mon, 19 Aug 2002 20:12:09 CDT." References: Message-ID: <200208201356.g7KDuqE12932@odiug.zope.com> > I was looking over some of the PEP's and I saw that 269 was > considered dead according to PEP 283. This is kind of odd because I was > planning to have an implementation by the end of the week. Well, but you could've told me! :-) I'll gladly revive it. > This is subject to the constraints of reality; I am taking a > whopping huge vacation starting this next weekend. It is either > going be ready for python-dev to play with this week or in the > middle of next month. Are you sure it's safe to expect your interest in this subject to extend beyond the month of August? :-) > My posts to the parser-sig are trying to be deferential to the > charter of the SIG (starting w/requirements for a general purpose parser > generator, not implementation of PEP 269). > I am certainly going to try to wrangle the parser-sig onwards, but > a pgen module is way overdue. Great! --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Tue Aug 20 15:14:26 2002 From: ark@research.att.com (Andrew Koenig) Date: Tue, 20 Aug 2002 10:14:26 -0400 (EDT) Subject: [Python-Dev] type categories -- an example In-Reply-To: <200208201000.20650.casey@zope.com> (message from Casey Duncan on Tue, 20 Aug 2002 10:00:20 -0400) References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15713.29163.526655.111678@slothrop.zope.com> <200208201000.20650.casey@zope.com> Message-ID: <200208201414.g7KEEQL29126@europa.research.att.com> Casey> In general you wouldn't care whether is was a sre_foo or an sre_bar, just if Casey> it acts like a compiled regular expression, and therefore supports that Casey> interface. So, the real solution would be to have re assert that interface on Casey> whatever the compiler returns so that you can check for it, something like: Casey> if ISre.isImplementedBy(unknown_ob): Casey> # It's a regex Casey> Where ISre is the compiled regular expression interface object. If the Casey> implementation varies the test would still work. Even if the interface Casey> varied, the test would work (but it might break other stuff). Exactly. From guido@python.org Tue Aug 20 15:17:54 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 10:17:54 -0400 Subject: [Python-Dev] PEP 269 versus 283. In-Reply-To: Your message of "Tue, 20 Aug 2002 16:29:18 +1200." <200208200429.g7K4TIC28027@oma.cosc.canterbury.ac.nz> References: <200208200429.g7K4TIC28027@oma.cosc.canterbury.ac.nz> Message-ID: <200208201417.g7KEHsg13056@odiug.zope.com> > There's a downside that you'd then be committed to supporting > it, even if Python stopped using pgen itself some time in the > future. I'm not worried about that in this case. First of all, supporting pgen shouldn't be too much of an effort (I can see translating it into Python at some point :-). It could also be degraded into a 3rd party module. And if we switch to something better, the something better will probably act as a better replacement for pgen (though with a different API), coaxing people to upgrade anyway. --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Tue Aug 20 15:19:35 2002 From: ark@research.att.com (Andrew Koenig) Date: Tue, 20 Aug 2002 10:19:35 -0400 (EDT) Subject: [Python-Dev] type categories -- an example In-Reply-To: <200208200042.g7K0goH28754@pcp02138704pcs.reston01.va.comcast.net> (message from Guido van Rossum on Mon, 19 Aug 2002 20:42:50 -0400) References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15713.26213.91880.761312@slothrop.zope.com> <200208192225.g7JMPj023386@europa.research.att.com> <200208200042.g7K0goH28754@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208201419.g7KEJZB29140@europa.research.att.com> >> But is it guaranteed that re.compile will always yield >> an object of the same type? Guido> There are no guarantees in life, but I expect that that is something Guido> that plenty of code depends on, so it will likely stay that way. The kind of situation I imagine is that a regular expression might be implemented not just as a single type but as a whole hierarchy of them, with the particular type used for a regular expression depending on thevalue of the regular expression. For example: class Compiled_regexp(object): # ... class Anchored_regexp(Compiled_regexp): # ... class Unanchored_regexp(Compiled_regexp): # ... where whether a regexp is anchored or unanchored depends on whether it begins with "^". (Contrived, but you get the idea). In that case, it is entirely possible that re.compile("") and re.compile("^foo") return types such that neither is an instance of the other. I understand that the regexp library doesn't work this way, and will probably never work this way, but I'm using this example to show why the technique of using the type returned by a particular library function call to identify the results of future calls doesn't work in general. From guido@python.org Tue Aug 20 15:21:28 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 10:21:28 -0400 Subject: [Python-Dev] Re: Last call: mortal interned strings In-Reply-To: Your message of "Tue, 20 Aug 2002 07:55:50 +0300." <20020820075550.A10609@hishome.net> References: <200208161858.g7GIwaM19389@pcp02138704pcs.reston01.va.comcast.net> <20020820075550.A10609@hishome.net> Message-ID: <200208201421.g7KELSR13085@odiug.zope.com> > I see that this has been checked in. > > My version: > > PyString_InternInPlace - immortal > PyString_Intern - mortal > > Your version: > > PyString_InternInPlace - mortal > PyString_InternImmortal - immortal > > My version favors backward compatibility - existing modules will not break > if they rely on the immortality of interned strings. > > Your version appears to maximize the benefit of interned strings - existing > modules automatically get the new mortal semantics without requiring any > changes. > > I was wondering what was the rationale behind this decision. I can only repeat what I said before about this: """But the vast majority of C code does *not* depend on this. I'd rather keep PyString_InternInPlace(), so we don't have to change all call locations, only the very rare ones that rely on this.""" > If the only reason was that the name PyString_Intern is not descriptive > enough it can be renamed to something like PyString_InternReference to > make it clear that it operates on a reference to a string and modifies > it "in place". It wasn't that. --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Tue Aug 20 15:24:37 2002 From: ark@research.att.com (Andrew Koenig) Date: 20 Aug 2002 10:24:37 -0400 Subject: [Python-Dev] type categories -- an example In-Reply-To: <15713.30786.767176.781663@slothrop.zope.com> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15713.26213.91880.761312@slothrop.zope.com> <200208192225.g7JMPj023386@europa.research.att.com> <15713.29163.526655.111678@slothrop.zope.com> <15713.30786.767176.781663@slothrop.zope.com> Message-ID: Jeremy> Then you asked if re.compile() was guaranteed to return an Jeremy> object of the same type. That question is all about the Jeremy> contract of the re module. The answer might have been: "No. Jeremy> In version X, it happens to always return objects of the same Jeremy> type, but in version Z, I may want to change this." Jeremy> I suppose we could get at the general question of checking Jeremy> types by assuming that re.compile() returned instances of two Jeremy> apparently unrelated classes and that we wanted a way to Jeremy> declare their relationship. I'm thinking of something like Jeremy> Haskell typeclasses here. Right. And the classes don't even have to be unrelated -- it's enough that neither one is derived from the other (for instance, that they both be derived from a third class). -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From guido@python.org Tue Aug 20 17:13:50 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 12:13:50 -0400 Subject: [Python-Dev] nested extension modules? In-Reply-To: Your message of "Mon, 19 Aug 2002 20:35:53 EDT." <012401c247e1$8c4570d0$6501a8c0@boostconsulting.com> References: <012401c247e1$8c4570d0$6501a8c0@boostconsulting.com> Message-ID: <200208201613.g7KGDoM21340@odiug.zope.com> > Using the source (Luke), I was trying to figure out the best way to add a > nested submodule from within an extension module. I noticed that the module > initialization code will set the module name from the package context (if > set), altogether discarding any name passed explicitly: > > [modsupport.c: Py_InitModule4()] > > ... > if (_Py_PackageContext != NULL) { > char *p = strrchr(_Py_PackageContext, '.'); > if (p != NULL && strcmp(name, p+1) == 0) { > name = _Py_PackageContext; > _Py_PackageContext = NULL; > } > } > > This _Py_PackageContext is set up from within _PyImport_LoadDynamicModule > [importdl.c:] > > ... > oldcontext = _Py_PackageContext; > _Py_PackageContext = packagecontext; > (*p)(); > _Py_PackageContext = oldcontext; > > IIUC, this means that when an extension module is loaded as part of a > package, any submodules I create my calling Py_InitModule will > come out with the same name. > > Questions: > > a. Have I got the analysis right? Not quite, if I understand what you're saying. The package context, despite its name, is not the package name, but the full name of the *module*, when the shared library is found inside a package. If, e.g., a package directory P contains an extension module file E.so, the package context is set to "P.E". The initE() function is supposed to call Py_InitModule4() with "E" as the module name. Py_InitModule4() then sees that this is the last component of the package context, and changes the module name to "P.E". It also nulls out the package context. The checkin comment I made back in 1997 explains this: Fix importing of shared libraries from inside packages. This is a bit of a hack: when the shared library is loaded, the module name is "package.module", but the module calls Py_InitModule*() with just "module" for the name. The shared library loader squirrels away the true name of the module in _Py_PackageContext, and Py_InitModule*() will substitute this (if the name actually matches). > b. Is there a more-sanctioned way around this other than touching > _Py_PackageContext (which seems to be intended to be private) I think using _Py_PackageContext is your only hope. If you contribute some docs for it we'll gladly add them to the API docs. --Guido van Rossum (home page: http://www.python.org/~guido/) From David Abrahams" <200208201613.g7KGDoM21340@odiug.zope.com> Message-ID: <047c01c2486a$db15afc0$6501a8c0@boostconsulting.com> From: "Guido van Rossum" > > Using the source (Luke), I was trying to figure out the best way to add a > > nested submodule from within an extension module. I noticed that the module > > initialization code will set the module name from the package context (if > > set), altogether discarding any name passed explicitly: > > > > [modsupport.c: Py_InitModule4()] > > > > ... > > if (_Py_PackageContext != NULL) { > > char *p = strrchr(_Py_PackageContext, '.'); > > if (p != NULL && strcmp(name, p+1) == 0) { > > name = _Py_PackageContext; > > _Py_PackageContext = NULL; > > } > > } > > > > This _Py_PackageContext is set up from within _PyImport_LoadDynamicModule > > [importdl.c:] > > > > ... > > oldcontext = _Py_PackageContext; > > _Py_PackageContext = packagecontext; > > (*p)(); > > _Py_PackageContext = oldcontext; > > > > IIUC, this means that when an extension module is loaded as part of a > > package, any submodules I create my calling Py_InitModule will > > come out with the same name. > > > > Questions: > > > > a. Have I got the analysis right? > > Not quite, if I understand what you're saying. The package context, > despite its name, is not the package name, but the full name of the > *module*, when the shared library is found inside a package. I think I understood that part. > If, e.g., a package directory P contains an extension module file > E.so, the package context is set to "P.E". The initE() function is > supposed to call Py_InitModule4() with "E" as the module name. > Py_InitModule4() then sees that this is the last component of the > package context, and changes the module name to "P.E". Yeah, that's what I expected. > It also nulls out the package context. Oops! I missed that part. Maybe that makes my problem imaginary, except that you go on to say... > The checkin comment I made back in 1997 explains this: > > Fix importing of shared libraries from inside packages. > This is a bit of a hack: when the shared library is loaded, the > module name is "package.module", but the module calls > Py_InitModule*() with just "module" for the name. The shared > library loader squirrels away the true name of the module in > _Py_PackageContext, and Py_InitModule*() will substitute this (if > the name actually matches). > > > b. Is there a more-sanctioned way around this other than touching > > _Py_PackageContext (which seems to be intended to be private) > > I think using _Py_PackageContext is your only hope. If you contribute > some docs for it we'll gladly add them to the API docs. Hmm, my only hope for what? What I was worried about was that if I tried to create a nested sub-extension module from within my extension module by calling Py_InitModuleXXX() directly, its name would be forced to be the same as that of the outer extension module. Since you pointed out that _Py_PackageContext was being nulled out, I don't think that's much of an issue. What issues /do/ I need to be aware of when doing this? Thanks, Dave ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From guido@python.org Tue Aug 20 18:06:24 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 13:06:24 -0400 Subject: [Python-Dev] nested extension modules? In-Reply-To: Your message of "Tue, 20 Aug 2002 12:54:49 EDT." <047c01c2486a$db15afc0$6501a8c0@boostconsulting.com> References: <012401c247e1$8c4570d0$6501a8c0@boostconsulting.com> <200208201613.g7KGDoM21340@odiug.zope.com> <047c01c2486a$db15afc0$6501a8c0@boostconsulting.com> Message-ID: <200208201706.g7KH6Ob21733@odiug.zope.com> > > It also nulls out the package context. > > Oops! I missed that part. Maybe that makes my problem imaginary, except > that you go on to say... > > > The checkin comment I made back in 1997 explains this: > > > > Fix importing of shared libraries from inside packages. > > This is a bit of a hack: when the shared library is loaded, the > > module name is "package.module", but the module calls > > Py_InitModule*() with just "module" for the name. The shared > > library loader squirrels away the true name of the module in > > _Py_PackageContext, and Py_InitModule*() will substitute this (if > > the name actually matches). > > > > > b. Is there a more-sanctioned way around this other than touching > > > _Py_PackageContext (which seems to be intended to be private) > > > > I think using _Py_PackageContext is your only hope. If you contribute > > some docs for it we'll gladly add them to the API docs. > > Hmm, my only hope for what? What I was worried about was that if I tried to > create a nested sub-extension module from within my extension module by > calling Py_InitModuleXXX() directly, its name would be forced to be the > same as that of the outer extension module. Since you pointed out that > _Py_PackageContext was being nulled out, I don't think that's much of an > issue. What issues /do/ I need to be aware of when doing this? I guess I misunderstood what you were trying to accomplish; I thought you were asking if there was a more accepted way of doing this besides setting _Py_PackageContext? I don't understand why the nulling out of _Py_PackageContext makes a difference for what you were trying to do -- unless the last component of your submodule's name is the same as its parent's (e.g. you're creating a submodule X.X inside a module X), the strcmp() could never succeed. Also, if the name passed to Py_InitModuleXXX() contains a dot, the strcmp() can never succeed (since it's applied to the last component of the package context). --Guido van Rossum (home page: http://www.python.org/~guido/) From mclay@nist.gov Tue Aug 20 18:10:18 2002 From: mclay@nist.gov (Michael McLay) Date: Tue, 20 Aug 2002 13:10:18 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208201310.18625.mclay@nist.gov> On Monday 19 August 2002 04:38 pm, Guido van Rossum wrote: > > [Guido van Rossum] > > > > > OTOH what then to do with _sort_repr -- make it a class var or an > > > instance var? > > [Brett C] > > > Well, how often can you imagine someone printing out a single set sorted, > > but having other sets that they didn't want printed out sorted? I would > > suspect that it is going to be a very rare case when someone wants just > > part of their sets printing sorted and the rest not. > > > > I say make it a class var. > > Hm, but what if two different library modules have conflicting > requirements? E.g. module A creates sets of complex numbers and must > have sort_repr=False, while module B needs sort_repr=True for > user-friendliness (or because it relies on this). Adding a SortedSet class to the module would partially solve the problem of globally clobbering the usage of Set in other modules. The downside is that the selection of the sort function would be a static decision made when when a set instance is created. >>> class Set(object): ... _sort_repr=False ... __slots__ = ["_data"] ... def __init__(self.... >>> class SortedSet(Set): ... _sort_repr=True ... >>> ss = SortedSet() >>> s = Set() >>> s._sort_repr False >>> ss._sort_repr True From guido@python.org Tue Aug 20 18:22:32 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 13:22:32 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Tue, 20 Aug 2002 13:10:18 EDT." <200208201310.18625.mclay@nist.gov> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> Message-ID: <200208201722.g7KHMX122261@odiug.zope.com> > Adding a SortedSet class to the module would partially solve the problem of > globally clobbering the usage of Set in other modules. I say YAGNI. I am still perplexed that I receoved *no* feedback on the sets module except on this issue of sort order (which I consider solved by adding a method _repr() that takes an optional 'sorted' argument). --Guido van Rossum (home page: http://www.python.org/~guido/) From mwh@python.net Tue Aug 20 18:25:13 2002 From: mwh@python.net (Michael Hudson) Date: 20 Aug 2002 18:25:13 +0100 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Guido van Rossum's message of "Tue, 20 Aug 2002 13:22:32 -0400" References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> Message-ID: <2mhehpe93a.fsf@starship.python.net> Guido van Rossum writes: > I am still perplexed that I receoved *no* feedback on the sets module > except on this issue of sort order (which I consider solved by adding > a method _repr() that takes an optional 'sorted' argument). This is hardly without precedent, though, is it? (I mean the only getting feedback on trivia). I haven't looked at the set implementationin detail, but given that it's principal authors are you and Alex, I'm sure it must be wonderful. Is that better? :) Cheers, M. -- Or here's an even simpler indicator of how much C++ sucks: Print out the C++ Public Review Document. Have someone hold it about three feet above your head and then drop it. Thus you will be enlightened. -- Thant Tessman From fredrik@pythonware.com Tue Aug 20 21:07:52 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 20 Aug 2002 22:07:52 +0200 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> <200208191406.g7JE6lK10592@pcp02138704pcs.reston01.va.comcast.net> <200208191525.g7JFPcF11152@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <026c01c24885$460247c0$ced241d5@hagrid> guido wrote: > > As long as users do not discover it, they will not use it! :-) > > We can mull that over until the first beta release. is there a list somewhere of things that should be mulled over? (e.g. set api issues, textfile(filename, mode, encoding) instead of that ugly "U" flag, datetime/basetime stuff, bwidgets additions to tkinter, tk 8.4 updates, etc) From guido@python.org Tue Aug 20 21:28:54 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 16:28:54 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Tue, 20 Aug 2002 22:07:52 +0200." <026c01c24885$460247c0$ced241d5@hagrid> References: <200208162154.g7GLsrZ28972@pcp02138704pcs.reston01.va.comcast.net> <200208191406.g7JE6lK10592@pcp02138704pcs.reston01.va.comcast.net> <200208191525.g7JFPcF11152@pcp02138704pcs.reston01.va.comcast.net> <026c01c24885$460247c0$ced241d5@hagrid> Message-ID: <200208202028.g7KKSsB25884@odiug.zope.com> > is there a list somewhere of things that should be mulled over? > > (e.g. set api issues, textfile(filename, mode, encoding) instead of > that ugly "U" flag, datetime/basetime stuff, bwidgets additions to > tkinter, tk 8.4 updates, etc) I've added these to PEP 283. Anybody who has a suggestion please edit that PEP (or mail it to me if you don't have checkin perms). --Guido van Rossum (home page: http://www.python.org/~guido/) From Jack.Jansen@oratrix.com Tue Aug 20 21:36:11 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Tue, 20 Aug 2002 22:36:11 +0200 Subject: [Python-Dev] PEP 277 (unicode filenames): please review In-Reply-To: Message-ID: <76896D65-B47C-11D6-A493-003065517236@oratrix.com> On maandag, augustus 19, 2002, at 03:36 , Matthias Urlichs wrote: > Guido: >> >> My guess it's not his listdir() or filesystem, but the keyboard >> driver. >> > No, it's MacOSX. It always uses the decomposed form. > > That is very noticeable via NFS volumes where files with > combined character names are unopenable from the GUI. > I've filed a bug report about that - i don't know whether OSX > 10.2 will allow NFC filenames, at least read-only. This must be an oversight (or maybe something they didn't implement because of lack of time?). They have all the machinery in place to do on the fly conversion of filenames, it is used for HFS (old-style HFS, not HFS+) and SMB filesystems, where you specify the character set of the filesystem at mount time, and they do NFC-NFD conversion in the system call interface. -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From mclay@nist.gov Tue Aug 20 21:31:10 2002 From: mclay@nist.gov (Michael McLay) Date: Tue, 20 Aug 2002 16:31:10 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208201722.g7KHMX122261@odiug.zope.com> References: <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> Message-ID: <200208201631.10300.mclay@nist.gov> On Tuesday 20 August 2002 01:22 pm, Guido van Rossum wrote: > I am still perplexed that I receoved *no* feedback on the sets module > except on this issue of sort order (which I consider solved by adding > a method _repr() that takes an optional 'sorted' argument). I haven't read the entire thread, but I was puzzled by the implementation approach. Did you consider kjbuckets for the standard Python distribution? While the claim is rather old, the following quote from Aaron's intro [1] to the module suggests it might improve performance: For suitably large compute intensive uses these types should provide up to an order of magnitude speedup versus an implementation that uses analogous operations implemented directly in Python. Adding the gadfly SQL database to the standard library would also be useful, but since it is back under development it would be best for gadfly to live on a separate release cycle. The kjbuckets software, however, doesn't seem to be changing. One more reason for adding kjbuckets, Tim Berner-Lee might find the kjGraphs class useful for the semantic web work. [1] http://starship.python.net/crew/aaron_watters/kjbuckets/kjbuckets.html From guido@python.org Tue Aug 20 21:49:25 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 16:49:25 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Tue, 20 Aug 2002 16:31:10 EDT." <200208201631.10300.mclay@nist.gov> References: <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <200208201631.10300.mclay@nist.gov> Message-ID: <200208202049.g7KKnPJ26019@odiug.zope.com> > > I am still perplexed that I receoved *no* feedback on the sets module > > except on this issue of sort order (which I consider solved by adding > > a method _repr() that takes an optional 'sorted' argument). > > I haven't read the entire thread, but I was puzzled by the implementation > approach. Did you consider kjbuckets for the standard Python distribution? No. I think that would be the wrong idea at this point for two reasons: (1) never change two variables at the same time; (2) let's gather some experience with the new set API first, before we start worrying about implementation speed. I also believe that kjbuckets maintains its data in a sorted order, which is unnecessary for sets -- a hash table is much faster. After all we use a very fast hash table implementation to represent sets. (The only improvement would be that we could save maybe 4 bytes per hash table entry because we don't need a value pointer.) > While the claim is rather old, the following quote from Aaron's > intro [1] to the module suggests it might improve performance: > > For suitably large compute intensive uses these types should > provide up to an order of magnitude speedup versus an > implementation that uses analogous operations implemented > directly in Python. The sets module does not implement analogous operations directly in Python. Almost all the implementation work is done by the dict implementation. > Adding the gadfly SQL database to the standard library would also be > useful, but since it is back under development it would be best for > gadfly to live on a separate release cycle. The kjbuckets software, > however, doesn't seem to be changing. Because nobody is maintaining it any more. > One more reason for adding kjbuckets, Tim Berner-Lee might find the > kjGraphs class useful for the semantic web work. > > [1] http://starship.python.net/crew/aaron_watters/kjbuckets/kjbuckets.html kjbuckets may be nice, but adding it to the core would add a serious new maintenance burden for the core developers. I don't see anyone raising their hand to help out here. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Tue Aug 20 22:17:13 2002 From: skip@pobox.com (Skip Montanaro) Date: Tue, 20 Aug 2002 16:17:13 -0500 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: References: Message-ID: <15714.45529.615062.997294@gargle.gargle.HOWL> tim> Straight character n-grams are very appealing because they're the tim> simplest and most language-neutral; I didn't have any luck with tim> them over the weekend, but the size of my training data was tim> trivial. Anybody up for pooling corpi (corpora?)? Skip From python@rcn.com Tue Aug 20 22:27:06 2002 From: python@rcn.com (Raymond Hettinger) Date: Tue, 20 Aug 2002 17:27:06 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> Message-ID: <009801c24890$565b26e0$1bf8a4d8@othello> From: "Guido van Rossum" > I am still perplexed that I receoved *no* feedback on the sets module > except on this issue of sort order (which I consider solved by adding > a method _repr() that takes an optional 'sorted' argument). I think the __init__() code in BaseSet should be pushed down into Set and ImmutableSet. It should be replaced by code raising a TypeError just like we do for basestring: >>> basestring('abc') Traceback (most recent call last): File "", line 1, in ? TypeError: The basestring type cannot be instantiated Raymond Hettinger P.S. More comments are on the way as we play with, profile, review, optimize, and document the module ;) From guido@python.org Tue Aug 20 22:27:11 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 17:27:11 -0400 Subject: [Python-Dev] What is a backport candidate? In-Reply-To: Your message of "Tue, 20 Aug 2002 17:15:20 EDT." <008401c2488e$b11e3ce0$1bf8a4d8@othello> References: <008401c2488e$b11e3ce0$1bf8a4d8@othello> Message-ID: <200208202127.g7KLRB926127@odiug.zope.com> > When we say "backport candidate", does that mean we need to think > about it more or that it is waiting for someone like me to pounce on > it and get it done? It means somebody (like you :-) should do triage on the feasibility of it. The triage can have several outcomes: - Trivial yes: the patch applies directly to the 2.2 branch and doesn't cause problems there. In this case, you can apply it right away and be done with it. - Trivial no: the patch doesn't make sense at all -- this should only happen when the patch patches code that was added in 2.3; in this case the backport/bugfix marking was a mistake, but mistakes happen. - Needs work: the idea behind the patch applies to 2.2, but the code there is sufficiently different that patch (or cvs update -j) doesn't quite work. There are gradations of this, depending on what's in the way. In this case, you may put it off. We need a database of these triage decisions; the new RoundUp-based tracker (prototype at python.org:8080) is supposed to have a feature to add this info to the tracker, but I don't know how it works or whether it is adequate yet. I'm cc'ing this to python-dev since others may be interested in this topic. Also note that I believe we've been inconsistent in marking up candidates: some say "bugfix candidate", some say "backport candidate", some may not be marked at all. :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@python.org Tue Aug 20 22:23:12 2002 From: barry@python.org (Barry A. Warsaw) Date: Tue, 20 Aug 2002 17:23:12 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 References: <15714.45529.615062.997294@gargle.gargle.HOWL> Message-ID: <15714.45888.971292.204949@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: tim> Straight character n-grams are very appealing because they're tim> the simplest and most language-neutral; I didn't have any tim> luck with them over the weekend, but the size of my training tim> data was trivial. SM> Anybody up for pooling corpi (corpora?)? I've got collections from python-dev, python-list, edu-sig, mailman-developers, and zope3-dev, chopped at Feb 2002, which is approximately when Greg installed SpamAssassin. The collections are /all/ known good, but pretty close (they should be verified by hand). The idea is to take some random subsets of these, cat them together and use them as both training and test data, along with some 'net-available known spam collections. No time more to play with this today though... -Barry From guido@python.org Tue Aug 20 22:41:13 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 17:41:13 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Tue, 20 Aug 2002 17:27:06 EDT." <009801c24890$565b26e0$1bf8a4d8@othello> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <009801c24890$565b26e0$1bf8a4d8@othello> Message-ID: <200208202141.g7KLfD726227@odiug.zope.com> > From: "Guido van Rossum" > > I am still perplexed that I receoved *no* feedback on the sets module > > except on this issue of sort order (which I consider solved by adding > > a method _repr() that takes an optional 'sorted' argument). > > I think the __init__() code in BaseSet should be pushed down into Set and ImmutableSet. It should be replaced by code raising a > TypeError just like we do for basestring: > > >>> basestring('abc') > Traceback (most recent call last): > File "", line 1, in ? > TypeError: The basestring type cannot be instantiated Good idea. I checked this in, raising NotImplementedError. > Raymond Hettinger > > P.S. More comments are on the way as we play with, profile, review, > optimize, and document the module ;) Didn't you submit a SF patch/bug? I think I replied to that. --Guido van Rossum (home page: http://www.python.org/~guido/) From smurf@noris.de Tue Aug 20 22:49:05 2002 From: smurf@noris.de (Matthias Urlichs) Date: Tue, 20 Aug 2002 23:49:05 +0200 Subject: [Python-Dev] PEP 277 (unicode filenames): please review In-Reply-To: <76896D65-B47C-11D6-A493-003065517236@oratrix.com>; from Jack.Jansen@oratrix.com on Tue, Aug 20, 2002 at 10:36:11PM +0200 References: <76896D65-B47C-11D6-A493-003065517236@oratrix.com> Message-ID: <20020820234905.A29078@noris.de> Hi, Jack Jansen: > > That is very noticeable via NFS volumes where files with > > combined character names are unopenable from the GUI. > > This must be an oversight (or maybe something they didn't > implement because of lack of time?). They have all the machinery > in place to do on the fly conversion of filenames, it is used > for HFS (old-style HFS, not HFS+) and SMB filesystems, where you > specify the character set of the filesystem at mount time, and > they do NFC-NFD conversion in the system call interface. Specifying the charset at mount time doesn't work for mount_nfs -- maybe they fix that in Jaguar (10.2). -- Matthias Urlichs | noris network AG | http://smurf.noris.de/ From tim.one@comcast.net Tue Aug 20 22:51:02 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 20 Aug 2002 17:51:02 -0400 Subject: [Python-Dev] RE: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <15714.45529.615062.997294@gargle.gargle.HOWL> Message-ID: [Skip Montanaro] > Anybody up for pooling corpi (corpora?)? Barry is collecting clean data from mailing-list archives for lists hosted at python.org. It's unclear that this will be useful for anything other than mailing lists hosted at python.org (which I expect have a lot of topic commonality). There's a lovely spam archive here: http://www.em.ca/~bruceg/spam/ From pinard@iro.umontreal.ca Tue Aug 20 23:39:56 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 20 Aug 2002 18:39:56 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208201722.g7KHMX122261@odiug.zope.com> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> Message-ID: [Guido van Rossum] > I am still perplexed that I received *no* feedback on the sets module As I previously said, I feel comfortable with what I read and saw. I'd probably have to use sets for having more circumstantiated comments. Unless you offer the source on c.l.py and ask for more users' opinions? Maybe some people would have preferred to see more usual notation, like `+' for union and `*' for intersection, rather than `or' and `and'? There are tiny pros and cons in each direction. For one, I'll gladly use what is available, I'm not really going to crusade for either notation... Should there be special provisions for Sets to interoperate magically with lists or iterators? Lists and iterators could be considered as ordered sets with duplicates allowed. Even if it could be tinily useful, it is surely not difficult to explicitly "cast" lists and iterators using the `Set' constructor. It is already easy to build an iterator or a list out of a set. Criticism? OK! What about supporting infinite sets? :-) Anything else? Hmph! The module doc-string has the word "actually" with three `l'! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard From paul@prescod.net Wed Aug 21 00:06:34 2002 From: paul@prescod.net (Paul Prescod) Date: Tue, 20 Aug 2002 16:06:34 -0700 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 References: <15714.45529.615062.997294@gargle.gargle.HOWL> <15714.45888.971292.204949@anthem.wooz.org> Message-ID: <3D62CB7A.E772EB0D@prescod.net> Some perhaps relevant links (with no off-topic discusssion): * http://www.tuxedo.org/~esr/bogofilter/ * http://www.ai.mit.edu/~jrennie/ifile/ * http://groups.google.com/groups?selm=ajk8mj%241c3qah%243%40ID-125932.news.dfncis.de """My finding is that it is _nowhere_ near sufficient to have two populations, "spam" versus "not spam." If you muddle together the Nigerian Pyramid schemes with the "Penis enhancement" ads along with the offers of new credit cards as well as the latest sites where you can talk to "hot, horny girls LIVE!", the statistics don't work out nearly so well. It's hard to tell, on the face of it, why Nigerian scams _should_ be considered textually similar to phone sex ads, and in practice, the result of throwing them all together" There are a few things left to improve about Ifile, and I'd like to redo it in some language fundamentally less painful to work with than C """ "Barry A. Warsaw" wrote: > > >>>>> "SM" == Skip Montanaro writes: > > tim> Straight character n-grams are very appealing because they're > tim> the simplest and most language-neutral; I didn't have any > tim> luck with them over the weekend, but the size of my training > tim> data was trivial. > > SM> Anybody up for pooling corpi (corpora?)? > > I've got collections from python-dev, python-list, edu-sig, > mailman-developers, and zope3-dev, chopped at Feb 2002, which is > approximately when Greg installed SpamAssassin. The collections are > /all/ known good, but pretty close (they should be verified by hand). > > The idea is to take some random subsets of these, cat them together > and use them as both training and test data, along with some > 'net-available known spam collections. > > No time more to play with this today though... > -Barry > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev -- Paul Prescod From esr@thyrsus.com Wed Aug 21 00:17:38 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 20 Aug 2002 19:17:38 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> Message-ID: <20020820231738.GA21011@thyrsus.com> [Guido van Rossum] > I am still perplexed that I received *no* feedback on the sets module It should have powerset and cartesian-product methods. Shall I code them? -- Eric S. Raymond From esr@thyrsus.com Wed Aug 21 00:23:46 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 20 Aug 2002 19:23:46 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <3D62CB7A.E772EB0D@prescod.net> References: <15714.45529.615062.997294@gargle.gargle.HOWL> <15714.45888.971292.204949@anthem.wooz.org> <3D62CB7A.E772EB0D@prescod.net> Message-ID: <20020820232346.GA21177@thyrsus.com> Paul Prescod : > Some perhaps relevant links (with no off-topic discusssion): > > * http://www.tuxedo.org/~esr/bogofilter/ I'm in the process of speed-tuning this now. I intend for it to be blazingly fast, usable for sites that process 100K mails a day, and I think I know how to do that. This is not a natural application for Python :-). > """My finding is that it is _nowhere_ near sufficient to have two > populations, "spam" versus "not spam." Well, except it seems to work quite well. The Nigerian trigger-word population is distinct from the penis-enlargement population, but they both show up under Bayesian analysis. -- Eric S. Raymond From python@rcn.com Wed Aug 21 00:23:30 2002 From: python@rcn.com (Raymond Hettinger) Date: Tue, 20 Aug 2002 19:23:30 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <009801c24890$565b26e0$1bf8a4d8@othello> <200208202141.g7KLfD726227@odiug.zope.com> Message-ID: <011001c248a0$99575580$1bf8a4d8@othello> [GvR] > > > I am still perplexed that I receoved *no* feedback on the sets module > > > except on this issue of sort order (which I consider solved by adding > > > a method _repr() that takes an optional 'sorted' argument). [RH] > > P.S. More comments are on the way as we play with, profile, review, > > optimize, and document the module ;) [GvR] > Didn't you submit a SF patch/bug? I think I replied to that. Yes. I've now revised the patch accordingly. More thoughts: 1. Rename .remove() to __del__(). Its usage is inconsistent with list.remove(element) which can leave other instances of element in the list. It is more consistent with 'del adict[element]'. 2. discard() looks like a useful standard API. Perhaps it shoulds be added to the dictionary API. 3. Should we add .as_temporarily_immutable to dictionaries and lists so that they will also be potential elements of a set? 4. remove(), update(), add(), and __contains__() all work hard to check for .as_temporarily_immutable(). Should this propagated to other methods that add set members(i.e. replace all instances of data[element] = value with self.add(element) or use self.update() in the code for __init__())? The answer is tough because it causes an enormous slowdown in the common use cases of uniquifying a sequence. OTOH, why check in some places but not others -- why is .add(aSetInstance) okay but not Set([aSetInstance]). If the answer is yes, then the code for update() should be super-optimized by taking moving the try/except outside the for-loop and wrapping the whole thing in a while 1. Also, we could bypass the slower .add() method when incoming source of elements is known to be an instance of BaseSet. 5. Add a quick pre-check to issubset() and issuperset() along the lines of: def issubset(self, other): """Report whether another set contains this set.""" self._binary_sanity_check(other) if len(self) > len(other): return False # Fast check for the obvious case for elt in self: if elt not in other: return False return True 6. For clarity and foolish consistency, replace all occurrences of 'elt' with 'element'. Raymond Hettinger From paul-python@svensson.org Wed Aug 21 00:27:53 2002 From: paul-python@svensson.org (Paul Svensson) Date: Tue, 20 Aug 2002 19:27:53 -0400 (EDT) Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208202141.g7KLfD726227@odiug.zope.com> Message-ID: On Tue, 20 Aug 2002, Guido van Rossum wrote: >> From: "Guido van Rossum" >> > I am still perplexed that I receoved *no* feedback on the sets module >> > except on this issue of sort order (which I consider solved by adding >> > a method _repr() that takes an optional 'sorted' argument). >> >> I think the __init__() code in BaseSet should be pushed down into Set and ImmutableSet. It should be replaced by code raising a >> TypeError just like we do for basestring: >> >> >>> basestring('abc') >> Traceback (most recent call last): >> File "", line 1, in ? >> TypeError: The basestring type cannot be instantiated > >Good idea. I checked this in, raising NotImplementedError. Is there any particular reason BaseSet and basestring need to raise different exceptions on an attempt at instantiation ? /Paul From tim.one@comcast.net Wed Aug 21 00:44:52 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 20 Aug 2002 19:44:52 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <3D62CB7A.E772EB0D@prescod.net> Message-ID: [Paul Prescod] > Some perhaps relevant links (with no off-topic discusssion): > > * http://www.tuxedo.org/~esr/bogofilter/ Damn -- wish I'd read that before. Among other things, Eric found a good use for Judy arrays . > * http://www.ai.mit.edu/~jrennie/ifile/ Knew about that. Good stuff. > http://groups.google.com/groups?selm=ajk8mj%241c3qah%243%40ID-1259 > 32.news.dfncis.de Seems confused, assuming Graham's approach is a minor variant of ifile's. But Graham's computation is to classic Bayesian classifiers (like ifile) as Python's lambda is to Lisp's <0.7 wink>. Heart of the confusion: Integrating the whole set of statistics together requires adding up statistics for _all_ the words found in a message, not just the words "sex" and "sexy." The rub is that Graham doesn't try to add up the statistics for all the words found in a msg. To the contrary, it ends up ignoring almost all of the words. In particular, if the database indicates that "sex" and "sexy" aren't good spam-vs-non-spam discriminators, Graham's approach ignores them completely (their presence or absence doesn't affect the final outcome at all -- it's like the words don't exist; this isn't what ifile does, and ifile probably couldn't get away with this because it's trying to do N-way classification instead of strictly 2-way -- someone who understands the math and reads Graham's article carefully will likely have a hard time figuring out what Bayes has to do with it at all! I sure did.). > """My finding is that it is _nowhere_ near sufficient to have two > populations, "spam" versus "not spam." In ifile I believe that. But the data will speak for itself soon enough, so I'm not going to argue about this. From aahz@pythoncraft.com Wed Aug 21 00:56:53 2002 From: aahz@pythoncraft.com (Aahz) Date: Tue, 20 Aug 2002 19:56:53 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: <011001c248a0$99575580$1bf8a4d8@othello> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <009801c24890$565b26e0$1bf8a4d8@othello> <200208202141.g7KLfD726227@odiug.zope.com> <011001c248a0$99575580$1bf8a4d8@othello> Message-ID: <20020820235653.GA9789@panix.com> On Tue, Aug 20, 2002, Raymond Hettinger wrote: > > 1. Rename .remove() to __del__(). Its usage is inconsistent with > list.remove(element) which can leave other instances of element in the > list. It is more consistent with 'del adict[element]'. You mean __delitem__, I think. __del__ is only for deleting the object itself when its refcount goes to zero. > 3. Should we add .as_temporarily_immutable to dictionaries and lists > so that they will also be potential elements of a set? There's been some talk in the past about creating lockable dicts and lists (emphasis on dicts because lists have tuple-equivalence). -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From tim.one@comcast.net Wed Aug 21 00:57:04 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 20 Aug 2002 19:57:04 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020820232346.GA21177@thyrsus.com> Message-ID: [Eric S. Raymond] > I'm in the process of speed-tuning this now. I intend for it to be > blazingly fast, usable for sites that process 100K mails a day, and I > think I know how to do that. This is not a natural application for > Python :-). I'm not sure about that. The all-Python version I checked in added 20,000 Python-Dev messages to the database in 2 wall-clock minutes. The time for computing the statistics, and for scoring, is simply trivial (this wouldn't be true of a "normal" Bayesian classifier (NBC), but Graham skips most of the work an NBC does, in particular favoring fast classification time over fast model-update time). What we anticipate is that the vast bulk of the time will end up getting spent on better tokenization, such as decoding base64 portions, and giving special care to header fields and URLs. I also *suspect* (based on a previous life in speech recogniation) that experiments will show that a mixture of character n-grams and word bigrams is significantly more effective than a "naive" tokenizer that just looks for US ASCII alphanumeric runs. >> """My finding is that it is _nowhere_ near sufficient to have two >> populations, "spam" versus "not spam." > Well, except it seems to work quite well. The Nigerian trigger-word > population is distinct from the penis-enlargement population, but they > both show up under Bayesian analysis. In fact, I'm going to say "Nigerian" and "penis enlargement" one more time each here, just to demonstrate that *this* message won't be a false positive when the smoke settles . Human Growth Hormone too, while I'm at it. From esr@thyrsus.com Wed Aug 21 00:59:33 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 20 Aug 2002 19:59:33 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: References: <3D62CB7A.E772EB0D@prescod.net> Message-ID: <20020820235933.GA22413@thyrsus.com> Tim Peters : > > * http://www.tuxedo.org/~esr/bogofilter/ > > Damn -- wish I'd read that before. Among other things, Eric found a good > use for Judy arrays . It's a freaking *ideal* use for Judy arrays. Platonically perfect. They couldn't fit better if they'd been designed for this application. Bogofilter was actually born in the moment that I realized this. -- Eric S. Raymond From python@rcn.com Wed Aug 21 01:02:05 2002 From: python@rcn.com (Raymond Hettinger) Date: Tue, 20 Aug 2002 20:02:05 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <009801c24890$565b26e0$1bf8a4d8@othello> <200208202141.g7KLfD726227@odiug.zope.com> <011001c248a0$99575580$1bf8a4d8@othello> <20020820235653.GA9789@panix.com> Message-ID: <016e01c248a5$fca0de40$1bf8a4d8@othello> From: "Aahz" > > 1. Rename .remove() to __del__(). Its usage is inconsistent with > > list.remove(element) which can leave other instances of element in the > > list. It is more consistent with 'del adict[element]'. > > You mean __delitem__, I think. __del__ is only for deleting the object > itself when its refcount goes to zero. Yes! From tim.one@comcast.net Wed Aug 21 01:09:57 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 20 Aug 2002 20:09:57 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020820235933.GA22413@thyrsus.com> Message-ID: [Eric S. Raymond] > It's a freaking *ideal* use for Judy arrays. Platonically perfect. They > couldn't fit better if they'd been designed for this application. > Bogofilter was actually born in the moment that I realized this. I believe that so long as it stays in memory. But, as you mention in your manpage startup is too slow for sites handling thousands of mails an hour That likely makes a Zope OOBTree stored under ZODB a better choice still, as that's designed for efficient update and access in a persistent database (the version of this we've got now does update during scoring, to keep track of when tokens were last used, and how often they've proved useful in discriminating -- there needs to be a way to expire tokens over time, else the database will grow without bound). I've corresponded with Douglas Baskins about "this kind of thing", and he's keen to address it (along with every other problem in the world <0.9 wink>); it would help if HP weren't laying off the people who have worked on this code. From guido@python.org Wed Aug 21 01:37:49 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 20:37:49 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Tue, 20 Aug 2002 18:39:56 EDT." References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> Message-ID: <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> > > I am still perplexed that I received *no* feedback on the sets > > module > > As I previously said, I feel comfortable with what I read and saw. > I'd probably have to use sets for having more circumstantiated > comments. Fair enough. > Unless you offer the source on c.l.py and ask for more users' opinions? Last time I tried that it turned out a bad idea. I prefer feedback over a flame war. > Maybe some people would have preferred to see more usual notation, > like `+' for union and `*' for intersection, rather than `or' and > `and'? There are tiny pros and cons in each direction. For one, > I'll gladly use what is available, I'm not really going to crusade > for either notation... Um, the notation is '|' and '&', not 'or' and 'and', and those are what I learned in school. Seems pretty conventional to me (Greg Wilson actually tried this out on unsuspecting newbies and found that while '+' worked okay, '*' did not -- read the PEP). But yes, this is decent feedback (with good enough arguments, Greg's conclusion might even be overturned). > Should there be special provisions for Sets to interoperate > magically with lists or iterators? Lists and iterators could be > considered as ordered sets with duplicates allowed. Even if it > could be tinily useful, it is surely not difficult to explicitly > "cast" lists and iterators using the `Set' constructor. It is > already easy to build an iterator or a list out of a set. You can do an in-place union of a Set and a sequence or iterable with set.update(seq). If you want intersection or a difference, or your set is immutable, you'd have to cast the sequence to a set. What's the use case? Which brings me to another open issue. set.update(seq) and set.add(element) have a provision to transform the inserted element(s) to an ImmutableSet if needed. Should the constructor do the same? > Criticism? OK! What about supporting infinite sets? :-) Anything else? > Hmph! The module doc-string has the word "actually" with three `l'! :-) Not any more, thanks to Raymond Hettinger. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Wed Aug 21 01:46:23 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 21 Aug 2002 12:46:23 +1200 (NZST) Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208210046.g7L0kN605969@oma.cosc.canterbury.ac.nz> > Um, the notation is '|' and '&', not 'or' and 'and', and those are > what I learned in school. Really? The notation I learned in school was big-rounded-U for union and big-upside-down-rounded-U for intersection. Not available in the ASCII character set, unfortunately. But I agree that | and & are fairlly intuitive substitutes for these, and they agree with what you use for bit twiddling. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From esr@thyrsus.com Wed Aug 21 01:52:52 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 20 Aug 2002 20:52:52 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: References: <20020820232346.GA21177@thyrsus.com> Message-ID: <20020821005252.GB22413@thyrsus.com> (Copied to Paul Graham. Paul, this is the mailing list of the Python maintainers. I thought you'd find the bits about lexical analysis in bogofilter interesting. Pythonistas, Paul is one of the smartest and sanest people in the LISP community, as evidenced partly by the fact that he hasn't been too proud to learn some lessons from Python :-). It would be a good thing for some bridges to be built here.) Tim Peters : > What we anticipate is that the vast bulk of the time will end up getting > spent on better tokenization, such as decoding base64 portions, and giving > special care to header fields and URLs. This is one of bogofilter's strengths. It already does this stuff at the lexical level using a speed-tuned flex scanner (I spent a lot of the development time watching token strings go by and tweaking the scanner rules to throw out cruft). In fact, look at this. It's a set of lex rules with interpolated comments: BASE64 [A-Za-z0-9/+] IPADDR [0-9]+\.[0-9]+\.[0-9]+\.[0-9]+ MIME_BOUNDARY ^--[^[:blank:]\n]*$ %% # Recognize and discard From_ headers ^From\ {return(FROM);} # Recognize and discard headers that contain dates and tumblers. ^Date:.*|Delivery-Date:.* ; ^Message-ID:.* ; # Throw away BASE64 enclosures. No point in using this as a discriminator; # spam that has these in it always has other triggers in the headers. # Therefore don't pay the overhead to decode it. ^{BASE64}+$ ; # Throw away tumblers generated by MTAs ^\tid\ .* ; SMTP\ id\ .* ; # Toss various meaningless MIME cruft boundary=.* ; name=\" ; filename=\" ; {MIME_BOUNDARY} ; # Keep IP addresses {IPADDR} {return(TOKEN);} # Keep wordlike tokens of length at least three [a-z$][a-z0-9$'.-]+[a-z0-9$] {return(TOKEN);} # Everything else, including all one-and-two-character tokens, # gets tossed. . ; This small set of rules does a remarkably effective job of tossing out everything that isn't a potential recognition feature. Watching the filtered token stream from a large spam corpus go by is actually quite an enlightening experience. It does a better job that Paul's original in one important respect; IP addresses and hostnames are preserved whole for use as recognition features. I think I know why Paul didn't do this -- he's not a C coder, and if lex/flex isn't part of one's toolkit one's brain doesn't tend to wander down design paths that involve elaborate lexical analysis, because it's just too hard. This is actually the first new program I've coded in C (rather than Python) in a good four years or so. There was a reason for this; I have painful experience with doing lexical analysis in Python that tells me flex-generated C will be a major performance win here. The combination of flex and Judy made it a no-brainer. > I also *suspect* (based on a > previous life in speech recogniation) that experiments will show that a > mixture of character n-grams and word bigrams is significantly more > effective than a "naive" tokenizer that just looks for US ASCII alphanumeric > runs. I share part of your suspicions -- I'm thinking about going to bigram analysis for header lines. But I'm working on getting the framework up first. Feature extraction is orthogonal to both the Bayesian analysis and (mostly) to the data-storage method, and can be a drop-in change if the framework is done right. > In fact, I'm going to say "Nigerian" and "penis enlargement" one more time > each here, just to demonstrate that *this* message won't be a false positive > when the smoke settles . Human Growth Hormone too, while I'm at it. I think I know why, too. It's the top-15 selection -- the highest-variance words don't blur into non-spam English the way statistics on *all* tokens would. It's like an edge filter. -- Eric S. Raymond From esr@thyrsus.com Wed Aug 21 01:56:05 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 20 Aug 2002 20:56:05 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020821005605.GC22413@thyrsus.com> Guido van Rossum : > Um, the notation is '|' and '&', not 'or' and 'and', and those are > what I learned in school. Seems pretty conventional to me (Greg > Wilson actually tried this out on unsuspecting newbies and found that > while '+' worked okay, '*' did not -- read the PEP). +1 on preferring | and & to `or' and `and'. To me, `or' and `and' say that what's being composed are predicates, not sets. -- Eric S. Raymond From greg@cosc.canterbury.ac.nz Wed Aug 21 02:00:48 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 21 Aug 2002 13:00:48 +1200 (NZST) Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <20020821005605.GC22413@thyrsus.com> Message-ID: <200208210100.g7L10mY06019@oma.cosc.canterbury.ac.nz> "Eric S. Raymond" : > To me, `or' and `and' say > that what's being composed are predicates, not sets. Besides which, they can't be overridden in Python anyway. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From esr@thyrsus.com Wed Aug 21 02:05:17 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 20 Aug 2002 21:05:17 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210046.g7L0kN605969@oma.cosc.canterbury.ac.nz> References: <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> <200208210046.g7L0kN605969@oma.cosc.canterbury.ac.nz> Message-ID: <20020821010517.GD22413@thyrsus.com> Greg Ewing : > > Um, the notation is '|' and '&', not 'or' and 'and', and those are > > what I learned in school. > > Really? The notation I learned in school was big-rounded-U > for union and big-upside-down-rounded-U for intersection. > Not available in the ASCII character set, unfortunately. For historical reasons, there are three different notations for Boolean algebra in common use. You're describing the one derived from set theory. I personally favor the one derived from lattice algebra; the distinctive feature of that one is the pointy and &/| operators that look like /\ and \/. The third uses | and &. The set-theoretic notation is the oldest. I think Birkhoff & MacLane invented the lattice-theory notation in the 1940s. It is probably *slightly* more popular than the set-theoretic notation. The | & one is distinctly less common than either, at least among mathematicians; I think EEs and suchlike may use it more than we do. > But I agree that | and & are fairlly intuitive substitutes > for these, and they agree with what you use for bit twiddling. Not an insignificant point. -- Eric S. Raymond From esr@thyrsus.com Wed Aug 21 02:14:29 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 20 Aug 2002 21:14:29 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: References: <20020820235933.GA22413@thyrsus.com> Message-ID: <20020821011429.GE22413@thyrsus.com> Tim Peters : > [Eric S. Raymond] > > It's a freaking *ideal* use for Judy arrays. Platonically perfect. They > > couldn't fit better if they'd been designed for this application. > > Bogofilter was actually born in the moment that I realized this. > > I believe that so long as it stays in memory. VM, dude, VM is your friend. I thought this through carefully. The size of bogofilter's working set isn't limited by core. And because it's a B-tree variant, the access frequency will be proportional to log2 of the wordlist size and the patterns will be spatially bursty. This is a memory access pattern that plays nice with an LRU pager. > But, as you mention in your manpage > > startup is too slow for sites handling thousands of mails an hour > > That likely makes a Zope OOBTree stored under ZODB a better choice still, as > that's designed for efficient update and access in a persistent database I'm working on a simpler solution, one which might have a Pythonic spinoff. Stay tuned. -- Eric S. Raymond From guido@python.org Wed Aug 21 02:46:08 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 21:46:08 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Tue, 20 Aug 2002 19:27:53 EDT." References: Message-ID: <200208210146.g7L1k8X31138@pcp02138704pcs.reston01.va.comcast.net> > Is there any particular reason BaseSet and basestring need to raise > different exceptions on an attempt at instantiation ? Hm, I dunno. NotImplementedError was intended for this kind of use, but TypeError also matches. I'll add an XXX for this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Aug 21 03:05:03 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 22:05:03 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Tue, 20 Aug 2002 19:23:30 EDT." <011001c248a0$99575580$1bf8a4d8@othello> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <009801c24890$565b26e0$1bf8a4d8@othello> <200208202141.g7KLfD726227@odiug.zope.com> <011001c248a0$99575580$1bf8a4d8@othello> Message-ID: <200208210205.g7L253l31163@pcp02138704pcs.reston01.va.comcast.net> > 1. Rename .remove() to __del__(). Its usage is inconsistent with > list.remove(element) which can leave other instances of element in > the list. It is more consistent with 'del adict[element]'. (You mean __delete__.) -1. Sets don't support the x[y] notation for accessing or setting elements, so it would be weird to use that for deleting. You're not deleting the value corresponding to the key or index y (like you are when using del x[y] on a list or dict), you're deleting y itself. That's more like x.remove(y) for lists. > 2. discard() looks like a useful standard API. Perhaps it shoulds > be added to the dictionary API. Perhaps. But the dict API already has so many things. And why not to the list API? I'm -0 on this. > 3. Should we add .as_temporarily_immutable to dictionaries and > lists so that they will also be potential elements of a set? Hm, I think this is premature. I'd like to see a use case for a set of dicts and a set of lists first. Then you can code list and dict subclasses in Python that implement _as_temporarily_immutable (and _as_immutable too, I suppose). Then we'll see how often this ends up being used. For now, I'd say YAGNI. (I've said YAGNI to sets for years, but the realization that as a result lots of people independently invented using the keys of a dict to represent a set made me change my mind. Sort of like how I changed my mind on a Boolean type, too, after 12 years of thinking it wasn't needed. :-) > 4. remove(), update(), add(), and __contains__() all work hard to > check for .as_temporarily_immutable(). Should this propagated to > other methods that add set members(i.e. replace all instances of > data[element] = value with self.add(element) or use self.update() in > the code for __init__())? I've been thinking the same thing. I think that the only case where this could apply is __init__(), by the way. > The answer is tough because it causes an enormous slowdown in the > common use cases of uniquifying a sequence. OTOH, why check in some > places but not others -- why is .add(aSetInstance) okay but not > Set([aSetInstance]). Really? Why the slowdown? I was thinking of simply changing __init__ into if seq is not None: self.update(seq) If that's too slow, perhaps update() could be changed to the following: it = iter(seq) try: for elt in it: data[elt] = value except TypeError: transform = getattr(elt, '_as_immutable', None) if transform is None: raise data[transform()] = value self.update(it) That is, if there are no elements that require transformation, the added cost is a single try/except setup (plus an extra call to iter()). If any element requires transformation, the rest of the elements are dealt with as fast as update() can. Hm, maybe this could be applied to update() too (except it shouldn't call itself recursively but simply write the loop out a second time, with a try/except around each element). > If the answer is yes, then the code for update() should be > super-optimized by taking moving the try/except outside the for-loop > and wrapping the whole thing in a while 1. That's a similar idea as I just sketched. Can you email me a proposed patch? (Let's skip SF for this.) > Also, we could bypass the slower .add() method when incoming source > of elements is known to be an instance of BaseSet. Huh? Nobody calls add() internally AFAIK. > 5. Add a quick pre-check to issubset() and issuperset() along the > lines of: > > def issubset(self, other): > """Report whether another set contains this set.""" > self._binary_sanity_check(other) > if len(self) > len(other): return False # Fast check for the obvious case > for elt in self: > if elt not in other: > return False > return True Sure. Check it in. > 6. For clarity and foolish consistency, replace all occurrences of > 'elt' with 'element'. Hm, no. 'element' for a loop control variable seems too long (I'd be happy with 'x' but Greg Wilson used 'element'). However I like 'element' as the argument name because it can be used as a keyword argument and then it's better spelled out in full. I think Greg Wilson used 'item' most of the time; I prefer to be consistent and say 'element' all the time since that's the accepted set terminology. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Aug 21 03:25:17 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 22:25:17 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Tue, 20 Aug 2002 19:17:38 EDT." <20020820231738.GA21011@thyrsus.com> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> Message-ID: <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> > It should have powerset and cartesian-product methods. Shall I code > them? The Cartesian product of two sets is easy: def product(s1, s2): cp = Set() for x in s1: for y in s2: cp.add((x, y)) return cp But I'm not sure if this is useful enough to add to the set class -- it seems a great way to waste a lot of memory, and typical uses are probably better served by writing out the for-loops. Perhaps this could be coded as a generator though. More fun would be the cartesian product of N sets; I guess it would have to be recursive. Here's a generator version: def product(s, *sets): if not sets: for x in s: yield (x,) else: subproduct = list(product(*sets)) for x in s: for t in subproduct: yield (x,) + t Note that this doesn't require sets; its arguments can be arbitrary iterables. So maybe this belongs in a module of iterator utilities rather than in the sets module. API choice: product() with a single argument yields a series of 1-tuples. That's slightly awkward, but works better for the recursion. And specifically asking for the Cartesian product of a single set is kind of pointless. If we *do* add this to the Set class, it could be aliased to __mul__ (giving another reason why using * for set intersection is a bad idea). Here's a naive powerset implementation returning a set: def power(s): ps = Set() ps.add(ImmutableSet([])) for elt in s: s1 = Set([elt]) ps1 = Set() for ss in ps: ps1.add(ss | s1) ps |= ps1 return ps This is even more of a memory hog; however the algorithm is slightly more subtle so it's perhaps more valuable to have this in the library. Here's a generator version: def power(s): if len(s) == 0: yield Set() else: # Non-destructively choose a random element: x = Set([iter(s).next()]) for ss in power(s - x): yield ss yield ss | x I'm not totally happy with this -- it recurses for each element in s, creating a new set at each level that is s minus one element. I'd prefer to build the set up from the other end, like the first version. IOW I'd love to see your version. :-) The first power() example raises a point about the set API: the Set() constructor can be called without an iterable argument, but ImmutableSet() cannot. Maybe ImmutableSet() should be allowed too? It creates an immutable empty set. (Hm, this could be a singleton. __new__ could take care of that.) While comparing the various versions of power(), I also ran into an interesting bug in the code. While Set([1]) == ImmutableSet([1]), Set([Set([1])]) != Set([ImmutableSet([1])]). I have to investigate this. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Wed Aug 21 03:44:31 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 20 Aug 2002 22:44:31 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210205.g7L253l31163@pcp02138704pcs.reston01.va.comcast.net> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <009801c24890$565b26e0$1bf8a4d8@othello> <200208202141.g7KLfD726227@odiug.zope.com> <011001c248a0$99575580$1bf8a4d8@othello> <200208210205.g7L253l31163@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020821024431.GA28198@thyrsus.com> Guido van Rossum : > Hm, no. 'element' for a loop control variable seems too long (I'd be > happy with 'x' but Greg Wilson used 'element'). However I like > 'element' as the argument name because it can be used as a keyword > argument and then it's better spelled out in full. I think Greg > Wilson used 'item' most of the time; I prefer to be consistent and say > 'element' all the time since that's the accepted set terminology. Briefly reverting to type as a logician, Eric applauds. Sometimes I tell you not to sweat what the my ex-colleagues will think, but this is a case in which using mathemtically-correct terminology will *not* obscure the difference between stateless/mathematical reasoning and stateful/programming reasoning, and is therefore a good idea. -- Eric S. Raymond From esr@thyrsus.com Wed Aug 21 03:57:25 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 20 Aug 2002 22:57:25 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020821025725.GB28198@thyrsus.com> Guido van Rossum : > Here's a generator version: > > def power(s): > if len(s) == 0: > yield Set() > else: > # Non-destructively choose a random element: > x = Set([iter(s).next()]) > for ss in power(s - x): > yield ss > yield ss | x > > I'm not totally happy with this -- it recurses for each element in s, > creating a new set at each level that is s minus one element. I'd > prefer to build the set up from the other end, like the first > version. You're right, that is an ugly and opaque piece of code. Guido me lad, you have been led down the garden path by a dubious lust for recursive elegance. One might almost think you were a LISP hacker or something. > IOW I'd love to see your version. :-) Here's the pre-generator version I wrote using lists as the underlying representation. Should be trivially transformable into a generator version. I'd do it myself but I'm heads-down on bogofilter just now def powerset(base): "Compute the set of all subsets of a set." powerset = [] for n in xrange(2 ** len(base)): subset = [] for e in xrange(len(base)): if n & 2 ** e: subset.append(base[e]) powerset.append(subset) return powerset Are you slapping your forehead yet? :-) -- Eric S. Raymond From skip@pobox.com Wed Aug 21 04:06:21 2002 From: skip@pobox.com (Skip Montanaro) Date: Tue, 20 Aug 2002 22:06:21 -0500 Subject: [Python-Dev] Automatic flex interface for Python? In-Reply-To: <20020821005252.GB22413@thyrsus.com> References: <20020820232346.GA21177@thyrsus.com> <20020821005252.GB22413@thyrsus.com> Message-ID: <15715.941.27029.778363@gargle.gargle.HOWL> Eric> This is one of bogofilter's strengths. It already does this stuff Eric> at the lexical level using a speed-tuned flex scanner (I spent a Eric> lot of the development time watching token strings go by and Eric> tweaking the scanner rules to throw out cruft). This reminds me of something which tickled my interesting bone the other day. The SpamAssassin folks are starting to look at Flex for much faster regular expression matching in situations where large numbers of static re's must be matched. I wonder if using something like SciPy's weave tool would make it (relatively) painless to incorporate fairly high-speed scanners into Python programs. It seems like it would just be an extra layer of compilation for something like weave. Instead of inserting C code into a string, wrapping it with module sticky stuff and compiling it, you'd insert Flex rules into the string, call a slightly higher level function which calls flex to generate the scanner code and use a slightly different bit of module sticky stuff to make it callable from Python. Skip From esr@thyrsus.com Wed Aug 21 04:20:18 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 20 Aug 2002 23:20:18 -0400 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <15715.941.27029.778363@gargle.gargle.HOWL> References: <20020820232346.GA21177@thyrsus.com> <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> Message-ID: <20020821032018.GA29112@thyrsus.com> Skip Montanaro : > This reminds me of something which tickled my interesting bone the other > day. The SpamAssassin folks are starting to look at Flex for much faster > regular expression matching in situations where large numbers of static re's > must be matched. *snort* Took'em long enough. No, I shouldn't be snarky. Flex is only obvious to Unix old-timers -- the traditions that gave rise to it have fallen into desuetitude in the last decade. > ...insert Flex rules into the string, call a slightly higher level > function which calls flex to generate the scanner code and use a > slightly different bit of module sticky stuff to make it callable > from Python. Lexers are painful in Python. They hit the language in a weak spot created by the immutability of strings. I've found this an obstacle more than once, but then I'm a battle-scarred old compiler jock who attacks *everything* with lexers and parsers. -- Eric S. Raymond From guido@python.org Wed Aug 21 04:19:54 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 23:19:54 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Tue, 20 Aug 2002 21:46:08 EDT." <200208210146.g7L1k8X31138@pcp02138704pcs.reston01.va.comcast.net> References: <200208210146.g7L1k8X31138@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208210319.g7L3Js131755@pcp02138704pcs.reston01.va.comcast.net> > > Is there any particular reason BaseSet and basestring need to raise > > different exceptions on an attempt at instantiation ? > > Hm, I dunno. NotImplementedError was intended for this kind of use, > but TypeError also matches. I'll add an XXX for this. I found a good reason why it should be TypeError, so TypeError it is. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Wed Aug 21 04:25:52 2002 From: skip@pobox.com (Skip Montanaro) Date: Tue, 20 Aug 2002 22:25:52 -0500 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <20020821032018.GA29112@thyrsus.com> References: <20020820232346.GA21177@thyrsus.com> <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> <20020821032018.GA29112@thyrsus.com> Message-ID: <15715.2112.952920.736993@gargle.gargle.HOWL> >> ...insert Flex rules into the string, call a slightly higher level >> function which calls flex to generate the scanner code and use a >> slightly different bit of module sticky stuff to make it callable >> from Python. Eric> Lexers are painful in Python. They hit the language in a weak Eric> spot created by the immutability of strings. Yeah, that's why you inline what is essentially a .l file into your Python code. ;-) I'm actually here in Austin for a couple days visiting Eric Jones and the SciPy gang. Perhaps Eric and I can bat something out over lunch tomorrow... Skip From pinard@iro.umontreal.ca Wed Aug 21 03:59:46 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 20 Aug 2002 22:59:46 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] > Um, the notation is '|' and '&', not 'or' and 'and', and those are > what I learned in school. Seems pretty conventional to me (Greg > Wilson actually tried this out on unsuspecting newbies and found that > while '+' worked okay, '*' did not -- read the PEP). The very usual notation for me has been the big `U' for union and the same, upside-down, for intersection, but even now that Python supports Unicode, these are not Python operators _yet_. :-) I never saw `|' nor `&' in literature, except `|' which means "such that" in set comprehensions, as Pythoneers would be tempted to say! On the other hand, for programmers, `|' and `&' are rather natural and easy. Eric has offered the idea of adding Cartesian product, and despite the usual notation is a tall thin `X', maybe it would be nice reserving `*' for that? It might not be explicit enough, and besides, there are other circumstances in Algebra, not so far from sets, when one might need many multiplicative operators, so `*' would easily get over-used. -- François Pinard http://www.iro.umontreal.ca/~pinard From guido@python.org Wed Aug 21 04:47:11 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 23:47:11 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Tue, 20 Aug 2002 22:57:25 EDT." <20020821025725.GB28198@thyrsus.com> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> Message-ID: <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> > Here's the pre-generator version I wrote using lists as the underlying > representation. Should be trivially transformable into a generator > version. I'd do it myself but I'm heads-down on bogofilter just now > > def powerset(base): > "Compute the set of all subsets of a set." > powerset = [] > for n in xrange(2 ** len(base)): > subset = [] > for e in xrange(len(base)): > if n & 2 ** e: > subset.append(base[e]) > powerset.append(subset) > return powerset > > Are you slapping your forehead yet? :-) Yes! I didn't actually know that algorithm. Here's the generator version for sets (still requires a real set as input): def powerset(base): size = len(base) for n in xrange(2**size): subset = [] for e, x in enumerate(base): if n & 2**e: subset.append(x) yield Set(subset) I would like to write n & (1< 31. Now, for a set with that many elements, there's no hope that this will ever complete in finite time, but does that mean it shouldn't start? I could write 1L< References: <20020820232346.GA21177@thyrsus.com> <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> Message-ID: <200208210349.g7L3nLO31826@pcp02138704pcs.reston01.va.comcast.net> > I wonder if using something like SciPy's weave tool would make it > (relatively) painless to incorporate fairly high-speed scanners into > Python programs. I haven't given up on the re module for fast scanners (see Tim's note on the speed of tokenizing 20,000 messages in mere minutes). Note that the Bayes approach doesn't *need* a trick to apply many regexes in parallel to the text. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Aug 21 04:57:51 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 20 Aug 2002 23:57:51 -0400 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: Your message of "Tue, 20 Aug 2002 23:20:18 EDT." <20020821032018.GA29112@thyrsus.com> References: <20020820232346.GA21177@thyrsus.com> <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> <20020821032018.GA29112@thyrsus.com> Message-ID: <200208210357.g7L3vpl31876@pcp02138704pcs.reston01.va.comcast.net> > Lexers are painful in Python. They hit the language in a weak spot > created by the immutability of strings. I've found this an obstacle > more than once, but then I'm a battle-scarred old compiler jock who > attacks *everything* with lexers and parsers. I think you're exaggerating the problem, or at least underestimating the re module. The re module is pretty fast! Reading a file line-by-line is very fast in Python 2.3 with the new "for line in open(filename)" idiom. I just scanned nearly a megabyte of ugly data (a Linux kernel) in 0.6 seconds using the regex '\w+', finding 177,000 words. The regex (?:\d+|[a-zA-Z_]+) took 1 second, yielding 1 second, finding 190,000 words. I expect that the list creation (one hit at a time) took more time than the matching. --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz@pythoncraft.com Wed Aug 21 05:00:21 2002 From: aahz@pythoncraft.com (Aahz) Date: Wed, 21 Aug 2002 00:00:21 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210319.g7L3Js131755@pcp02138704pcs.reston01.va.comcast.net> References: <200208210146.g7L1k8X31138@pcp02138704pcs.reston01.va.comcast.net> <200208210319.g7L3Js131755@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020821040021.GA22655@panix.com> On Tue, Aug 20, 2002, Guido van Rossum wrote: > >>> Is there any particular reason BaseSet and basestring need to raise >>> different exceptions on an attempt at instantiation ? >> >> Hm, I dunno. NotImplementedError was intended for this kind of use, >> but TypeError also matches. I'll add an XXX for this. > > I found a good reason why it should be TypeError, so TypeError it is. Mind telling us? (I've always used NotImplementedError, so I'm curious.) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From guido@python.org Wed Aug 21 05:02:09 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 21 Aug 2002 00:02:09 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Wed, 21 Aug 2002 00:00:21 EDT." <20020821040021.GA22655@panix.com> References: <200208210146.g7L1k8X31138@pcp02138704pcs.reston01.va.comcast.net> <200208210319.g7L3Js131755@pcp02138704pcs.reston01.va.comcast.net> <20020821040021.GA22655@panix.com> Message-ID: <200208210402.g7L429S31962@pcp02138704pcs.reston01.va.comcast.net> > Mind telling us? (I've always used NotImplementedError, so I'm > curious.) OK, from the checkins: - Changed the NotImplementedError in BaseSet.__init__ to TypeError, both for consistency with basestring() and because we have to use TypeError when denying Set.__hash__. Together those provide sufficient evidence that an unimplemented method needs to raise TypeError. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Wed Aug 21 05:12:52 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 21 Aug 2002 00:12:52 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020821041252.GA30489@thyrsus.com> Guido van Rossum : > > Are you slapping your forehead yet? :-) > > Yes! I didn't actually know that algorithm. I thought it up myself. Which is funny, since to get there you have to think like a hardware engineer rather than a logician. My brain was definitely out of character that day. > This is a generator that yields a series of lists whose values are the > items of base. And again, like cartesian product, it's now more a > generator thing than a set thing. I don't care where it lives, really. I just like the concision of being able to say foo.powerset(). Not that I've used this yet, but I know one algorithm for which it would be helpful. Another one I invented, actually, back when I really was a mathematician -- a closed form for sums of certain categories of probability distributions. I called it the Dungeon Dice Theorem. Never published it. > BTW, the correctness of all my versions trivially derives from the > correctness of your version -- each is a very simple transformation of > the previous one. My mentor Lambert Meertens calls this process > Algorithmics (and has developed a mathematical notation and theory for > program transformations). Web pointer? -- Eric S. Raymond From aahz@pythoncraft.com Wed Aug 21 05:13:29 2002 From: aahz@pythoncraft.com (Aahz) Date: Wed, 21 Aug 2002 00:13:29 -0400 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <200208210357.g7L3vpl31876@pcp02138704pcs.reston01.va.comcast.net> References: <20020820232346.GA21177@thyrsus.com> <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> <20020821032018.GA29112@thyrsus.com> <200208210357.g7L3vpl31876@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020821041329.GA25548@panix.com> I'm mildly curious why nobody has suggested mxTextTools or anything like that. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From esr@thyrsus.com Wed Aug 21 05:16:47 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 21 Aug 2002 00:16:47 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020821041647.GB30489@thyrsus.com> Fran=E7ois Pinard : > I never saw `|' nor `&' in literature, except `|' which means "such tha= t" > in set comprehensions, as Pythoneers would be tempted to say! On the > other hand, for programmers, `|' and `&' are rather natural and easy. Now that I think of it, I've never seen a mathematician use this at all. But I agree that it's a good choice for programmersl =20 > Eric has offered the idea of adding Cartesian product, and despite the > usual notation is a tall thin `X', maybe it would be nice reserving `*' > for that?=20 Mildly in favor, but I wouldn't cry if it didn't happen. --=20 Eric S. Raymond From tim_one@email.msn.com Wed Aug 21 05:22:20 2002 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 21 Aug 2002 00:22:20 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208202049.g7KKnPJ26019@odiug.zope.com> Message-ID: [Guido] > ... > I also believe that kjbuckets maintains its data in a sorted order, > which is unnecessary for sets -- a hash table is much faster. It's Zope's BTrees that maintain sorted order. kjbuckets consists of 3 variations ("set", "dict", "graph") of a rather complicated hash table, driven by the need for the graph flavor to associate multiple values with a single key. The kj hash table slots each contain a small contiguous vector, and then it starts to get complicated . > After all we use a very fast hash table implementation to represent sets. > (The only improvement would be that we could save maybe 4 bytes per > hash table entry because we don't need a value pointer.) The set flavor of kjbucket does skip the value pointer. I suppose it could have supported multisets too ("bags" -- duplicate keys allowed), but it doesn't (they can be faked via a kjgraph using dummy values, though -- much like faking a set in Python via a dict with dummy values!). > ... > The sets module does not implement analogous operations directly in > Python. Almost all the implementation work is done by the dict > implementation. kjbuckets has a rich set of operations coded in C, including intersection, graph composition, and transitive closure. If the sets module were to sprout those operations too, they would have to look like the sets intersection implementation (nests of Python loops and ifs), as Python dicts don't support primitives able to polish off large chunks of the necessary work at C speed. Aaron's claim that kjbuckets can do those kinds of things 10x faster than Python code seems quite safe . > ... > kjbuckets may be nice, but adding it to the core would add a serious > new maintenance burden for the core developers. I don't see anyone > raising their hand to help out here. Not me. It's about 3500 lines of hairy code, and you wouldn't like some of the interface decisions it made. For example, a_kjset[3] = 5 adds 3 to the set and ignores 5. Even Aaron blushes at that one , but some of the others would be much harder to sort out. It would be a major undertaking to do so. From jafo@tummy.com Wed Aug 21 05:26:01 2002 From: jafo@tummy.com (Sean Reifschneider) Date: Tue, 20 Aug 2002 22:26:01 -0600 Subject: [Python-Dev] Sort() returning sorted list Message-ID: <20020821042601.GA9076@tummy.com> It would be nice to have, at times, sort() return the sorted list, but doing so could lead people to believe that returns a sorted copy of the list. What about if we could do "list.sort(returnAfterSorting = 1)" or something? It'd be nice if it were short and sweet, but it should be clear what it's doing too... Sean -- Passionate hatred can give meaning and purpose to an empty life. -- Eric Hoffer Sean Reifschneider, Inimitably Superfluous tummy.com - Linux Consulting since 1995. Qmail, KRUD, Firewalls, Python From tim_one@email.msn.com Wed Aug 21 05:50:39 2002 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 21 Aug 2002 00:50:39 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020821011429.GE22413@thyrsus.com> Message-ID: [Eric S. Raymond] > VM, dude, VM is your friend. I thought this through carefully. The > size of bogofilter's working set isn't limited by core. Do you expect that to be an issue? When I built a database from 20,000 messages, the whole thing fit in a Python dict consuming about 10MB. That's all. As is always the case in this kind of thing, about half the tokens are utterly useless since they appear only once in only one message (think of things like misspellings and message ids here -- about half the tokens generated will be "obviously useless", although the useless won't become obvious until, e.g., a month passes and you never seen the token again). In addition, this is a statistical-sampling method, and 20,000 messages is a very large sample. I concluded that, in practice, and since we do have ways to identify and purge useless tokens, 5MB is a reasonable upper bound on the size of this thing. It doesn't fit in my L2 cache, but I'd need a magnifying glass to find it in my RAM. > And because it's a B-tree variant, the access frequency will be proportional > to log2 of the wordlist size I don't believe Judy is faster than Python string-keyed dicts at the sizes I'm expecting (I've corresponded with Douglas about that too, and his timing data has a hard time disagreeing ). > and the patterns will be spatially bursty. Why? Do you sort tokens before looking them up? Else I don't see a reason to expect that, from one lookup to the next, the paths from root to leaf will enjoy spatial overlap beyond the root node. > This is a memory access pattern that plays nice with an LRU pager. Well, as I said before, all the evidence I've seen says the scoring time for a message is near-trivial (including the lookup times), even in pure Python. It's only the parsing that I'm still worried about, and I may yet confess a bad case of Flex envy. > I'm working on a simpler solution, one which might have a Pythonic spinoff. > Stay tuned. I figure this means something simpler than a BTree under ZODB. If so, you should set yourself a tougher challenge . From tim_one@email.msn.com Wed Aug 21 06:11:03 2002 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 21 Aug 2002 01:11:03 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido, eventually arrives at ...] > def powerset(base): > pairs = list(enumerate(base)) > size = len(pairs) > for n in xrange(2**size): > subset = [] > for e, x in pairs: > if n & 2**e: > subset.append(x) > yield subset Now let's rewrite that in modern Python : def powerset(base): pairs = [(2**i, x) for i, x in enumerate(base)] for i in xrange(2**len(pairs)): yield [x for (mask, x) in pairs if i & mask] > This is a generator that yields a series of lists whose values are the > items of base. And again, like cartesian product, it's now more a > generator thing than a set thing. Generators are great for constructing families of combinatorial objects, BTW. They can be huge, and in algorithms that use them as inputs for searches, you can often expect not to need more than a miniscule fraction of the entire family, but can't predict how many you'll need. Generators are perfect then. > BTW, the correctness of all my versions trivially derives from the > correctness of your version -- each is a very simple transformation of > the previous one. My mentor Lambert Meertens calls this process > Algorithmics (and has developed a mathematical notation and theory for > program transformations). Was he interested in that before his sabbatical with the SETL folks? The SETL project produced lots of great research in that area, largely driven by the desire to help ultra-high-level SETL programs finish in their lifetimes. From guido@python.org Wed Aug 21 06:14:27 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 21 Aug 2002 01:14:27 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Tue, 20 Aug 2002 23:47:11 EDT." Message-ID: <200208210514.g7L5ER432127@pcp02138704pcs.reston01.va.comcast.net> A few transformations down the road, here's a 4-line powerset() generator: def powerset(base): pairs = [(2**i, x) for i, x in enumerate(base)] for n in xrange(2**len(pairs)): yield [x for m, x in pairs if m&n] --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Aug 21 06:23:15 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 21 Aug 2002 01:23:15 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Wed, 21 Aug 2002 01:11:03 EDT." References: Message-ID: <200208210523.g7L5NFg32195@pcp02138704pcs.reston01.va.comcast.net> > Now let's rewrite that in modern Python : > > def powerset(base): > pairs = [(2**i, x) for i, x in enumerate(base)] > for i in xrange(2**len(pairs)): > yield [x for (mask, x) in pairs if i & mask] Honest, I didn't see this before I posted mine. :-) > > This is a generator that yields a series of lists whose values are the > > items of base. And again, like cartesian product, it's now more a > > generator thing than a set thing. > > Generators are great for constructing families of combinatorial > objects, BTW. They can be huge, and in algorithms that use them as > inputs for searches, you can often expect not to need more than a > miniscule fraction of the entire family, but can't predict how many > you'll need. Generators are perfect then. Yup. That's why I strove for these to be generators. > > BTW, the correctness of all my versions trivially derives from the > > correctness of your version -- each is a very simple > > transformation of the previous one. My mentor Lambert Meertens > > calls this process Algorithmics (and has developed a mathematical > > notation and theory for program transformations). > > Was he interested in that before his sabbatical with the SETL folks? Dunno. Maybe it sparked his interest. > The SETL project produced lots of great research in that area, > largely driven by the desire to help ultra-high-level SETL programs > finish in their lifetimes. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one@email.msn.com Wed Aug 21 06:25:42 2002 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 21 Aug 2002 01:25:42 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210514.g7L5ER432127@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido, retroactively channeling Tim] > A few transformations down the road, here's a 4-line powerset() generator: > > def powerset(base): > pairs = [(2**i, x) for i, x in enumerate(base)] > for n in xrange(2**len(pairs)): > yield [x for m, x in pairs if m&n] Mine used one less local variable, so is more cache-friendly . From guido@python.org Wed Aug 21 06:26:17 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 21 Aug 2002 01:26:17 -0400 Subject: [Python-Dev] Sort() returning sorted list In-Reply-To: Your message of "Tue, 20 Aug 2002 22:26:01 MDT." <20020821042601.GA9076@tummy.com> References: <20020821042601.GA9076@tummy.com> Message-ID: <200208210526.g7L5QH332234@pcp02138704pcs.reston01.va.comcast.net> > It would be nice to have, at times, sort() return the sorted list, > but doing so could lead people to believe that returns a sorted copy of > the list. What about if we could do "list.sort(returnAfterSorting = 1)" > or something? It'd be nice if it were short and sweet, but it should > be clear what it's doing too... Nah, this is not provided precisely for the reason you give. You can write your own utility easily enough: def sort(L): L.sort() return L --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Wed Aug 21 06:35:56 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 21 Aug 2002 01:35:56 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: References: <20020821011429.GE22413@thyrsus.com> Message-ID: <20020821053556.GA700@thyrsus.com> Tim Peters : > [Eric S. Raymond] > > VM, dude, VM is your friend. I thought this through carefully. The > > size of bogofilter's working set isn't limited by core. > > Do you expect that to be an issue? When I built a database from 20,000 > messages, the whole thing fit in a Python dict consuming about 10MB. Hm, that's a bit smaller than I would have thought, but the order of magnitude I was expecting. > That's > all. As is always the case in this kind of thing, about half the tokens are > utterly useless since they appear only once in only one message (think of > things like misspellings and message ids here -- about half the tokens > generated will be "obviously useless", although the useless won't become > obvious until, e.g., a month passes and you never seen the token again). Recognition features should age! Wow! That's a good point! With the age counter being reset when they're recognized. > > and the patterns will be spatially bursty. > > Why? Do you sort tokens before looking them up? I thought part of the point of the method was that you get sorting for free because of the way elements are inserted. > Else I don't see a reason > to expect that, from one lookup to the next, the paths from root to leaf > will enjoy spatial overlap beyond the root node. No, but think about how the pointer in a binary search moves. It's spatially bursty, Memory accesses frequencies for repeated binary searches will be a sum of bursty signals, analogous to the way network traffic volumes look in the time domain. In fact the graph of memory adress vs. number of accesses is gonna win up looking an awful lot like 1/f noise, I think. *Not* evenly distributed; something there for LRU to weork with. > > I'm working on a simpler solution, one which might have a Pythonic spinoff. > > Stay tuned. > > I figure this means something simpler than a BTree under ZODB. If so, you > should set yourself a tougher challenge . What I'm starting to test now is a refactoring of the program where it spawn a daemon version of itself first time it's called. The daemon eats the wordlists and stays in core fielding requests from subsequent program runs. Basically an answer to "how you call bogofilter 1K times a day from procmail without bringing your disks to their knees" problem" -- persistence on the cheap. Thing is that the solution to this problem is very generic. Might turn into a Python framework. -- Eric S. Raymond From greg@cosc.canterbury.ac.nz Wed Aug 21 06:36:26 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 21 Aug 2002 17:36:26 +1200 (NZST) Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <20020821010517.GD22413@thyrsus.com> Message-ID: <200208210536.g7L5aQY07215@oma.cosc.canterbury.ac.nz> "Eric S. Raymond" : > The | & one is distinctly less common than either, at least among > mathematicians; I think EEs and suchlike may use it more than we do. I'm surprised that mathematicians use | and & at all. I had always assumed that these were invented by the programming community, being available ASCII characters used in programming languages, and that mathematicians wouldn't ever use them if they had a choice. But maybe I'm wrong! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Wed Aug 21 06:41:11 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 21 Aug 2002 17:41:11 +1200 (NZST) Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208210541.g7L5fBS07245@oma.cosc.canterbury.ac.nz> Guido van Rossum : > def product(s1, s2): > cp = Set() > for x in s1: > for y in s2: > cp.add((x, y)) > return cp Oh, no. Someone is bound to want set comprehensions, now... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Wed Aug 21 06:46:31 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 21 Aug 2002 17:46:31 +1200 (NZST) Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Message-ID: <200208210546.g7L5kVP07306@oma.cosc.canterbury.ac.nz> pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard): > Eric has offered the idea of adding Cartesian product, and despite the > usual notation is a tall thin `X', maybe it would be nice reserving > `*' for that? Maybe '%'? It looks a bit X-ish. Or if Python ever gets a dedicated matrix-multiplication operator, maybe that could be reused. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From zack@codesourcery.com Wed Aug 21 06:58:25 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Tue, 20 Aug 2002 22:58:25 -0700 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020821053556.GA700@thyrsus.com> References: <20020821011429.GE22413@thyrsus.com> <20020821053556.GA700@thyrsus.com> Message-ID: <20020821055825.GP29858@codesourcery.com> On Wed, Aug 21, 2002 at 01:35:56AM -0400, Eric S. Raymond wrote: > > What I'm starting to test now is a refactoring of the program where it > spawn a daemon version of itself first time it's called. The daemon > eats the wordlists and stays in core fielding requests from subsequent > program runs. Basically an answer to "how you call bogofilter 1K > times a day from procmail without bringing your disks to their knees" > problem" -- persistence on the cheap. For use at ISPs, the daemon should be able to field requests from lots of different users, maintaining one unified word list. Without needing any access whatsoever to user home directories. zw From esr@thyrsus.com Wed Aug 21 07:00:26 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 21 Aug 2002 02:00:26 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210536.g7L5aQY07215@oma.cosc.canterbury.ac.nz> References: <20020821010517.GD22413@thyrsus.com> <200208210536.g7L5aQY07215@oma.cosc.canterbury.ac.nz> Message-ID: <20020821060026.GA1771@thyrsus.com> Greg Ewing : > > The | & one is distinctly less common than either, at least among > > mathematicians; I think EEs and suchlike may use it more than we do. > > I'm surprised that mathematicians use | and & at all. I had always > assumed that these were invented by the programming community, being > available ASCII characters used in programming languages, and that > mathematicians wouldn't ever use them if they had a choice. But maybe > I'm wrong! Your post crossed one of mine in which, on reflection, I said I'd never seen a mathematician use these. Not even me, not when I'm doing math anyway. I still think in Birkhoff's lattice-theory notation. Nevertheless I'm quite comfortable with | & when programming. -- Eric S. Raymond From esr@thyrsus.com Wed Aug 21 07:03:33 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 21 Aug 2002 02:03:33 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210546.g7L5kVP07306@oma.cosc.canterbury.ac.nz> References: <200208210546.g7L5kVP07306@oma.cosc.canterbury.ac.nz> Message-ID: <20020821060333.GB1771@thyrsus.com> Greg Ewing : > pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard): > > > Eric has offered the idea of adding Cartesian product, and despite the > > usual notation is a tall thin `X', maybe it would be nice reserving > > `*' for that? > > Maybe '%'? It looks a bit X-ish. Ugh. *No.* -1 on that, I'd look at % and see a string format operator. Suddenly I like Francois's suggestion a lot better. -- Eric S. Raymond From jafo@tummy.com Wed Aug 21 07:13:16 2002 From: jafo@tummy.com (Sean Reifschneider) Date: Wed, 21 Aug 2002 00:13:16 -0600 Subject: [Python-Dev] Sort() returning sorted list In-Reply-To: <200208210526.g7L5QH332234@pcp02138704pcs.reston01.va.comcast.net> References: <20020821042601.GA9076@tummy.com> <200208210526.g7L5QH332234@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020821061316.GA9418@tummy.com> On Wed, Aug 21, 2002 at 01:26:17AM -0400, Guido van Rossum wrote: >def sort(L): > L.sort() > return L Yeah, that's not QUITE as bad as having to write my own string-handling routines. ;-/ So, there's really no place in Python itself for this? Like, list.sort_inplace() and list.sort_copy(), both of which return something because it's obvious what they do? Sean -- A ship in port is safe, but that is not what ships are for. -- Rear Admiral Grace Murray Hopper Sean Reifschneider, Inimitably Superfluous tummy.com - Linux Consulting since 1995. Qmail, KRUD, Firewalls, Python From esr@thyrsus.com Wed Aug 21 07:22:26 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 21 Aug 2002 02:22:26 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020821055825.GP29858@codesourcery.com> References: <20020821011429.GE22413@thyrsus.com> <20020821053556.GA700@thyrsus.com> <20020821055825.GP29858@codesourcery.com> Message-ID: <20020821062226.GC1771@thyrsus.com> Zack Weinberg : > On Wed, Aug 21, 2002 at 01:35:56AM -0400, Eric S. Raymond wrote: > > > > What I'm starting to test now is a refactoring of the program where it > > spawn a daemon version of itself first time it's called. The daemon > > eats the wordlists and stays in core fielding requests from subsequent > > program runs. Basically an answer to "how you call bogofilter 1K > > times a day from procmail without bringing your disks to their knees" > > problem" -- persistence on the cheap. > > For use at ISPs, the daemon should be able to field requests from lots > of different users, maintaining one unified word list. Without > needing any access whatsoever to user home directories. I'm on it. The following is not yet working, but it's a straight road to get there.... There is a public spam-checker port. Your client program sends it packets consisting of a list of header token counts. You can send lots of these blocks; each one has to be under the maximum atomic-message size for sockets (I think that's 32K). The server accumulates the frequency counts you ship it until you say "OK, what is it?" Does the Bayes test. Ships you back a result. -- Eric S. Raymond From tdelaney@avaya.com Wed Aug 21 07:26:24 2002 From: tdelaney@avaya.com (Delaney, Timothy) Date: Wed, 21 Aug 2002 16:26:24 +1000 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib Message-ID: > From: Guido van Rossum [mailto:guido@python.org] > > OK, from the checkins: > > - Changed the NotImplementedError in BaseSet.__init__ to TypeError, > both for consistency with basestring() and because we have to use > TypeError when denying Set.__hash__. Together those provide > sufficient evidence that an unimplemented method needs to raise > TypeError. Hmm ... is there a case that NotImplementedError should be a subclass of TypeError? Conceptually it would make sense (this *type* does not implement this method). Of course, it would probably also break code ... Tim Delaney From md9ms@mdstud.chalmers.se Wed Aug 21 09:16:55 2002 From: md9ms@mdstud.chalmers.se (Martin =?ISO-8859-1?Q?Sj=F6gren?=) Date: 21 Aug 2002 10:16:55 +0200 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <20020821010517.GD22413@thyrsus.com> References: <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> <200208210046.g7L0kN605969@oma.cosc.canterbury.ac.nz> <20020821010517.GD22413@thyrsus.com> Message-ID: <1029917815.581.3.camel@winterfell> --=-xdg8TO/yjz+k7Gej7Y+U Content-Type: text/plain Content-Transfer-Encoding: quoted-printable ons 2002-08-21 klockan 03.05 skrev Eric S. Raymond: > For historical reasons, there are three different notations for Boolean > algebra in common use. You're describing the one derived from set theor= y.=20 > I personally favor the one derived from lattice algebra; the distinctive > feature of that one is the pointy and &/| operators that look like /\ an= d > \/. The third uses | and &. Uhm, what about + and juxtaposition? They are quite common at least here in Sweden, for boolean algebra. Martin --=-xdg8TO/yjz+k7Gej7Y+U Content-Type: application/pgp-signature; name=signature.asc Content-Description: Detta =?ISO-8859-1?Q?=E4r?= en digitalt signerad meddelandedel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQA9Y0x3GpBPiZwE9FYRAlJDAJ9mnkltoZ/Is/4V/jc23NAo7TwYJACgqoMV VVVruTXyO9DFHMPm93jkcxM= =JTmU -----END PGP SIGNATURE----- --=-xdg8TO/yjz+k7Gej7Y+U-- From fredrik@pythonware.com Wed Aug 21 09:17:44 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 21 Aug 2002 10:17:44 +0200 Subject: [Python-Dev] Re: Automatic flex interface for Python? References: <20020820232346.GA21177@thyrsus.com> <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> <20020821032018.GA29112@thyrsus.com> <200208210357.g7L3vpl31876@pcp02138704pcs.reston01.va.comcast.net> <20020821041329.GA25548@panix.com> Message-ID: <015901c248eb$97772ec0$0900a8c0@spiff> aahz wrote: > I'm mildly curious why nobody has suggested mxTextTools or anything = like > that. I'm mildly curious why mxTextTools proponents=20 From esr@thyrsus.com Wed Aug 21 09:28:32 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 21 Aug 2002 04:28:32 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <1029917815.581.3.camel@winterfell> References: <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> <200208210046.g7L0kN605969@oma.cosc.canterbury.ac.nz> <20020821010517.GD22413@thyrsus.com> <1029917815.581.3.camel@winterfell> Message-ID: <20020821082832.GA7256@thyrsus.com> --VS++wcV0S1rZb1Fb Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Martin Sj=F6gren : > Uhm, what about + and juxtaposition? They are quite common at least here > in Sweden, for boolean algebra. Is it + for disjunction and juxtaposition for conjunction, or the other way around? Not that I've ever seen either variant. --=20 Eric S. Raymond --VS++wcV0S1rZb1Fb Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE9Y08vrfUW04Qh8RwRAvDZAKC5dZsMEy9+ibBHptKdXUHFatjklQCg5MNJ +gj+1K30qGy6MRj1kWgYQ0Q= =i0Jh -----END PGP SIGNATURE----- --VS++wcV0S1rZb1Fb-- From fredrik@pythonware.com Wed Aug 21 09:34:49 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 21 Aug 2002 10:34:49 +0200 Subject: [Python-Dev] Re: Automatic flex interface for Python? References: <20020820232346.GA21177@thyrsus.com> <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> <20020821032018.GA29112@thyrsus.com> <200208210357.g7L3vpl31876@pcp02138704pcs.reston01.va.comcast.net> <20020821041329.GA25548@panix.com> <015901c248eb$97772ec0$0900a8c0@spiff> Message-ID: <018e01c248ed$9ffc0d20$0900a8c0@spiff> > I'm mildly curious why mxTextTools proponents=20 eh? why did my mailer send that mail? what I was trying to say is that I'm mildly curious why people tend to treat mxTextTools like some kind of silver bullet, without actually comparing it to a well- written regular expression. I've heard from people who've spent days rewriting their application, only to find that the resulting program was slower. (as Guido noted, for problems like this, the overhead isn't so much in the engine itself, as in the effort needed to create Python data structures...) From md9ms@mdstud.chalmers.se Wed Aug 21 09:36:07 2002 From: md9ms@mdstud.chalmers.se (Martin =?ISO-8859-1?Q?Sj=F6gren?=) Date: 21 Aug 2002 10:36:07 +0200 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <20020821082832.GA7256@thyrsus.com> References: <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> <200208210046.g7L0kN605969@oma.cosc.canterbury.ac.nz> <20020821010517.GD22413@thyrsus.com> <1029917815.581.3.camel@winterfell> <20020821082832.GA7256@thyrsus.com> Message-ID: <1029918967.582.13.camel@winterfell> --=-TNMb8y5tNj8ZwhlXqska Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable ons 2002-08-21 klockan 10.28 skrev Eric S. Raymond: > Martin Sj=C3=B6gren : > > Uhm, what about + and juxtaposition? They are quite common at least her= e > > in Sweden, for boolean algebra. >=20 > Is it + for disjunction and juxtaposition for conjunction, or the other > way around? Not that I've ever seen either variant. I've often seen it in the context of electronics. a+1 =3D 1, a0 =3D 0 and s= o on. That is, + is disjunction and juxtaposition (or a multiplication dot) is conjunction. Hmm, I just realized that I've also seen it in an American book on discrete maths, so it's not just us Swedes ;) Martin --=-TNMb8y5tNj8ZwhlXqska Content-Type: application/pgp-signature; name=signature.asc Content-Description: Detta =?ISO-8859-1?Q?=E4r?= en digitalt signerad meddelandedel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQA9Y1D3GpBPiZwE9FYRAukUAJ9F28PGs9PN84eAixyxCYJIINjM4wCeL03P FRGiWJNA3s6V6MyUViePwaY= =a708 -----END PGP SIGNATURE----- --=-TNMb8y5tNj8ZwhlXqska-- From esr@thyrsus.com Wed Aug 21 10:05:27 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 21 Aug 2002 05:05:27 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <1029918967.582.13.camel@winterfell> References: <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> <200208210046.g7L0kN605969@oma.cosc.canterbury.ac.nz> <20020821010517.GD22413@thyrsus.com> <1029917815.581.3.camel@winterfell> <20020821082832.GA7256@thyrsus.com> <1029918967.582.13.camel@winterfell> Message-ID: <20020821090527.GA8346@thyrsus.com> --BOKacYhQ+x31HxR3 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Martin Sj=F6gren : > > Is it + for disjunction and juxtaposition for conjunction, or the other > > way around? Not that I've ever seen either variant. >=20 > I've often seen it in the context of electronics. a+1 =3D 1, a0 =3D 0 and= so > on. That is, + is disjunction and juxtaposition (or a multiplication > dot) is conjunction. Makes sense. Hardware designers care a lot about reduction to disjunctive normal form. Much more than logicians do, actually. =20 > Hmm, I just realized that I've also seen it in an American book on > discrete maths, so it's not just us Swedes ;) Odd that I haven't encountered it. --=20 Eric S. Raymond --BOKacYhQ+x31HxR3 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE9Y1fWrfUW04Qh8RwRAhwOAKD7mNAy4dJ/sAwooPRsYElWr6ecPACgqIfs Ee8290+tGB+Kbjkq7Q2JyNs= =Y4Mq -----END PGP SIGNATURE----- --BOKacYhQ+x31HxR3-- From esr@snark.thyrsus.com Wed Aug 21 11:30:35 2002 From: esr@snark.thyrsus.com (Eric S. Raymond) Date: Wed, 21 Aug 2002 06:30:35 -0400 Subject: [Python-Dev] Embarassed in Malvern Message-ID: <200208211030.g7LAUZv13032@snark.thyrsus.com> Apparently I was hallucinating when I thought these has been released in open source. Aaarrgghh. Well, they had a butt-ugly API anyway. I'll replace the with Damian Ivereigh's libredblack in 0.3. -- Eric S. Raymond Government should be weak, amateurish and ridiculous. At present, it fulfills only a third of the role. -- Edward Abbey From barry@python.org Wed Aug 21 12:57:45 2002 From: barry@python.org (Barry A. Warsaw) Date: Wed, 21 Aug 2002 07:57:45 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 References: <20020821011429.GE22413@thyrsus.com> <20020821053556.GA700@thyrsus.com> <20020821055825.GP29858@codesourcery.com> Message-ID: <15715.32825.611885.57351@anthem.wooz.org> >>>>> "ZW" == Zack Weinberg writes: ZW> For use at ISPs, the daemon should be able to field requests ZW> from lots of different users, maintaining one unified word ZW> list. Without needing any access whatsoever to user home ZW> directories. An approach like this certainly makes sense for a mailing list server, especially when all the lists are roughly about the same topic. Even without that, I suspect that spam across lists all looks the same, while non-spam will differ so there may be list/site organizations you can exploit here. -Barry From aahz@pythoncraft.com Wed Aug 21 13:28:24 2002 From: aahz@pythoncraft.com (Aahz) Date: Wed, 21 Aug 2002 08:28:24 -0400 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <018e01c248ed$9ffc0d20$0900a8c0@spiff> References: <20020820232346.GA21177@thyrsus.com> <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> <20020821032018.GA29112@thyrsus.com> <200208210357.g7L3vpl31876@pcp02138704pcs.reston01.va.comcast.net> <20020821041329.GA25548@panix.com> <015901c248eb$97772ec0$0900a8c0@spiff> <018e01c248ed$9ffc0d20$0900a8c0@spiff> Message-ID: <20020821122824.GA16740@panix.com> On Wed, Aug 21, 2002, Fredrik Lundh wrote: > > > I'm mildly curious why mxTextTools proponents > > eh? why did my mailer send that mail? what I was trying to say > is that I'm mildly curious why people tend to treat mxTextTools like > some kind of silver bullet, without actually comparing it to a well- > written regular expression. > > I've heard from people who've spent days rewriting their application, > only to find that the resulting program was slower. Okay, so that's one datapoint. I've never actually used mxTextTools; I'm mostly going by comments Tim Peters has made in the past suggesting that regex tools are poor for parsing. Since he's the one saying that regex is fast enough this time, I figured it'd be an appropriate time to throw up a question. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From guido@python.org Wed Aug 21 13:39:44 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 21 Aug 2002 08:39:44 -0400 Subject: [Python-Dev] Sort() returning sorted list In-Reply-To: Your message of "Wed, 21 Aug 2002 00:13:16 MDT." <20020821061316.GA9418@tummy.com> References: <20020821042601.GA9076@tummy.com> <200208210526.g7L5QH332234@pcp02138704pcs.reston01.va.comcast.net> <20020821061316.GA9418@tummy.com> Message-ID: <200208211239.g7LCdjF00588@pcp02138704pcs.reston01.va.comcast.net> > So, there's really no place in Python itself for this? Like, > list.sort_inplace() and list.sort_copy(), both of which return something > because it's obvious what they do? No, TOOWTDI. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Aug 21 13:41:05 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 21 Aug 2002 08:41:05 -0400 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Wed, 21 Aug 2002 16:26:24 +1000." References: Message-ID: <200208211241.g7LCf5000601@pcp02138704pcs.reston01.va.comcast.net> > Hmm ... is there a case that NotImplementedError should be a > subclass of TypeError? Conceptually it would make sense (this *type* > does not implement this method). I think you're overthinking this. NotImplementedError is fine for code that wants to send that particular message to the user. We're playing with TypeError here because we're trying to be close to the metal. --Guido van Rossum (home page: http://www.python.org/~guido/) From gmcm@hypernet.com Wed Aug 21 14:31:36 2002 From: gmcm@hypernet.com (Gordon McMillan) Date: Wed, 21 Aug 2002 09:31:36 -0400 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <018e01c248ed$9ffc0d20$0900a8c0@spiff> Message-ID: <3D635DF8.9480.90CBCDDD@localhost> On 21 Aug 2002 at 10:34, Fredrik Lundh wrote: > eh? why did my mailer send that mail? what I was > trying to say is that I'm mildly curious why people > tend to treat mxTextTools like some kind of silver > bullet, without actually comparing it to a well- > written regular expression. > > I've heard from people who've spent days rewriting > their application, only to find that the resulting > program was slower. > > (as Guido noted, for problems like this, the > overhead isn't so much in the engine itself, as in > the effort needed to create Python data > structures...) mxTextTools lets (encourages?) you to break all the rules about lex -> parse. If you can (& want to) put a good deal of the "parse" stuff into the scanning rules, you can get a speed advantage. You're also not constrained by the rules of BNF, if you choose to see that as an advantage :-). My one successful use of mxTextTools came after using SPARK to figure out what I actually needed in my AST, and realizing that the ambiguities in the grammar didn't matter in practice, so I could produce an almost-AST directly. -- Gordon http://www.mcmillan-inc.com/ From gmcm@hypernet.com Wed Aug 21 14:36:44 2002 From: gmcm@hypernet.com (Gordon McMillan) Date: Wed, 21 Aug 2002 09:36:44 -0400 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <20020821122824.GA16740@panix.com> References: <018e01c248ed$9ffc0d20$0900a8c0@spiff> Message-ID: <3D635F2C.4839.90D07DA3@localhost> On 21 Aug 2002 at 8:28, Aahz wrote: > ... I've never actually used mxTextTools; I'm > mostly going by comments Tim Peters has made in the > past suggesting that regex tools are poor for > parsing. They suck for parsing. They excel for lexing, however. -- Gordon http://www.mcmillan-inc.com/ From skip@pobox.com Wed Aug 21 14:47:11 2002 From: skip@pobox.com (Skip Montanaro) Date: Wed, 21 Aug 2002 08:47:11 -0500 Subject: [Python-Dev] Automatic flex interface for Python? In-Reply-To: <200208210349.g7L3nLO31826@pcp02138704pcs.reston01.va.comcast.net> References: <20020820232346.GA21177@thyrsus.com> <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> <200208210349.g7L3nLO31826@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15715.39391.629762.256170@gargle.gargle.HOWL> Guido> I haven't given up on the re module for fast scanners (see Tim's Guido> note on the speed of tokenizing 20,000 messages in mere minutes). Guido> Note that the Bayes approach doesn't *need* a trick to apply many Guido> regexes in parallel to the text. Right. I'm thinking of it in situations where you do need such tricks. SpamAssassin is one such place. I think Eric has an application (quickly tokenizing the data produced by an external program, where the data can run into several hundreds of thousands of lines) where this might be beneficial as well. Skip From skip@pobox.com Wed Aug 21 16:00:45 2002 From: skip@pobox.com (Skip Montanaro) Date: Wed, 21 Aug 2002 10:00:45 -0500 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <20020821122824.GA16740@panix.com> References: <20020820232346.GA21177@thyrsus.com> <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> <20020821032018.GA29112@thyrsus.com> <200208210357.g7L3vpl31876@pcp02138704pcs.reston01.va.comcast.net> <20020821041329.GA25548@panix.com> <015901c248eb$97772ec0$0900a8c0@spiff> <018e01c248ed$9ffc0d20$0900a8c0@spiff> <20020821122824.GA16740@panix.com> Message-ID: <15715.43805.195606.442523@gargle.gargle.HOWL> aahz> I'm mostly going by comments Tim Peters has made in the past aahz> suggesting that regex tools are poor for parsing. parsing != tokenizing. ;-) Regular expressions are great for tokenizing (most of the time). Skip From tim@zope.com Wed Aug 21 16:15:54 2002 From: tim@zope.com (Tim Peters) Date: Wed, 21 Aug 2002 11:15:54 -0400 Subject: [Python-Dev] Embarassed in Malvern In-Reply-To: <200208211030.g7LAUZv13032@snark.thyrsus.com> Message-ID: [Eric S. Raymond] > Apparently I was hallucinating when I thought these has been released in > open source. Are you talking about Judy? If so, the LGPL'ed source isn't at HP, it's at SourceForge: http://sf.net/projects/judy/ From python@rcn.com Wed Aug 21 16:31:54 2002 From: python@rcn.com (Raymond Hettinger) Date: Wed, 21 Aug 2002 11:31:54 -0400 Subject: [Python-Dev] Backwards compatiblity References: <200208211241.g7LCf5000601@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <001701c24927$e162dca0$9feb7ad1@othello> For 2.2.2, if we add False,True=0,1 to __builtins__, then code written for 2.3 will more likely run without modification. For instance, that is all the sets module need to run under 2.2. Since 2.2 may end-up being the Py-in-a-Tie and because we want 2.3 book examples to be likely to run in an older environment, I propose we add the two builtins. Further, since we don't want to encourage further propagation of custom dictionary based sets, we should consider adding the sets module also. In both cases, it can't hurt to add the extra functions and it can certainly help some of the time. Raymond Hettinger From aleax@aleax.it Wed Aug 21 16:36:18 2002 From: aleax@aleax.it (Alex Martelli) Date: Wed, 21 Aug 2002 17:36:18 +0200 Subject: [Python-Dev] Backwards compatiblity In-Reply-To: <001701c24927$e162dca0$9feb7ad1@othello> References: <200208211241.g7LCf5000601@pcp02138704pcs.reston01.va.comcast.net> <001701c24927$e162dca0$9feb7ad1@othello> Message-ID: On Wednesday 21 August 2002 05:31 pm, Raymond Hettinger wrote: > For 2.2.2, if we add False,True=0,1 to __builtins__, then code written for > 2.3 will more likely run without modification. For instance, that is all > the sets module need to run under 2.2. False and True with those values (as well as function bool) are already in 2.2.1's __builtins__. Alex From tim@zope.com Wed Aug 21 16:44:09 2002 From: tim@zope.com (Tim Peters) Date: Wed, 21 Aug 2002 11:44:09 -0400 Subject: [Python-Dev] Backwards compatiblity In-Reply-To: <001701c24927$e162dca0$9feb7ad1@othello> Message-ID: [Raymond Hettinger] > For 2.2.2, if we add False,True=0,1 to __builtins__, then code > written for 2.3 will more likely run without modification. For > instance, that is all the sets module need to run under 2.2. Good idea! Before spending *too* much time on it, though , note that Guido already did it for 2.2.1: Python 2.2.1 (#34, Apr 9 2002, 19:34:33) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> True 1 >>> False 0 >>> > ... > Further, since we don't want to encourage further propagation of custom > dictionary based sets, we should consider adding the sets module also. Strongly doubt that one will happen. > In both cases, it can't hurt to add the extra functions and it can certainly > help some of the time. The sets module is still pre-alpha, and adding pre-alpha anything to a "stability release" is highly dubious. At best, it would create artificialcompatibility problems if 2.3 alpha and beta tests shows a need to change the sets API. If people want new features, that's what new releases are for. From python@rcn.com Wed Aug 21 16:42:22 2002 From: python@rcn.com (Raymond Hettinger) Date: Wed, 21 Aug 2002 11:42:22 -0400 Subject: [Python-Dev] Backwards compatiblity References: <200208211241.g7LCf5000601@pcp02138704pcs.reston01.va.comcast.net> <001701c24927$e162dca0$9feb7ad1@othello> Message-ID: <000701c24929$57d8ada0$9feb7ad1@othello> > False and True with those values (as well as function bool) are already > in 2.2.1's __builtins__. Doh! From python@rcn.com Wed Aug 21 16:51:04 2002 From: python@rcn.com (Raymond Hettinger) Date: Wed, 21 Aug 2002 11:51:04 -0400 Subject: [Python-Dev] Backwards compatiblity References: Message-ID: <001a01c2492a$8f27cec0$9feb7ad1@othello> > The sets module is still pre-alpha, and adding pre-alpha anything to a > "stability release" is highly dubious. At best, it would create > artificialcompatibility problems if 2.3 alpha and beta tests shows a need to > change the sets API. If people want new features, that's what new releases > are for. Once, it is firmed-up a bit, how about putting sets.py on python.org or in the Vaults of Parnassus? From barry@zope.com Wed Aug 21 17:00:43 2002 From: barry@zope.com (Barry A. Warsaw) Date: Wed, 21 Aug 2002 12:00:43 -0400 Subject: [Python-Dev] Backwards compatiblity References: <001a01c2492a$8f27cec0$9feb7ad1@othello> Message-ID: <15715.47403.731237.499977@anthem.wooz.org> >>>>> "RH" == Raymond Hettinger writes: RH> Once, it is firmed-up a bit, how about putting sets.py on RH> python.org or in the Vaults of Parnassus? Or making a nice little distutils package available on SF? -Barry From aahz@pythoncraft.com Wed Aug 21 17:24:02 2002 From: aahz@pythoncraft.com (Aahz) Date: Wed, 21 Aug 2002 12:24:02 -0400 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <15715.43805.195606.442523@gargle.gargle.HOWL> References: <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> <20020821032018.GA29112@thyrsus.com> <200208210357.g7L3vpl31876@pcp02138704pcs.reston01.va.comcast.net> <20020821041329.GA25548@panix.com> <015901c248eb$97772ec0$0900a8c0@spiff> <018e01c248ed$9ffc0d20$0900a8c0@spiff> <20020821122824.GA16740@panix.com> <15715.43805.195606.442523@gargle.gargle.HOWL> Message-ID: <20020821162402.GA10933@panix.com> On Wed, Aug 21, 2002, Skip Montanaro wrote: > > aahz> I'm mostly going by comments Tim Peters has made in the past > aahz> suggesting that regex tools are poor for parsing. > > parsing != tokenizing. ;-) > Regular expressions are great for tokenizing (most of the time). Ah. Here we see one of the little drawbacks of not finishing my CS degree. ;-) Can someone suggest a good simple reference on the distinctions between parsing / lexing / tokenizing, particularly in the context of general string processing (e.g. XML) rather than the arcane art of compiler technology? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From fredrik@pythonware.com Wed Aug 21 17:46:17 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 21 Aug 2002 18:46:17 +0200 Subject: [Python-Dev] Re: Automatic flex interface for Python? References: <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> <20020821032018.GA29112@thyrsus.com> <200208210357.g7L3vpl31876@pcp02138704pcs.reston01.va.comcast.net> <20020821041329.GA25548@panix.com> <015901c248eb$97772ec0$0900a8c0@spiff> <018e01c248ed$9ffc0d20$0900a8c0@spiff> <20020821122824.GA16740@panix.com> <15715.43805.195606.442523@gargle.gargle.HOWL> <20020821162402.GA10933@panix.com> Message-ID: <01dd01c24932$476307a0$ced241d5@hagrid> aahz wrote: > Ah. Here we see one of the little drawbacks of not finishing my CS > degree. ;-) Can someone suggest a good simple reference on the > distinctions between parsing / lexing / tokenizing start here: http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?query=parser > particularly in the context of general string processing (e.g. XML) > rather than the arcane art of compiler technology? words tend to mean slightly different things in the XML universe, so I'll leave that to the XML experts. From esr@thyrsus.com Wed Aug 21 17:54:10 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 21 Aug 2002 12:54:10 -0400 Subject: [Python-Dev] Parsing vs. lexing. In-Reply-To: <20020821162402.GA10933@panix.com> References: <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> <20020821032018.GA29112@thyrsus.com> <200208210357.g7L3vpl31876@pcp02138704pcs.reston01.va.comcast.net> <20020821041329.GA25548@panix.com> <015901c248eb$97772ec0$0900a8c0@spiff> <018e01c248ed$9ffc0d20$0900a8c0@spiff> <20020821122824.GA16740@panix.com> <15715.43805.195606.442523@gargle.gargle.HOWL> <20020821162402.GA10933@panix.com> Message-ID: <20020821165410.GA18493@thyrsus.com> Aahz : > Ah. Here we see one of the little drawbacks of not finishing my CS > degree. ;-) Can someone suggest a good simple reference on the > distinctions between parsing / lexing / tokenizing, particularly in the > context of general string processing (e.g. XML) rather than the arcane > art of compiler technology? It's pretty simple, actually. Lexing *is* tokenizing; it's breaking the input stream into appropopriate lexical units. When you say "lexing" it implies that your tokenizer may be doing other things as well -- handling comment syntax, or gathering low-level semantic information like "this is a typedef". Parsing, on the other hand, consists of attempting to match your input to a grammar. The result of a parse is typically either "this matches" or to throw an error. There are two kinds of parsers -- event generators and structure builders. Event generators call designated hook functions when they recognize a piece of syntax you're interested in. In XML-land, SAX is like this. Structure builders return some data structure (typically a tree) representing the syntax of your input. In XML-land, DOM is like this. There is a vast literature on parsing. You don't need to know most of it. The key thing to remember is that, except for very simple cases, writing parsers by hand is usually stupid. Even when it's simple to do, machine-generated parsers have better hooks for error recovery. There are several `parser generator' tools that will compile a grammar specification to a parser; the best-known one is Bison, an open-source implementation of the classic Unix tool YACC (Yet Another Compiler Compiler). Python has its own parser generator, SPARK. Unfortunately, while SPARK is quite powerful (that is, good for handling ambiguities in the spec), the Earley algorithm it uses gives O(n**3) performance in the generated parser. It's not usable for production on larger than toy grammars. The Python standard library includes a lexer class suitable for a large class of shell-like syntaxes. As Guido has pointed out, regexps provide another attack on the problem. -- Eric S. Raymond From esr@thyrsus.com Wed Aug 21 17:58:55 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 21 Aug 2002 12:58:55 -0400 Subject: [Python-Dev] Embarassed in Malvern In-Reply-To: References: <200208211030.g7LAUZv13032@snark.thyrsus.com> Message-ID: <20020821165855.GB18493@thyrsus.com> Tim Peters : > [Eric S. Raymond] > > Apparently I was hallucinating when I thought these has been released in > > open source. > > Are you talking about Judy? If so, the LGPL'ed source isn't at HP, it's at > SourceForge: > > http://sf.net/projects/judy/ It's been hiding its existence effectively. I got three pieces of email from people wondering what I was going to do about the lack of source, then couldn't find any pointers to the source either on the HP site or via Google. Phew. Well, the interface is still butt-ugly, but that performance... -- Eric S. Raymond From zack@codesourcery.com Wed Aug 21 18:03:53 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Wed, 21 Aug 2002 10:03:53 -0700 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020821062226.GC1771@thyrsus.com> References: <20020821011429.GE22413@thyrsus.com> <20020821053556.GA700@thyrsus.com> <20020821055825.GP29858@codesourcery.com> <20020821062226.GC1771@thyrsus.com> Message-ID: <20020821170353.GE2803@codesourcery.com> On Wed, Aug 21, 2002 at 02:22:26AM -0400, Eric S. Raymond wrote: > > I'm on it. The following is not yet working, but it's a straight road to get > there.... > > There is a public spam-checker port. Your client program sends it > packets consisting of a list of header token counts. You > can send lots of these blocks; each one has to be under the maximum > atomic-message size for sockets (I think that's 32K). > > The server accumulates the frequency counts you ship it until you say > "OK, what is it?" Does the Bayes test. Ships you back a result. My ISP-postmaster friend's reaction to that: | As far it it goes, yes. How would it learn? | | On a more mundane note, I'd like to see decoding of base64 in it. | | (Oh, and on a blue-sky note, has anyone taken up Graham's suggestion | of having one of these things that looks at word pairs instead of | words?) | | It's neat that ESR saw immediately that the daemon should be | self-contained, no access to home directories. SpamAssassin doesn't | have a simple way of doing that, and [ISP] is modifying it to have | one -- and you wouldn't believe the resistance to the proposed | changes from some of the SA developers. Some of them really seem | to think that it's better and simpler to store user configuration | in a database than to have the client send its config file to the | server along with each message. I remember you said you didn't want to do base64 decode because it was too slow? zw From esr@thyrsus.com Wed Aug 21 18:13:11 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 21 Aug 2002 13:13:11 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020821170353.GE2803@codesourcery.com> References: <20020821011429.GE22413@thyrsus.com> <20020821053556.GA700@thyrsus.com> <20020821055825.GP29858@codesourcery.com> <20020821062226.GC1771@thyrsus.com> <20020821170353.GE2803@codesourcery.com> Message-ID: <20020821171311.GA19427@thyrsus.com> Zack Weinberg : > My ISP-postmaster friend's reaction to that: > > | As far it it goes, yes. How would it learn? Your users' mailers would have two delete buttons -- spam and nonspam. On each delete the message would be shipped to bogofilter, which would would merge the content into its token lists. > I remember you said you didn't want to do base64 decode because it was > too slow? And not necessary. Base64 spam invariably has telltales that Bayesian amalysis will pick up in the headers and MIME cruft. A rather large percentage of it is either big5 or images. -- Eric S. Raymond From dave@boost-consulting.com Wed Aug 21 18:15:22 2002 From: dave@boost-consulting.com (David Abrahams) Date: Wed, 21 Aug 2002 13:15:22 -0400 Subject: [Python-Dev] More pydoc questions Message-ID: <0d6101c24936$57d6c5f0$6501a8c0@boostconsulting.com> I recently added an invocation to help(my_extension_module) to the Boost.Python test suite, to prove that I can give reasonable help output. Worked great for me, since I was always running the test from within emacs. However, some other developer complained that the test required user intervention to run, since it would prompt at each screenful. So, I changed it to: print pydoc.TextDoc().docmodule(my_extension_module) Now I get (well, I'm not sure how this will show up in your mailer, but for me it's full of control characters): NNAAMMEE docstring_ext FFIILLEE c:\build\libs\python\test\bin\docstring_ext.pyd\vc7.1\debug\runtime-link-dy namic\docstring_ext.pyd DDEESSCCRRIIPPTTIIOONN A simple test module for documentation strings Exercised by docstring.py CCLLAASSSSEESS Boost.Python.instance(__builtin__.object) X class XX(Boost.Python.instance) | A simple class wrapper around a C++ int ... So my question is, is there a way to dump the text help for a module without prompting and without any extra control characters? TIA, Dave P.S. Another question: the docmodule() function takes two optional arguments whose role is undocumented. What are they for? ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From nas@python.ca Wed Aug 21 18:27:26 2002 From: nas@python.ca (Neil Schemenauer) Date: Wed, 21 Aug 2002 10:27:26 -0700 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: References: <20020820235933.GA22413@thyrsus.com> Message-ID: <20020821172726.GA12272@glacier.arctrix.com> Tim Peters wrote: > the version of this we've got now does update during scoring Are you planning to check this into the sandbox? Neil From aahz@pythoncraft.com Wed Aug 21 18:31:57 2002 From: aahz@pythoncraft.com (Aahz) Date: Wed, 21 Aug 2002 13:31:57 -0400 Subject: [Python-Dev] PEP 277 (unicode filenames): please review In-Reply-To: <3D5973C9.5070309@lemburg.com> References: <001e01c242e5$49697ff0$bd5d4540@Dell2> <200208131637.g7DGbLA08429@odiug.zope.com> <3D5973C9.5070309@lemburg.com> Message-ID: <20020821173157.GA469@panix.com> [doing an archeological dig through e-mail] On Tue, Aug 13, 2002, M.-A. Lemburg wrote: > > At least is good :-) NFC is NFD + canonical composition. Decomposition > isn't all that hard (using unicodedata.decomposition()). For > composition the situation is different: not all information is > available in the unicodedata database (the exclusion list) and > the database also doesn't provide the reverse mapping from > decomposed code points to composed one. See the Annexes to the > tech report to get an impression of just how hard combining is... In a message just prior to this one, you wrote: The recommended way of doing normalization is to go by Normalization Form C: Canonical Decomposition, followed by Canonical Composition. So, um, which way is it? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From yozh@mx1.ru Wed Aug 21 19:24:51 2002 From: yozh@mx1.ru (Stepan Koltsov) Date: Wed, 21 Aug 2002 22:24:51 +0400 Subject: [Python-Dev] q about default args Message-ID: <20020821182451.GA31454@banana.mx1.ru> Hi, Guido, other Python developers and other subscribers. First of all, if this question was discussed here or somewhere else 8086 times, please direct me to discussion archives. I couldn't guess the keywords to search for in the python-dev archives as I haven't found the search page where to enter these keywords :-) The question is: To be or^H^H^H^H^H^H^H^H^H Why not evaluate default parameters of a function at THE function call, not at function def (as is done currenly)? For example, C++ (a nice language, isn't it? ;-) ) evaluates default parameters at function call. Example below illustrates the point in mind: In Python: ---------- def func(l = []): do_something(l) ========== to make it clearer (though this is not exactly the same): --- def func(l = list()): do_something(l) === or, in C++: --- void func(list l = list()) { do_something(l) } === Implementation details: Simple... Add a flag to the code object, that means "evaluate default args". Compile default args to small code objects and store them where values for default args are stored in current Python (i.e. co_consts). When a function is called, evaluate the default args (if the above flag is set) in the context of that function. So, for inst., this code is now possible: --- class Tree: # this iter walks through a Tree level-wise (i.e. left to right the down). def levelIter(self, nodes=[self]): # ^^^^^^ look here # the following is not "mission critical" :-) if len(nodes) == 0: return for node in nodes: yield node nodes = reduce(operator.add, [n.children() for n in nodes]) for node in self.levelIter(nodes): yield node === About compatibility: compiled python files stay backward compatible as long as they do not define the mentioned flag. An alternative way to go (a little example... LOOK ON, PERSONALY, I LIKE IT ALLOT): --- def f(x=12+[]): stmts === compiled into something like: 0: LOAD_CONST 1 (12) 1: BUILD_LIST 0 2: BINARY_ADD 3: STORE_FAST 0 (x) 4: # here code of stmts begin in the case if 'x' was specfied, the code is executed instruction 4 onword This should work perfectly, ideologically correct and I think even faster then current interpreter implementation. Motivation (he-he, the most difficult part of this letter): 1. Try to import this module: ---xx.py--- import math def func(a = map(lambda x: math.sqrt(x)): pass # there is no call to func === This code does nothing but define a single function, but look at the execution time... 2. Currently, default arguments are like static function variables, defined in the function parameter list! That is wrong. 4. Again: I dislike code like --- def f(l=None): if l is None: l = [] ... === 5. I asked my friend (also big Python fan): why the current behaviour is correct? his answer was: "the curren behaviour is correct, becausethat is the way it was done in the first place :-) ..." I don't see any advantages of the current style, and lack of advantages is advantage of new style :-) I hope, that the current state of things is a result of laziness (or is it "business"), not sabotage :-) . and not an ideological decision. It isn't late to fix Python yet :-) , as when Cpt. J. L. Picard once again saves the galaxy (this time, not from the evil Borg), it will be difficult to change self-modificating Python compilers, reconstruct hardware Python bytecode interpreters and verify tetrabytes of source code, written in Python (NOTE: I speek of the not so distant future :-) ) -- mailto: Stepan Koltsov From jepler@unpythonic.net Wed Aug 21 19:31:46 2002 From: jepler@unpythonic.net (Jeff Epler) Date: Wed, 21 Aug 2002 13:31:46 -0500 Subject: [Python-Dev] More pydoc questions In-Reply-To: <0d6101c24936$57d6c5f0$6501a8c0@boostconsulting.com> References: <0d6101c24936$57d6c5f0$6501a8c0@boostconsulting.com> Message-ID: <20020821133145.G1218@unpythonic.net> On Wed, Aug 21, 2002 at 01:15:22PM -0400, David Abrahams wrote: > I recently added an invocation to help(my_extension_module) to the > Boost.Python test suite, to prove that I can give reasonable help output. > Worked great for me, since I was always running the test from within emacs. > However, some other developer complained that the test required user > intervention to run, since it would prompt at each screenful. So, I changed > it to: > > print pydoc.TextDoc().docmodule(my_extension_module) > > Now I get (well, I'm not sure how this will show up in your mailer, but for > me it's full of control characters): In my mailer, X^HX is displayed as a bold X. It's an old trick of impact printers and interpreted by fine unix screen pagers such as "less". I'm not sure how to disable it. However, re.sub("\10.", "", s) should remove it from "s" without hurting anything else. I don't know if pydoc produces underlines. If underlines are expressed as X^H_, then it'll convert those to regular text too. But if underlines are _^HX, you'll want to use re.sub(".\10", "", s) instead. That'll work for both bold and underline. Jeff From dave@boost-consulting.com Wed Aug 21 19:45:35 2002 From: dave@boost-consulting.com (David Abrahams) Date: Wed, 21 Aug 2002 14:45:35 -0400 Subject: [Python-Dev] More pydoc questions References: <0d6101c24936$57d6c5f0$6501a8c0@boostconsulting.com> <20020821133145.G1218@unpythonic.net> Message-ID: <0f8501c24942$f0e67720$6501a8c0@boostconsulting.com> From: "Jeff Epler" > On Wed, Aug 21, 2002 at 01:15:22PM -0400, David Abrahams wrote: > > I recently added an invocation to help(my_extension_module) to the > > Boost.Python test suite, to prove that I can give reasonable help output. > > Worked great for me, since I was always running the test from within emacs. > > However, some other developer complained that the test required user > > intervention to run, since it would prompt at each screenful. So, I changed > > it to: > > > > print pydoc.TextDoc().docmodule(my_extension_module) > > > > Now I get (well, I'm not sure how this will show up in your mailer, but for > > me it's full of control characters): > > In my mailer, X^HX is displayed as a bold X. It's an old trick of > impact printers and interpreted by fine unix screen pagers such as > "less". Figured it was something like that. > I'm not sure how to disable it. However, > re.sub("\10.", "", s) > should remove it from "s" without hurting anything else. I don't know > if pydoc produces underlines. If underlines > are expressed as X^H_, then it'll convert those to regular text too. > But if underlines are _^HX, you'll want to use > re.sub(".\10", "", s) > instead. That'll work for both bold and underline. I didn't want to resort to that, but then I also thought it would be uglier than it turned out to be. Thanks! ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From pinard@iro.umontreal.ca Wed Aug 21 19:59:37 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 21 Aug 2002 14:59:37 -0400 Subject: [Python-Dev] Re: More pydoc questions In-Reply-To: <0d6101c24936$57d6c5f0$6501a8c0@boostconsulting.com> References: <0d6101c24936$57d6c5f0$6501a8c0@boostconsulting.com> Message-ID: --=-=-= [David Abrahams] > Now I get (well, I'm not sure how this will show up in your mailer, but for > me it's full of control characters): It shows nicely here, using Gnus for a mail reader. See: --=-=-= Content-Type: image/png Content-Disposition: inline; filename=xx.png Content-Transfer-Encoding: base64 iVBORw0KGgoAAAANSUhEUgAAANUAAAD/EAIAAAC5hdjHAAAABmJLR0QA3wDfAN9iX+6/AAA1mklE QVR42u19zW8lyXFn5nuPbDbZbLJb49XYmukZSfDJt8bsTYdhvZv/ib0ZWPjklRZYYyEIMOZmCD4Z 8F+w1z0ZPr2ivNBhL7tY7WF2vWNLM5oPtVvT3Ww2u/n1HnMPkd3zq4kXycj6eCyy4lcAUSjmy4qM jIyKjMyM8M9+8uK/P/uJm7iJGzvvvBvt/nTjHefdaHJ79z+Vr375h+XcAYrJj35XTMr5L/+wnLvg ggvuwl24C7dwC7coT/fXytOP7v300Uf3yuP9SXk8fafYmb4ze1S+nD0q1vdOi/XyfH+jPI+/lUD/ nbu5mzvvvPO8noO/Ovr44K/cxE3cRKzHO+883e7+7M6f7P7MjeC6LiBuLNzCLfyW3/Jb4TAchkO/ 7bf9NnIynIWzcIZt9xt+w2+Ek3ASTmJJ4InItjW/5tcqb4df4Vv8bX/b35bq17zdr/t1vx5rPg2n 4ZT3XSf84TXg/YW7cBdUQzgKR+HIjd04jhEfacaaT8JJOIlvPw7H4bjSC8hPfC3rryX8V5RvwklV H6npSdfWXB5qcobLs3POReUxdiM3etO9I7f2RjXQM/rB63vnQlQfvAG33W13m26n3y/emn5/9q/l q9m/Fmt7J8VaOd+/Xc7dyPk3v00L4ppbc2vTd4vd6buVetwvtkv38NYH/+vhrah800MFVR614jqC KCcOj93YjaNweLgYJ7MVCv3qPJyH8/Tn5JL6NW+nMiS4emXXFn+4cl+4hVvEJyQta3Bxmnll2N40 P5P0ZJRvwkl9H2no6VoemnAGfwUfyxFZfFDFaEnV3Fby7GJlZl+Uz2dfTB8U96YPSPEVk73jYrKk BqlmqR734YvCRWHF9/La8L/Xy+JL8iQOSGqR1P31FIpPXvr6NW9PK6au+QO/JSvP7/pdv+vv+Xv+ Ht1Hu68ef9Llc/nfBSdza9a0tDt5aM4ZvMZu7MbN1EHy9aSwZr8tn81+S4qvXOxvlouMgUTTXl6P +8V2+dpyxa83Wo7NB1UvEZ6H5+H5zWvXlfGHrIMX7OKKz3AzAEqwQ2uIFFbFahvvvSrGeg8UlVxS D7f+sM4bqvhubLuulj/1bArDjUCH6u8S6y+99AGiKVp/5OtB35+mToPBYHDOdar+rsz6MxgMBgXM +jMYDAOFWX8Gg2GgMOvPYDAMFGb9GQyGgcKsP4PBMFCY9WcwGAYKs/4MBsNAYdafwWAYKMz6MxgM A4VZfwaDYaAw689gMAwUZv0ZDIaBwqw/g8EwUJj1ZzAYukSAq2cYnvVnFqINRcNKEFMI3PV3/d1+ Ujgk64/yxmmGwXUZJCF5mUpiQ7EX84PcPupPn+rljVIIUOKBHkvjkKw/SoRIw0AYeH3+UlXovOPv +Dvx60r3W37Lb61okLck0BVudzZIYrLKw3AYDq+417a/uboo3yN5638KgQt34S76Yf3xLKtw37L1 F1xwgbqQs6OiRoVvmsrO6m4ajql5noVn4Rl+aVtO0MNa6nf8jt/RlxfbeObO3FnskXN37s47UYL6 1Ff1+itTHjLSMOWW7w4N5I0+b/TbXElb8rwtSVh8c63W+qM8v5KKERoZ6/msfDr7rGXfH6ZpTxr2 la8ffpPZe0WLRiqTCxzGE7hazV+8pKWQHXVJGmmJP7ztVHLX7/rdmGSS0kt2YLdq7L6M/tLIgyTJ yZSwYi/ry3eHevIGSrOepPlNv+k3L+GqhnwydLDOHb/jd/xtf9vf7tL6+6x8OvssWm3n+xvleczz ewGXpIDoydzN3Xz2aflk9mlrK7/eeefDk/AkPKmItaAiiX3hIByEg/jFOwyH4ZC6lqYAsTzZMixh duUtWKYtoZSe1KuY2vssPAvPYquppfglR/5s+S2/FUvirw7DYTisiO+Fu3AX0Y4gu4aedGe30hBN W+j6/jpzZ+6sIg/p9t4kdJHlGcZ7+Dp8Hb4mSeAjriZXoX+jlB6Fo3AUx+9JOAkn3Vt/ZLWt7Z0U a+XZ/q3ybPq94u70ezTlmb5T7EzfISGbvlvsTt+N9w+Ke9MH0ff3XnF/+l5r1h9+tfAbiwsjBHpC 3zr8DqPCpf9y9Y010G+lMv0BqgzkFakGVCXYIn5P4FNatGKIb5SMvFW7VVRw6TKa/uL/ldpr0IOk AiWNpAL53ERCFm7hFuFleBle8jTnk+7ahdbf7Dfl17PfTN8vdqfvz35d/n726+n3i7em3xfvX5f/ zvT9aP2REvyifD77opjsvSgmB4ujTw4WS5Sg+msWvy07fsfvhN+H34ffx/9SbXzoopLlU3gcSLyG 67ImizY13fPn+PGg2/v+vr8v+tr4r/qWkD6zvy5pr+0/zUXS+99o1NCHVuiXDtVfVFikyPAeFZx0 T4oP75n193D8wa8eLrP+NMJHTOGCjgOe18MVH3/OlOwlZfomhzTVpekG8XbkRm4k+nFo8nIQDsJB 5SPRh/W+3Ldr+qvP7b2pCrEzSejS90dWGyqytMWXvn+vuD99r+VTH2QDktfmLf+Wf6vCLLy4dYBq N73MLz3pQkSaCA2qdfLKPQ6Pw+PwNDwNT8WlKvqQoFtA+mxo1E270DhGcvurSXtz+6te/3Y3z2hX 3jRKquPPSZe+P7L46im7HxR/MP1BVJp0/2n5ZPZpC74/vqYmMR0ngKgE0SuEfjFeXlruENZPszGH q63tI9B2/7Z/279NE73KCil+DDh/2MKR2FJYcOhiOFUsWQn6/tpwG26jUXtz+yuzfOenLLqTN83V lkoFdOn7k6w/jSUoTYe/LA9nX9b0/aEXgG0moFWhOFRgapweQpX1Sl4en4BQNl/ljMMMWh29UY3r j5tFaHJX/VCO3KiylYRsZz1/0uoJamtB+CQPI5slqPoL1rvrtTe3v7L7F/bidaH+upO3CL64wZ90 oAS79P2RwtJYeZLFl7T+sn1/Yzd24yVbSeG3lY7E53xYSjYjlpeeNEF6R1WT+iVlQUCLDz8eev5o eNUBUGFV6Mztr1x5qNdf9fo3PXnvp7xR9Tge+Wbvxtu/07/th/WHSrA76y+9fK4RrHrfmXYHdsfe kGzrprOJSWu8yh3AzXe0rbI8/lR/yqIPfaThfMfS1aXvr2PrL9v3Z6tyanGJm5Pxandb8hW168ZC fcrCgOh+5Tet1CTrj9+/X3xn+r7F+2trqKhWQm0zx3WB9VEtDMn6u45Kqt0x0uf4a9cxVJfx55rD rL/e4ZKoKtLAUJTpZ/y1GD6AlHIXPXh94+uthj99wwo5PyTrr//fz8yjYBnx8no5pa0cbu/Ab3V9 4+uthj99w2riP0Z0Hu+vb9YfbrKtF19sJXGV4567dM214uVlrwx21148yo49mKZE/99+xtfT8zOX P13Tk1teGlNtxX/MrR9LQry/Ie37o2jP7LB6PPRGoavwfCtu8pQ2gmAAqMb2FIZakpTUkg2oFDuP KJfWZzNXBpeEg6VqqP7m7UVBp3uK88FqjipbUEzif6WAC2KDM8vn9mwuP9X8WQ09+vLRYmVjKl1e L8+59S+hn/574k7cyZD2/UG0Z4ovRiKFUwyqDU8+xHgw6DXDY2F8M22D7/Alu/ZRhc3d3M3j4Kcz GDQ8NMNbMTAqR/pBKcT/NjlVwsWdfJ20KZ2doklvJ+6/LyxGmiP5Yc4HjEOXzZ+26IGjn7x/VfST TMJxwCVywtsL8R9V8syVLK8flGPljZz+zqM993PlVx9f7MSduJMVxXfTRPflB+/bjZeH3BNCVDUa fhg2HcOdUvjJpN2qorlvwJAQ6XiRBCkcbJo/TeghyZH6l2Q+N94l0Uz14K+wfJP4j1L9BGwRjW5O /4ri/fXN+pOGjcL316P4bh3Hy1tyBpZCRSJn6r0RKV9NuNP+KEFNvEiuJrrgD1c9aSnKjXfJwwnz 3peea+RZqp+I5XsbkvTbvr9lCpE9j2HNIcz9tT8LIaknUIKk+GJOr7ZW5ZocbOosNsxKVWGaA6sM +pTLw2QMxBXRrOdSOl6nc27o+/40LGsS3625OOqP8bVFA6sTp2NXsClEirfIw9M252cX8ew08SL7 wE8Ej2Wpob/5klFb8qyO1zk8608fX4wUX5P4brnIjafWQby8Sot48inpI9Fc3CUI8RajNZoech3H 11NBEy+yrXAb9ehhGewqXNXQnwwon9EijTzn8kSiv+uNL73z/WEnaeKLrbt1t94knl220smMp9Zy vDxwfl+y0aFdJHttCf/B+S2dheg8vp6eqc3lpzvfroIeVXmekKiWEsyQZzVPltC/mo0vPY33R1s6 cLVLE3EsN55drtKpF9+t3Xh59eLZNWl6epuxRE96JXo18fVWws+Wt2E3j18plQcHVJpm8b9qec7g iUQ/2K1D8v3honi6y/mT7o6LNQnwvUp62kVmTr6MfCm5/OwiwHpzfnYdZU9PTy4nm7SoSf259I/d 2I0t3p/BYBgSQAlaxBeDwTBQ2L4/g8EwUJj1ZzAYBgqz/gwGw0Bh1p/BYBgozPozGAwDhVl/BoNh oDDrz2AwDBRm/RkMhoHCrD+DwTBQmPVnMBgGCrP+DAbDQGHWn8FgGCjM+jMYDAOFWX8Gg2GgMOvP YDAMFGb9GQyGgWJI1p+QwtxgMAwTQ7L+MI1eWhXq/9tWDfrLYDD0X/31zvpbuIVbUEpDv+k3/SYv 4re+ueIjyG9Lv1pSBjP/3v7mqtRA2Wlv+Vv+Vvwv0kP10e/uwLX9zWVTe4Ph2qi/3ll/lL5asqqo NkzhDImuo7qkpHmH4TAc8ufuzJ25M64Ko5pDhQiZRlE5Ujo+qj/e09VFdl2DwdRfd1X3dOUX1RMp L5brPl5kr5GVx7KFRiVIKbGpDNiJWE/FTuTAX5Eq7zrJpMFg6Fr99XrlFxUlWn8IshZ5eXwLqk60 LqW3cGBt6J3EX5nXz2DoAIO2/sJBOAgHfsfv+J1LlikIqOzwXnrV0/A0PCXa4j1BUNP+Llzm9TMY Osaku6qj9UdKkBQZKT68l6w/Xp7q+bI8nH1ZTPZeFJODxdEnB4slSlADXhLtrKR6im8kW09Su1jD httwG+J74e3RxzeCyxSfwdAZhmf9Mc9aeB6eh+d+1+/63bS6qSxNkKqiN5Kq4vVP3OTNBwbv8S2o cE3xGQw3Q/311PeHZUiBahYZUD3RPf0WFZ9Qf5z24rt4eckaNRgMnWGoZ35RkY3cyI3Cq/AqvKr8 F9UTK195I1eCI3YRtVL9dOHmG1SCtgBiMHSAIfn+aPrJrT+utgjks8PykpqTyvP6JZ8g/ZMWYbhy dM45Z7v/DIZ20aH669r6ezj+4FcPl1l/khKkqSjt3eOKiVTMy/AyvFzyRKizhfJodUowP6DB0AGG 5Pvjk1CuYqSr6/K5tRkMhj6rv975/vjU1WAwDBgW789gMAwUFu35JsL4YDwxKGDW341CPJdyvdCx Yorr6YYr4n+fYdbfTQF9DK7VB6CirLvoOylQhSmU1fC/9zDr7zpDilQobZPOjSDdXXmKjUjnpiGu oirwROZWcApsoaI8t35WsmVLU89Pfevq8f+GYkj7/m4QKBY0imwc5DxkFpWnyDF8XODJZeBbpf5W y8eS0GvxtDWdvWFbu6k8nbOOkRNhBV+iJ/6WrBsWq7F5/ZGfGD2I4j+2ZIOL/KRWID93/I7fib3P 62H/zeW/qb+a6HXEl2uu+KJA44Ckoc4+A7H88/A8POc7CmnA46CigV0pD4f8Ym0wSCr04NaikRu5 UaU8qqG5m7s50RwVx5pbc2vVsc5agYMTn3PVRu+iYBaSP1RTP/IHFF9F3Yzd2I3jc0HJZvQvvZH4 z8JhVN5CMo8KF8sL/83g/wBgvr/rA2odD9NA320SYuQATW3wqB+eQaaS9F8cKlge34UhuZAeUHao OJaUx48TvYWGnLQRnUBpAKgebAWonksCSaT3e0r1I3/QyYDU0j0+byKBXHlhu/gUleIVHYWjcFSx FilKOam25vy/0TDr73oqQT5Q8UwLD5rAwzTgPb/SkWyQnBfhRXiBk8F4sE+Kn8g/VOlew8HJSkYr qQmk+jl/MCwF8QcjhOPzxvD3/X1/X+QSvgVsOlSC4jQ2l/+m/urBfH8dQprspBUWV3z8uVQ+SUOc UmHmE26TNukXyb5rl5+K/1bUPfGcWtp42ltRZ+hMaHII0jaEJWErv9dT8UlBsdDXgxcvj58NTXk+ qLjiAFUY/W7Ckov025Z5laY8940oaTRlfhweh8cxnmNbQcloWpoOfMv4jz7QynQ47RTqmv9DVn/m ++sEqOAwgzA5+HlcQqH8En+WVB43duCgohHEhxmqVz7MeFvSKqnJJ20OF/rOuOrXKGh8QgsRb/u3 /ds0UW1hDx3/gLF8MhL/0fqOlb0ML8NLaQU5g/83Gub7u2aoTL5gMMdvPg1CcNgvKY+1saWM7PLc 9S6VT7eFHPnSFDKzZ5ds7yBvGlKl93/xtW++0YSvzHLVllZ8kOtZz38x6jhfKGvCf1N/uTDfX8vg vjb+nK/0SbveJLumu/Lp30pDNHdxI7nLL3fxpPJfri55FkD2Fr5DU6JnyfYgBT8z6K/Ff1N/NWHW X4dKsPnz3PrbKp+rItuts/Hb61nTGZR0TH9rfXcjYL4/g0EHtv5bufg038LZ9h628msw5MBU2A2C WX8Gg2GgMOvPYDAMFGb9GQyGgcKsP4PBMFCY9WcwGAYKs/4MBsNAYdafwWAYKMz6MxgMA4VZfwaD YaAw689gMAwUZv0ZDIaBwqw/g8EwUJj1ZzAYBgqz/gwGw0Bh1p/BYBgozPozGAwDxZCsv7ZSERoM hhuBIVl/mNZPfxH0JXl5DXLpMRgMjTGkTG8Lt3ALv+t3/W4lFy1ZkQSWeYunTKzk9Bq7sRtjnge/ 5bf8Vkxz8zK8DC/TXMLyGnpMZA2Ga6D+epfpjdJdQ+romIULU4PjhWXmbu7mmN8rPl9za25NTO+d BpbX0GMwGFrF8FZ+UT2lFQ3eU+JwhGCjZU9U9fQYDIZWMbyVXyyD3kBJeenz5zbPS6uhx2AwtIRB 7/vzd+Ha/ua6Kpurb/QYDDcbQ/L9IcjXRosJI7iuStH0jR6DYQAYnvVHZVDFUD3dKRrNbkR6uyk+ g2GFGJ7vj8rz+zQkFcb9dAS+RUbaGygtcZi/z2DoGEP1/ekVH4E2zRDoXefu3J3HJ0gDXqSguRKk J/RfTXlbADEYOsCQfH+0eUXa2iIgbntmqipunx65kRtFnx1TT37H7/idiqrFbcy0y09f3rY9Gwyt YkinPuiEBikd3LUnAbciS2DvCq/Cq/DqkppxW7W+vMFgaBVD8v3hQocGXnHllue2Z+5bDAZD/9Vf 73x/uf4+g8Fwo2Hx/gwGw0Bh0Z4NBsNAYdafwWAYKMz6MxgMA4VZfwaDYaAw689gMAwUZv0ZDIaB wqw/g8EwUJj1ZzAYBgqz/gwGw0Bh1p/BYBgozPozGAwDhVl/BoNhoDDrz2AwDBRm/RkMhoHCrD+D wTBQmPVnMBgGCrP++oHVU77KN/ahXzTZlg3D4U9wwQWz/q5YXPxdf9ffXSVpMZ9cbltqDYBK66Sc yPV4q+dwsr0r4n+f1UeSttXLZ+f8JIOpa/XXO+tvJSKYoVxQZa8mky9PxJ5uy/Y3V/a7MDUoZC7O Vr5N6Em3dyX8b9LezmVV+jhdlXx2zM/YXmrFwi3cYnjWXzqJuKaDpZKYwlw/GYfOyKCngaIPB+Eg HGhqCM/D8/A8W71isnZ8ywVcuc6KevRo2luP/xr5adxe8S3N5eHMnbkzN3dzN3fn7tydX6IEu+BP bnub8BPbS9fgrD9qNiUpv+Pv+DtoTUSbIlmP3/JbfismPpd6/EV4EV7EMhqq0DJiVPEa0pMRzX9V 1hOm2VRkyKtQjjTQsIE6s/lTi56M9mbyXypZKQ8ymdveWDK44ILf9Jt+s115iJTv+l2/G8fCrt/1 u5X6u+YP0q9pL9BQj59L2rvpN/1mh2nOo/VHSpAUGSk+vJesP16e6vmyPJx9WUz2XhSTg8XRJweL JUowKejEjmhH4HACtRiOwlE4uuT7KeX5JTvlKByFo1ibIlE6mfSxJAx1UrjxOU4npe9n8r/U6na9 ObGNZGHB2+NbuNrK5U896Nur4T/JAw3Rbb/tt2N7URGP3MiNlsiPvr3Qg0vqaS4PF+7CXcTfzt3c zYkz4TAchkO35tbcWiP5RP7g+FLIs6q9ufJDJXl7n4Vn4RnVNjzrb+ImbhLLUw1MbYlj6igchaPw MrwMLy+xU1D40rShUI7govJELU4k6017a1lPGhVT4SHUHwcVH4S5/KmpldXt1fMf3SZYZ1p+6skD 1cM+GzXlgS6kmeohlUc92Jw/pM5wfGnkWdPeevKDv8L2Ah+G5/vjoiBZc9Jw0usHMtTTky+uPiSX cz9XD5FaZg2lh5aKP11Dw3+QgUgzTqtRKtJeV708SEMdJSEtk9KyjzRr0fOH08YvmFFdIs+8vbz+ JvxMtm7Y+/7qKRSNMhIGj7j5gwuKpGIk+q9WOaYVRz3+XC39/L+g3Gng0TwgeoTTq6jtygMqI5wC c+WCvtfm/NF8AnPlVlN/Z/IzPOuPOVPF1S7+U8XSh/iNQmctGv+4TopAtc4vaf06Le7qlqrKI8/r 0SPxR7++2Zx+Df+lgQrepehnVFiy2fLAdxSgkwG8YEh5xfeqUWpNPjwa+eT8bMkJkyE/zLockvVH 83++Hw2Ga4WJzb9XwmJIpTZymfPVUhwM1FISF04/intapHDhP73dIV0eed6EHok/q6Rfw39oxZJh hsM7LRu58oA1c3kQaqssPqSBLa0n59zHKshDhX4aifr625If7M2FW7jFkFZ+6WuZ9Bdolj4aGeqo yGDwRKq2/Jbfqkyy+Boipx+cx9I3P4oFcMnf9/f9fanVqvIN6LmEPyukP5v/yWGWlp9L5AGURbQl 6S0SPYIKqCgdve8VPJvZaojUmSQPnD+ctpaUoCQ/S9p77I7dcYfqr2vr7+H4g189XGb9ScxinZTN 4sbg32SkpLKmnN5Yg/SnVx6l9kpv0ZevR08mfzqkvy3+N5CQ2F60uVBK8XOrkU+05rhtnm6LpDKS VmTlv3r+4BYWff315Eeibe7mbj4k6w9Zv3rkrrjlKuLc520p+q7f2x397fK/eT0omdIuPKmyetZo c5urO/530e9oJE3cxE2G5PuzcFiGfgIks95Bw7j8glem3T1MblvEF4Ohd8My+1f8Mihg8f76imsb ImmgMJ5cQ5j11zt0EiKppcHZ0+hvV6qY+hzSagj8b4LhWX/XxapqSZW3NjjTB+x7iUtOYvSyv667 Qlkp/xtjSNYfbX/lndFd/DJeRvEWVTw+DT1N4qNh/Tx+XzriYRf8zC2vj2fXRB70/dWSvLVsaXYX XzKX/00+AA3oH9K+v4VbuIW/5+/5exSPRNzmyuoRNxZAqJ+4vZY2VQK7+bv4rypfy/TOtTQ9uK2U xZuTWndJ/UQPDXJ+ToDKp7e56vnZuPySDc8U6w2OiPGW5sqDvr/q1S9JUVs2eIY8U6gr6n1eD/tv Lv/rUa7nZwxyJaynD2nfHwt3KsYXg+eVmGJ4ghXju+F/aTB8Hb4OX9PurUpEvLEbu3E8loQDRh2f LlKoj1eYGV9PFb+PR6bTxHeT4sEl6a/Qk46vJ8V3k+LZaeLrSRzT9Fdu/D5QfBV1I8lMPfWB8sOi 9VTegmGsuHkh/Dc7nmCuxddWPMQoRNfW+qvp+9PEF+ORyHjNZNLzs8MY6Ik6gDoeS/KjSJr4dDwC miZeoT4+mj5+H35O9PHdeDy4NP3sJCxvER8Abz7refHsuDyA6rkk5oomnqBG3vA8L74dnzdPEcXi UVfkmZVMh1Rogf+5aDUe4vBOfWjii7HATaITV0oEk75vLsS58Qrpp3DmMaqz9CBBwUK1zuN56OO7 ZdJfOadJKo+OpvGe4j2i4Uzy5Gm97CKN5I0dy4/lWfCFRjYgnYCWuIRvwZO8oATFaWwu/5vzs0E8 xCH5/vTKgvnyKpMREM3o17iqFefc9qYjyqX/y0NCaXKnpelM0y+cS614UaWY0tn6oKUjifXkDU99 oLonnqO/uCWZqTgTmmyZvqoNZxrpleIhAuW2708Gn6ii7aOxXxTR4i5Rkfr4empVK8abQwrT8fty 47tpykv0N4mv11Y8O/0nrUkAWpRkmuI9Do/D4/A0PA1PW4v4TdNDdFOkw7sxH2hlOqyPr5eb5U4q 31Y8xHW/7teH5/vTH7TGCQiLZVaJZ+drXRz14utp4hWyb+YSsdDH78uN75aOV5ikv1F8vebx7NLQ 9FfuwX5ciHjbv+3fpolqC3vo+AeMJSOV+M+jwsRI15r4eupPgiqccPN4iNSi03AaTofk+9MHWUzH L+Px7LhzV/OEB9GsF18PJbxJvDl9e3Pju4HoZ5dvEF+vtXh2vLg+nqCmfr72zTea8JVZyV6TFB8k LdLzvyIhvE59fL00//WfpVbjIQ7J9zd2YzduIX5ZcqWyUoP+SZP4eg2smEbtzaWn6/Lp39aKZycN 1Ix4gnp5S2euEKxdvkNTomfJ9iAFP2vG41Pzf4ny1ZsmTeIhdr300TvrL70A3zyOXltP0ugu3ly7 z9uiv90FjeY1dxzPrp41nUHJKuPxdRd7sUk8RFJ9J+7EnQzJ+rNAQIY+o551PDC0YK2Dl9BWfg2G PsHi96U50xY/x27sxhbvz2AwDAmgBM36MxgMA4VZfwaDYaAw689gMAwUZv0ZDIaBwqw/g8EwUJj1 ZzAYBgqz/gwGw0Bh1p/BYBgozPozGAwDhVl/BoNhoDDrz2AwDBRm/RkMhoHCrD+DwTBQmPVnMBgG CrP+DAbDQDEk66+tVIEGg+FGYEjWH0t0suSSENSX/le5b8mlp2/lecLD1fAnXY9hwBhSro+FW7iF 3/W7frcyACgF36vwKrxakuB5y2/5rUpGUUyZJGTSir9iZSolIXfXkny7WJ4yFVDNlASH04NtQUqo PHIJc4ZR9jtIoCO2N10/QfEWniIyTU+katNv+s0Kf6CnKvRQ/ZThgfoU64HWUaZaUwGm/jpB7zK9 UTpqGGwxbcqaW3NrOMDicKLE1ZBEOaohnto8/swFF2I9pCaofipPycJ3/I7fiblcKVcpZkGVylN2 1yfhSXiyhB5QB7H8pt/0m/TfcBgOw2FU/aCq4m8pYxZPH55bP35gkm+JT+Zu7uZiSVRYt/1tf/sS /mz5Lb8VnoVn4ZloFWIKdoPBOTfElV8cGJQzFNI/xyFHNgIfMFzx4cUndzgxR5uRnpA6xsGJ9WNS dkW+1yX2r1QbQbBhs+vHt6STu2NtaXo4JZw/aJMiP4FX8VMECt2mvYYVqb9rsPLLBzMfTlgneg+x Bm57YlplsFbIrozWJQ5g/BVmrX8ZXoaXlfKSquJ2DSoLDilxYm790m/Tb9HTw/kjfXJ4z+I99YXB sBr1dw32/TF1k1aj/i5c299cFYVF/ibyG8JUt/IWaarOFShLzbeEnjv+jr9D76r4MRsj1k+tIJ9p q/XnE6RI/Ig+wYNwEA4iZ8ziMzAMyfcnDaf0ExhU0Qrjkz78LZaEZZCoBNGHpfdD8SUCVNNU50E4 CAfcd1YTmvb2B5p+ZItdNvgNw7b+ND61tLdLGoowKY4TWFJ5fDU2TRWBJm7ow6KCpPLotWSdSTXk 8gRbyifd9VRSW0grMmY1xyUmsF5t2Bs6V38q3x+/X43vj3v0+OAh97w04RJUDLnbKxNedNsjnbj0 Idh30XIkdUl0MkUcbTSabuMGGu7/SqtaVHyvpWNUQ0I0vaCnh/OHL19wG5yAitsUn2GV6m/JtFdj 5XVt/eFgoK0t9BgUh7jBBa0wHKJ8JZeAiyfc9b7hNtyGWD/RxluE6o84QEoQdrFV9u5x2ljbl7gO 6im+qmSlatDTw/mDSxnSajL2HfEH/ZXNW2e4EejS9/dZ+XT22fS94v70PdH3p7knK5Luvyifz76o 6ftjmy38PX/P34t6DJYs4gDje9DIic5XPEFpkhqKVhtOuPi2YXivWJ7K4DZgbt2AIo5vBz/jEsuR ntO70H7kFnQ9iylZQ6RNTU9r/OGKz+xBU3/dVR2tP74AkmnxRcX32/LZ7LeNTn3giYL0MMBFDM0q J/MViicKhEm0qjyWTNNMZXCbsYZyeolUvxpiDbn08BY14U9LrTPcJHTp+yPrjxQf3aNS09yD4qP7 crG/WS5q+v64Iz9d3qsv/a9y3yKV1KuPduvXfAbqtbRr/pgH0LBK9RenvWS13do7K27FqevW3kWx NfuqfDH7qtjcWxSb8f723ry4Xbmn8vRbqme896oY1/T9mcfHYDAAurf+3ivuT98rz/c3ynNSZPF+ be+kWIvWHCm1i/2t8iLe03MqM9+/Xc6nD4p70weNrD+DwWAAdG/9kRIk5XW6v16eolJzEz9xE1J8 buRHbkQKjhRZVHzvFrvTd6meRtafYZWwvjD0Xva63/f3oLg3fRAXNN4pdqbvlGf7t8ozUmRuHmBr SPhmNfDCXbgLVHx0b9Gexc7ujbqJ6+PGn8aDc6D0NJG9u/6uv6sv3731hyu/tHcPrT/czsrOS6Dl GK0/y/WBnU1njrHLr5YD0lbkq+UPRiTsz+Dsgaz2jZ5GwK1U6g/eqlZ+4dxunAKT9ceVFxAdrT+w Is36i4JLEQOfhWfhWSVuYBfqLDNUVDyKlxs9mz/n5dO/ZeXjQcOO26vCmTtzZ3GzN27wrhdjvHn5 evQ04WS6f5u0lx8JVdfT5b4/acMzKTUKXrC2d/zGEoQmocWHPsRG+/5uBrCN0pbvxnyoHJ7Dl0sb pOlXZE0k9/dVolVTnYfhMBzGkyq07ZlCqNKxv7v+rr/Ly1xCT+YJFrG9EGe7Hj95dOslG7wb06Pv r3r0ZLRX07/UCqRB+HhL/csPd8aPLp4LgqhL0odwVRFf8AwvToHP9zfKczdy/s3Z2OCCCxW/Id2T Emw34ss1RIz8TAoCI1eTkNF/01uLFcM1ChM7/xv/ywcJWFui/wW+w+Hr8HX4OsbZxjeO3diNo8qj MjBo+emOGOe5waCNNUBACjwq14ifPLo1qHIpNk8GPRA+Q9VftejJ/jBr+hdorqgntq8zcgPbSzRT bagQSepQA/B4nQyrjfiCU+CT/bXyhKw5foq2ovhQaQ7Z94et4+eLcQrTvH5+KgZDYC0ZtXClrS3c fE7fahp46AWm/0JyAgzrUHF38KjRue3l3ue2+MnPI1NL+VG8XHpY7B9Vf+XSUw/6/j1xJ+6kElgE 4wyxeOyxNn5sgc0YlpzKF7Aq64/fk4L7vDyYfV6s7b16bQlWFB//7eCtvyjJ9/19f18MiNBEB74I L8ILnMLgEbrW3AtSZBfuqeHh8pGGlj54S/jZ/Owzp1ld2yX04PQ/t79q0dNh/2KcIal/+cU/hKh2 1ej+zC8GK5X8gHSs7UGx+2aLDA920DzT280AejpQaDCPXeOBikmOJK9cy3xOK7LOFJ/Iz6s6JJem hyusq+qvdvsXy0iZc9KZZ3j0TClIHfy2+5VfKcIzKrXx3qtiXFF2QuRnW/nFIFdLUv80n8gIQ0u1 vqxZsU0LKH8iBRnT2Psaemh6lQ5k25aq5ctTHGl60rZbbn9p6Gmrvekn6f7llxR0jkUSSpfv3vcn BT0FZRdPd0jlSQniyq//xd3SD3TfH8/i1qq1gsGyliTAlISYMIdLE79PE86AXPiwplzxfPGof7n0 YA3Mp1bhRnNIqbJ4GTU9jfpLQ089+dRcpOh5e3n/4idfKB8lhMV5FMuv+3W/3qXv7/PyYPZ55eQG HoPDSDAQHKFSBp+TEvxq/8Xsq8J9eFS4g8XR/zuoWn/YhdJ0+GbYifq4L3qAs/ySjRSS0sStDOS9 wl9JHj3pCaYhJVuGJY1qTk/Fd6Zubz1U3kWRH9macgY9DfpLT09NaPp33a279bhjQdgtoOIPLI7F ekhypP6du7mbh9NwGk679P3RIsbrAKXHxWT2qHw5e1T4vaPCz35XHs1+d8k9lifFN/rwZTEi6+/h +IP/+XAsfv/5d0wKOn9dkN7KW2+jL1eaJEzoMyJIfJPKY3Zj3BIB+Unib/kTSlhOPQs2QmV7BNZc j5567W2Lt3x9PJeeJvRr6Kknn5r+5U+43EoROXl7wcpbsk9Q4s+Fu3AXXVp/j/ePZ4+LxYfHxYIU VuH2DgtX9/7Dozf35x8+Lc4PTo/+6eB0SbZcSfgw3Dy3GXuv+MQDgkygay4EpSdBaSeDZH1LW7Kl 1T1h0id6c5rQk9veJkpQqj+3vKa9Gvo19ORKjtSb6ScSNO3l2bRZZu0l7QXp8s/+8sX/ePaX6Grd /c8b7zrn3GSy+x/Ls1++W1bYUYQf/bYIZfjlO2W1ugt34S7Kxf7tcvHRxk9/89HGoz999F8f/Wnc 3YM7fSjHRZN7tAvaEkdD30A+O+plvDcYmuPcnbvzLoVp7uZu/vE//t9///E/vta4oV+xNwwGw+rR B9fTwi3covvox9EuDN+eztRjmeFqRVbKjHxd6O+OJ3zVMk1DvZAKmhAATervjj+4ksvXptOS1oVG cs65FUwlSPE1EcG0L8mwclDcRlqRvwY9kt5A23wgJXfhLfG+8bzSGqo0HtUm9XfMf1oIxeXHKD9p D2y7lLN39duTgsGv/qjYnv4RrSMvEbUho4Ojb2LNeNaY7+aTdu23S4++fhp47xQ703fikPusfDr7 rBFVOJiT6rXyeRi5kRtFSpjaqigCvgILfF4yCqAtNevvWhrBLp79S/l49i+05UUqj7zlPGyNKuec u0JHssaMp079XnF3+r0lWxb0E4Gbqiil1b0ubBw+uZPsEb31kaazifUEFOIe0haOzdEw/rR8Mvt0 iaTN3dzNpz8s/s30h7hlN+5+/XX5+9mv+ZaUuEWMW9PpUcAUn6r+VU6H+RjEAAesDIbIix8MiTMt UXUVmc/UaisKDbEDhT4JEpT4JbyhHkNsHXGJtpHT/SW+Ff1bsGZSHwTuwQE7K5anv/QlZ/RUlBF/ L/4X+l1Vf3q7THO55XWiVYLRTfBUEpXh0U2kTwjyilQbHwVYUlP/VSk+hBTYGFJiIOXxyANKdatU TVbKDv3xGvhuxwkXCpnGtczZfTMAUxgMCovtbf61r3x4QHFEdcMs8fhGsj5YTA48yRMPn0mSwA88 TdzETSrWU7r+JoOzCXiP4EQYrUW0YTGUE9u5VqktOQpU9fdzLCg2Znd3NnlyBU3VbFHGrsrc7UWH 7VRfew0rc1m/mon2wi3cIg4Mvhm4uZdEOFOMBxNjee4NxLejspNCVHB7DU9o6Ou/WvBJq9ALZLci T6LEUktpsgxh3yp8U8ibWH+fTzpp3CAdYLKahkXzlTcGGrzk6537vVJ3bWUAK+jJLd85+NFuHHJt TROk+vmuerIByeeFAxUVFtiMb0KcfTPIqQZIj8WXVi6p/2oHtubzQ20nGxalkSs74jZafAu4kmvN Yv00GzCsVP2RakDnZQO1lT2AJRHByUKantzyqx9y3fl39FHVyMqgyTKPx4slJZol6wmtGKn+Hlg0 KhUjLdFwi5gmvz8o/mD6g/hfOPeC2XIy6u8zNMtfV6z+6g0txXQgYw1XwZS4LEBql8z+Jqqqz5MF HsRBvzqWXhnntgzf4suDF2FgorQaJaAjnwc4wvLp+tNSUW83mX7ngKTCUCYh3O8lb6ENOvi5Bf9d rIdtfFHV33+kP+Ft9Reg2TcBHdXpdcZ6s3qcAtB9OilfdzvgeikocUrOt6QklWBcP5XcEVxBQP2V pQ8pAAFbgY10MnrQkxgDmkleqlr1Z8uPnj96Oeer0vwJZsAQIh7G//KERNKGJLU8XAHYwhdm9Mbn GBZPqgz3J+T216Qm6Xzrw+sv+cWlDVYrvtgkXAuj6cDYjV2MDs2nUXEC0n+Dv8knh3afUUxs4hL4 48TNovwTJdh3lb1ybHpb8cFhQnoqzzhf8eXxfuErv5we8BWq6q8nPxr+6K1LaXizHhH7S1r9ZNTW rH+VEoufH1LiSA+XImiXSDnswYx9rQ9qR0FPVRFfoKLyZH9SnkjdTGU+2v7pFx9tP9p79F8e7X38 D//n3338DxlamX+x8Z5vntSvC193SxAnQX9cvD3949nH5eezjysLEWnOSI5zvnSA2yYw2go9p137 LNpapb9ef17boYfLQ7r+evKTpkdS3DzIUnP6+TQw/ZZ69a9cYtG5sWSqjlKHVrCU7gr4H0fEJ+Wj 2SeXOEmAqpyAVxfuwl0Ukx/9rphUuoEn3p67uZv/5O//w29+8vethSpqN/zR9d0JeO7O3fnP//xv 9n7+5z/+27/Y//HftpCh1WDoDtz5gEq5ifQGF1z4+Z/9zY9+/mc//ru/+G8//ruokTR1Hrtjd5wf 72/8o6+KcYz3xzwO5en+Wnn60Z2ffv7RHf9d/13/3dk/l/86++fi9t68uB0TFd3aOytuUVrLmLuD no/3XhXjcrG/WS7iPZWhhOjH+5PymKZdZO4+/Ot/+78f/jXStqSR8Hz3Z3f+ZPdn4pfcYDAMAWAz 5ltPr3OnX3zLCf16Z/n6a1McPVPkfSg29s6KjYpSm+9vlvNivPeyGJcX+1vlRbwnJUj5f8/2b5Vn 0x8W35n+MCY8Wts7KdYO3JE7cKrcZqjy2k3qbDAYriPGbuzGdS0gKZA0rsdhMqMHxb3pg/J0f708 faP4bpdzN/ETNyHF50Z+5Eak+KKfkTLAwZGm+Fv3i+3SqXwZaOWZxddnWDxH49VqAF7FLvP80ooM HGamLQVkzdH01s0D2ZLeebcI30pJV8kS926xO323kuc3vcojrdBJ63TdiZQJa1oad/yO3zE+qHgl ZEQz1EOXeX7hABPm8K1Yf1ICGlwIh4Pf0VdI1p+U51ezx5ASnWz7bb9dEalWVVV3Nd8QXFVQsuvY F+mMxoZa6NL6w3y+ZP29X3xn+n6cApP1x5UXD3gFVmTF+pOO0KdB+322/bbfpoSKMYeskFlVNWD4 f8/cmTtbkmBbUw+/2h32+vr15XP5g/+ktIf1wrhn0n/NLE30qvONzel5jL6/upAHU38YvDDu6ad7 nMZO9o6LCe8MtPgqvj+N9SeB7yqScsVCPenpBv8vJZz2u37X78YE27t+1+9iMukl9dCvtr99tXWS N7d+sTznT1KhXPLfu/6uv5v+8MQE3qS8Nv2m31TRzymvLtD13Iaq8B9lDHcmolpX929Nfqrl4Xqh yzy/OO2lVWBSgjgFPt/fKM/dyPnqLsKK35DuSQl+WR7Oviwmey+KycHi6JODxRIlKIHsvrf8W/6t mPYYFnBitnmyCum/6aEiJdimvPRzN3dzEtxYm7AXyW/5Lb9VSeANV/wvJW9uMJCihcUWguJ/of74 hFrBMiNXyuP2WoHb4n8hKbv4gQEOczor/CFqQfFVEmmP3diN43Mo2WfFF+kH6YpcYk6eWD4tP9ib en6CgyjWz4IyXBeuSsix/jJ1/BLFR9bfe8X96Xvlyf5aeRKtP7aOXFF8qDSbWH84XPnQxSkqn2ik ecIDOtCqNKk8aasNRjPkpwvwv/ppjjQx4UepSAHhAMAn/I3EHwylKflt+SE2KaQCj5YsqNFIFQ/A hfzBcAmg+JaEUegnMMoh40/8iLLA+ir5Yed/L+EnD5LK+xHHy7XFRFBwKXvH6SYOFasNfH84BabY FXF/3/n+RnmOcYZxwSTW81X5YvZVxfrjEe4UWS/8fX/f378kOrQ+Uogm0U/6xCiP4svrga/xkg8S 1IwqjL7MOAHkKo+HohIXbdiB/Fg/WRM0RMFaQTsiIw4NluQfD84f6XAYWq/poKF9U4I8OBjuWuXb ztLyg5eenzhl5uMlvcvi2qg/+iKkxbOa2sZ9Kw6HEPGlouxwFRjvKY/B5+XB7PPpg2L3zRYZLEOW 4OflwexztP4ejj/41cOxOEVNOn3D0/A0POVC4+/5e/6ek7JV8dBS9f7LD9inwxOAdbPECsB77s2k rz39CiY+FUUJ3/bKZIrVE/lDQBWDwxLjl6Q3J+l5mP4t9iw5MUh9A21REV/VsX8NpL20UjgGST7T tWn4yaRryXhJe8/7jwt34S7o/OwCNhCPXHDnzrvgxnHIfTt9THgz/eERX07dsTt1t1xwtyipHV+9 rdzTaZDx3qtifEn57xV3p9+Lvr9be8+LWwenR/90cKqa1OCXDSdxXLlj8HS6TtyJO1mSaIme5P6X R+7V18BbKg0SyRUwd3M3D0/Ck/Ak/mjdr/v1cBJOwklsO02pWDSOCn+4VXXsjt1xRSrwCbady0xz DqPChdX22FIK0ACh5HsXAEriBvY4/jdXPpuU5+NFCnR2XZQgcPv/AyrsPrjbs71eAAAAAElFTkSu QmCC --=-=-= Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit -- François Pinard http://www.iro.umontreal.ca/~pinard --=-=-=-- From jeremy@alum.mit.edu Wed Aug 21 20:03:50 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 21 Aug 2002 15:03:50 -0400 Subject: [Python-Dev] q about default args In-Reply-To: <20020821182451.GA31454@banana.mx1.ru> References: <20020821182451.GA31454@banana.mx1.ru> Message-ID: <15715.58390.417872.99598@slothrop.zope.com> >>>>> "SK" == Stepan Koltsov writes: SK> 2. Currently, default arguments are like static function SK> variables, SK> defined in the function parameter list! That is wrong. No, it's not. The Python language definition is completely clear on the semantics of default arguments. They are evaluated at function definition time and stored like static function variables. SK> 4. Again: I dislike code like SK> --- SK> def f(l=None): SK> if l is None: SK> l = [] SK> ... SK> === I don't see anything wrong with this code. SK> 5. I asked my friend (also big Python fan): why the current SK> behaviour is correct? his answer was: "the curren behaviour is SK> correct, becausethat is the way it was done in the first place SK> :-) ..." I don't see any advantages of the current style, and SK> lack of advantages is advantage of new style :-) Even if I liked the semantics you propose, it would create enormous pain to change the language semantics here. You'll have to work a lot harder on motivation if you want us to fix something that isn't broken :-). Jeremy From guido@python.org Wed Aug 21 20:19:27 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 21 Aug 2002 15:19:27 -0400 Subject: [Python-Dev] Backwards compatiblity In-Reply-To: Your message of "Wed, 21 Aug 2002 12:00:43 EDT." <15715.47403.731237.499977@anthem.wooz.org> References: <001a01c2492a$8f27cec0$9feb7ad1@othello> <15715.47403.731237.499977@anthem.wooz.org> Message-ID: <200208211919.g7LJJRr01664@pcp02138704pcs.reston01.va.comcast.net> > RH> Once, it is firmed-up a bit, how about putting sets.py on > RH> python.org or in the Vaults of Parnassus? > > Or making a nice little distutils package available on SF? Sorry, I'm not interested. It's a standard library module for Python 2.3. Everything else is a distraction from my POV. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Aug 21 20:28:51 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 21 Aug 2002 15:28:51 -0400 Subject: [Python-Dev] Parsing vs. lexing. In-Reply-To: Your message of "Wed, 21 Aug 2002 12:54:10 EDT." <20020821165410.GA18493@thyrsus.com> References: <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> <20020821032018.GA29112@thyrsus.com> <200208210357.g7L3vpl31876@pcp02138704pcs.reston01.va.comcast.net> <20020821041329.GA25548@panix.com> <015901c248eb$97772ec0$0900a8c0@spiff> <018e01c248ed$9ffc0d20$0900a8c0@spiff> <20020821122824.GA16740@panix.com> <15715.43805.195606.442523@gargle.gargle.HOWL> <20020821162402.GA10933@panix.com> <20020821165410.GA18493@thyrsus.com> Message-ID: <200208211928.g7LJSpT01688@pcp02138704pcs.reston01.va.comcast.net> > There are several `parser generator' tools that will compile a grammar > specification to a parser; the best-known one is Bison, an open-source > implementation of the classic Unix tool YACC (Yet Another Compiler > Compiler). > > Python has its own parser generator, SPARK. Unfortunately, while > SPARK is quite powerful (that is, good for handling ambiguities in the > spec), the Earley algorithm it uses gives O(n**3) performance in the > generated parser. It's not usable for production on larger than toy > grammars. > > The Python standard library includes a lexer class suitable for a large > class of shell-like syntaxes. As Guido has pointed out, regexps provide > another attack on the problem. SPARK can hardly make clame to the fame of being "Python's own parser generator". While it's a parser generator for Python programs and itself written in Python, it is not distributed with Python. "Python's own" would be pgen, which lives in the Parser subdirectory of the Python source tree. Pgen is used to parse the Python source code and construct a parse tree out of it. As parser generators go, pgen is appropriately (and pythonically) stupid -- its power is restricted to that of LL(1) languages, equivalent to recursive-descent parsers. Its only interesting feature may be that it uses a regex-like notation to feed it the grammar for which to generate a parser. (That's what the *, ?, [], | and () meta-symbols in the file Grammar/Grammar are for.) I would note that for small languages (much smaller than Python), writing a recursive-descent parser by hand is actually one of the most effective ways of creating a parser. I recently had the pleasure to write a recursive-descent parser for a simple Boolean query language; there was absolutely no need to involve a big gun like a parser generator. OTOH I would not consider writing a recursive-descent parser by hand for Python's Grammar -- that's why I created pgen in the first place. :-) Another note for Aahz: when it comes to scanning data that's not really a programming language, e.g. email messages, the words parsing, scanning, lexing and tokenizing are often used pretty much interchangeably. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Aug 21 20:32:08 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 21 Aug 2002 15:32:08 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: Your message of "Wed, 21 Aug 2002 13:13:11 EDT." <20020821171311.GA19427@thyrsus.com> References: <20020821011429.GE22413@thyrsus.com> <20020821053556.GA700@thyrsus.com> <20020821055825.GP29858@codesourcery.com> <20020821062226.GC1771@thyrsus.com> <20020821170353.GE2803@codesourcery.com> <20020821171311.GA19427@thyrsus.com> Message-ID: <200208211932.g7LJW8S01717@pcp02138704pcs.reston01.va.comcast.net> > > I remember you said you didn't want to do base64 decode because it was > > too slow? > > And not necessary. Base64 spam invariably has telltales that Bayesian > amalysis will pick up in the headers and MIME cruft. A rather large > percentage of it is either big5 or images. I'd be curious to know if that will continue to be true in the future. At least one of my non-tech friends sends email that's exclusively HTML (even though the content is very lightly marked-up plain text), from a hotmail account. Spam could easily have the same origin, but the HTML contents would be very different. --Guido van Rossum (home page: http://www.python.org/~guido/) From zack@codesourcery.com Wed Aug 21 20:50:29 2002 From: zack@codesourcery.com (Zack Weinberg) Date: Wed, 21 Aug 2002 12:50:29 -0700 Subject: [Python-Dev] Parsing vs. lexing. In-Reply-To: <200208211928.g7LJSpT01688@pcp02138704pcs.reston01.va.comcast.net> References: <20020821032018.GA29112@thyrsus.com> <200208210357.g7L3vpl31876@pcp02138704pcs.reston01.va.comcast.net> <20020821041329.GA25548@panix.com> <015901c248eb$97772ec0$0900a8c0@spiff> <018e01c248ed$9ffc0d20$0900a8c0@spiff> <20020821122824.GA16740@panix.com> <15715.43805.195606.442523@gargle.gargle.HOWL> <20020821162402.GA10933@panix.com> <20020821165410.GA18493@thyrsus.com> <200208211928.g7LJSpT01688@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020821195029.GK2803@codesourcery.com> On Wed, Aug 21, 2002 at 03:28:51PM -0400, Guido van Rossum wrote: > > I would note that for small languages (much smaller than Python), > writing a recursive-descent parser by hand is actually one of the most > effective ways of creating a parser. I recently had the pleasure to > write a recursive-descent parser for a simple Boolean query language; > there was absolutely no need to involve a big gun like a parser > generator. OTOH I would not consider writing a recursive-descent > parser by hand for Python's Grammar -- that's why I created pgen in > the first place. :-) You might be interested to know that over in GCC land we're changing the C++ front end to use a hand-written recursive descent parser. It's not done yet, but we expect it to be easier to maintain, faster, and better at generating diagnostics than the existing yacc-based parser. zw From barry@zope.com Wed Aug 21 20:53:42 2002 From: barry@zope.com (Barry A. Warsaw) Date: Wed, 21 Aug 2002 15:53:42 -0400 Subject: [Python-Dev] Backwards compatiblity References: <001a01c2492a$8f27cec0$9feb7ad1@othello> <15715.47403.731237.499977@anthem.wooz.org> <200208211919.g7LJJRr01664@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15715.61382.57288.352405@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: >> RH> Once, it is firmed-up a bit, how about putting sets.py on >> RH> python.org or in the Vaults of Parnassus? >> Or making a nice little distutils package available on SF? GvR> Sorry, I'm not interested. It's a standard library module GvR> for Python 2.3. Everything else is a distraction from my GvR> POV. I didn't mean to imply you should do it. But it would be easy enough to do for anybody who was sufficiently motivated. -Barry From barry@python.org Wed Aug 21 20:59:42 2002 From: barry@python.org (Barry A. Warsaw) Date: Wed, 21 Aug 2002 15:59:42 -0400 Subject: [Python-Dev] Parsing vs. lexing. References: <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> <20020821032018.GA29112@thyrsus.com> <200208210357.g7L3vpl31876@pcp02138704pcs.reston01.va.comcast.net> <20020821041329.GA25548@panix.com> <015901c248eb$97772ec0$0900a8c0@spiff> <018e01c248ed$9ffc0d20$0900a8c0@spiff> <20020821122824.GA16740@panix.com> <15715.43805.195606.442523@gargle.gargle.HOWL> <20020821162402.GA10933@panix.com> <20020821165410.GA18493@thyrsus.com> <200208211928.g7LJSpT01688@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15715.61742.187517.882603@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Another note for Aahz: when it comes to scanning data that's GvR> not really a programming language, e.g. email messages, the GvR> words parsing, scanning, lexing and tokenizing are often used GvR> pretty much interchangeably. True, although even stuff like email messages are defined by a formal grammar, i.e. RFC 2822. email.Generator of course doesn't strictly use that grammar because it's trying to allow a much greater leniency in its input than a language compiler would. But note that approaches like Emacs's mail-extr.el package do in fact try to do more strict parsing based on the grammar. -Barry From barry@python.org Wed Aug 21 21:23:04 2002 From: barry@python.org (Barry A. Warsaw) Date: Wed, 21 Aug 2002 16:23:04 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 References: <20020821011429.GE22413@thyrsus.com> <20020821053556.GA700@thyrsus.com> <20020821055825.GP29858@codesourcery.com> <20020821062226.GC1771@thyrsus.com> <20020821170353.GE2803@codesourcery.com> Message-ID: <15715.63144.762044.591822@anthem.wooz.org> | As far it it goes, yes. How would it learn? I have some ideas about how you could hook this into Mailman to do community/membership assisted learning. Understanding that people will be highly motivated to inform you about spam but not about good messages, you essentially queue a copy of a random sampling of messages for a few days. Members can let the list admin know about leaked spam (via a url or -spam address, or whatever) and after the list admin verifies it, this trains the system on that spam. If no feedback on a message happens after a few days, you train the system on that known good message. You need list admin verification to avoid attack vectors (I get mad at Guido so I -- a normal user -- label all his messages as spam). | On a more mundane note, I'd like to see decoding of base64 in it. | | (Oh, and on a blue-sky note, has anyone taken up Graham's suggestion | of having one of these things that looks at word pairs instead of | words?) | | It's neat that ESR saw immediately that the daemon should be | self-contained, no access to home directories. SpamAssassin doesn't | have a simple way of doing that, and [ISP] is modifying it to have | one -- and you wouldn't believe the resistance to the proposed | changes from some of the SA developers. Some of them really seem | to think that it's better and simpler to store user configuration | in a database than to have the client send its config file to the | server along with each message. >>>>> "ZW" == Zack Weinberg writes: ZW> I remember you said you didn't want to do base64 decode ZW> because it was too slow? But there might be some interesting, integrated ways around that. Say for example, you take a Python-enabled mail server, parse the message into its decoded form early (but not before low level SMTP-based rejections) and then pass that parsed and decoded message object tree around to all the other subsystems that are interested, e.g. the Bayes filter, and Mailman. You can at least amortize the cost of parsing and decoding once for the rest of the lifetime of that message on your system. I think we have all the pieces in place to play with this approach on python.org. -Barry From jeremy@alum.mit.edu Wed Aug 21 21:21:18 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 21 Aug 2002 16:21:18 -0400 Subject: [Python-Dev] Parsing vs. lexing. In-Reply-To: <20020821195029.GK2803@codesourcery.com> References: <20020821032018.GA29112@thyrsus.com> <200208210357.g7L3vpl31876@pcp02138704pcs.reston01.va.comcast.net> <20020821041329.GA25548@panix.com> <015901c248eb$97772ec0$0900a8c0@spiff> <018e01c248ed$9ffc0d20$0900a8c0@spiff> <20020821122824.GA16740@panix.com> <15715.43805.195606.442523@gargle.gargle.HOWL> <20020821162402.GA10933@panix.com> <20020821165410.GA18493@thyrsus.com> <200208211928.g7LJSpT01688@pcp02138704pcs.reston01.va.comcast.net> <20020821195029.GK2803@codesourcery.com> Message-ID: <15715.63038.724262.356240@slothrop.zope.com> >>>>> "ZW" == Zack Weinberg writes: ZW> You might be interested to know that over in GCC land we're ZW> changing the C++ front end to use a hand-written recursive ZW> descent parser. It's not done yet, but we expect it to be ZW> easier to maintain, faster, and better at generating diagnostics ZW> than the existing yacc-based parser. LCC also uses a hand-written recursive descent parser, for exactly the reasons you mention. Thought I'd also mention a neat new paper about an old algorithm for recursive descent parsers with backtracking and unlimited lookahead. Packrat Parsing: Simple, Powerful, Lazy, Linear Time, Bryan Ford. ICFP 2002 http://www.brynosaurus.com/pub.html Jeremy From barry@python.org Wed Aug 21 21:24:34 2002 From: barry@python.org (Barry A. Warsaw) Date: Wed, 21 Aug 2002 16:24:34 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 References: <20020821011429.GE22413@thyrsus.com> <20020821053556.GA700@thyrsus.com> <20020821055825.GP29858@codesourcery.com> <20020821062226.GC1771@thyrsus.com> <20020821170353.GE2803@codesourcery.com> <20020821171311.GA19427@thyrsus.com> Message-ID: <15715.63234.234910.602300@anthem.wooz.org> >>>>> "ESR" == Eric S Raymond writes: >> My ISP-postmaster friend's reaction to that: | As far it it >> goes, yes. How would it learn? ESR> Your users' mailers would have two delete buttons -- spam and ESR> nonspam. On each delete the message would be shipped to ESR> bogofilter, which would would merge the content into its ESR> token lists. You need some kind of list admin oversight or your system is open to attack vectors on individual posters. -Barry From aleax@aleax.it Wed Aug 21 21:25:27 2002 From: aleax@aleax.it (Alex Martelli) Date: Wed, 21 Aug 2002 22:25:27 +0200 Subject: [Python-Dev] Parsing vs. lexing. In-Reply-To: <20020821195029.GK2803@codesourcery.com> References: <20020821032018.GA29112@thyrsus.com> <200208211928.g7LJSpT01688@pcp02138704pcs.reston01.va.comcast.net> <20020821195029.GK2803@codesourcery.com> Message-ID: On Wednesday 21 August 2002 09:50 pm, Zack Weinberg wrote: ... > You might be interested to know that over in GCC land we're changing > the C++ front end to use a hand-written recursive descent parser. > It's not done yet, but we expect it to be easier to maintain, faster, > and better at generating diagnostics than the existing yacc-based > parser. Interesting! This reminds me of a long-ago interview with Borland's techies about how they had managed to create Turbo Pascal, which ran well in a 64K (K, not M-) one-floppy PC, when their competitor, Microsoft Pascal, forced one to do a lot of disc-jockeying even with 256K and 2 floppies. Basically, their take was "we just did everything by the Dragon Book -- except that the parser is a hand-written recursive descent parser [Aho &c being adamant defenders of Yacc & the like], which buys us a lot" ... Alex From barry@python.org Wed Aug 21 21:25:41 2002 From: barry@python.org (Barry A. Warsaw) Date: Wed, 21 Aug 2002 16:25:41 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 References: <20020820235933.GA22413@thyrsus.com> <20020821172726.GA12272@glacier.arctrix.com> Message-ID: <15715.63301.574377.513425@anthem.wooz.org> >>>>> "NS" == Neil Schemenauer writes: NS> Are you planning to check this into the sandbox? http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbox/spambayes/ -Barry From noah@noah.org Wed Aug 21 21:42:10 2002 From: noah@noah.org (Noah) Date: Wed, 21 Aug 2002 13:42:10 -0700 Subject: [Python-Dev] RE: Python-Dev digest, Vol 1 #2574 - 14 msgs In-Reply-To: <20020821190002.30652.46409.Mailman@mail.python.org> Message-ID: On Wed, 21 Aug 2002 Aahz wrote: > On Wed, Aug 21, 2002, Skip Montanaro wrote: > > parsing != tokenizing. ;-) > > Regular expressions are great for tokenizing (most of the time). > Ah. Here we see one of the little drawbacks of not finishing my CS > degree. ;-) Can someone suggest a good simple reference on the > distinctions between parsing / lexing / tokenizing, particularly in the > context of general string processing (e.g. XML) rather than the arcane > art of compiler technology? > Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ It's been 8 or 9 years since I took a compiler design class, so this info is probably be WRONG, but as luck would have it I've been reviewing some of this stuff lately so I can feel some of the old neuron paths warming up. Basically the distinction between a lexer and a parser refers to the complexity of symbol inputs that they can recognize. A lexer (AKA tokenizer) is modeled by a finite state machine (FSM). These don't have a stack or memory, just a state. They are not good for things that require nested structure. A parser recognizes more complex structures. They are good for things like XML and source code where you have NESTED tree structures (familiarly known as SYNTAX). If you have to remember how many levels deep you are in something then it means you need a parser. Technically a parser is something that can be defined by a context free grammar and can be recognized by a Push Down Automata (PDA). A PDA is a FSM with memory. A PDA has at least one stack. The "context free" on the grammar means that you can unambiguously recognize any section of the input stream no matter what came earlier in the stream. ... Sometimes real grammars are a little dirty and context does matter, but only within a small window. That means you might have to "look ahead" or behind a few symbols to eliminate some ambiguity. The amount that you have to look ahead sets the complexity of the grammar. This is called LALR (look ahead left reduce). So a simple grammar with no look ahead is called LALR(0). A slightly more complex grammar that requires 1 symbol look-ahead is called LALR(1). We like most parsing to be simple. I think languages like C and Python require are LALR(0). I think FORTRAN does require a look-ahead, so it's LALR(1). I have no idea what it must require to parse Perl. [Again Note: I sure I probably got some details wrong here.] If you go further in complexity and you want to handle evaluating expressions then you need a Turing Machine (TM). These are problems where a context-free grammar cannot be tweaked with a look-ahead. Another way to think about it is if your input is so complex that it must be described algorithmically then you need a TM. For example neither a FSM nor a PDA can recognize a non-rational number like SQRT(2) or pi. Nor can they recognize the truthfulness of expressions like "2*5=10" (although a PDA can recognize that it's a valid expression). RegExs are hybrids of FSM and PDA and are fine for ad hoc lexing. They are not very good for parsing. I think older style RegExs started life off as pure FSM, but newer flavors made popular by Perl added memory and became PDAs... or something like that. But somehow they are limited and not quite as powerful as a real PDA, so they can't parse. Traditionally C programmers used a combination of LEX and YACC (or GNU's versions of FLEX and Bison) to build parsers. You really only need YACC, but the problem is so much simpler if the input stream is tokenized before you try to parse it, which is why you also use LEX. Hopefully that captures the essence if not the actual facts. But then I'm trying to compress one year of computer science study into this email :-) If you are interested I just wrote a little PDA class for Python. Yours, Noah From tim@zope.com Wed Aug 21 22:05:58 2002 From: tim@zope.com (Tim Peters) Date: Wed, 21 Aug 2002 17:05:58 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020821172726.GA12272@glacier.arctrix.com> Message-ID: [Tim] > the version of this we've got now does update during scoring [Neil Schemenauer] > Are you planning to check this into the sandbox? Update-during-scoring was already in the initial version. This works with a Python dict, though (which Barry pickles and unpickles across runs), not with a persistent database (like ZODB). Changes to use a ZODB BTree would be easy, but not yet most interesting to me. There are many more basic open questions, like which kinds of tokenization ("feature extraction") do and don't work. BTW, that's why the WordInfo records have a .killcount attribute -- the data will tell us which ways work best. From pinard@iro.umontreal.ca Wed Aug 21 22:14:42 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 21 Aug 2002 17:14:42 -0400 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <15715.941.27029.778363@gargle.gargle.HOWL> References: <20020820232346.GA21177@thyrsus.com> <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> Message-ID: [Skip Montanaro] > The SpamAssassin folks are starting to look at Flex for much faster > regular expression matching in situations where large numbers of static > re's must be matched. This problem was also vivid in `procmail'-based SPAM filtering, as I observed it many years ago, and I remembered having quite lurked on the side of Flex at the time. I finally solved my own problem by writing a Python program able to sustain thousands of specialised rules rather speedily, proving once again that algorithmic approaches are often much slower than languages :-). > I wonder if using something like SciPy's weave tool would make it > (relatively) painless to incorporate fairly high-speed scanners into > Python programs. For a pure Python solution, PLEX could also be an avenue. It compiles a fast automaton, similar to what Flex does, from a grammar describing all tokens. I tried PLEX recently and was satisfied, even if not astonished by the speed. Also wanting fast parsing, I first used various Python heuristics, but the result was not solid enough to my taste. So I finally opted for Bison, and once on that road, it was just natural to rewrite the PLEX part in Flex. [Guido van Rossum] > I think you're exaggerating the problem, or at least underestimating > the re module. The re module is pretty fast! There are limits to what a backtracking matcher could do, speed-wise, when there are many hundreds of alternated patterns. [Eric S. Raymond] > It's pretty simple, actually. Lexing *is* tokenizing; it's breaking the > input stream into appropopriate lexical units. [...] Parsing, on the > other hand, consists of attempting to match your input to a grammar. > The result of a parse is typically either "this matches" or to throw > an error. There are two kinds of parsers -- event generators and > structure builders. Maybe lexing matches a grammar to a point, generates an event according to the match, advances the cursor and repeats until the end-of-file is met. Typically, parsing matches a grammar at point and does not repeat. In some less usual applications, they may be successive lexing stages, each taking its input from the output of the previous one. Parsing may be driven in stages too. I guess that we use the word `lexer' when the output structure is more flat, and `parser' when the output structure is more sexy! There are cases where the distinction is almost fuzzy. [Skip Montanaro] > Guido> I haven't given up on the re module for fast scanners (see > Guido> Tim's note on the speed of tokenizing 20,000 messages in > Guido> mere minutes). Note that the Bayes approach doesn't *need* > Guido> a trick to apply many regexes in parallel to the text. > Right. I'm thinking of it in situations where you do need such tricks. > SpamAssassin is one such place. I think Eric has an application (quickly > tokenizing the data produced by an external program, where the data can > run into several hundreds of thousands of lines) where this might be > beneficial as well. Heap queues could be useful here as well. Consider you have hundreds of regexps to match in parallel on a big text. When the regexp is not too complex, it is more easy or likely that there exists a fast searcher for it. Let's build one searcher per regexp. Always beginning from the start of text, find the spot where each regexp first matches. Build a heap queue using the position as key, and both the regexp and match data as value. The top of the heap is the spot of your first match, process it, and while removing it from the heap, search forward from that spot for the same regexp, and add any result back to the heap. Merely repeat until the heap is empty. I'm not fully sure, but I have the intuition the above could be advantageous. -- François Pinard http://www.iro.umontreal.ca/~pinard From jriehl@spaceship.com Wed Aug 21 22:29:33 2002 From: jriehl@spaceship.com (Jonathan Riehl) Date: Wed, 21 Aug 2002 16:29:33 -0500 (CDT) Subject: [Python-Dev] Parsing vs. lexing. In-Reply-To: <200208211928.g7LJSpT01688@pcp02138704pcs.reston01.va.comcast.net> Message-ID: On Wed, 21 Aug 2002, Guido van Rossum wrote: > > I would note that for small languages (much smaller than Python), > writing a recursive-descent parser by hand is actually one of the most > effective ways of creating a parser. I recently had the pleasure to > write a recursive-descent parser for a simple Boolean query language; > there was absolutely no need to involve a big gun like a parser > generator. OTOH I would not consider writing a recursive-descent > parser by hand for Python's Grammar -- that's why I created pgen in > the first place. :-) > As per Zach's comments, I think this is pretty funny. I have just spent more time trying to expose pgen to the interpreter than I took to write a R-D parser for Python 1.3 (granted, once Fred's parser module came around, I felt a bit silly). Considering the scope of my parser generator integration wishlist, having GCC move to a hand coded recursive descent parser is going to make my head explode. Even TenDRA (http://www.tendra.org/) used a LL(n) parser generator, despite its highly tweaked lookahead code. So now I'm going to have to extract grammars from call trees? As if the 500 languages problem isn't already intractable, there are going to be popular language implementations that don't even bother with an abstract syntax specificaiton!? (Stop me from further hyperbole if I am incorrect.) No wonder there are no killer software engineering apps. Maybe I should just start writing toy languages for kids... *smirk* -Jon From tdelaney@avaya.com Wed Aug 21 23:56:41 2002 From: tdelaney@avaya.com (Delaney, Timothy) Date: Thu, 22 Aug 2002 08:56:41 +1000 Subject: [Python-Dev] PEP 218 (sets); moving set.py to Lib Message-ID: > From: Guido van Rossum [mailto:guido@python.org] > > > Hmm ... is there a case that NotImplementedError should be a > > subclass of TypeError? Conceptually it would make sense (this *type* > > does not implement this method). > > I think you're overthinking this. NotImplementedError is fine for > code that wants to send that particular message to the user. We're > playing with TypeError here because we're trying to be close to the > metal. I'd be willing to concede that I was overthinking, except that it took surprisingly little thought for the connection to be made ;) Tim Delaney From greg@cosc.canterbury.ac.nz Thu Aug 22 00:20:06 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 22 Aug 2002 11:20:06 +1200 (NZST) Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <1029917815.581.3.camel@winterfell> Message-ID: <200208212320.g7LNK6O14137@oma.cosc.canterbury.ac.nz> Martin =?ISO-8859-1?Q?Sj=F6gren?= : > Uhm, what about + and juxtaposition? They are quite common at least > here in Sweden, for boolean algebra. They're not normally used for sets, though, in my experience (despite the fact that set theory is a boolean algebra:-). Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu Aug 22 00:25:07 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 22 Aug 2002 11:25:07 +1200 (NZST) Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <20020821082832.GA7256@thyrsus.com> Message-ID: <200208212325.g7LNP7G14158@oma.cosc.canterbury.ac.nz> "Eric S. Raymond" Subject: Re: [Python-Dev] Re: PEP 218 (sets): > Is it + for disjunction and juxtaposition for conjunction, or the other > way around? + is 'or' and juxtaposition (or sometimes a dot) is 'and' (I prefer those words because they're shorter than 'disjunction' and 'conjunction', and I can remember which is which:-). These are probably where Wirth got the idea of using + and * from in Pascal -- 'or' is considered the 'addition' operator in boolean algebra, and 'and' the 'multiplication' operator. (Although the two are actually completely symmetrical in their algebraic properties, so it's an arbitrary choice.) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From pinard@iro.umontreal.ca Thu Aug 22 00:31:44 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 21 Aug 2002 19:31:44 -0400 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: References: <20020820232346.GA21177@thyrsus.com> <20020821005252.GB22413@thyrsus.com> <15715.941.27029.778363@gargle.gargle.HOWL> Message-ID: [François Pinard] > I finally solved my own problem by writing a Python program able to > sustain thousands of specialised rules rather speedily, proving once > again that algorithmic approaches are often much slower than languages :-). Sorry, my English is so unclear! I meant that people sometimes say Python is slow, yet because it allows clearer algorithms, one ends up with more speed in Python than other solutions in supposedly faster languages. For these other solutions, the slowness comes from the algorithms. People are often tempted to benchmark languages, while they should rather benchmark ideas. :-) -- François Pinard http://www.iro.umontreal.ca/~pinard From greg@cosc.canterbury.ac.nz Thu Aug 22 00:40:44 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 22 Aug 2002 11:40:44 +1200 (NZST) Subject: [Python-Dev] More pydoc questions In-Reply-To: <0d6101c24936$57d6c5f0$6501a8c0@boostconsulting.com> Message-ID: <200208212340.g7LNeiS14216@oma.cosc.canterbury.ac.nz> David Abrahams : > N^HNA^HAM^HME^HE > docstring_ext That looks like it's designed to produce bold characters on a mechanical-impact printing device. Which is surely an anachronism in this day and age -- it's not going to work on most of the printers in use nowadays, is it? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From pinard@iro.umontreal.ca Thu Aug 22 00:54:07 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 21 Aug 2002 19:54:07 -0400 Subject: [Python-Dev] Re: More pydoc questions In-Reply-To: <200208212340.g7LNeiS14216@oma.cosc.canterbury.ac.nz> References: <200208212340.g7LNeiS14216@oma.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > David Abrahams : > > N^HNA^HAM^HME^HE > > docstring_ext > That looks like it's designed to produce bold characters > on a mechanical-impact printing device. Which is surely > an anachronism in this day and age -- it's not going to > work on most of the printers in use nowadays, is it? It works pretty well, as many printer filters know how to interpret such overstrike when meant to bold or underline. Some of these filters even know how to combine diacritics over/under letters. For glass screens, `less' also does the proper thing. A few years ago, I had to write a tool that should underline and bold, and after some looking around, found out that using overstrike was the most versatile and supported way to proceed, however anachronic it may look. Of course, we could resort to bigger hammers, like Docbook or XML, and converters of all sorts, but there is also a place for simple things! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard From greg@cosc.canterbury.ac.nz Thu Aug 22 01:03:05 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 22 Aug 2002 12:03:05 +1200 (NZST) Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <20020821162402.GA10933@panix.com> Message-ID: <200208220003.g7M035Z14410@oma.cosc.canterbury.ac.nz> Aahz : > Can someone suggest a good simple reference on the > distinctions between parsing / lexing / tokenizing Lexical analysis, otherwise known as "lexing" or "tokenising", is the process of splitting the input up into a sequence of "tokens", such as (in the case of a programming language) identifiers, operators, string literals, etc. Parsing is the next higher level in the process, which takes the sequence of tokens and recognises language constructs -- statements, expressions, etc. > particularly in the context of general string processing (e.g. XML) > rather than the arcane art of compiler technology? The lexing and parsing part of compiler technology isn't really any more arcane than it is for XML or anything else -- exactly the same principles apply. It's more a matter of how deeply you want to get into the theory. The standard text on this stuff around here seems to be Aho, Hopcroft and Ullman, "The Theory of Parsing, Translation and Compiling", but you might find that a bit much if all you want to do is parse XML. It will, however, give you a good grounding in the theory of REs, various classes of grammar, different parsing techniques, etc., after which writing an XML parser will seem like quite a trivial task. :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@comcast.net Thu Aug 22 02:13:02 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 21 Aug 2002 21:13:02 -0400 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <20020821032018.GA29112@thyrsus.com> Message-ID: [Eric S. Raymond] > ... > Lexers are painful in Python. This is so. > They hit the language in a weak spot created by the immutability of > strings. But you lost me here -- I don't see a connection between immutability and either ease of writing lexers or speed of lexers. Indeed, lexers are (IME) more-or-less exactly as painful and slow written in Perl, where strings are mutable. Seems more to me that lexing is convenient and fast only when expressed in a language specifically designed for writing lexers, and backed by a specialized execution engine that knows a great deal about fast state-machine implementation. Like, say, Flex. Lexing is also clumsy and slow in SNOBOL4 and Icon, despite that they excel at higher-level pattern-matching tasks. IOW, lexing is in a world by itself, almost nothing is good at it, and the few things that shine at it don't do anything else. lexing-is-the-floating-point-execution-unit-of-the-language-world-ly y'rs - tim From tim.one@comcast.net Thu Aug 22 02:21:19 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 21 Aug 2002 21:21:19 -0400 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <3D635DF8.9480.90CBCDDD@localhost> Message-ID: [Gordon McMillan] > mxTextTools lets (encourages?) you to break all > the rules about lex -> parse. If you can (& want to) > put a good deal of the "parse" stuff into the scanning > rules, you can get a speed advantage. You're also > not constrained by the rules of BNF, if you choose > to see that as an advantage :-). > > My one successful use of mxTextTools came after > using SPARK to figure out what I actually needed > in my AST, and realizing that the ambiguities in the > grammar didn't matter in practice, so I could produce > an almost-AST directly. I don't expect anyone will have much luck writing a fast lexer using mxTextTools *or* Python's regexp package unless they know quite a bit about how each works under the covers, and about how fast lexing is accomplished by DFAs. If you know both, you can build a DFA by hand and painfully instruct mxTextTools in the details of its construction, and get a very fast tokenizer (compared to what's possible with re), regardless of the number of token classes or the complexity of their definitions. Writing to mxTextTools directly is a lot like writing in an assembly language for a character-matching machine, with all the pains and potential joys that implies. If I were Eric, I'd use Flex . From tim.one@comcast.net Thu Aug 22 02:35:23 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 21 Aug 2002 21:35:23 -0400 Subject: [Python-Dev] More pydoc questions In-Reply-To: <200208212340.g7LNeiS14216@oma.cosc.canterbury.ac.nz> Message-ID: [David Abrahams] > N^HNA^HAM^HME^HE > docstring_ext Note that pydoc contains this handy function: def plain(text): """Remove boldface formatting from text.""" return re.sub('.\b', '', text) Note that it's important that the regexp *not* be a raw-string there (it will do somthing quite amazingly different if it is). From tim.one@comcast.net Thu Aug 22 02:50:29 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 21 Aug 2002 21:50:29 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido] > ... > Um, the notation is '|' and '&', not 'or' and 'and', and those are > what I learned in school. Seems pretty conventional to me (Greg > Wilson actually tried this out on unsuspecting newbies and found that > while '+' worked okay, '*' did not -- read the PEP). FYI, kjbuckets uses '+' (union) and '&' (intersection). '*' is used for graph composition. It so happens that graph composition applied to sets views each set as a graph of self-loops (1, 2, 7} -> {(1, 1), (2, 2), (7, 7)} and the composition of two such self-loop graphs is the self-loop graph of the sets' intersection. So you can view '*' as being a set intersection operation there. It's more useful to compose a graph with a set, in which case you get the subgraph all of whose start-arc nodes are in the set (set * graph), or all of whose end-arc nodes are in the set (graph * set). This is all very handy if you do a lot of it, but getting comfortable with this higher-level of view of things is at the other end of a learning curve. From greg@cosc.canterbury.ac.nz Thu Aug 22 03:00:41 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 22 Aug 2002 14:00:41 +1200 (NZST) Subject: [Python-Dev] Parsing vs. lexing. In-Reply-To: Message-ID: <200208220200.g7M20fD17456@oma.cosc.canterbury.ac.nz> Alex Martelli : > This reminds me of a long-ago interview with Borland's techies about > how they had managed to create Turbo Pascal, which ran well in a 64K > (K, not M-) one-floppy PC Even more impressive was the earlier version of Turbo Pascal which ran on 64K Z80-based CP/M systems! I have great respect for that one, because in a previous life I used it to develop a cross-compiler for a Modula-2-like language targeting the National 32000 architecture. My compiler consisted of 3 overlays (for parsing, declaration analysis and code generation), wasn't very fast, and had so little memory left for a symbol table that it could only compile very small modules. :-( In hindsight, my undoing was probably my insistence that the language not require forward declarations (it was my language, so I could make it how I wanted). If I had relaxed that, I could have used a single- pass design that would have simplified things considerably. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu Aug 22 03:18:39 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 22 Aug 2002 14:18:39 +1200 (NZST) Subject: [Python-Dev] Re: More pydoc questions In-Reply-To: Message-ID: <200208220218.g7M2Idv17608@oma.cosc.canterbury.ac.nz> pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard): > It works pretty well, as many printer filters know how to interpret such > overstrike when meant to bold or underline. Interesting - I hadn't known that. I guess it's not quite so silly as it might look, then. Still, there ought to be a way of getting plain output unadorned by such tricks from the help system. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@comcast.net Thu Aug 22 03:47:55 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 21 Aug 2002 22:47:55 -0400 Subject: [Python-Dev] Parsing vs. lexing. In-Reply-To: Message-ID: [Jonathan Riehl] > As per Zach's comments, I think this is pretty funny. I have just spent > more time trying to expose pgen to the interpreter than I took to write > a R-D parser for Python 1.3 (granted, once Fred's parser module came > around, I felt a bit silly). It seems a potential lesson went unlearned then . > Considering the scope of my parser generator integration wishlist, > having GCC move to a hand coded recursive descent parser is going to > make my head explode. Even TenDRA (http://www.tendra.org/) used a LL(n) > parser generator, despite its highly tweaked lookahead code. So now I'm > going to have to extract grammars from call trees? As if the 500 > languages problem isn't already intractable, there are going to be > popular language implementations that don't even bother with an abstract > syntax specificaiton!? (Stop me from further hyperbole if I am > incorrect.) Anyone writing an R-D parser by hand without a formal grammer to guide them is insane. The formal grammar likely won't capture everything, though -- but then they never do. > No wonder there are no killer software engineering apps. Maybe I should > just start writing toy languages for kids... Parser generators are great for little languages! They're painful for real languages, though, because syntax warts accumulate and then tool rigidity gets harder to live with. Hand-crafted R-D parsers are wonderfully tweakable in intuitive ways (staring at a mountain of parse-table conflicts and divining how to warp the grammar to shut the tool up is a black art nobody should regret not learning ...). 15 years of my previous lives were spent as a compiler jockey, working for HW vendors. The only time we used a parser generator was the time we used one written by a major customer, and for political reasons far more than technical ones. It worked OK in the end, but it didn't really save any time. It did save us from one class of error. I vividly recall a bug report against the previous Fortran compiler, where this program line (an approximation) CONTRAST = KNOB ** ROTATION apparently never got executed. It appeared to be an optimization bug at a fundamental level, as there was simply no code generated for this statement. After too much digging, we found that the guy who wrote the Fortran parser had done the equivalent of if not statement.has_label() and statement.startswith('CONT'): pass # an unlabelled CONTINUE statement can be ignored It's just that nobody had started a variable name with those 4 letters before. Yikes! I was afraid to fly for a year after . a-class-of-tool-most-appreciated-when-it's-least-needed-ly y'rs - tim From aahz@pythoncraft.com Thu Aug 22 03:59:27 2002 From: aahz@pythoncraft.com (Aahz) Date: Wed, 21 Aug 2002 22:59:27 -0400 Subject: [Python-Dev] Parsing vs. lexing. In-Reply-To: <200208220200.g7M20fD17456@oma.cosc.canterbury.ac.nz> References: <200208220200.g7M20fD17456@oma.cosc.canterbury.ac.nz> Message-ID: <20020822025927.GA7395@panix.com> On Thu, Aug 22, 2002, Greg Ewing wrote: > Alex Martelli : >> >> This reminds me of a long-ago interview with Borland's techies about >> how they had managed to create Turbo Pascal, which ran well in a 64K >> (K, not M-) one-floppy PC > > Even more impressive was the earlier version of Turbo Pascal which ran > on 64K Z80-based CP/M systems! s/64K/48K/ I believe you could actually theoretically start it up with only 32K of memory, but you couldn't do any real work. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From tim.one@comcast.net Thu Aug 22 04:39:34 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 21 Aug 2002 23:39:34 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020821053556.GA700@thyrsus.com> Message-ID: [Tim] >> Do you expect that to be an issue? When I built a database from 2= 0,000 >> messages, the whole thing fit in a Python dict consuming about 10M= B. [Eric S. Raymond] > Hm, that's a bit smaller than I would have thought, but the order o= f > magnitude I was expecting. It's even smaller than that . The dict here maps strings to in= stances of a Python class (WordInfo). The latter uses new-in-2.2 __slots__, = and those give major memory efficiences over old-style classes, but there= 's still subtantial memory overhead compared to what's possible in C. I= n addition, there are memory overheads for the Python objects stored in= the WordInfo instances, including a Python float object in each record re= cording the time.time() of last access by the scoring method. IOW, there are tons of memory overheads here, yet the size was still = minor. So I have no hesitation leaving this part in Python, and coding this = part up was a trivial finger exercise. You know all that, though! It makes = your decision to use C from the start hard to fathom. > ... > Recognition features should age! Wow! That's a good point! With = the > age counter being reset when they're recognized. For concreteness, here's the comment from the Python code, which I be= lieve is accurate: # (*)atime is the last access time, a UTC time.time() value. It'= s the # most recent time this word was used by scoring (i.e., by spampr= ob(), # not by training via learn()); or, if the word has never been us= ed by # scoring, the time the word record was created (i.e., by learn()= ). # One good criterion for identifying junk (word records that have= no # value) is to delete words that haven't been used for a long tim= e. # Perhaps they were typos, or unique identifiers, or relevant to = a # once-hot topic or scam that's fallen out of favor. Whatever, i= f # a word is no longer being used, it's just wasting space. Besides the space-saving gimmick, there may be practical value in exp= iring older words that are getting used, but less frequently over time. Th= at would be evidence that the nature of the world is changing, and more aggressively expiring the model for how the world *used* to be may sp= eed adaptation to the new realities. I'm not saving enough info to do th= at, though, and it's unclear whether it would really help. Against it, w= hile I see new spam gimmicks pop up regularly, the old ones never seem to go= away (e.g., I don't do anything to try to block spam on my email accounts,= and the bulk of the spam I get is still easily recognized from the subjec= t line alone). However, because it's all written in Python , it will = be very easy to set up experiments to answer such questions. BTW, the ifile FAQ gives a little info about the expiration scheme if= ile uses. Rennie's paper gives more: Age is defined as the number of e-mail messages which have been added to the model since frequency statistics have been kept for the word. Old, infrequent words are to be dropped while young wo= rds and old, frequent words should be kept. One way to quantify this is to say that words which occur fewer than log2(age)-1 times should be discarded from the model. For example, if =93baseball= =94 occurred in the 1st document and occurred 5 or fewer times in the next 63 documents, the word and its corresponding statistics woul= d be eliminated from the model=92s database. This feature selectio= n cutoff is used in ifile and is found to significantly improve efficiency without noticeably affecting classification performanc= e. I'm not sure how that policy would work with Graham's scheme (which h= as many differences from the more conventional scheme ifile uses). Our Pytho= n code also saves a count of the number of times each word makes it into Gra= ham's "best 15" list, and I expect that to be a direct measure of the value= we're getting out of keeping a word ("word" means whatever the tokenizer pa= sses to the classifier -- it's really any hashable and (for now) pickleable P= ython object). [on Judy] > I thought part of the point of the method was that you get > sorting for free because of the way elements are inserted. Sure, if range-search or final sorted order is important, it's a grea= t benefit. I was only wondering about why you'd expect spatial localit= y in the input as naturally ordered. > ... > No, but think about how the pointer in a binary search moves. It's > spatially bursty, Memory accesses frequencies for repeated binary > searches will be a sum of bursty signals, analogous to the way > network traffic volumes look in the time domain. In fact the > graph of memory adress vs. number of accesses is gonna win up > looking an awful lot like 1/f noise, I think. *Not* evenly > distributed; something there for LRU to weork with. Judy may or may not be able to exploit something here; I don't know, = and I'd need to know a lot more about Judy's implementation to even start to = guess. Plain binary search has horrid cache behavior, though. Indeed, most = older research papers on B-Trees suggest that binary search doesn't buy any= thing in even rather large B-Tree nodes, because the cache misses swamp the reduced instruction count over what a simple linear search does. Wor= se, that linear search can be significantly faster if the HW is smart eno= ugh to detect the regular address access pattern of linear search and do som= e helpful prefetching for you. More recent research on B-Trees is on w= ays to get away from bad binary search behavior; two current lines are using explicit prefetch instructions to minimize the stalls, and using a mo= re cache-friendly data structure inside a B-Tree node. Out-guessing mod= ern VM implementations is damned hard. With a Python dict you're likely to get a cache miss per lookup. If = that's a disaster, time.clock() isn't revealing it . > ... > What I'm starting to test now is a refactoring of the program where= it > spawn a daemon version of itself first time it's called. The daemo= n > eats the wordlists and stays in core fielding requests from subsequ= ent > program runs. Basically an answer to "how you call bogofilter 1K > times a day from procmail without bringing your disks to their knee= s" > problem" -- persistence on the cheap. > > Thing is that the solution to this problem is very generic. Might > turn into a Python framework. Sounds good! Just don't call it "compilerlike" . From python@rcn.com Thu Aug 22 04:57:10 2002 From: python@rcn.com (Raymond Hettinger) Date: Wed, 21 Aug 2002 23:57:10 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib References: Message-ID: <008301c24990$12449d00$47b53bd0@othello> > [Guido] > > Um, the notation is '|' and '&', not 'or' and 'and', and those are > > what I learned in school. Seems pretty conventional to me (Greg > > Wilson actually tried this out on unsuspecting newbies and found that > > while '+' worked okay, '*' did not -- read the PEP). [Tim] > FYI, kjbuckets uses '+' (union) and '&' (intersection). '*' is used for FTI, ISETL uses '+' and '*' as synonyms for the spelled-out 'inter' and 'union' operators. Playing with a sample session for possible inclusion in the tutorial, I've found that '|' is not nearly as clear in its intention as '+'. Raymond Hettinger ------------------------------------------------------------------- from sets import Set engineers = Set(['John', 'Jane', 'Jack', 'Janice']) programmers = Set(['Jack', 'Sam', 'Susan', 'Janice']) management = Set(['Jane', 'Jack', 'Susan', 'Zack']) employees = engineers | programmers | management # more clear with '+' engineering_management = engineers & programmers fulltime_management = management - engineers - programmers engineers.add('Marvin') print engineers, 'Look, Marvin was added' print employees.issuperset(engineers), 'There is a problem' print employees, 'Hmm, employees needs an update' employees.update(engineers) print employees, 'Looks fine now' for group in [engineers, programmers, management, employees]: group.discard('Susan') print group From guido@python.org Thu Aug 22 05:30:09 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 22 Aug 2002 00:30:09 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Wed, 21 Aug 2002 23:57:10 EDT." <008301c24990$12449d00$47b53bd0@othello> References: <008301c24990$12449d00$47b53bd0@othello> Message-ID: <200208220430.g7M4U9f02877@pcp02138704pcs.reston01.va.comcast.net> > Playing with a sample session for possible inclusion in the > tutorial, I've found that '|' is not nearly as clear in its > intention as '+'. It's way too early to say that. I actually like the fact that | and & are a new vocabulary (for containers at least) so they provide an additional hint that we're dealing with a different kind of container. For sequences, a+b != b+a (unless a==b). For sets, a|b == b|a. That's a useful distinction. + is already used for two distinct purposes: for numbers (where it is symmetric) and for sequences (where it is not). But numbers and sequences are unlikely to be confused because they are used so differently. Sets and lists are both containers, and I think it's useful that their vocabularies don't overlap much. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Thu Aug 22 05:49:08 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 22 Aug 2002 00:49:08 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020821171311.GA19427@thyrsus.com> Message-ID: [Eric S. Raymond] > Your users' mailers would have two delete buttons -- spam and nonspam. > On each delete the message would be shipped to bogofilter, which would > would merge the content into its token lists. I want to raise a caution here. Graham pulled his formulas out of thin air, and one part of the scoring step is quite dubious. This requires detail to understand. Where X means "word x is present" and similarly for Y, and S means "it's spam" and "not-S" means "it's not spam", and sticking to just the two-word case for simplicity: P(S | X and Y) = [by Bayes] P(X and Y | S) * P(S) / P(X and Y) = [by the usual expanded form of Bayes] P(X and Y | S) * P(S) / (P(S)*P(X and Y | S) + P(not-S)*P(X and Y | not-S)) All that is rigorously correct so far. Now we make the simplifying assumption that puts the "naive" in "naive Bayes", that the probability of X is independent of the probability of Y, so that the conjoined probabilities can be replaced by multiplication of non-conjoined probabilities. This yields P(X|S)*P(Y|S)*P(S) --------------------------------------------------- P(S)*P(X|S)*P(Y|S) + P(not-S)*P(X|not-S)*P(Y|not-S) Then, unlike a "normal" formulation of Bayesian classification, Graham's scheme simply doesn't know anything about P(X|S) and P(Y|S) etc. It only knows about probabilities in the other direction (P(S|X) etc). It takes 3 more applications of Bayes to get what we want from what we know. That is, P(X|S) = [again by Bayes] P(S|X) * P(X) / P(S) Plug that in, mutatis mutandis, in six places, to get P(S|X)*P(X)/P(S)*P(S|Y)*P(Y)/P(S)*P(S) --------------------------------------------------- P(S)*P(S|X)*P(X)/P(S)*P(S|Y)*P(Y)/P(S) + ... P(not-S)*P(not-S|X)*P(X)/P(not-S)*P(not-S|Y)*P(Y)/P(not-S) The factor P(X)*P(Y) cancels out of numerator and denominator, leaving P(S|X)*/P(S)*P(S|Y)*/P(S)*P(S) --------------------------------------------------- P(S)*P(S|X)/P(S)*P(S|Y)/P(S) + ... P(not-S)*P(not-S|X)/P(not-S)*P(not-S|Y)/P(not-S) and simplifying some P(whatever)/P(whatever) instances away gives P(S|X)*P(S|Y)/P(S) --------------------------------------------------- P(S|X)*P(S|Y)/P(S) + P(not-S|X)*P(not-S|Y)/P(not-S) This isn't what Graham computes, though: the P(S) and P(not-S) terms are missing in his formulation. Given that P(not-S) = 1-P(S), and P(not-S|whatever) = 1-P(S|whatever), what he actually computes is P(S|X)*P(S|Y) ------------------------------------- P(S|X)*P(S|Y) + P(not-S|X)*P(not-S|Y) This is the same as the Bayesian result only if P(S) = 0.5 (in which case all the instances of P(S) and P(not-S) cancel out). Else it's a distortion of the naive Bayesian result. For this reason, it's best that the number of spam msgs fed into your database be approximately equal to the number of non-spam msgs fed into it: that's the only way to make P(S) ~= P(not-S), so that the distortion doesn't matter. Indeed, it may be that Graham found he had to multiply his "good counts" by 2 in order to make up for that in real life he has twice as many non-spam messages as spam messages in his inbox, but that the final scoring implicitly assumes they're of equal number (and so overly favors the "it's spam" outcome unless the math is fudged elsewhere to make up for that). It would likely be better still to train the database with a proportion of spam to not-spam messages reflecting what you actually get in your inbox, and change the scoring to use the real-life P(S) and P(not-S) estimates. In that case the "mystery bias" of 2 may actively hurt, overly favoring the "not spam" outcome. Note that Graham said: Here's a sketch of how I do statistical filtering. I start with one corpus of spam and one of nonspam mail. At the moment each one has about 4000 messages in it. That's consistent with all the above, although it's unclear whether Graham intended "about the same" to be a precondition for using this formulation, or whether fudging elsewhere was introduced empirically to make up for the scoring formula neglecting P(S) and P(not-S) by oversight. From tim.one@comcast.net Thu Aug 22 06:38:22 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 22 Aug 2002 01:38:22 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <008301c24990$12449d00$47b53bd0@othello> Message-ID: [Raymond Hettinger] > FTI, ISETL uses '+' and '*' as synonyms for the spelled-out > 'inter' and 'union' operators. You realize that reads as if they used '+' for 'inter' and '*' for 'union', right? > Playing with a sample session for possible inclusion in the > tutorial, I've found that '|' is not nearly as clear in its intention > as '+'. > >... > engineers = Set(['John', 'Jane', 'Jack', 'Janice']) > programmers = Set(['Jack', 'Sam', 'Susan', 'Janice']) > management = Set(['Jane', 'Jack', 'Susan', 'Zack']) > > employees = engineers | programmers | management # more clear with '+' I haven't made time to play with the new sets module yet, but it was instantly clear to me just as it was. I think Guido makes a very good point about "+" making it much more confusable with a sequence or numeric operation too. OTOH, I'm rarely a fan of overloaded operators, and suspect I'll tend to use whatever .method() names the module supports (the set modules I've written for my own use never overloaded operators, btw). One thing did strike me as odd later! If I were to ask this company's HR director what kinds of employees they had, I bet the answer I'd hear is well, mostly we have engineers and programmers and management It seems far less likely I'd hear well, mostly we have engineers or programmers or management and I read "|" as "or". If I heard well, mostly we have engineers vertical-bar programmers vertical-bar management I'd beg to work there for free . From md9ms@mdstud.chalmers.se Thu Aug 22 09:44:24 2002 From: md9ms@mdstud.chalmers.se (Martin =?ISO-8859-1?Q?Sj=F6gren?=) Date: 22 Aug 2002 10:44:24 +0200 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208212320.g7LNK6O14137@oma.cosc.canterbury.ac.nz> References: <200208212320.g7LNK6O14137@oma.cosc.canterbury.ac.nz> Message-ID: <1030005864.561.1.camel@winterfell> --=-6UjBzwGgs/NryKWvMJfj Content-Type: text/plain Content-Transfer-Encoding: quoted-printable tor 2002-08-22 klockan 01.20 skrev Greg Ewing: > Martin =3D?ISO-8859-1?Q?Sj=3DF6gren?=3D : >=20 > > Uhm, what about + and juxtaposition? They are quite common at least > > here in Sweden, for boolean algebra. >=20 > They're not normally used for sets, though, in my > experience (despite the fact that set theory is > a boolean algebra:-). Nope, not for sets. I have rarely seen anything but \cup and \cap for sets. But they are quite often used when working with boolean algebras in general. Martin --=-6UjBzwGgs/NryKWvMJfj Content-Type: application/pgp-signature; name=signature.asc Content-Description: Detta =?ISO-8859-1?Q?=E4r?= en digitalt signerad meddelandedel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQA9ZKRoGpBPiZwE9FYRAlcjAJ9bg7Fl+ji8JF8f8hKb4b/CMmQ47wCfesR8 i8mQE+8956jpudkyD6hocJM= =8sSJ -----END PGP SIGNATURE----- --=-6UjBzwGgs/NryKWvMJfj-- From md9ms@mdstud.chalmers.se Thu Aug 22 09:55:37 2002 From: md9ms@mdstud.chalmers.se (Martin =?ISO-8859-1?Q?Sj=F6gren?=) Date: 22 Aug 2002 10:55:37 +0200 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: References: Message-ID: <1030006537.561.4.camel@winterfell> --=-FIrrKwwWlzNvPZonsSzc Content-Type: text/plain Content-Transfer-Encoding: quoted-printable tor 2002-08-22 klockan 03.13 skrev Tim Peters: > Seems more to me that lexing is convenient and fast only when expressed i= n a > language specifically designed for writing lexers, and backed by a > specialized execution engine that knows a great deal about fast > state-machine implementation. Like, say, Flex. Lexing is also clumsy an= d > slow in SNOBOL4 and Icon, despite that they excel at higher-level > pattern-matching tasks. IOW, lexing is in a world by itself, almost noth= ing > is good at it, and the few things that shine at it don't do anything else= . I've actually found that Haskell's pattern matching and lazy evaluation makes it pretty easy to write lexers. Too bad it's too hard to use Haskell together with other languages :( But then, writing a R-D parser in Haskell is piece-of-cake too :-) Martin --=-FIrrKwwWlzNvPZonsSzc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Detta =?ISO-8859-1?Q?=E4r?= en digitalt signerad meddelandedel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQA9ZKcJGpBPiZwE9FYRAt3BAJ9osD6UL0Ai6SDaKPv8eupnzASqigCgqJhV LtEKvmTv0SXwdzPA/YsPyeI= =r8i7 -----END PGP SIGNATURE----- --=-FIrrKwwWlzNvPZonsSzc-- From mwh@python.net Thu Aug 22 10:15:26 2002 From: mwh@python.net (Michael Hudson) Date: 22 Aug 2002 10:15:26 +0100 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Greg Ewing's message of "Thu, 22 Aug 2002 11:25:07 +1200 (NZST)" References: <200208212325.g7LNP7G14158@oma.cosc.canterbury.ac.nz> Message-ID: <2mlm6zw8y9.fsf@starship.python.net> Greg Ewing writes: > "Eric S. Raymond" > Subject: Re: [Python-Dev] Re: PEP 218 (sets): > > > Is it + for disjunction and juxtaposition for conjunction, or the other > > way around? > > + is 'or' and juxtaposition (or sometimes a dot) is 'and' > (I prefer those words because they're shorter than > 'disjunction' and 'conjunction', and I can remember which > is which:-). I've always thought "meet" and "join" to be quite cute terms for lattice operations. Guaranteed to confuse the average user of sets.py -- let's go for it! Cheers, M. -- The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence. -- Edsger W. Dijkstra, SIGPLAN Notices, Volume 17, Number 5 From mwh@python.net Thu Aug 22 10:31:07 2002 From: mwh@python.net (Michael Hudson) Date: 22 Aug 2002 10:31:07 +0100 Subject: [Python-Dev] q about default args In-Reply-To: Stepan Koltsov's message of "Wed, 21 Aug 2002 22:24:51 +0400" References: <20020821182451.GA31454@banana.mx1.ru> Message-ID: <2mit23w884.fsf@starship.python.net> Stepan Koltsov writes: > Hi, Guido, other Python developers and other subscribers. > > > First of all, if this question was discussed here or somewhere > else 8086 times, please direct me to discussion archives. I doubt it's ever been discussed on python-dev. Most people here know a non-starter when they see one. Hmm. Well, they know this one's a non-starter :) > I couldn't guess the keywords to search for in the python-dev archives > as I haven't found the search page where to enter these keywords :-) Just use google. If python-dev is the most relavent place for the discussion, the archives will be near the top of the results. > The question is: To be or^H^H^H^H^H^H^H^H^H Why not evaluate default > parameters of a function at THE function call, not at function def > (as is done currenly)? For example, C++ (a nice language, isn't it? ;-) No. > ) evaluates default parameters at function call. > [...] > Implementation details: > > Simple... > Add a flag to the code object, that means "evaluate default args". > Compile default args to small code objects and store them where values > for default args are stored in current Python (i.e. co_consts). That's not where they're stored. > When a function is called, evaluate the default args (if the above > flag is set) in the context of that function. This could break code, you realise: a = 1 def f(a, b=a): print a, b [...] I could go on, but I'm running out of steam... > An alternative way to go (a little example... LOOK ON, PERSONALY, I > LIKE IT ALLOT): Fortunately or unfortunately, that makes little difference to the direction of python development. > --- > > def f(x=12+[]): > stmts > > === > > compiled into something like: > > 0: LOAD_CONST 1 (12) > 1: BUILD_LIST 0 > 2: BINARY_ADD > 3: STORE_FAST 0 (x) > 4: # here code of stmts begin > > in the case if 'x' was specfied, the code is executed instruction 4 > onword This should work perfectly, ideologically correct and I think > even faster then current interpreter implementation. You'd have fun with: def f(a=1,b=2): print a, b f(b=1) here, no? > > Motivation (he-he, the most difficult part of this letter): > > 1. Try to import this module: > > ---xx.py--- > > import math > def func(a = map(lambda x: math.sqrt(x)): > pass > # there is no call to func > > === > > This code does nothing but define a single function, > but look at the execution time... So don't do something that thick, then! > 2. Currently, default arguments are like static function variables, > defined in the function parameter list! That is wrong. Says you. > 4. Again: I dislike code like > > --- > > def f(l=None): > if l is None: > l = [] > ... Who elected you style guru of the universe? > 5. I asked my friend (also big Python fan): why the current > behaviour is correct? his answer was: "the curren behaviour is > correct, becausethat is the way it was done in the first place :-) > ..." I don't see any advantages of the current style, and lack of > advantages is advantage of new style :-) For better of for worse, people *do* write code that depends on default function arguments being evaluated once, usually as a lazy way of precomputing things, or as a cache. > I hope, that the current state of things is a result of laziness (or is > it "business"), not sabotage :-) . and not an ideological decision. It > isn't late to fix Python yet :-) Two points: 1) I'm unconvinced this is a "fix" 2) I think it probably is too late. Cheers, M. -- You can lead an idiot to knowledge but you cannot make him think. You can, however, rectally insert the information, printed on stone tablets, using a sharpened poker. -- Nicolai -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html From fredrik@pythonware.com Thu Aug 22 12:03:44 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 22 Aug 2002 13:03:44 +0200 Subject: [Python-Dev] Parsing vs. lexing. References: Message-ID: <001601c249cb$98830fb0$0900a8c0@spiff> tim wrote: > Parser generators are great for little languages! They're painful for = real > languages, though, because syntax warts accumulate and then tool = rigidity > gets harder to live with. Hand-crafted R-D parsers are wonderfully > tweakable in intuitive ways (staring at a mountain of parse-table = conflicts > and divining how to warp the grammar to shut the tool up is a black = art > nobody should regret not learning ...). cf. http://compilers.iecc.com/comparch/article/00-12-106 "For me and C++, [using a parser generator] was a bad mistake." From magnus@hetland.org Thu Aug 22 15:24:32 2002 From: magnus@hetland.org (Magnus Lie Hetland) Date: Thu, 22 Aug 2002 16:24:32 +0200 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net>; from guido@python.org on Tue, Aug 20, 2002 at 11:47:11PM -0400 References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020822162432.E9248@idi.ntnu.no> Guido van Rossum : > [snip] > no hope that this will ever complete in finite time, but does that > mean it shouldn't start? I could write 1L< then I'd be paying for long ops that I'll only ever need in a case > that's only of theoretical importance. How about lazy sets? E.g. a CartesianProduct could delegate to its two underlying (concrete) sets when checking for membership, and a PowerSet could perform individual member cheks for each element in a given subset... Etc. I guess this might be too specific for the library -- subclassing ImmutableSet and overriding the accessors shouldn't be too hard... (The nice thing about including it in the library is that you could produce these things as results from operations on Set and ImmutableSet, e.g. 2**some_set could give a power set or whatever...) -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From magnus@hetland.org Thu Aug 22 15:25:01 2002 From: magnus@hetland.org (Magnus Lie Hetland) Date: Thu, 22 Aug 2002 16:25:01 +0200 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208210541.g7L5fBS07245@oma.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Wed, Aug 21, 2002 at 05:41:11PM +1200 References: <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <200208210541.g7L5fBS07245@oma.cosc.canterbury.ac.nz> Message-ID: <20020822162501.F9248@idi.ntnu.no> Greg Ewing : > [snip] > > Oh, no. Someone is bound to want set comprehensions, now... That's in the PEP, isn't it? -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From guido@python.org Thu Aug 22 16:12:57 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 22 Aug 2002 11:12:57 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Thu, 22 Aug 2002 16:24:32 +0200." <20020822162432.E9248@idi.ntnu.no> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> Message-ID: <200208221512.g7MFCvI27671@odiug.zope.com> > [snip] > > no hope that this will ever complete in finite time, but does that > > mean it shouldn't start? I could write 1L< > then I'd be paying for long ops that I'll only ever need in a case > > that's only of theoretical importance. > > How about lazy sets? E.g. a CartesianProduct could delegate to its two > underlying (concrete) sets when checking for membership, and a > PowerSet could perform individual member cheks for each element in a > given subset... Etc. Have you got a use case for membership tests of a cartesian product? > I guess this might be too specific for the library -- subclassing > ImmutableSet and overriding the accessors shouldn't be too hard... > > (The nice thing about including it in the library is that you could > produce these things as results from operations on Set and > ImmutableSet, e.g. 2**some_set could give a power set or whatever...) Use case? --Guido van Rossum (home page: http://www.python.org/~guido/) From python@discworld.dyndns.org Thu Aug 22 17:14:32 2002 From: python@discworld.dyndns.org (Charles Cazabon) Date: Thu, 22 Aug 2002 10:14:32 -0600 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: ; from tim.one@comcast.net on Thu, Aug 22, 2002 at 12:49:08AM -0400 References: <20020821171311.GA19427@thyrsus.com> Message-ID: <20020822101432.B25785@twoflower.internal.do> This brings up a couple of questions, one related to the theory behind this Bayesian spam filtering, and one about Python optimization ... apologies in advance for the long post. Tim Peters wrote: > > I want to raise a caution here. Graham pulled his formulas out of thin air, > and one part of the scoring step is quite dubious. This requires detail to > understand. [detail deleted] > P(S|X)*P(S|Y)/P(S) > --------------------------------------------------- > P(S|X)*P(S|Y)/P(S) + P(not-S|X)*P(not-S|Y)/P(not-S) > > This isn't what Graham computes, though: the P(S) and P(not-S) terms are > missing in his formulation. Given that P(not-S) = 1-P(S), and > P(not-S|whatever) = 1-P(S|whatever), what he actually computes is > > P(S|X)*P(S|Y) > ------------------------------------- > P(S|X)*P(S|Y) + P(not-S|X)*P(not-S|Y) > > This is the same as the Bayesian result only if P(S) = 0.5 (in which case > all the instances of P(S) and P(not-S) cancel out). Else it's a distortion > of the naive Bayesian result. Is there an easy fix to this problem? I implemented this in Python after reading about it on the weekend, and it might explain why my results are not quite as fabulous as the author noted (I'm getting more false positives than he claimed he was). Note that I'm not so good with the above notation; I'm more at home with plain algebraic stuff :). But the more interesting Python question: I'm running into some performance problems with my implementation. Details: The analysis stage of my implementation (I'll refer to it as "spamalyzer" for now) stores the "mail corpus" and term list on disk. The mail corpus is two dictionaries (one for spam, one for good mail), each of which contains two further dictionaries -- one is the filenames of analyzed messages (one key per filename, values ignored and stored as 0), and the other is a dictionary mapping terms to the number of occurrences. The terms list is a single dictionary mapping terms to a pair of floats (probability of being spam and distance from 0.5). My first try at this used cPickle to store these items, but loading them back in was excruciatingly slow. From a lightly loaded P3-500/128MB running Linux 2.2.x, each of these is a separate run of a benchmarking Python script: Loading corpus -------------- pickle method: good (1014 files, 289182 terms), spam (156 files, 14089 terms) in 65.190000000000 seconds. pickle method: good (1014 files, 289182 terms), spam (156 files, 14089 terms) in 64.790000000000 seconds. pickle method: good (1014 files, 289182 terms), spam (156 files, 14089 terms) in 65.010000000000 seconds. Loading terms ------------- pickle method: got 12986 terms in 3.460000000000 seconds. pickle method: got 12986 terms in 3.470000000000 seconds. pickle method: got 12986 terms in 3.450000000000 seconds. For a lark, I decided to try an alternative way of storing the data (and no, I haven't tried the marshal module directly). I wrote a function to write the contents of the dictionary to a text file in the form of Python source, so that you can re-load the data with a simple "import" command. To my surprise, this was significantly faster! The first import, of course, takes a while, as the interpreter compiles the .py file to .pyc format, but subsequent runs are an order of magnitude faster than cPickle.load(): Loading corpus -------------- [charlesc@charon spamalyzer]$ rm mail_corpus.pyc custom method: good (1014 files, 289182 terms), spam (156 files, 14089 terms) in 194.210000000000 seconds. custom method: good (1014 files, 289182 terms), spam (156 files, 14089 terms) in 3.500000000000 seconds. custom method: good (1014 files, 289182 terms), spam (156 files, 14089 terms) in 3.260000000000 seconds. custom method: good (1014 files, 289182 terms), spam (156 files, 14089 terms) in 3.260000000000 seconds. Loading terms ------------- [charlesc@charon spamalyzer]$ rm terms.pyc custom method: got 12986 terms in 3.110000000000 seconds. custom method: got 12986 terms in 0.210000000000 seconds. custom method: got 12986 terms in 0.210000000000 seconds. custom method: got 12986 terms in 0.210000000000 seconds. So the big question is, why is my naive "o = __import__ (f, {}, {}, [])" so much faster than the more obvious "o = cPickle.load (f)"? And what can I do to make it faster :). Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL'ed software available at: http://www.qcc.ca/~charlesc/software/ ----------------------------------------------------------------------- From skip@pobox.com Thu Aug 22 17:33:25 2002 From: skip@pobox.com (Skip Montanaro) Date: Thu, 22 Aug 2002 11:33:25 -0500 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020822101432.B25785@twoflower.internal.do> References: <20020821171311.GA19427@thyrsus.com> <20020822101432.B25785@twoflower.internal.do> Message-ID: <15717.4693.455504.793456@gargle.gargle.HOWL> Charles> So the big question is, why is my naive "o = __import__ (f, {}, Charles> {}, [])" so much faster than the more obvious "o = cPickle.load Charles> (f)"? And what can I do to make it faster :). Try dumping in the binary format, e.g.: s = cPickle.dumps(obj, 1) Skip From tim@zope.com Thu Aug 22 20:37:26 2002 From: tim@zope.com (Tim Peters) Date: Thu, 22 Aug 2002 15:37:26 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020822101432.B25785@twoflower.internal.do> Message-ID: [Charles Cazabon] > Is there an easy fix to this problem? I don't know that there is "a problem". The step is dubious, but other steps are also dubious, and naive Bayes itself is dubious (the assumption that word pobabilities are independent is simply false in this application). But outside of, perhaps, quantum chromodynamics, all models of reality are more or less dubious, and it's darned hard to say whether the fudging needed to make them appear to work is lucky or principled, robust or brittle. The more gross deviations there are from a model, though, the less one can appeal to the model for guidance. In the limit, you can end up with a pile of arbitrary tricks, with no idea which gimmicks matter anymore (given enough free parameters to fiddle, you can train even a horribly inappropriate model to fit a specific blob of data exactly). > I implemented this in Python after reading about it on the weekend, and it > might explain why my results are not quite as fabulous as the author noted > (I'm getting more false positives than he claimed he was). How many lines of code do you have? That's a gross lower bound on the number of places it might have screwed up . > Note that I'm not so good with the above notation; I'm more at home with > plain algebraic stuff :). It's all plain-- and simple ---algebra, it's just long-winded. You may be confusing yourself, e.g., by reading P(S|X) as if it's a complex expression in its own right. But it's not -- given an incarnation of the universe, it denotes a fixed number. Think of it as "w" instead . Let's get concrete. You have a spam corpus with 1000 messages. 100 of them contain the word x, and 500 contain the word y. Then P(X|S) = 100/1000 = 1/10 P(Y|S) = 500/1000 = 1/2 You also have a non-spam corpus with 2000 messages. 100 of them contain x too, and 500 contain y. Then P(X|not-S) = 100/2000 = 1/20 P(Y|not-Y) = 500/2000 = 1/4 This is the entire universe, and it's all you know. If you pick a message at random, what's P(S) = the probability that it's from the spam corpus? It's trivial: P(S) = 1000/(1000+2000) = 1/3 and P(not-S) = 2/3 Now *given that* you've picked a message at random, and *know* it contains x, but don't know anything else, what's the probability it's spam (== what's P(S|X)?). Well, it has to be one of the 100 spam messages that contains x, or one of the 100 non-spam messages that contains x. They're all equally likely, so P(S|X) = (100+100)/200 = 1/2 and P(S|Y) = (500+500)/500 = 1/2 too by the same reasoning. P(not-S|X) and P(not-S|Y) are also 1/2 each. So far, there's nothing a reasonable person can argue with. Given that this is our universe, these numbers fall directly out of what reasonable people agree "probability" means. When it comes to P(S|X and Y), life is more difficult. If we *agree* to assume that word probabilities are independent (which is itself dubious, but has the virtue of appearing to work pretty well anyway), then the number of messages in the spam corpus we can expect to contain both X and Y is P(X|S)*P(Y|S)*number_spams = (1/10)*(1/2)*1000 = 50 Similarly the # of non-spam messages we can expect to contain both X and Y is (1/20)*(1/4)*2000 = 25 Since that's all the messages that contain both X and Y, the probability that a message containing both X and Y is spam is P(S | X and Y) = 50/(50 + 25) = 2/3 Note that this agrees with the formula whose derivation I spelled out from first principles: P(S|X)*P(S|Y)/P(S) --------------------------------------------------- = P(S|X)*P(S|Y)/P(S) + P(not-S|X)*P(not-S|Y)/P(not-S) (1/2)*(1/2)/(1/3) 2 -------------------------------------- = - (1/2)*(1/2)/(1/3) + (1/2)*(1/2)/(2/3) 3 It's educational to work through Graham's formulation on the same example. To start with, P(S|X) is approximated by a different means, and fudging the "good count" by a factor of 2, giving P'(S|X) = (100/1000) / (100/1000 + 2*100/2000) = 1/2 and similarly for P'(S|Y). These are the same probabilities I gave above, but the only reason they're the same is because I deliberately picked a spam corpus size exactly half the size of the non-spam corpus, knowing in advance that this factor-of-2 fudge would make the results the same in the end. The only difference in what's computed then is in the scoring step, where Graham's formulation computes P'(S | X and Y) = (1/2)*(1/2)/((1/2)*(1/2)+(1/2)*(1/2)) = 1/2 instead of the 2/3 that's actually true in this universe. If the corpus sizes diverge more, the discrepancies at the end grow too, and the way of computing P(S|X) at the start also diverges. Is that good or bad? I say no more now than that it's dubious . > But the more interesting Python question: I'm running into some > performance problems with my implementation. Details: English never helps with these. Whittle it down and post actual code to comp.lang.python for help. Or study the sandbox code in the CVS repository (see the Subject line of this msg) -- it's not having any speed problems. From tim@zope.com Thu Aug 22 22:55:55 2002 From: tim@zope.com (Tim Peters) Date: Thu, 22 Aug 2002 17:55:55 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: Message-ID: [Tim] > ... > P(S|X) = (100+100)/200 = 1/2 > and > P(S|Y) = (500+500)/500 = 1/2 > > too by the same reasoning. P(not-S|X) and P(not-S|Y) are also 1/2 each. > > So far, there's nothing a reasonable person can argue with. And note that only an unreasonable person would argue that (100+100)/200 = 1, so don't even think about it . Of course those should have been P(S|X) = 100/(100+100) = 1/2 and P(S|Y) = 500/(500+500) = 1/2 There are probably other glitches like that. Work out the example yourself -- it's much easier to figure it out than to transcribe all the tedious numbers into a msg. From esr@thyrsus.com Thu Aug 22 23:27:37 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 22 Aug 2002 18:27:37 -0400 Subject: [Python-Dev] q about default args In-Reply-To: <20020821182451.GA31454@banana.mx1.ru> References: <20020821182451.GA31454@banana.mx1.ru> Message-ID: <20020822222737.GA3044@thyrsus.com> Stepan Koltsov : > The question is: To be or^H^H^H^H^H^H^H^H^H Why not evaluate default > parameters of a function at THE function call, not at function def > (as is done currenly)? For example, C++ (a nice language, isn't it? ;-) > ) evaluates default parameters at function call. Among other things, because that choice (what old LISP hackers like me call `dynamic scoping') turns out to be far more difficult to model mentally than Python's lexical scoping. Forty years of LISP experience says Python does the right thing. -- Eric S. Raymond From esr@thyrsus.com Thu Aug 22 23:46:56 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 22 Aug 2002 18:46:56 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <200208211932.g7LJW8S01717@pcp02138704pcs.reston01.va.comcast.net> References: <20020821011429.GE22413@thyrsus.com> <20020821053556.GA700@thyrsus.com> <20020821055825.GP29858@codesourcery.com> <20020821062226.GC1771@thyrsus.com> <20020821170353.GE2803@codesourcery.com> <20020821171311.GA19427@thyrsus.com> <200208211932.g7LJW8S01717@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020822224656.GD3044@thyrsus.com> Guido van Rossum : > > And not necessary. Base64 spam invariably has telltales that Bayesian > > amalysis will pick up in the headers and MIME cruft. A rather large > > percentage of it is either big5 or images. > > I'd be curious to know if that will continue to be true in the future. > At least one of my non-tech friends sends email that's exclusively > HTML (even though the content is very lightly marked-up plain text), > from a hotmail account. Spam could easily have the same origin, but > the HTML contents would be very different. Well, consider. If your friend were to send you base64 mail, it probaby would *not* come from one of the spamhaus addresses in bogofilter's wordlists. The presence of base64 content is neutral. That means that about the only way not decoding it could lead to a false positive is if the headers contained spam-correlated tokens which decoding the body would have countered with words having a higher non-spam loading. -- Eric S. Raymond From esr@thyrsus.com Thu Aug 22 23:38:11 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 22 Aug 2002 18:38:11 -0400 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: References: <20020821032018.GA29112@thyrsus.com> Message-ID: <20020822223811.GC3044@thyrsus.com> Tim Peters : > > They hit the language in a weak spot created by the immutability of > > strings. > > But you lost me here -- I don't see a connection between immutability and > either ease of writing lexers or speed of lexers. It's an implementation problem. You find yourself doing a lot of string accessing and pasting, creating several new objects per input char. -- Eric S. Raymond From esr@thyrsus.com Thu Aug 22 23:33:34 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 22 Aug 2002 18:33:34 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <15715.63234.234910.602300@anthem.wooz.org> References: <20020821011429.GE22413@thyrsus.com> <20020821053556.GA700@thyrsus.com> <20020821055825.GP29858@codesourcery.com> <20020821062226.GC1771@thyrsus.com> <20020821170353.GE2803@codesourcery.com> <20020821171311.GA19427@thyrsus.com> <15715.63234.234910.602300@anthem.wooz.org> Message-ID: <20020822223334.GB3044@thyrsus.com> Barry A. Warsaw : > You need some kind of list admin oversight or your system is open to > attack vectors on individual posters. Interesting point! -- Eric S. Raymond From greg@cosc.canterbury.ac.nz Fri Aug 23 03:38:32 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 23 Aug 2002 14:38:32 +1200 (NZST) Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <20020822223811.GC3044@thyrsus.com> Message-ID: <200208230238.g7N2cWW27387@oma.cosc.canterbury.ac.nz> "Eric S. Raymond" : > Tim Peters : > > But you lost me here -- I don't see a connection between immutability and > > either ease of writing lexers or speed of lexers. > It's an implementation problem. You find yourself doing a lot of > string accessing and pasting, creating several new objects per > input char. Not necessarily! Plex manages to do it without any of that. The trick is to leave all the characters in the input buffer and just *count* how many characters make up the next token. Once you've decided where the token ends, one slice gives it to you. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@comcast.net Fri Aug 23 04:17:08 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 22 Aug 2002 23:17:08 -0400 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <200208230238.g7N2cWW27387@oma.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > Not necessarily! Plex manages to do it without any > of that. > > The trick is to leave all the characters in the input > buffer and just *count* how many characters make up > the next token. Once you've decided where the token > ends, one slice gives it to you. Plex is very nice! It doesn't pass my "convient and fast" test only because the DFA at the end still runs at Python speed, and one character at a time is still mounds slower than it could be in C. Hmm. But you can also generate pretty reasonable C code from Python source now too! You're going to solve this yet, Greg. Note that mxTextTools also computes slice indices for "tagging", rather than build up new string objects. Heck, that's also why Guido (from the start) gave the regexp and string match+search gimmicks optional start-index and end-index arguments too, and why one of the "where did this group match?" flavors returns slice indices. I think Eric has spent too much time debugging C lately . From greg@cosc.canterbury.ac.nz Fri Aug 23 04:21:55 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 23 Aug 2002 15:21:55 +1200 (NZST) Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: Message-ID: <200208230321.g7N3Ltd27561@oma.cosc.canterbury.ac.nz> > Hmm. But you can also generate pretty reasonable C code from Python > source now too! You're going to solve this yet, Greg. Yes, probably the first serious use I make of Pyrex will be to re-implement the inner loop of Plex so it runs at C speed. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From bsder@mail.allcaps.org Fri Aug 23 04:25:25 2002 From: bsder@mail.allcaps.org (Andrew P. Lentvorski) Date: Thu, 22 Aug 2002 20:25:25 -0700 (PDT) Subject: [Python-Dev] q about default args In-Reply-To: <20020822222737.GA3044@thyrsus.com> Message-ID: <20020822194902.V37067-100000@mail.allcaps.org> On Thu, 22 Aug 2002, Eric S. Raymond wrote: > Stepan Koltsov : > > The question is: To be or^H^H^H^H^H^H^H^H^H Why not evaluate default > > parameters of a function at THE function call, not at function def > > Among other things, because that choice (what old LISP hackers like me > call `dynamic scoping') turns out to be far more difficult to model > mentally than Python's lexical scoping. That statement sounds like someone spent a lot of time doing research on it. Is there a reference I could go look up? -a From tim.one@comcast.net Fri Aug 23 04:29:59 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 22 Aug 2002 23:29:59 -0400 Subject: [Python-Dev] RE: [Python-checkins] python/dist/src/Lib pyclbr.py,1.26,1.27 In-Reply-To: Message-ID: > Update of /cvsroot/python/python/dist/src/Lib > In directory usw-pr-cvs1:/tmp/cvs-serv15469 > > Modified Files: > pyclbr.py > Log Message: > Rewritten using the tokenize module, which gives us a real tokenizer > rather than a number of approximating regular expressions. > Alas, it is 3-4 times slower. Let that be a challenge for the > tokenize module. Was this just for purity, or did it fix a bug? The regexps there were close to being heroically careful, and even so it was somtimes uncomfortably slow using the class browser in IDLE (based on pyclbr), and even on a fast machine. A factor of 3 or 4 might make that unbearable. If it was for purity, note that tokenize is also based on mounds of regexp tricks . From esr@thyrsus.com Fri Aug 23 06:06:45 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Fri, 23 Aug 2002 01:06:45 -0400 Subject: [Python-Dev] q about default args In-Reply-To: <20020822194902.V37067-100000@mail.allcaps.org> References: <20020822222737.GA3044@thyrsus.com> <20020822194902.V37067-100000@mail.allcaps.org> Message-ID: <20020823050645.GA5118@thyrsus.com> Andrew P. Lentvorski : > > Among other things, because that choice (what old LISP hackers like me > > call `dynamic scoping') turns out to be far more difficult to model > > mentally than Python's lexical scoping. > > That statement sounds like someone spent a lot of time doing research on > it. Is there a reference I could go look up? It's sort of a folk theorem derived from painful experience. Nobody has proposed a new LISP dialect with lexical scoping since the mid-1980s. Scheme and Common LISP, both lexically scoped, pretty much settled the controversy. -- Eric S. Raymond From est@hyperreal.org Fri Aug 23 07:37:56 2002 From: est@hyperreal.org (Eric Tiedemann) Date: Thu, 22 Aug 2002 23:37:56 -0700 (PDT) Subject: [Python-Dev] q about default args In-Reply-To: <20020823050645.GA5118@thyrsus.com> Message-ID: <20020823063756.64183.qmail@hyperreal.org> Eric S. Raymond discourseth: > Andrew P. Lentvorski : > > > Among other things, because that choice (what old LISP hackers like me > > > call `dynamic scoping') turns out to be far more difficult to model > > > mentally than Python's lexical scoping. > > > > That statement sounds like someone spent a lot of time doing research on > > it. Is there a reference I could go look up? > > It's sort of a folk theorem derived from painful experience. Nobody > has proposed a new LISP dialect with lexical scoping since the mid-1980s. > Scheme and Common LISP, both lexically scoped, pretty much settled the > controversy. http://citeseer.nj.nec.com/steele93evolution.html has some good coverage of this. When it comes to the original topic (the handling of default arguments), I think it's possible to separate time of evaluation and scope of evaluation. Call-time and static-scoping seem like good choices to me. Being able to refer to parameters to the left of the one you're defaulting can be especially handy. E From fredrik@pythonware.com Fri Aug 23 08:56:43 2002 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 23 Aug 2002 09:56:43 +0200 Subject: [Python-Dev] Re: Automatic flex interface for Python? References: <200208230238.g7N2cWW27387@oma.cosc.canterbury.ac.nz> Message-ID: <007b01c24a7a$a081c580$0900a8c0@spiff> greg wrote: > > It's an implementation problem. You find yourself doing a lot of=20 > > string accessing and pasting, creating several new objects per > > input char. >=20 > Not necessarily! Plex manages to do it without any > of that. > > The trick is to leave all the characters in the input > buffer and just *count* how many characters make up > the next token. you can do that without even looking at the characters? From guido@python.org Fri Aug 23 14:28:52 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 23 Aug 2002 09:28:52 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: Your message of "Thu, 22 Aug 2002 18:46:56 EDT." <20020822224656.GD3044@thyrsus.com> References: <20020821011429.GE22413@thyrsus.com> <20020821053556.GA700@thyrsus.com> <20020821055825.GP29858@codesourcery.com> <20020821062226.GC1771@thyrsus.com> <20020821170353.GE2803@codesourcery.com> <20020821171311.GA19427@thyrsus.com> <200208211932.g7LJW8S01717@pcp02138704pcs.reston01.va.comcast.net> <20020822224656.GD3044@thyrsus.com> Message-ID: <200208231328.g7NDSrX08124@pcp02138704pcs.reston01.va.comcast.net> > Well, consider. If your friend were to send you base64 mail, it > probaby would *not* come from one of the spamhaus addresses in > bogofilter's wordlists. Yeah, but not every spammer sends from a well-known spammer's address. > The presence of base64 content is neutral. That means that about the only > way not decoding it could lead to a false positive is if the headers > contained spam-correlated tokens which decoding the body would have > countered with words having a higher non-spam loading. Graham mentions the possibility that spammers can develop ways to make their headers look neutral. When I receive a base64-encoded HTML message from Korea whose subject is "Hi", it could be from a Korean Python hacker (there were 700 of those at a conference Christian Tismer attended in Korea last year, so this is a realistic example), or it could be Korean spam. Decoding the base64 would make it obvious. The headers usually give some clues, but based on what makes it through SpamAssassin (which we've been running for all python.org mail since February or so), base64 encoding scores high on the list of false negatives. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Aug 23 14:29:55 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 23 Aug 2002 09:29:55 -0400 Subject: [Python-Dev] q about default args In-Reply-To: Your message of "Thu, 22 Aug 2002 18:27:37 EDT." <20020822222737.GA3044@thyrsus.com> References: <20020821182451.GA31454@banana.mx1.ru> <20020822222737.GA3044@thyrsus.com> Message-ID: <200208231329.g7NDTtF08152@pcp02138704pcs.reston01.va.comcast.net> > Stepan Koltsov : > > The question is: To be or^H^H^H^H^H^H^H^H^H Why not evaluate default > > parameters of a function at THE function call, not at function def > > (as is done currenly)? For example, C++ (a nice language, isn't it? ;-) > > ) evaluates default parameters at function call. > > Among other things, because that choice (what old LISP hackers like me > call `dynamic scoping') turns out to be far more difficult to model > mentally than Python's lexical scoping. Forty years of LISP experience > says Python does the right thing. Dynamic scoping has nothing to do with it. Nevertheless, there's no chance in hell this will ever change, so let's drop the subject. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Fri Aug 23 14:40:47 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Fri, 23 Aug 2002 09:40:47 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <200208231328.g7NDSrX08124@pcp02138704pcs.reston01.va.comcast.net> References: <20020821011429.GE22413@thyrsus.com> <20020821053556.GA700@thyrsus.com> <20020821055825.GP29858@codesourcery.com> <20020821062226.GC1771@thyrsus.com> <20020821170353.GE2803@codesourcery.com> <20020821171311.GA19427@thyrsus.com> <200208211932.g7LJW8S01717@pcp02138704pcs.reston01.va.comcast.net> <20020822224656.GD3044@thyrsus.com> <200208231328.g7NDSrX08124@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020823134047.GA28978@thyrsus.com> Guido van Rossum : > The headers usually give some clues, but based on what makes > it through SpamAssassin (which we've been running for all python.org > mail since February or so), base64 encoding scores high on the list of > false negatives. Noted. I'll take account of that in my planning. -- Eric S. Raymond From ping@zesty.ca Fri Aug 23 15:40:20 2002 From: ping@zesty.ca (Ka-Ping Yee) Date: Fri, 23 Aug 2002 09:40:20 -0500 (CDT) Subject: [Python-Dev] More pydoc questions In-Reply-To: <0d6101c24936$57d6c5f0$6501a8c0@boostconsulting.com> Message-ID: On Wed, 21 Aug 2002, David Abrahams wrote: > Now I get (well, I'm not sure how this will show up in your mailer, but for > me it's full of control characters): > > NNAAMMEE > docstring_ext > > FFIILLEE [...] > So my question is, is there a way to dump the text help for a module > without prompting and without any extra control characters? Hi -- sorry it took a couple of days to reply (i'm out of town). The pydoc module contains a function for precisely this purpose -- just run the string through pydoc.plain(). % pydoc pydoc.plain Python Library Documentation: function plain in pydoc plain(text) Remove boldface formatting from text. -- ?!ng From guido@python.org Fri Aug 23 15:39:09 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 23 Aug 2002 10:39:09 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib pyclbr.py,1.26,1.27 In-Reply-To: Your message of "Thu, 22 Aug 2002 23:29:59 EDT." References: Message-ID: <200208231439.g7NEd9v09639@pcp02138704pcs.reston01.va.comcast.net> > > Rewritten using the tokenize module, which gives us a real tokenizer > > rather than a number of approximating regular expressions. > > Alas, it is 3-4 times slower. Let that be a challenge for the > > tokenize module. > > Was this just for purity, or did it fix a bug? The regexps there > were close to being heroically careful, and even so it was somtimes > uncomfortably slow using the class browser in IDLE (based on > pyclbr), and even on a fast machine. A factor of 3 or 4 might make > that unbearable. > > If it was for purity, note that tokenize is also based on mounds of > regexp tricks . It was for purity, with an eye towards future improvements (I want to teach it more about packages and import-aliasing). While tokenize uses regexp tricks, they are much closer to 100% correct than those in pyclbr. E.g. the pyclbr regexps don't cope with continuation backslashes (which often occur in long import statements), or comments or expressions inside the list of superclasses. It also didn't cope well with 'import M as N' which is showing up more and more frequently. I think there are still bugs in that area, but they will be much simpler to fix now. I was going to use this as an excuse to learn how to use the hotshot profiler to find out if there are any bottlenecks in the tokenize module. pyclbr.readmodule_ex('Tkinter') takes under 1.2 seconds on my home machine now. I find that acceptable (it's a lot quicker than IDLE takes to colorize Tkinter.py :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From nathan@geerbox.com Fri Aug 23 17:33:13 2002 From: nathan@geerbox.com (Nathan Clegg) Date: Fri, 23 Aug 2002 09:33:13 -0700 Subject: [Python-Dev] type categories In-Reply-To: <200208131802.g7DI2Ro27807@europa.research.att.com> References: <200208131802.g7DI2Ro27807@europa.research.att.com> Message-ID: <15718.25545.999300.938049@jin.int.geerbox.com> This discussion appears about over, but I haven't seen any solutions via inheritance. Other languages that lack true interfaces use abstract base classes. Python supports multiple inheritance. Isn't this enough? If the basic types are turned into abstract base classes and inserted into the builtin name space, and library and user-defined classes are reparented to the appropriate base class, then isinstance becomes the test for category inclusion. A partial example: class FileType(): def __init__(*args, **kwargs): raise AbstractClassError, \ "You cannot instantiate abstract classes" def readline(*args, **kwargs): raise NotImplementedError, \ "Methods must be overridden by their children" All "file-like" objects, beginning with file itself and StringIO, can extend FileType (or AbstractFile or File or whatever). A function expecting a file-like object or a filename can test the parameter to see if it is an instance of FileType rather than seeing if it has a readline method. Type hierarchies could be obvious (or endlessly debated): Object --> Collection --> Sequence --> List \ \ \--> Tuple \ \ \--> String \ \--> Set \ \--> Mapping --> Dict \--> FileLike --> File \--> StringIO \--> Number --> Complex \--> Real --> Integer --> Long \--> Float \--> Iterator \--> Iterable etc. The hierarchy could be further complicated with mutability (MutableSequence (e.g. list), ImmutableSequence (e.g. tuple, string)), or perhaps mutability could be a property of classes or even objects (allowing runtime marking of objects read-only? by contract? enforced?). This seems to be a library (not language) solution to the problem posed. Can the low level types implemented completely in C still descend from a python parent class without any performance hit? Can someone please point out the inferiority or infeasibility of this method? Or is it just "ugly"? -- Nathan Clegg GeerBox nathan@geerbox.com From tim.one@comcast.net Fri Aug 23 17:53:02 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 23 Aug 2002 12:53:02 -0400 Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <007b01c24a7a$a081c580$0900a8c0@spiff> Message-ID: [attribution lost] >>> It's an implementation problem. You find yourself doing a lot of >>> string accessing and pasting, creating several new objects per >>> input char. [Greg Ewing] >> Not necessarily! Plex manages to do it without any >> of that. >> >> The trick is to leave all the characters in the input >> buffer and just *count* how many characters make up >> the next token. [/F] > you can do that without even looking at the characters? 1-character strings are shared; string[i] doesn't create a new object except for the first time that character is seen. string_item() in particular uses the character found to index into an array of 1-character string objects. From jriehl@spaceship.com Fri Aug 23 18:15:09 2002 From: jriehl@spaceship.com (Jonathan Riehl) Date: Fri, 23 Aug 2002 12:15:09 -0500 (CDT) Subject: [Python-Dev] PEP 269 Implementation, rev.0 Message-ID: As per earlier discussions, I am going to take a whopping huge intermission that will run August right out, and end my yearly yernings to expose pgen to the Python public. Therefore, I've submitted a provisional patch for parser people to play with until next August (*smirk*). Get it while it's hot (ID 599331), and still in sync with CVS (not that there are any radical changes): http://sourceforge.net/tracker/index.php?func=detail&aid=599331&group_id=5470&atid=305470 Comments are requested. Thanks! -Jon From guido@python.org Fri Aug 23 18:15:27 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 23 Aug 2002 13:15:27 -0400 Subject: [Python-Dev] type categories In-Reply-To: Your message of "Fri, 23 Aug 2002 09:33:13 PDT." <15718.25545.999300.938049@jin.int.geerbox.com> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> Message-ID: <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> > This discussion appears about over, but I haven't seen any solutions > via inheritance. Other languages that lack true interfaces use > abstract base classes. Python supports multiple inheritance. Isn't > this enough? I haven't given up the hope that inheritance and interfaces could use the same mechanisms. But Jim Fulton, based on years of experience in Zope, claims they really should be different. I wish I understood why he thinks so. > If the basic types are turned into abstract base classes and inserted > into the builtin name space, and library and user-defined classes are > reparented to the appropriate base class, then isinstance becomes the > test for category inclusion. > > A partial example: > > class FileType(): > def __init__(*args, **kwargs): > raise AbstractClassError, \ > "You cannot instantiate abstract classes" > > def readline(*args, **kwargs): > raise NotImplementedError, \ > "Methods must be overridden by their children" Except that the readline signature should be shown here. > All "file-like" objects, beginning with file itself and StringIO, can > extend FileType (or AbstractFile or File or whatever). A function > expecting a file-like object or a filename can test the parameter to > see if it is an instance of FileType rather than seeing if it has a > readline method. > > Type hierarchies could be obvious (or endlessly debated): Endlessly debated is more like it. Do you need separate types for readable files and writable files? For seekable files? For text files? Etc. > Object --> Collection --> Sequence --> List > \ \ \--> Tuple > \ \ \--> String Is a string really a collection? > \ \--> Set > \ \--> Mapping --> Dict How about readonly mappings? Should every mapping support keys()? values()? items()? iterkeys(), itervalues(), iteritems()? > \--> FileLike --> File > \--> StringIO > \--> Number --> Complex > \--> Real --> Integer --> Long Where does short int go? > \--> Float > \--> Iterator > \--> Iterable > > etc. > > The hierarchy could be further complicated with mutability > (MutableSequence (e.g. list), ImmutableSequence (e.g. tuple, string)), > or perhaps mutability could be a property of classes or even objects > (allowing runtime marking of objects read-only? by contract? enforced?). Exactly. Endless debate will be yours. :-) > This seems to be a library (not language) solution to the problem > posed. Can the low level types implemented completely in C still > descend from a python parent class without any performance hit? Not easily, no, but it would be possible to put most of the abstract hierarchy in C. > Can someone please point out the inferiority or infeasibility of > this method? Or is it just "ugly"? Agreeing on an ontology seems the hardest part to me. --Guido van Rossum (home page: http://www.python.org/~guido/) From Jack.Jansen@oratrix.com Fri Aug 23 21:39:04 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Fri, 23 Aug 2002 22:39:04 +0200 Subject: [Python-Dev] [development doc updates] In-Reply-To: <20020823172422.978CB18EC2B@grendel.zope.com> Message-ID: <5CF9EDC6-B6D8-11D6-B228-003065517236@oratrix.com> On vrijdag, augustus 23, 2002, at 07:24 , Fred L. Drake wrote: > The development version of the documentation has been updated: > > http://www.python.org/dev/doc/devel/ > > Add documentation for the new "sets" module (thanks, Raymond!). > Various minor additions and clarifications. Fred, how much work would it be to make at least the html tarfile available too under www.python.org/doc/2.3? I'm looking at making the documentation friendly to the Mac help viewer (actually, Bill Fancher donated the code), and it would help the build process is there was a fixed URL based on the version number where I could always find the latest docs for the current version. -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From tim.one@comcast.net Fri Aug 23 21:54:46 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 23 Aug 2002 16:54:46 -0400 Subject: [Python-Dev] Questions about sets.py Message-ID: 1. BaseSet contains 3 blobs like this: def __or__(self, other): """Return the union of two sets as a new set. (I.e. all elements that are in either set.) """ if not isinstance(other, BaseSet): return NotImplemented result = self.__class__(self._data) result._data.update(other._data) return result def union(self, other): """Return the union of two sets as a new set. (I.e. all elements that are in either set.) """ return self | other Is there a good reason not to write the latter as just union = __or__ ? 2. Is there a particular reason for not coding issuperset as def issuperset(self, other): """Report whether this set contains another set.""" self._binary_sanity_check(other) return other.issubset(self) ? Given that issubset exists, issuperset is of marginal value anyway. 3. BaseSet._update is a darned cute example of exploiting that the iterator returned by iter() isn't restartable!. That isn't a question, it's just a giggle . 4. Who believes that the __le__, __lt__, __ge__, and __gt__ methods are a good idea? If anything, I'd expect s <= t to mean "is subset", and "s < t" to mean "is proper subset". Getting the lexicographic ordering of the underlying dicts instead doesn't seem to be of any use, except perhaps to prevent sorting lists of sets from blowing up. Fine by me if that blows up, though. 5. It's curious enough that we avoid dict.copy() in def copy(self): """Return a shallow copy of a set.""" result = self.__class__([]) result._data.update(self._data) return result that if there's a reason to avoid it a comment would help. 6. It seems that doing self.__class__([]) in various places instead of self.__class__() wastes time without reason (it builds a unique empty list each time, the __init__ function then does a useless iteration dance over that, and finally the list object is torn apart again). If the intent is to communicate that we're creating an empty set, IMO the latter spelling is even a bit clearer about that (I see "[]" and keep wondering what it's trying to accomplish). 7. union_update, intersection_update, symmetric_difference_update, and difference_update return self despite mutating in-place. That makes them unique among mutating container methods (e.g., list.append, list.insert, list.remove, dict.update, list.sort, ..., return None). Is the inconsistency worth it? Chaining mutating set operations isn't common, and with names like "symmetric_difference_update()" it's a challenge to fit more than one on a line anyway . If it's thought that chaining mutating operations is somehow a good idea for sets when it's not for other containers, then we really have to be consistent about it in the sets module. For example, then Set.add() should return self too; indeed, set.add(elt1).add(elt2) may even be pleasant at times. Or if the point was merely to create "nice names" for __ior__ etc, then, e.g., the existing union_update should be renamed to __ior__, and union_update defined as def union_update(self, other): """yadda""" self |= other and let it return None. In a sense this is the opposite of question #1, where the extra code block *is* supplied but without an apparent need. 8. If there's something still valuable in _test(), I think it ought to be moved into test_sets.py. "Self-testing modules" can be convenient when developing, but after modules are deployed in the std library the embedded tests are never run again (with the exception of module doctests, which can easily be run via a regrtest-flavor test_xyz test, and which are so invoked). From Samuele Pedroni" some thoughts of mine [GvR] > Agreeing on an ontology seems the hardest part to me. Why does it seem such a daunting task? i) much of python code depends concretely on interfaces with the granularity from one to a bunch of methods, especially code expecting base-type-like implementations. ii) people appreciate to be able to implement just the minimal subset of methods that makes things work [Obviously here I'm not talking about large frameworks like Zope] We are not in vacuum. There is Python code out there, and progammers with ideas about what is like programming in Python. [Maybe I just restate the obvious and repeat myself but it seems that for some people not only type checking but in general explicitness about types is a tabu for Python code. OTOH there _exists_ Python code that as input depends/requires subclasses of some _specific_ abstract classes. And even Smalltalk has - when "reasonable" and "necessary" - a kind of interface notion, in the form of isFoo method defined on Object and overridden to return "yes sir" down the hierarchy] It seems to me that there is no route to the advantages of type categories without some explicitness. Now to the point. [Here my target is more dispatching and "distinguishing" by type categories than type checking, PEP 246 issue is far from orthogonal but here I will ignore it because I don't want my head to explode ] Maybe it is obvious but anyway type categories' "problem" should be framed wrt: - what kind of coding style we want to enable (or maybe respectively deprecate)? - what problems are we solving? - what kind of code will not work anymore? - what "migration path" for such code, or to the new style? Let's consider an exemplar fragment: if hasattr(f,'write'): ... # needs just f.write else: # f is not file-like ... possible future styles: 1) if doesimplement(f,FileLike): ... else: ... problems: the "exponential" ontology problem, or the problem, if we limit ourselves only to large granularity interfaces and we interpret them strictly[*], that the programmer must implement more methods than strictly necessary. [*] the point is whether an interface should be interpreted as "all the signatures implemented" . Here is not relevant this is checked or enforced at class definition time. 2) if doesimplement(f,Category('write')): ... else: ... if we allow for such on-the-fly constructed interfaces (I hope one can extrapolate what I mean with that) we maybe solve the "exponential" ontology problem, but this code is not really an improvement over the code using hasattr; what we are interested in is not whether f has some kind of write method but whether f has a file-like write method. Through interfaces one wants to check and convey commitment more than at the signature level. Can we do better? It would be nice to find some kind of middle-ground between hasattr(.,'write') and large granularity strict interfaces (LGSI). My humble ideas: these are just two points picked in that entire range (hasattr - LSGI), other variations are maybe useful, necessary, or reasonable. a) As a "workaround" it should be possible to declare that a class implements an interface partially, that means some subset of it. Then it is an open issue whether there should be ways to check both for strict and non-strict implementation, or some general control to tweak all checkings and/or enable warnings. [Do Zope interfaces already allow for this?] b) OK with a) but if I want potentially to be strict and still deal with: "b) much of python code concretely depends on interfaces with the granularity from one to a bunch of methods" we should let people be precise about what subset of an interface they are implementing, like - I'm implemeting a subset of FileLike so consider the corresponding matching signatures - or, with finer control, I'm implementing FileLike 'write' and - no - my 'tell' has nothing to do with file-like ... and more importantly it should be possible to check for such subsets: if doesimplement(f,PartCategory(FileLike,['write'])): # Yup, I know, this is ugly and begs for sugar ... else: ... [And mildly interestingly such code can degrade to just check for hasattr(.,'write') for "migration" and possibly emit warnings] regards, Samuele Pedroni. PS: I know, here I have not dealt with implementation or performance problems and the killing details. From guido@python.org Fri Aug 23 22:05:27 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 23 Aug 2002 17:05:27 -0400 Subject: [Python-Dev] utf8 issue Message-ID: <200208232105.g7NL5RE16863@pcp02138704pcs.reston01.va.comcast.net> This might beling on SF, except it's already been solved in Python 2.3, and I need guidance about what to do for Python 2.2.2. In 2.2.1, a lone surrogate encoded into utf8 gives an utf8 string that cannot be decode back. In 2.3, this is fixed. Should this be fixed in 2.2.2 as well? I'm asking because it caused problems with reading .pyc files: if there's a Unicode literal containing a lone surrogate, reading the .pyc file causes an exception: UnicodeError: UTF-8 decoding error: unexpected code byte It looks like revision 2.128 fixed this for 2.3, but that patch doesn't cleanly apply to the 2.2 maintenance branch. Can someone help? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Aug 23 22:21:16 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 23 Aug 2002 17:21:16 -0400 Subject: [Python-Dev] Questions about sets.py In-Reply-To: Your message of "Fri, 23 Aug 2002 16:54:46 EDT." References: Message-ID: <200208232121.g7NLLHh16924@pcp02138704pcs.reston01.va.comcast.net> > 1. BaseSet contains 3 blobs like this: > > def __or__(self, other): > """Return the union of two sets as a new set. > > (I.e. all elements that are in either set.) > """ > if not isinstance(other, BaseSet): > return NotImplemented > result = self.__class__(self._data) > result._data.update(other._data) > return result > > def union(self, other): > """Return the union of two sets as a new set. > > (I.e. all elements that are in either set.) > """ > return self | other > > Is there a good reason not to write the latter as just > > union = __or__ > > ? Yes. It was written like that before! But in order to be a good citizen in the world of binary operators, __or__ should not raise TypeError; it should return NotImplemented if the other argument is not a set (since the other argument might implement __ror__ and know how to or itself with a set). But union(), which is a normal function, should not return NotImplemented, which would confuse the user. So they have to be different. I thought it would be best for union() to use the | operator so that if the other argument implements __ror__, union() will acquire this ability. > 2. Is there a particular reason for not coding issuperset as > > def issuperset(self, other): > """Report whether this set contains another set.""" > self._binary_sanity_check(other) > return other.issubset(self) > > ? Given that issubset exists, issuperset is of marginal value anyway. The original code didn't have issuperset(), I added it for symmetry. Spelling it out saves two calls: one to _binary_sanity_check(), one to issubset(). > 3. BaseSet._update is a darned cute example of exploiting that the iterator > returned by iter() isn't restartable!. That isn't a question, it's > just a giggle . Yes, I like it. :-) > 4. Who believes that the __le__, __lt__, __ge__, and __gt__ methods are a > good idea? If anything, I'd expect s <= t to mean "is subset", and > "s < t" to mean "is proper subset". Getting the lexicographic > ordering of the underlying dicts instead doesn't seem to be of any > use, except perhaps to prevent sorting lists of sets from blowing > up. Fine by me if that blows up, though. Greg Wilson added these when he made the class inherit from dict, presumably because without any further measures, sets would be comparable to dicts using the default dict comparison. That design choice was later undone by Alex (at my suggestion), but he fixed the comparisons rather than removing them. I think using <=, < etc. to spell issubset and isstrictsubset would be great. > 5. It's curious enough that we avoid dict.copy() in > > def copy(self): > """Return a shallow copy of a set.""" > result = self.__class__([]) > result._data.update(self._data) > return result > > that if there's a reason to avoid it a comment would help. Raymond asked me whether to use copy() or update(). I looked at the code and found that they execute almost the same code, with about one instruction more per item for update(). But for small sets, copy() allocates a new dict (and the old one is thrown away). I thought that that might be more important than saving an instruction per item. > 6. It seems that doing > > self.__class__([]) > > in various places instead of > > self.__class__() > > wastes time without reason (it builds a unique empty list each time, > the __init__ function then does a useless iteration dance over that, > and finally the list object is torn apart again). If the intent is > to communicate that we're creating an empty set, IMO the latter > spelling is even a bit clearer about that (I see "[]" and keep > wondering what it's trying to accomplish). That was when ImmutableSet() required an argument. It can be left out now. > 7. union_update, intersection_update, symmetric_difference_update, > and difference_update return self despite mutating in-place. That > makes them unique among mutating container methods (e.g., > list.append, list.insert, list.remove, dict.update, list.sort, ..., > return None). Is the inconsistency worth it? Chaining mutating set > operations isn't common, and with names like > "symmetric_difference_update()" it's a challenge to fit more than > one on a line anyway . > > If it's thought that chaining mutating operations is somehow a good > idea for sets when it's not for other containers, then we really > have to be consistent about it in the sets module. For example, > then Set.add() should return self too; indeed, > set.add(elt1).add(elt2) may even be pleasant at times. > > Or if the point was merely to create "nice names" for __ior__ etc, > then, e.g., the existing union_update should be renamed to __ior__, > and union_update defined as > > def union_update(self, other): > """yadda""" > self |= other > > and let it return None. In a sense this is the opposite of question #1, > where the extra code block *is* supplied but without an apparent need. You guessed right. That's the best solution IMO. > 8. If there's something still valuable in _test(), I think it ought > to be moved into test_sets.py. "Self-testing modules" can be > convenient when developing, but after modules are deployed in the > std library the embedded tests are never run again (with the > exception of module doctests, which can easily be run via a > regrtest-flavor test_xyz test, and which are so invoked). Please toss it. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Fri Aug 23 22:31:47 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 23 Aug 2002 17:31:47 -0400 Subject: [Python-Dev] Questions about sets.py In-Reply-To: <200208232121.g7NLLHh16924@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido, answers Tim's set questions] Thanks! It was enlightening, I believe I understood it all without the urge to fignt back , and I'll make changes accordingly (maybe not today, but before Monday). I forgot to say this the first time: it's a very nice module! Kudos to Greg, Alex, you and Raymond. From fdrake@acm.org Fri Aug 23 23:31:23 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 23 Aug 2002 18:31:23 -0400 Subject: [Python-Dev] [development doc updates] In-Reply-To: <5CF9EDC6-B6D8-11D6-B228-003065517236@oratrix.com> References: <20020823172422.978CB18EC2B@grendel.zope.com> <5CF9EDC6-B6D8-11D6-B228-003065517236@oratrix.com> Message-ID: <15718.47035.3545.985252@grendel.zope.com> Jack Jansen writes: > how much work would it be to make at least the html tarfile > available too under www.python.org/doc/2.3? Are you looking for the tarfile or for an online documentation set? > I'm looking at making the documentation friendly to the Mac help > viewer (actually, Bill Fancher donated the code), and it would > help the build process is there was a fixed URL based on the > version number where I could always find the latest docs for the > current version. Is there online documentation for the Mac OS help viewer? I don't know anything about it. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From jeremy@alum.mit.edu Sat Aug 24 03:52:53 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Fri, 23 Aug 2002 22:52:53 -0400 Subject: [Python-Dev] type categories In-Reply-To: <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15718.62725.643469.789554@slothrop.zope.com> >>>>> "GvR" == Guido van Rossum writes: >> This discussion appears about over, but I haven't seen any >> solutions via inheritance. Other languages that lack true >> interfaces use abstract base classes. Python supports multiple >> inheritance. Isn't this enough? GvR> I haven't given up the hope that inheritance and interfaces GvR> could use the same mechanisms. But Jim Fulton, based on years GvR> of experience in Zope, claims they really should be different. GvR> I wish I understood why he thinks so. Here's a go at explaining why the mechanisms need to be separate. I'm loathe to channel Jim, but I think he'd agree. We'd like to use interfaces to make fairly strong claims. If a class A implements an interface I, then we should be able to use an instance of A anywhere that an I is needed. This is just the straightforward notion of substitutability. I'm saying this is a strong claim because we want an A to behave like an I. By behave, I mean that the interface I can describe behavior beyond just a method name or signature. Why can't we use the current inheritance mechanism to implement the interface concept? Because the inheritance mechanism is too general. If we take the class A, anyone can create a subclass of it, regardless of whether that subclass implements I. Say you wanted to write LBYL code that tests whether an object implements an interface. If you use a marker class and isinstance() for the test, the inheritance rules makes it impossible to express some relationships. In particular, it is impossible to write a class B that inherits from A, but does not implement I. Since our test is isinstance(), any subclass of A will appear to implement I. This is unfortunate, because inheritance is a great implementation trick that shouldn't have anything to do with the interface. If we think about it briefly in terms of types. (Python doesn't have the explicit types, but sometimes we reason about programs as if they did.) Strongly typed OO languages have to deal in some way with subclasses that are not subtypes. Some type systems require covariance or contravariance or invariance. In some cases, you can write a class that is a subclass but is not a subtype. The latter is what we're hoping to achieve with interfaces. If we imagined an interface statement that was explicit and no inherited, then we'd be okay. class A(SomeBase): implements I class B(A): implements J Now we've got a class A that implements I and a subclass B that implements J. The test isinstance(B(), A) is true, but the test implements(B(), I) is not. It's quite helpful to have the implements() predicate which uses a rule different from isinstance(). If we don't have the separate interface concept, the language just isn't as expressive. We would have to establish a convention to sacrifice one of -- a) being able to inherit from a class just for implementation purposes or b) being able to reason about interfaces using isinstance(). a) is error prone, because the language wouldn't prevent anyone from making the mistake. b) is unfortunate, because we'd have interfaces but no formal way to reason about them. Jeremy From esr@thyrsus.com Sat Aug 24 05:44:16 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Sat, 24 Aug 2002 00:44:16 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: References: <20020821171311.GA19427@thyrsus.com> Message-ID: <20020824044416.GA25299@thyrsus.com> Tim Peters : > P(S|X)*P(S|Y)/P(S) > --------------------------------------------------- > P(S|X)*P(S|Y)/P(S) + P(not-S|X)*P(not-S|Y)/P(not-S) > > This isn't what Graham computes, though: the P(S) and P(not-S) terms are > missing in his formulation. Given that P(not-S) = 1-P(S), and > P(not-S|whatever) = 1-P(S|whatever), what he actually computes is > > P(S|X)*P(S|Y) > ------------------------------------- > P(S|X)*P(S|Y) + P(not-S|X)*P(not-S|Y) > > This is the same as the Bayesian result only if P(S) = 0.5 (in which case > all the instances of P(S) and P(not-S) cancel out). Else it's a distortion > of the naive Bayesian result. OK. So, maybe I'm just being stupid, but this seems easy to solve. We already *have* estimates of P(S) and P(not-S) -- we have a message count associated with both wordlists. So why not use the running ratios between 'em? As long as we initialize with "good" and "bad" corpora that are approximately the same size, the should work no worse than the equiprobability assumption. The ratios will correct in time based on incoming traffic. Oh, and do you mind if I use your algebra as part of bogofilter's documentation? -- Eric S. Raymond From tim.one@comcast.net Sat Aug 24 06:26:12 2002 From: tim.one@comcast.net (Tim Peters) Date: Sat, 24 Aug 2002 01:26:12 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020824044416.GA25299@thyrsus.com> Message-ID: > P(S|X)*P(S|Y)/P(S) > --------------------------------------------------- > P(S|X)*P(S|Y)/P(S) + P(not-S|X)*P(not-S|Y)/P(not-S) [Eric S. Raymond] > ... > OK. So, maybe I'm just being stupid, but this seems easy to solve. > We already *have* estimates of P(S) and P(not-S) -- we have a message > count associated with both wordlists. So why not use the running > ratios between 'em? a. There are other fudges in the code that may rely on this fudge to cancel out, intentionally or unintentionally. I'm loathe to type more about this instead of working on the code, because I've already typed about it. See a later msg for a concrete example of how the factor-of-2 "good count" bias acts in part to counter the distortion here. Take one away, and the other(s) may well become "a problem". b. Unless the proportion of spam to not-spam in the training sets is a good approximation to the real-life ratio of spam to not- spam, it's also dubious to train the system with bogus P(S) and P(not-S) values. c. I'll get back to this when our testing infrastructure is trustworthy. At the moment I'm hosed because the spam corpus I pulled off the web turns out to be trivial to recognize in contrast to Barry's corpus of good msgs from python.org mailing lists: every msg in the spam corpus has stuff about the fellow who collected the spam in the headers, while nothing in the python.org corpus does; contrarily, every msg in the python.org corpus has python.org header info not in the spam corpus headers. This is an easy way to get 100% precision and 100% recall, but not particularly realistic -- the rules it's learning are of the form "it's spam if and only if it's addressed to bruceg"; "t's not spam if and only if the headers contain 'List-Unsubscribe'"; etc. The learning can't be faulted, but the teacher can . d. I only exposed the math for the two-word case above, and the generalization to n words may not be clear from the final result (although it's clear enough if you back off a few steps). If there are n words, w[0] thru w[n-1]: prod1 <- product for i in range(n) of P(S|w[i])/P(S) prod2 <- product for i in range (n) of (1-P(S|w[i])/(1-P(S)) result <- prod1*P(S) / (prod1*P(S) + prod2*(1-P(S))) That's if you're better set up to experiment now. If you do this, the most interesting thing to see is whether results get better or worse if you *also* get rid of the artificial "good count" boost by the factor of 2. > As long as we initialize with "good" and "bad" corpora that are > approximately the same size, the should work no worse than the > equiprobability assumption. "not spam" is already being given an artificial boost in a couple of ways. Given that in real life most people still get more not-spam than spam, removing the counter-bias in the scoring math may boost the false negative rate. > The ratios will correct in time based on incoming traffic. Depends on how training is done. > Oh, and do you mind if I use your algebra as part of bogofilter's > documentation? Not at all, although if you wait until we get our version of this ironed out, you'll almost certainly be able to pull an anally-proofread version out of a plain-text doc file I'll feel compelled to write . From guido@python.org Sat Aug 24 07:44:27 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 24 Aug 2002 02:44:27 -0400 Subject: [Python-Dev] type categories In-Reply-To: Your message of "Fri, 23 Aug 2002 22:52:53 EDT." <15718.62725.643469.789554@slothrop.zope.com> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> Message-ID: <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> > If we don't have the separate interface concept, the language just > isn't as expressive. We would have to establish a convention to > sacrifice one of -- a) being able to inherit from a class just for > implementation purposes or b) being able to reason about interfaces > using isinstance(). a) is error prone, because the language > wouldn't prevent anyone from making the mistake. b) is unfortunate, > because we'd have interfaces but no formal way to reason about them. So the point is that it's possible to have a class D that picks up interface I somewhere in its inheritance chain, by inheriting from a class C that implements I, where D doesn't actually satisfy the invariants of I (or of C, probably). I can see that that is a useful feature. But it shouldn't have to preclude us from using inheritance for interfaces, if there was a way to "shut off" inheritance as far as isinstance() (or issubclass()) testing is concerned. C++ does this using private inheritance. Maybe we can add a similar convention to Python for denying inheritance from a given class or interface. Why do keep arguing for inheritance? (a) the need to deny inheritance from an interface, while essential, is relatively rare IMO, and in *most* cases the inheritance rules work just fine; (b) having two separate but similar mechanisms makes the language larger. For example, if we ever are going to add argument type declarations to Python, it will probably look like this: def foo(a: classA, b: classB): ...body... It would be convenient if this could be *defined* as assert isinstance(a, classA) and isinstance(b, classB) so that programs that have a simple class hierarchy can use their classes directly as argument types, without having to go through the trouble of declaring a parallel set of interfaces. I also think that it should be possible to come up with a set of standard "abstract" classes representing concepts like number, sequence, etc., in which the standard built-in types are nicely embedded. --Guido van Rossum (home page: http://www.python.org/~guido/) From esr@thyrsus.com Sat Aug 24 10:03:51 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Sat, 24 Aug 2002 05:03:51 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: References: <20020824044416.GA25299@thyrsus.com> Message-ID: <20020824090351.GA28644@thyrsus.com> Tim Peters : > a. There are other fudges in the code that may rely on this fudge > to cancel out, intentionally or unintentionally. I'm loathe to > type more about this instead of working on the code, because I've > already typed about it. See a later msg for a concrete example of > how the factor-of-2 "good count" bias acts in part to counter the > distortion here. Take one away, and the other(s) may well become > "a problem". I was thinking of shooting that "goodness bias" through the head and seeing what happens, actually. I've been unhappy with that fudge in Paul's original formula from the beginning. > b. Unless the proportion of spam to not-spam in the training sets > is a good approximation to the real-life ratio of spam to not- > spam, it's also dubious to train the system with bogus P(S) and > P(not-S) values. Right -- which is why I want to experiment with actually *using* the real life running ratio. > c. I'll get back to this when our testing infrastructure is trustworthy. > At the moment I'm hosed because the spam corpus I pulled off the > web turns out to be trivial to recognize in contrast to Barry's > corpus of good msgs from python.org mailing lists: Ouch. That's a trap I'll have to watch out for in handling other peoples' corpora. -- Eric S. Raymond From aleax@aleax.it Sat Aug 24 10:31:45 2002 From: aleax@aleax.it (Alex Martelli) Date: Sat, 24 Aug 2002 11:31:45 +0200 Subject: [Python-Dev] type categories In-Reply-To: <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> Message-ID: On Saturday 24 August 2002 08:44 am, Guido van Rossum wrote: ... > For example, if we ever are going to add argument type declarations to > Python, it will probably look like this: > > def foo(a: classA, b: classB): > ...body... > > It would be convenient if this could be *defined* as > > assert isinstance(a, classA) and isinstance(b, classB) I was hoping this could be defined as: a = adapt(a, classA) b = adapt(b, classB) but I fully agree that, if we have an elegant way to say "I'm inheriting JUST for implementation, don't let that be generally known" (C++'s private inheritance is not a perfect mechanism, because in C++ 'private' affects _accessibility_, not _visibility_, sigh), it can indeed be handier and more productive to have interfaces and classes merge into each other rather than be completely separate. Substantial experience with C++ (merged) and Java (separate) suggests that to me. From the point of view of the hypothetical 'adapt', that suggests the 'protocol' argument should be allowed to be a type (class), rather than an 'interface' that is a different entity than a type, and also the useful implication: isinstance(i, T) ==> adapt(i, T) is i > so that programs that have a simple class hierarchy can use their > classes directly as argument types, without having to go through the > trouble of declaring a parallel set of interfaces. The "parallel set of interfaces" (which I had to do in Java) *was* indeed somewhat of a bother. Any time you need to develop and maintain two separate but strongly parallel trees (here, one of interfaces, and a separate parallel one of typical/suggested partial or total implementations to be used e.g. in inner classes that supply those interfaces), you're in for a spot of trouble. I even did some of that with a hand-kludged "code generator" which read a single description file and generated both the interface AND the class from it (but then of course I ended up editing the generated code and back in trouble again when maintenance was needed -- seems to happen regularly to me with code generators). Surely making the target language directly able to digest a unified description would be nicer. > I also think that it should be possible to come up with a set of > standard "abstract" classes representing concepts like number, > sequence, etc., in which the standard built-in types are nicely > embedded. If you manage to pull that off, it will be a WONDERFUL trick indeed. Alex From David Abrahams" <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <007501c24b62$1ae61640$6501a8c0@boostconsulting.com> From: "Guido van Rossum" > > If we don't have the separate interface concept, the language just > > isn't as expressive. We would have to establish a convention to > > sacrifice one of -- a) being able to inherit from a class just for > > implementation purposes or b) being able to reason about interfaces > > using isinstance(). a) is error prone, because the language > > wouldn't prevent anyone from making the mistake. b) is unfortunate, > > because we'd have interfaces but no formal way to reason about them. > > So the point is that it's possible to have a class D that picks up > interface I somewhere in its inheritance chain, by inheriting from a > class C that implements I, where D doesn't actually satisfy the > invariants of I (or of C, probably). > def foo(a: classA, b: classB): > ...body... > > It would be convenient if this could be *defined* as > > assert isinstance(a, classA) and isinstance(b, classB) > > so that programs that have a simple class hierarchy can use their > classes directly as argument types, without having to go through the > trouble of declaring a parallel set of interfaces. > > I also think that it should be possible to come up with a set of > standard "abstract" classes representing concepts like number, > sequence, etc., in which the standard built-in types are nicely > embedded. Ah, but not all numbers are created equal! Can I write: x << 1 ? Not if x is a float. Somebody will eventually want to categorize numeric types more-finely, e.g. Monoid, Euclidean Ring, ... It sounds to my C++ ear like you're trying to make this analogous to runtime polymorphism in C++. I think Python's polymorphism is a lot closer to what we do at compile-time in C++, and it should stay that way: no inheritance relationship needed... at least, not on the surface. Here's why: people inevitably discover new type categories in the objects and types they're already using. In C++ this happened when Stepanov et al discovered that built-in pointers matched his mental model of random-access iterators. A similar thing will happen in Python when you make all numbers inherit from Number but someone wants to impose the real mathematical categories (or heck: Integer vs. Fractional) on them. What Stepanov's crew did did was to invent iterator traits, which decouple the type's category from the type itself. Each category is represented by a class, and those classes do have an inherticance relationship (i.e. every random_access_iterator IS-A bidirectional_iterator). Actually, I have no problem with collecting type category info from an object's MRO: as Guido implies, that will often be the simplest way to do it. However, I think there ought to be a parallel mechanism which allows additional categorization non-intrusively, and it was my understanding that the PEP Alex has been promoting does that. -Dave ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From Samuele Pedroni" . From dave@boost-consulting.com Sat Aug 24 14:14:22 2002 From: dave@boost-consulting.com (David Abrahams) Date: Sat, 24 Aug 2002 09:14:22 -0400 Subject: [Python-Dev] convertibility and "Pythonicity" Message-ID: <00ba01c24b70$304035d0$6501a8c0@boostconsulting.com> [Is "Pythonicity" the right word?] I'm interested in getting some qualitative feedback about something I'm doing in Boost.Python. The questions are, 1. How well does this behavior match up with what Python users have probably come to expect? 2. (related, I hope!) How close is it to the intended design of Python? When wrapping a C++ function that expects a float argument, I thought it would be bizarre if people couldn't pass a Python int. Well, Python ints have a lovely __float__ function which can be used to convert them to floats. Following that idea to its "logical" conclusion led me to where I am today: when matching a formal argument corresponding to one of the built-in Python types, first use the corresponding conversion slot. That could lead to some surprising behaviors: char index(const char* s, int n); // wrapped using Boost.Python >>> index('foobar', 2) # ok 'o' >>> index(3.14, 1.2) # Wierd (floats have __str__) '.' >>> index([1, 3, 5], 0.0) # Super wierd (everything has __str__) '[' So I went back and tried some "obvious" test in Python 2.2.1: >>> 'foobar'[3.0] Traceback (most recent call last): File "", line 1, in ? TypeError: sequence index must be integer Well, I had expected this to work, so I'm beginning to re-think my "liberal conversion" policy. It seems like Python itself isn't using these slots to do "implicit conversion". But then: >>> 'foobar'[3L] 'b' [The int/long unification I've heard about hasn't happened yet, has it?] Furthermore: >>> range(3.3, 10.3) [3, 4, 5, 6, 7, 8, 9] But: >>> range('1', '5') Traceback (most recent call last): File "", line 1, in ? TypeError: an integer is required Now I note that strings don't have __int__, so I guess the int type handles int('42') itself using special knowledge about strings. I suppose that's to keep strings from seeming to be numbers, since the nb_int slot fills in the number_methods. And: >>> class zero(object): ... def __int__(self): return 0 ... >>> range(zero(), 5) [0, 1, 2, 3, 4] So, is there any general practice, (even if it's not universal)? Do Python functions usually tend to coerce their arguments into the types they're expecting? I'm guessing the answer is no... ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From Samuele Pedroni" FYI , in _Jython_ given the java class public class A { public void fi(int i) {} public void fd(double d) {} public void fs(String s) {} } import A a=A() one can call a.fi with a Python integer or long a.fd with a Python integer or long or float a.fs only with a Python string [yes with type categories or adapt we could do better, but the design prefers to minimize unexpected behaviour, and in practice is not too much constraining] regards. From magnus@hetland.org Sat Aug 24 15:30:26 2002 From: magnus@hetland.org (Magnus Lie Hetland) Date: Sat, 24 Aug 2002 16:30:26 +0200 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <20020821090527.GA8346@thyrsus.com>; from esr@thyrsus.com on Wed, Aug 21, 2002 at 05:05:27AM -0400 References: <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> <200208210046.g7L0kN605969@oma.cosc.canterbury.ac.nz> <20020821010517.GD22413@thyrsus.com> <1029917815.581.3.camel@winterfell> <20020821082832.GA7256@thyrsus.com> <1029918967.582.13.camel@winterfell> <20020821090527.GA8346@thyrsus.com> Message-ID: <20020824163026.A10202@idi.ntnu.no> Eric S. Raymond : > [snip] > Makes sense. Hardware designers care a lot about reduction to disjunctive > normal form. Much more than logicians do, actually. > > > Hmm, I just realized that I've also seen it in an American book on > > discrete maths, so it's not just us Swedes ;) > > Odd that I haven't encountered it. Indeed. I thought this was quite standard when working with digital circuits etc... And -- I don't quite see why we're talking about Boolean algebra in general here, when we're specifically looking for set operators... Oh, well. -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From magnus@hetland.org Sat Aug 24 15:33:08 2002 From: magnus@hetland.org (Magnus Lie Hetland) Date: Sat, 24 Aug 2002 16:33:08 +0200 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <20020821005605.GC22413@thyrsus.com>; from esr@thyrsus.com on Tue, Aug 20, 2002 at 08:56:05PM -0400 References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <200208210037.g7L0bnG30953@pcp02138704pcs.reston01.va.comcast.net> <20020821005605.GC22413@thyrsus.com> Message-ID: <20020824163308.B10202@idi.ntnu.no> Eric S. Raymond : > > Guido van Rossum : > > Um, the notation is '|' and '&', not 'or' and 'and', and those are > > what I learned in school. Seems pretty conventional to me (Greg > > Wilson actually tried this out on unsuspecting newbies and found that > > while '+' worked okay, '*' did not -- read the PEP). > > +1 on preferring | and & to `or' and `and'. To me, `or' and `and' say > that what's being composed are predicates, not sets. I concur completely. Using 'or' and 'and' seems close to overriding 'is' (although that's impossible, of course) to me. To me, the expression set1 and set2 should return the first set, if empty, or the second set, if the first one is empty. Suddenly having their intersection would be very surprising, I think. For set1 & set2 to return their intersection, however, is very consistent with int1 & int2 -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From magnus@hetland.org Sat Aug 24 15:38:48 2002 From: magnus@hetland.org (Magnus Lie Hetland) Date: Sat, 24 Aug 2002 16:38:48 +0200 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208221512.g7MFCvI27671@odiug.zope.com>; from guido@python.org on Thu, Aug 22, 2002 at 11:12:57AM -0400 References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> Message-ID: <20020824163848.C10202@idi.ntnu.no> Guido van Rossum : > [snip] > Have you got a use case for membership tests of a cartesian product? Not that I can think of at the moment, no :-) I guess the idea was to use lazy sets for some such operations. Then you could build complex expressions through cartesian products, unions, intersections, set differences, set comprehensions etc. without actually constructing the full set. Checking for membership or iterating over (or even constructing, after all the operations have been applied) such a set might be useful, I'm sure... You could implement joins with cartesian products without terrible performance penalties etc... But I guess this sort of thing might as well go into some other module somewhere (probably outside the libs). It was just a thought. -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From oren-py-d@hishome.net Sat Aug 24 15:44:36 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Sat, 24 Aug 2002 10:44:36 -0400 Subject: [Python-Dev] type categories In-Reply-To: <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020824144436.GA53344@hishome.net> On Sat, Aug 24, 2002 at 02:44:27AM -0400, Guido van Rossum wrote: > Why do keep arguing for inheritance? (a) the need to deny inheritance > from an interface, while essential, is relatively rare IMO, and in > *most* cases the inheritance rules work just fine; (b) having two > separate but similar mechanisms makes the language larger. Inheriting the implementation without implementing the same interfaces is only one reason to want an interface mechanism that is not 100% tied to inheritance. Objects written by different authors on two sides of the globe often implement the same protocol without actually inheriting the definition from a common module. I can pass these objects to a method expecting this protocol and it will work just fine (most of the time...) I would like to be able to declare that I need an object with a specific interface even if the object was written long before and I don't want to modify an existing library just to make it conform to my interface names. Strictly defined named interfaces like Zope interfaces are also important but most of the interfaces I use in everyday programming are more ad-hoc in nature and are often defined retroactively. > For example, if we ever are going to add argument type declarations to > Python, it will probably look like this: > > def foo(a: classA, b: classB): > ...body... > > It would be convenient if this could be *defined* as > > assert isinstance(a, classA) and isinstance(b, classB) In your Optional Static Typing presentation slides you define "type expressions". If the isinstance method accepted a type expression object as its second argument this assertion would work for interfaces that are not defined by strict hierarchical inheritance. > so that programs that have a simple class hierarchy can use their > classes directly as argument types, without having to go through the > trouble of declaring a parallel set of interfaces. ...and classes could be used too. They are just type expressions that match a single class. BTW, isinstance already supports a simple form of this: a tuple is interpreted as an "OR" type expression. You can say that isinstance returns True if the object is an instance of one of the types matched by the type expression. Oren From magnus@hetland.org Sat Aug 24 16:00:52 2002 From: magnus@hetland.org (Magnus Lie Hetland) Date: Sat, 24 Aug 2002 17:00:52 +0200 Subject: [Python-Dev] Set naming Message-ID: <20020824170052.A18177@idi.ntnu.no> By naming the new set module sets and the class Set the parallel to array module is broken. I guess that's not a problem -- I just thought I'd mention it. Naming the module "set" would be more analogous to "array", and having "set" as an alias for "Set" would let people switch to a possible future type with the same name by commenting out their import statements... But then again, I guess my little mind is infested with hobgoblins ;) -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From esr@thyrsus.com Sat Aug 24 16:14:12 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Sat, 24 Aug 2002 11:14:12 -0400 Subject: [Python-Dev] Set naming In-Reply-To: <20020824170052.A18177@idi.ntnu.no> References: <20020824170052.A18177@idi.ntnu.no> Message-ID: <20020824151412.GA5538@thyrsus.com> Magnus Lie Hetland : > By naming the new set module sets and the class Set the parallel to > array module is broken. I guess that's not a problem -- I just thought > I'd mention it. Naming the module "set" would be more analogous to > "array", and having "set" as an alias for "Set" would let people > switch to a possible future type with the same name by commenting out > their import statements... Hmmm...I think I agree with this objection, and I have another. It's not consistently so, but usually the classes that are simpler and closer to the system core aren't capitalized. The name "Set" has a misleading hint in it. -- Eric S. Raymond From jeremy@alum.mit.edu Sat Aug 24 16:15:56 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Sat, 24 Aug 2002 11:15:56 -0400 Subject: [Python-Dev] type categories In-Reply-To: <20020824144436.GA53344@hishome.net> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> <20020824144436.GA53344@hishome.net> Message-ID: <15719.41772.336072.815938@slothrop.zope.com> Good point, Oren. We now have two requirements for interfaces that are different than the standard inheritance mechanism. It should be possible to: - inherit from a class without implementing that class's interfaces - declare that a class implements an interface outside the class statement It's harder to support the second requirement using the current inheritance mechanism. Jeremy From aleax@aleax.it Sat Aug 24 16:32:09 2002 From: aleax@aleax.it (Alex Martelli) Date: Sat, 24 Aug 2002 17:32:09 +0200 Subject: [Python-Dev] convertibility In-Reply-To: <00b901c24b76$9aa4a220$6d94fea9@newmexico> References: <00b901c24b76$9aa4a220$6d94fea9@newmexico> Message-ID: On Saturday 24 August 2002 04:00 pm, Samuele Pedroni wrote: ... > [yes with type categories or adapt we could do better, > but the design prefers to minimize unexpected behaviour, > and in practice is not too much constraining] As a happy user of Jython (albeit, so far, in modest amounts, and not yet in production-code), I want to add an unsolicited testimonial -- most of the time, the rules Jython applies "do what feels right" and prove (to me) unsurprising and unconstraining. After studying the rules in detail, particularly with overload resolution in mind, I was afraid of many possible mishaps. In practice, I find that it seems the rules don't get in my way and don't trip me up either. Whatever it is, there IS something right in those rules (perhaps just in conjunction with typical Java libraries, or perhaps more generally). Alex From aleax@aleax.it Sat Aug 24 16:37:36 2002 From: aleax@aleax.it (Alex Martelli) Date: Sat, 24 Aug 2002 17:37:36 +0200 Subject: [Python-Dev] type categories In-Reply-To: <15719.41772.336072.815938@slothrop.zope.com> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <20020824144436.GA53344@hishome.net> <15719.41772.336072.815938@slothrop.zope.com> Message-ID: On Saturday 24 August 2002 05:15 pm, Jeremy Hylton wrote: > Good point, Oren. We now have two requirements for interfaces that > are different than the standard inheritance mechanism. It should be > possible to: > > - inherit from a class without implementing that class's interfaces > > - declare that a class implements an interface outside the class > statement > > It's harder to support the second requirement using the current > inheritance mechanism. The second requirement is a good part of what adaptation is meant to do. As I understand, that's exactly what Zope3 already provides for its interfaces. You don't just "declare" the fact -- you register an adapter that can provide whatever is needed to make it so. I.e., if object X does already implement interface Y without ANY need for tweaking/renaming/whatever, I guess the registered adapter can just return the object X it receives as an argument. More often, the adapter will return some (hopefully thin) wrapper over X that deals with renaming, signature-adaptation, and the like. That's how it works in Zope3 (at least as I understood from several discussions with Jim Fulton and Guido -- haven't studied Zope3 yet), and I think that such "external adaptation" functionality, however dressed up, should definitely be a part of whatever Python ends up with. Alex From ark@research.att.com Sat Aug 24 16:53:44 2002 From: ark@research.att.com (Andrew Koenig) Date: 24 Aug 2002 11:53:44 -0400 Subject: [Python-Dev] type categories In-Reply-To: <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido> For example, if we ever are going to add argument type declarations to Guido> Python, it will probably look like this: Guido> def foo(a: classA, b: classB): Guido> ...body... Guido> It would be convenient if this could be *defined* as Guido> assert isinstance(a, classA) and isinstance(b, classB) Guido> so that programs that have a simple class hierarchy can use their Guido> classes directly as argument types, without having to go through the Guido> trouble of declaring a parallel set of interfaces. Guido> I also think that it should be possible to come up with a set of Guido> standard "abstract" classes representing concepts like number, Guido> sequence, etc., in which the standard built-in types are nicely Guido> embedded. I agree completely. Any use of inheritance that satisfies Liskov substitutability will satisfy interface inheritance too, and although it is possible to think of uses of inheritance that arent substutitable, they're unusual enough that they should probably require (syntactic) special pleading, if only to alert the reader. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From pedroni@inf.ethz.ch Sat Aug 24 16:53:46 2002 From: pedroni@inf.ethz.ch (Samuele Pedroni) Date: Sat, 24 Aug 2002 17:53:46 +0200 Subject: [Python-Dev] type categories Message-ID: <00f901c24b86$6e58a800$6d94fea9@newmexico> [Jeremy Hylton] > - inherit from a class without implementing that class's interfaces > > - declare that a class implements an interface outside the class > statement I would like to add and restate my proposal to allow also for refering to anonymous super-interfaces of an interface in terms of the interface plus a subset of its signatures, also e.g. FileLike and just 'write'. [that means an interface can be thought to correspond to a set of (tag,signature) tuples, where tag identifies the interface, and one can also just consider subsets of it] I really think that such a feature would allow interfaces to better mix and match with how currently Python code is written. Or at least ease the transition from an interfaces-less world. This may seem YAGNI, but I clearly remember people stating (on types-sig) the need to refer to an interface of just the granularity of just file-like 'read' or just __getitem__. Having to name them is overkill, having to implement all the methods of an interface corresponding to a base Python type also. It is a burden to implement and may seem complex, but I feel, it matches how we code in Python - implementing e.g. just subsets of interfaces corresponding to a base Python type - and still allowing to have interface checking precision. regards. From ark@research.att.com Sat Aug 24 16:59:07 2002 From: ark@research.att.com (Andrew Koenig) Date: 24 Aug 2002 11:59:07 -0400 Subject: [Python-Dev] type categories In-Reply-To: <007501c24b62$1ae61640$6501a8c0@boostconsulting.com> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> <007501c24b62$1ae61640$6501a8c0@boostconsulting.com> Message-ID: David> It sounds to my C++ ear like you're trying to make this David> analogous to runtime polymorphism in C++. I think Python's David> polymorphism is a lot closer to what we do at compile-time in David> C++, and it should stay that way: no inheritance relationship David> needed... at least, not on the surface. Here's why: people David> inevitably discover new type categories in the objects and David> types they're already using. In C++ this happened when Stepanov David> et al discovered that built-in pointers matched his mental David> model of random-access iterators. A similar thing will happen David> in Python when you make all numbers inherit from Number but David> someone wants to impose the real mathematical categories (or David> heck: Integer vs. Fractional) on them. David> What Stepanov's crew did did was to invent iterator traits, David> which decouple the type's category from the type itself. Each David> category is represented by a class, and those classes do have David> an inherticance relationship (i.e. every random_access_iterator David> IS-A bidirectional_iterator). In other words, there *is* an inheritance relationship in C++'s compile-time polymorphism, and iterator traits are one way of expressing that polymorphism. So we have two desirable properties: 1) Guido's suggestion that interface specifications are close enough to classes that they should be classes, and should be inherited like classes, possibly with a way of hiding that inheritance for special cases; 2) Dave's suggestion that people other than a class author might wish to make claims about the interface that the class supports. I now remember that in one of my earlier messages, I said something related to (2) as well. Is there a way of merging these two ideas? -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From dave@boost-consulting.com Sat Aug 24 17:09:28 2002 From: dave@boost-consulting.com (David Abrahams) Date: Sat, 24 Aug 2002 12:09:28 -0400 Subject: [Python-Dev] convertibility References: <00b901c24b76$9aa4a220$6d94fea9@newmexico> Message-ID: <013b01c24b89$4c1d8140$6501a8c0@boostconsulting.com> From: "Alex Martelli" > On Saturday 24 August 2002 04:00 pm, Samuele Pedroni wrote: > ... > > [yes with type categories or adapt we could do better, > > but the design prefers to minimize unexpected behaviour, > > and in practice is not too much constraining] > > As a happy user of Jython (albeit, so far, in modest amounts, > and not yet in production-code), I want to add an unsolicited > testimonial -- most of the time, the rules Jython applies "do > what feels right" and prove (to me) unsurprising and unconstraining. > > After studying the rules in detail, particularly with overload resolution in > mind, I was afraid of many possible mishaps. In practice, I find that > it seems the rules don't get in my way and don't trip me up either. > Whatever it is, there IS something right in those rules (perhaps just > in conjunction with typical Java libraries, or perhaps more generally). Hmm. When did Java acquire overload resolution? I was surprised to see it here: http://developer.java.sun.com/developer/TechTips/2000/tt0314.html I was thinking of taking advantage of these rules for Boost.Python (and Python itself), but I'm a little worried about the applicability of the final part of the rules: if any method still under consideration has parameter types that are assignable to another method that's also still in play, then the other method is removed from consideration. This process is repeated until no other method can be eliminated. If the result is a single "most specific" method, then that method is called. If there's more than one method left, the call is ambiguous. This rule is similar to the one used in C++ for partial ordering of function templates. The problem is that my convertibility criteria examine the actual objects involved in a conversion, not just their types. This allows us to overload on sequence-of-float vs. sequence-of-string, for example. Substitutability of argument types can't be tested without exemplars of those types to work with. -Dave ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From nathan@geerbox.com Sat Aug 24 17:30:39 2002 From: nathan@geerbox.com (Nathan Clegg) Date: Sat, 24 Aug 2002 09:30:39 -0700 Subject: [Python-Dev] type categories In-Reply-To: <20020824144436.GA53344@hishome.net> Message-ID: <15719.46255.109609.240036@jin.int.geerbox.com> >>>>> "Oren" == Oren Tirosh writes: Oren> I would like to be able to declare that I need an object Oren> with a specific interface even if the object was written Oren> long before and I don't want to modify an existing library Oren> just to make it conform to my interface names. class InterfaceWrapper(ExistingClass, AbstractInterfaceClass): pass I'm not saying this is a good idea :), but I believe this problem is already solvable in the current language. The wrapper class should pass the test of isinstance for the interface class, but the existing class as the first parent should implement all of the calls. Note that most other languages that actually support proper interfaces (i.e. Java) would have similar trouble adding an interface to a prior existing class without modifying its definition. Python actually provides a much simpler solution than others might, it seems to me. -- Nathan Clegg GeerBox nathan@geerbox.com From David Abrahams" Message-ID: <016001c24b8d$984536e0$6501a8c0@boostconsulting.com> From: "Nathan Clegg" > >>>>> "Oren" == Oren Tirosh writes: > > Oren> I would like to be able to declare that I need an object > Oren> with a specific interface even if the object was written > Oren> long before and I don't want to modify an existing library > Oren> just to make it conform to my interface names. > > class InterfaceWrapper(ExistingClass, AbstractInterfaceClass): > pass > > I'm not saying this is a good idea :), but I believe this problem is > already solvable in the current language. The wrapper class should > pass the test of isinstance for the interface class, but the existing > class as the first parent should implement all of the calls. > > Note that most other languages that actually support proper interfaces > (i.e. Java) would have similar trouble adding an interface to a prior > existing class without modifying its definition. Python actually > provides a much simpler solution than others might, it seems to me. The problem is that we want to use ExistingClass *objects* where AbstractInterfaceClass is required. If someone else has written a module containing: def some_fantastic_function(AbstractInterfaceClass: x) ... And I have written a function: def my_func(generator) for x in generator: some_fantastic_function(x) If there's a generator lying about which produces ExistingClass, I ought to be able to pass it to my_func. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From oren-py-d@hishome.net Sat Aug 24 17:59:07 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Sat, 24 Aug 2002 19:59:07 +0300 Subject: [Python-Dev] type categories In-Reply-To: <15719.41772.336072.815938@slothrop.zope.com>; from jeremy@alum.mit.edu on Sat, Aug 24, 2002 at 11:15:56AM -0400 References: <200208131802.g7DI2Ro27807@europa.research.att.com> <15718.25545.999300.938049@jin.int.geerbox.com> <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> <15718.62725.643469.789554@slothrop.zope.com> <200208240644.g7O6iRC25237@pcp02138704pcs.reston01.va.comcast.net> <20020824144436.GA53344@hishome.net> <15719.41772.336072.815938@slothrop.zope.com> Message-ID: <20020824195907.A8498@hishome.net> On Sat, Aug 24, 2002 at 11:15:56AM -0400, Jeremy Hylton wrote: > Good point, Oren. We now have two requirements for interfaces that > are different than the standard inheritance mechanism. It should be > possible to: > > - inherit from a class without implementing that class's interfaces > > - declare that a class implements an interface outside the class > statement > > It's harder to support the second requirement using the current > inheritance mechanism. I want to go a step further. I don't want to declare that a class implements an interface outside the class statement. I don't want to declare *anything* about classes. My approach centers on the user of the class rather than the provider. The user can declare what he *expects* from the class and the inteface checking will verify that the class meets these requirements. In a way this is what you already do in Python - you use the object and if it doesn't meet your expectations it raises an exception. Exceptions are raised for both bad form and bad content. Bad content will still trigger an exception when you try to use it but bad form can be detected much earlier. See http://www.tothink.com/python/predicates I originally developed this for rulebases in security applications. I am now porting it to Python and cleaning it up. I think it should be an effective way to write assertions about the form of class objects based on methods, call signatures, etc. If/when type checking is added to Python it should also be possible to specify specific types for arguments and return values. Oren From oren-py-d@hishome.net Sat Aug 24 18:06:31 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Sat, 24 Aug 2002 20:06:31 +0300 Subject: [Python-Dev] type categories In-Reply-To: <20020824161522.B1DC12044B@gandalf.hishome.net>; from aleax@aleax.it on Sat, Aug 24, 2002 at 05:37:36PM +0200 References: <200208131802.g7DI2Ro27807@europa.research.att.com> <20020824144436.GA53344@hishome.net> <15719.41772.336072.815938@slothrop.zope.com> <20020824161522.B1DC12044B@gandalf.hishome.net> Message-ID: <20020824200631.B8498@hishome.net> On Sat, Aug 24, 2002 at 05:37:36PM +0200, Alex Martelli wrote: > On Saturday 24 August 2002 05:15 pm, Jeremy Hylton wrote: > > Good point, Oren. We now have two requirements for interfaces that > > are different than the standard inheritance mechanism. It should be > > possible to: > > > > - inherit from a class without implementing that class's interfaces > > > > - declare that a class implements an interface outside the class > > statement > > > > It's harder to support the second requirement using the current > > inheritance mechanism. > > The second requirement is a good part of what adaptation is meant > to do. I am not talking about situations where the object does not meet your expectations and needs to be adapted - I'm talking about situations where it actually does and the only problem is how to describe that fact properly. Adaptation is cool, but I don't see it as a replacement for anything that interfaces are supposed to achieve. Effective adaptation requires some kind of interface definition mechanism to work on top of. Oren From ark@research.att.com Sat Aug 24 18:33:01 2002 From: ark@research.att.com (Andrew Koenig) Date: 24 Aug 2002 13:33:01 -0400 Subject: [Python-Dev] type categories In-Reply-To: <15719.46255.109609.240036@jin.int.geerbox.com> References: <15719.46255.109609.240036@jin.int.geerbox.com> Message-ID: >>>>>> "Oren" == Oren Tirosh writes: Oren> I would like to be able to declare that I need an object Oren> with a specific interface even if the object was written Oren> long before and I don't want to modify an existing library Oren> just to make it conform to my interface names. Nathan> class InterfaceWrapper(ExistingClass, AbstractInterfaceClass): Nathan> pass Nathan> I'm not saying this is a good idea :), but I believe this problem is Nathan> already solvable in the current language. Not quite. You are creating a new class with the desired property, but it can sometimes be desirable to assert properties about types that already exist. For example, suppose I invent a GroupUnderPlus property for types for which the + operator has group properties. I would like to be able to say that int has that property, and not have to derive a new class from int in order to do so. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From haering_python@gmx.de Sat Aug 24 19:30:56 2002 From: haering_python@gmx.de (Gerhard =?iso-8859-1?Q?H=E4ring?=) Date: Sat, 24 Aug 2002 20:30:56 +0200 Subject: [Python-Dev] Why no math.fac? Message-ID: <20020824183056.GA1859@lilith.ghaering.test> Any reason why there isn't any factorial function in the math module? I could easily implement one in C (for ints and longs only, right?) Gerhard -- This sig powered by Python! Außentemperatur in München: 22.3 °C Wind: 1.2 m/s From pinard@iro.umontreal.ca Sat Aug 24 19:41:16 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 24 Aug 2002 14:41:16 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <20020824163848.C10202@idi.ntnu.no> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> Message-ID: [Magnus Lie Hetland] > I guess the idea was to use lazy sets for some such operations. Then > you could build complex expressions through cartesian products, unions, > intersections, set differences, set comprehensions etc. without actually > constructing the full set. Allow me some random thoughts. (Aren't they always random, anyway? :-) When I saw some of the suggestions on this list for "generating" elements of a cartesian product, despite sometimes elegant, I thought "Too much done, too soon.". But the truth is that I did not give the thing a serious try, I'm not sure I would be able to offer anything better. One nice thing, with a dict or a set, is that we can quickly access how many entries there are in there. Is there some internal way to efficiently fetch the N'th element, from the order in which the keys would be naturally listed? If not, one could always pay some extra temporary memory to build a list of these keys first. If you have to "generate" a cartesian product for N sets, you could set up a compound counter as a list of N indices, the K'th meant to run from 0 up to the cardinality C[K] of the K'th set, and devise simple recipes to yield the element of the product represented by the counter, and to bump it. Moreover, it would be trivial to equip this generator with a `__len__' function able to predict the cardinality CCC of the whole result, and quite easy being able to transform any KKK between 0 and NNN into an equivalent compound counter, and from there, access any member of the cartesian product at constant speed, without generating it all. All the above is pretty simple, and meant to introduce a few suggestions that might solve once and for all, if we could do it well enough, a re-occurring request on the Python list about how to produce permutations and al. We might try to rewrite the recipes behind a "generating" cartesian product of many sets, illustrated above, into a similar generating function able to produce all permutations of a single set. So let's say: Set([1, 2, 3]).permutations() would lazily produce the equivalent of: Set([(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]) That generator could offer a `__len__' function predicting the cardinality CCC of the result, and some trickery could be used to map integers from 0 to CCC into various permutations. Once on the road, it would be worth offering combinations and arrangements just as well, and maybe also offering the "power" set, I mean here the set of all subsets of a given set, all with predictable cardinality and constant speed access to any member by index. Yet, many questions or objections may rise. Using tuples to represent each element of a cartesian product of many sets is pretty natural, but it is slightly less clear that a tuple is the best fit for representing an ordered set as in permutations and arrangements, as tuples may allow elements to be repeated, while an ordered set does not. I think that sets are to be preferred over tuples for returning combinations or subsets. While it is natural to speak and think of subsets of a set, or permutations, arrangements and combinations of a set, some people might prefer to stay closer to an underlying implementations with lists (sublists of a list, permutations, arrangements or combinations of a list), and would feel that going through sets is an unwelcome detour for their applications. Indeed, what's the real use and justficiation for hashing keys and such things, when one wants nothing else than arrangements from a list? Another aspect worth some thinking is that permutations, in particular, are mathematical objects in themselves: we can notably multiply permutations or take the inverse of a permutation. Arrangements are in fact permutations over combinations elements. Some thought is surely needed for properly reflecting mathematical elegance into how the set API is extended for the above, and not merely burying that elegance under practical concerns. Some people may think that these are all problems which are orthogonal to the design of a basic set feature, and which should be addressed in separate Python modules. On the other hand, I think that a better and nicer integration might result if all these things were thought together, and thought sooner than later. Moreover, my intuition tells me that with some care and luck (both are needed), these extra set features could be small enough additions to the `sets' module to not be worth another one. Besides, if appropriate, such facilities would surely add a lot of zest and punch into the interest raised by the `sets' module when it gets published. -- François Pinard http://www.iro.umontreal.ca/~pinard From tim.one@comcast.net Sat Aug 24 20:29:03 2002 From: tim.one@comcast.net (Tim Peters) Date: Sat, 24 Aug 2002 15:29:03 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: <20020821005252.GB22413@thyrsus.com> Message-ID: [Eric S. Raymond] > (Copied to Paul Graham. Paul, this is the mailing list of the Python > maintainers. I thought you'd find the bits about lexical analysis in > bogofilter interesting. Pythonistas, Paul is one of the smartest and > sanest people in the LISP community, as evidenced partly by the fact > that he hasn't been too proud to learn some lessons from Python :-). > It would be a good thing for some bridges to be built here.) Hi, Paul! I believe Eric copied you on some concerns I had about the underpinnings of the algorithm, specifically about the final "is it spam?" computation. Looking at your links, I bet you got the formula from here: http://www.mathpages.com/home/kmath267.htm If so, the cause of the difficulty is that you inherited a subtle (because unstated) assumption from that writeup: I would suggest that we assume symmetry between "y" and "n". In other words, assume that probability of predicting correctly is the same regardless of whether the correct answer is "y" or "n". That part's fine. This implies p0=p7, p1=p6, p2=p5, and p3=p4, But that part doesn't follow from *just* the stated assumptions: note that those four equalities imply that p0+p2+p4+p6 = p7+p5+p3+p1 But the left-hand side of that is the probability that event X does not occur (it's all the rows with 'n' in the 'R' column), and the right-hand side is the probability that event X does occur (it's all the rows with 'y' in the 'R' column). In other words, this derivation also makes the stronger-- and unstated --assumption that X occurs with probability 1/2. The ultimate formula given on that page is correct if P(X)=0.5, but turns out it's wrong if P(X) isn't 0.5. Reality doesn't care how accurate Smith and Jones are, X occurs with its own probability regardless of what they think. Picture an extreme: suppose reality is such that X *always* occurs. Then p0 must be 0, and so must p2, p4 and p6 (the rows with 'n' in the R column can never happen if R is always 'y'). But then p0+p2+p4+p6 is 0 too, and the equality above implies p7+p5+p3+p1 is also 0. We reach the absurd conclusion that if X always occurs, the probability that X occurs is 0. As approximations to 1 go, 0 could stand some improvement . The math is easy enough to repair, but it may percolate into other parts of your algorithm. Chiefly, I *suspect* you found you needed to boost the "good count" by a factor of 2 because you actually have substantially more non-spam than spam in your inbox, and the scoring step was favoring "spam" more than it should have by virtue of neglecting to correct for that your real-life P(spam) is significantly less than 0.5 (although your training corpora had about the same number of spams as non-spams, so that P(spam)=0.5 was aprroximately true across your *training* data -- that's another complication). Makes sense? Once our testing setup is trustworthy, I'll try it both ways and report on results. In the meantime, it's something worth pondering. From Samuele Pedroni" ----- Original Message ----- From: Troels Therkelsen Newsgroups: comp.lang.python Sent: Saturday, August 24, 2002 6:42 PM Subject: Security hole in rexec? > Hello everybody, > > I have managed to stumble onto something with the rexec module that I > do not quite understand. As I understand it, the rexec framework is > meant to create a sandbox area within the Python interpreter, > technically with an instance of the rexec.RExec class. It is supposed > to be impossible to break out of this sandbox unless you do something > careless like inserting non-rexec objects into the rexec namespace. > > Let me demonstrate with some code: > > Python 2.2.1 (#1, Jun 27 2002, 10:29:04) > [GCC 2.95.3 20010315 (release)] on linux2 > Type "help", "copyright", "credits" or "license" for more > information. > >>> import rexec > >>> r = rexec.RExec() > >>> r.r_exec("import sys; print sys.stdout") > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/local/lib/python2.2/rexec.py", line 254, in r_exec > exec code in m.__dict__ > File "", line 1, in ? > AttributeError: 'module' object has no attribute 'stdout' > > This is as you'd expect, 'stdout' is not in the default ok_sys_names > attribute of the rexec.RExec class, so you are not supposed to be able > to see it from within the 'sandbox'. But observe: > > >>> r.r_exec("del __builtins__") > >>> r.r_exec("import sys; print sys.stdout") > ', mode 'w' at 0x80fe2a0> > > If __builtins__ is so critical to the operation of the 'sandbox' how > is it possible to break it from within the 'sandbox'? Have I stumbled > across a bug in rexec? Have I misunderstood something important? > > I've used the id() function to get the 'address' of the __builtins__ > object and I have verified that the new __builtins__ which gets > re-added has a different id so it is definitely a different > __builtins__ than the one I used del on. It would appear that exec > and family adds __builtins__ to the namespace it runs in if it doesn't > exist. But where does it get it from? Why doesn't rexec deal with > this quirk of exec? Maybe it's a new feature/bug of exec? > > I'll stop with the questions now. Suffice to say, I really need rexec > :-) > > Best regards, > > Troels Therkelsen From tim.one@comcast.net Sat Aug 24 21:03:13 2002 From: tim.one@comcast.net (Tim Peters) Date: Sat, 24 Aug 2002 16:03:13 -0400 Subject: [Python-Dev] Why no math.fac? In-Reply-To: <20020824183056.GA1859@lilith.ghaering.test> Message-ID: [Gerhard H=E4ring] > Any reason why there isn't any factorial function in the math modul= e? math has traditionally just wrapped functions from the platform libm, although it's gotten a *little* smarter than that in a few ways. > I could easily implement one in C (for ints and longs only, right?) A Python function is more suitable. If n is small, fac goes fast no = matter how it's written. If n is large, the time spent in long-int multipli= cation will overwhelming swamp whatever little savings you got from writing = it in C. More, an intelligent unbounded-int fac function written in Python= would likely run much faster than anything you could bear to code in C, bec= ause an "intelligent" function for this would strive to balance the sizes of = the multiplicands along the way, and that requires bookkeeping that's pai= nful in C. For example, try these under current CVS Python: =66rom heapq import heapreplace, heappop def fac1(n): if n =3D=3D 0: return 1 if n <=3D 2: return n partials =3D range(2, n+1) while len(partials) > 1: n1 =3D heappop(partials) n2 =3D partials[0] heapreplace(partials, n1*n2) return partials[0] def fac2(n): if n =3D=3D 0: return 1 if n <=3D 2: return n result =3D 2 for i in xrange(3, n+1): result *=3D i return result fac1 implements a simple balancing scheme that eventually manages to = get Karatsuba multiplication (new in 2.3; the heapq module is also new) i= nto play. For n=3D100000, fac2 takes 10 times longer to run on my box, a= nd wouldn't be significantly faster than that if coded in C. From bsder@mail.allcaps.org Sat Aug 24 21:55:04 2002 From: bsder@mail.allcaps.org (Andrew P. Lentvorski) Date: Sat, 24 Aug 2002 13:55:04 -0700 (PDT) Subject: [Python-Dev] Why no math.fac? In-Reply-To: Message-ID: <20020824134520.R42747-100000@mail.allcaps.org> On Sat, 24 Aug 2002, Tim Peters wrote: > [Gerhard H=E4ring] > > Any reason why there isn't any factorial function in the math module? > Since factorial is really a special case of the gamma function, wouldn't it be better to put it in a separate module that handles such complex mathematical functions? (orthonormal polynomials, implicitly defined functions, etc.) -a From pinard@iro.umontreal.ca Sat Aug 24 21:59:16 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 24 Aug 2002 16:59:16 -0400 Subject: [Python-Dev] Re: Why no math.fac? In-Reply-To: References: Message-ID: [Tim Peters] > [...] an "intelligent" function for this would strive to balance the > sizes of the multiplicands along the way [...] > from heapq import heapreplace, heappop > def fac1(n): [...] Simple and clever. A real pleasure to read! :-) Wouldn't it make a wonderful example of `heapq' in the Library Reference? -- François Pinard http://www.iro.umontreal.ca/~pinard From aleax@aleax.it Sat Aug 24 22:33:38 2002 From: aleax@aleax.it (Alex Martelli) Date: Sat, 24 Aug 2002 23:33:38 +0200 Subject: [Python-Dev] type categories In-Reply-To: <20020824200631.B8498@hishome.net> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <20020824161522.B1DC12044B@gandalf.hishome.net> <20020824200631.B8498@hishome.net> Message-ID: On Saturday 24 August 2002 07:06 pm, Oren Tirosh wrote: > On Sat, Aug 24, 2002 at 05:37:36PM +0200, Alex Martelli wrote: > > On Saturday 24 August 2002 05:15 pm, Jeremy Hylton wrote: > > > Good point, Oren. We now have two requirements for interfaces that > > > are different than the standard inheritance mechanism. It should be > > > possible to: > > > > > > - inherit from a class without implementing that class's interfaces > > > > > > - declare that a class implements an interface outside the class > > > statement > > > > > > It's harder to support the second requirement using the current > > > inheritance mechanism. > > > > The second requirement is a good part of what adaptation is meant > > to do. > > I am not talking about situations where the object does not meet your > expectations and needs to be adapted - I'm talking about situations where > it actually does and the only problem is how to describe that fact > properly. Adaptation IS one way to "describe that fact properly", given that checks are anyway constrained to happen at runtime. You just install an adapter from objects x of class X to protocol Y that receives x as an argument and whose body is just "return x" -- that's all. You may consider the adaptation mechanism too general to bend it to this purpose, but I look at it differently, namely: what's the gain that would justify a further, special-purpose mechanism that's usable only (e.g.) when all of X's methods already have the right name and order of parameters, but then we'd have to switch to another if there is renaming or reordering to be done? Unless some huge gain can be shown to come from having multiple mechanisms, I'd rather have just one -- "entities must not be multiplied without necessity". Should some caller, for some weird reason, need to distinguish whether object x was adapter to Y through an actual wrapper, or without the need for one, the caller can test "if x is adapt(x, Y):" -- I can't easily think of actual use cases, but, if there are any, they are covered anyway. Incldentally, I consider the best compile-time equivalent of adaptation I know to be Haskell's "instance" statement. Don't let the name mislead you -- Haskell is FP, not OO, and doesn't use "instance" to talk about what in Python we'd call instances of a type. Rather, Haskell uses statement "instance" to assert that a type T is an instance of a typeclass C. A typeclass is Haskell's equivalent of an interface (actually of a stateless abstract class, and then some, but that's another issue), and "type T instances typeclass C" is Haskell's way ot say "type T implements interface C". Renaming IS generally necessary. If you have an installation of the Haskell interpreter HUGS (comes with many Linux distros, for example -- can be downloaded from www.haskell.org also for Windows), have a look at demos/Lattice.hs -- you may find it readable even without knowing Haskell, since Haskell uses significant whitespace much like Python and has much notation in common with maths and other FP languages, Lattice.hs defines a typeclass "Lattice", and asserts that Bool instances Lattice (then goes on from there, of course, but let's stop to this part). But of course the key functions (would be methods for us, in an OO language) in Lattice are called meet and join (standard math terms, after all), while in Bool the corresponding functionality is given by functions named && and ||. No problem, of course: the instance statement is (MUST be!) able to "rename" -- to assert that, when using Bool as a Lattice, meet means && and join means || . Since instance is a compile-time thing, it doesn't need any 'wrapper' -- just some appendix to the compiler's symbol tables, of course. But if we want to remain OO, dynamic, and do name dispatching of methods, we WOULD need a wrapper of some kind to perform the same renaming. A facility that is SO special-purpose that it doesn't let me say "I have conceived this new interface Lattice, and existing class bool is an example of it" -- or forces me to distort Lattice's method names away from standards such as meet and join in order to fit them to the preexisting names of bool's methods/operators (and then how will I go about asserting that OTHER classes are also lattices...?), does not seem a good idea to me. > Adaptation is cool, but I don't see it as a replacement for anything that > interfaces are supposed to achieve. Effective adaptation requires some > kind of interface definition mechanism to work on top of. The latter is a widespread opinion, but one from which I disagree. Using types as the "protocols" that adaptation works with is, IMHO, quite workable. And some of adaptation's aspects provide facilities, such as "third-party adapters" also working for renaming and similar issues, without which you could not achieve all "that interfaces are supposed to achieve" -- and I don't think those aspects should ALSO be duplicated by adding other mechanisms AS WELL AS adaptation. It seems to me Zope3 has it right in this respect (even though I think I disagree on other design choices -- I won't know for sure until I get a chance to try it out in production code), by making adaptation a key part of the interfaces' mechanisms. Alex From pinard@iro.umontreal.ca Sat Aug 24 23:00:40 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 24 Aug 2002 18:00:40 -0400 Subject: [Python-Dev] heapq method names Message-ID: Hi, people. In the `heapq' module, I'm a little bothered by the fact modules names have `heap' as a prefix in their name. If the methods have been installed as standard list methods, it would be quite understandable, but it has been decided otherwise. The most usual way of using a module is: import MODULE ... MODULE.METHOD(ARGUMENTS) rather than: from MODULE import METHOD ... METHOD(ARGUMENTS) and we should name METHODs accordingly, not repeating the MODULE as prefix. This is a rather common usage, almost everywhere in the Python library. So my suggestion of changing now, before `heapq' gets released for real: heappush -> push heappop -> pop heapreplace -> replace I guess that `heapify' is OK as it stands. The example should be changed accordingly, that is, using `import heapq' instead of `from heapq import such-and-such', using `heapq.push' instead of `heappush' and `heapq.pop' instead of `heappop'. -- François Pinard http://www.iro.umontreal.ca/~pinard From oren-py-d@hishome.net Sat Aug 24 23:14:31 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Sat, 24 Aug 2002 18:14:31 -0400 Subject: [Python-Dev] type categories In-Reply-To: References: <200208131802.g7DI2Ro27807@europa.research.att.com> <20020824161522.B1DC12044B@gandalf.hishome.net> <20020824200631.B8498@hishome.net> Message-ID: <20020824221431.GA97219@hishome.net> On Sat, Aug 24, 2002 at 11:33:38PM +0200, Alex Martelli wrote: > > I am not talking about situations where the object does not meet your > > expectations and needs to be adapted - I'm talking about situations where > > it actually does and the only problem is how to describe that fact > > properly. > > Adaptation IS one way to "describe that fact properly", given that checks > are anyway constrained to happen at runtime. You just install an > adapter from objects x of class X to protocol Y that receives x as > an argument and whose body is just "return x" -- that's all. I don't take it as given that "checks are anyway constrain to happen at runtime". I prefer a system that is future-proof enough to evolve into something that the compiler can use to do type inference. That is one of the reasons I don't want a typeclass / type category / interface / / type expression / whateveryouyouwannacallit to call any user-written Python code. (I don't want Python to become of those languages where user code can execute at compile time :-) > ... what's the gain that would justify a further, special-purpose > mechanism that's usable only (e.g.) when all of X's methods already have > the right name and order of parameters, but then we'd have to switch to > another if there is renaming or reordering to be done? Being able to eventually perform many type checks earlier - at compile time or at module load time. Renaming and reordering really does have to be done at runtime in a dynamically typed language. Oren From guido@python.org Sun Aug 25 00:55:00 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 24 Aug 2002 19:55:00 -0400 Subject: [Python-Dev] Fw: Security hole in rexec? In-Reply-To: Your message of "Sat, 24 Aug 2002 21:29:24 +0200." <038e01c24ba4$8e78db00$6d94fea9@newmexico> References: <038e01c24ba4$8e78db00$6d94fea9@newmexico> Message-ID: <200208242355.g7ONt0326777@pcp02138704pcs.reston01.va.comcast.net> [rexec compromised by deleting __builtins__] This has been known for a while, see python.org/sf/577530. My recommendation is the same as always: don't trust rexec. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Aug 25 01:03:59 2002 From: guido@python.org (Guido van Rossum) Date: Sat, 24 Aug 2002 20:03:59 -0400 Subject: [Python-Dev] heapq method names In-Reply-To: Your message of "Sat, 24 Aug 2002 18:00:40 EDT." References: Message-ID: <200208250003.g7P03xJ26851@pcp02138704pcs.reston01.va.comcast.net> > In the `heapq' module, I'm a little bothered by the fact modules names have > `heap' as a prefix in their name. If the methods have been installed as > standard list methods, it would be quite understandable, but it has been > decided otherwise. > > The most usual way of using a module is: > > import MODULE > ... > MODULE.METHOD(ARGUMENTS) > > rather than: > > from MODULE import METHOD > ... > METHOD(ARGUMENTS) > > and we should name METHODs accordingly, not repeating the MODULE as prefix. > This is a rather common usage, almost everywhere in the Python library. > > So my suggestion of changing now, before `heapq' gets released for real: > > heappush -> push > heappop -> pop > heapreplace -> replace > > I guess that `heapify' is OK as it stands. > > The example should be changed accordingly, that is, using `import heapq' > instead of `from heapq import such-and-such', using `heapq.push' instead of > `heappush' and `heapq.pop' instead of `heappop'. -1. The nmes 'push', 'pop' and 'replace' are too generic. The module seems to "invite" the ``from heapq import heappush, heappop'' syntax, and I'd like to honor that. --Guido van Rossum (home page: http://www.python.org/~guido/) From aleax@aleax.it Sun Aug 25 08:33:55 2002 From: aleax@aleax.it (Alex Martelli) Date: Sun, 25 Aug 2002 09:33:55 +0200 Subject: [Python-Dev] type categories In-Reply-To: <20020824221431.GA97219@hishome.net> References: <200208131802.g7DI2Ro27807@europa.research.att.com> <20020824221431.GA97219@hishome.net> Message-ID: <02082509335500.13476@arthur> On Sunday 25 August 2002 00:14, Oren Tirosh wrote: > On Sat, Aug 24, 2002 at 11:33:38PM +0200, Alex Martelli wrote: > > > I am not talking about situations where the object does not meet your > > > expectations and needs to be adapted - I'm talking about situations > > > where it actually does and the only problem is how to describe that > > > fact properly. > > > > Adaptation IS one way to "describe that fact properly", given that > > checks are anyway constrained to happen at runtime. You just install > > an adapter from objects x of class X to protocol Y that receives x as > > an argument and whose body is just "return x" -- that's all. > > I don't take it as given that "checks are anyway constrain to happen at > runtime". I prefer a system that is future-proof enough to evolve into > something that the compiler can use to do type inference. That is one A compiler able to do type inference had better be smart enough to recognize the special-case pattern: def noadapt(obj, proto): return obj install_adapter(noadapt, someclass, someprotocol) If that hypothetical compiler is unable to recognize this pattern (with whatever change of names except for a built-in install_adapter), its hypothetical type inference is FAR too puny for me to be happy to pay any substantial price for it. In particular, defining multiple mechanisms that partially overlap for the same tasks, for the sole purpose of making it hypothetically and marginally easier to draw the sole distinction of compile time versus runtime, IS a substantial price to pay in term of language complication. Conceptual distinctions between compile time and runtime are already "a price". One that may be worth paying, in general, for performance and in order to get error messages earlier. But, I think, one we should be quite wary to _extend_ -- particularly to extend to areas where we might well get away WITHOUT paying it. > time or at module load time. Renaming and reordering really does have to > be done at runtime in a dynamically typed language. Not necessarily, given _decent_ (hypothetical) type inference. The hypothetical decent type-inferring compiler would know about the install_adapter builtin. It could then hypothetically special-case method renaming by recognizing in the adapter pure-renaming patterns such as: def interfacemethod(self, *args): return self.obj.objmethod(*args) and generate code suitably when it recognizes that a given adapter does nothing but renaming. Blue-sky to some extent, but that sort of thing IS a good part of what type *inference* is about. Alex From skip@manatee.mojam.com Sun Aug 25 13:00:16 2002 From: skip@manatee.mojam.com (Skip Montanaro) Date: Sun, 25 Aug 2002 07:00:16 -0500 Subject: [Python-Dev] Weekly Python Bug/Patch Summary Message-ID: <200208251200.g7PC0GX1011371@manatee.mojam.com> Bug/Patch Summary ----------------- 273 open / 2793 total bugs (+4) 109 open / 1663 total patches (+3) New Bugs -------- Empty genindex.html pages (2002-07-26) http://python.org/sf/586926 import cycle in distutils (2002-08-19) http://python.org/sf/597604 spawn*() doesn't handle errors well (2002-08-20) http://python.org/sf/597795 exec*() doesn't handle errors well (2002-08-20) http://python.org/sf/597797 compiler package and SET_LINENO (2002-08-20) http://python.org/sf/597919 Core dump when using mmap. (2002-08-20) http://python.org/sf/597938 import _tkinter python dumps core. (2002-08-21) http://python.org/sf/598160 The KeyError message doesn't use repr on the key value reported (2002-08-21) http://python.org/sf/598451 HTTPConnection memory leak (2002-08-22) http://python.org/sf/598797 Python not handling cText (2002-08-22) http://python.org/sf/598981 execfile() not show filename when IOErro (2002-08-23) http://python.org/sf/599163 ext module generation problem (2002-08-23) http://python.org/sf/599248 re searches don't work with 4-byte unico (2002-08-23) http://python.org/sf/599377 Method resolution order in Py 2.2 - 2.3 (2002-08-23) http://python.org/sf/599452 CRAM-MD5 module (2002-08-24) http://python.org/sf/599679 SocketServer wrong about allow_reuse_add (2002-08-24) http://python.org/sf/599681 sub[n] not working as expected. (2002-08-24) http://python.org/sf/599757 httplib.connect broken in 2.1 branch (2002-08-25) http://python.org/sf/599838 NameError value is not the name error (2002-08-25) http://python.org/sf/599869 New Patches ----------- Pure Python strptime() (PEP 42) (2001-10-23) http://python.org/sf/474274 "simplification" to ceval.c (2002-08-19) http://python.org/sf/597221 Oren Tirosh's fastnames patch (2002-08-20) http://python.org/sf/597907 textwrap.dedent, inspect.getdoc-ish (2002-08-21) http://python.org/sf/598163 Failure building the documentation (2002-08-22) http://python.org/sf/598996 PEP 269 Implementation (2002-08-23) http://python.org/sf/599331 Bugfix for urllib2.py (2002-08-25) http://python.org/sf/599836 Closed Bugs ----------- Summary: "BuildApplet can destory the source file on Mac OS X" (2002-01-18) http://python.org/sf/505562 ** in doc/current/lib/operator-map.html (2002-07-04) http://python.org/sf/577513 bug in splituser(host) in urllib (2002-07-14) http://python.org/sf/581529 imaplib: prefix-quoted strings (2002-07-30) http://python.org/sf/588711 add main to py_pycompile (2002-07-30) http://python.org/sf/588768 Mixin broken for new-style classes (2002-08-05) http://python.org/sf/591135 comments taken as values in ConfigParser (2002-08-08) http://python.org/sf/592527 string method bugs w/ 8bit, unicode args (2002-08-14) http://python.org/sf/595350 pythonw has a console on Win98 (2002-08-15) http://python.org/sf/595537 IDLE/Command Line Output Differ (2002-08-15) http://python.org/sf/595791 pickle_complex in copy_reg.py (2002-08-15) http://python.org/sf/595837 popenN return only text mode pipes (2002-08-16) http://python.org/sf/595919 textwrap has problems wrapping hyphens (2002-08-17) http://python.org/sf/596434 Closed Patches -------------- Alternative implementation of interning (2002-07-01) http://python.org/sf/576101 new version of Set class (2002-07-13) http://python.org/sf/580995 Update environ for CGIHTTPServer.py (2002-08-15) http://python.org/sf/595846 urllib.splituser(): '@' in usrname (2002-08-17) http://python.org/sf/596581 From pinard@iro.umontreal.ca Sun Aug 25 13:04:20 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 25 Aug 2002 08:04:20 -0400 Subject: [Python-Dev] Re: heapq method names In-Reply-To: <200208250003.g7P03xJ26851@pcp02138704pcs.reston01.va.comcast.net> References: <200208250003.g7P03xJ26851@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] > > So my suggestion of changing now, before `heapq' gets released for real: > > > > heappush -> push > > heappop -> pop > > heapreplace -> replace > -1. The nmes 'push', 'pop' and 'replace' are too generic. The module > seems to "invite" the ``from heapq import heappush, heappop'' syntax, > and I'd like to honor that. May I invite you to reconsider? We are going to live with that one for a loong time, you know... Quite granted, as it stands, the module invites the long form of import (from MODULE import LIST-OF-NAMES). This _is_ what I question. Writing `heapq.heapXXX' is kind of ugly, people are going to spontaneously avoid it, especially given that the documentation says to do so. Yet, the long import line is uselessly tedious to write. I would not think the author really wrote `heappush' and `heappop' with the intent that they could sit in a module and be imported with the long form, but rather as inlinable `def', or maybe rather as built-in methods for `list' objects. That intent changing, the method names are then asking to be revised. There are not much cases in the Python library where the `from ... import' is forced upon users in practice. The `BaseHTTPServer' module and friends are the only examples that come to mind, and I find these import lines especially cumbersome to write: hopefully, these are not to be used often. The `heapq' module is different, as for some programmers, it might be used often, and I do not see a real reason for making it tedious or different. As for `push' etc. being too generic, there are used in the context of a specialised module, which gives these word there specialised meaning, so genericity is not a real argument. Other modules already qualify simple words. Has it been a problem? Even, would it be that some people really want to write `from MODULE import *' or `from MODULE import SUCH-AND-SUCH' for a lot of modules at global scope, something which is not to be encouraged anyway, these users still have `from ... import ... as' to help them. Please consider altering the current `heapq' module so to _not_ invite a different importing style. Make it more similar to the rest of the library, there is probably no real need for a difference. Let it be nicer to use! -- François Pinard http://www.iro.umontreal.ca/~pinard From Samuele Pedroni" <20020824161522.B1DC12044B@gandalf.hishome.net> <20020824200631.B8498@hishome.net> Message-ID: <003401c24c3e$95df06e0$6d94fea9@newmexico> This is straight from the library (xml.sax.saxutils) [chosen more because so at least it's real code than because nobody can argue that it is irrelevant. I find fascinating how much powerful is the argument "nobody does that often" in discussions about language design] def prepare_input_source(source, base = ""): """This function takes an InputSource and an optional base URL and returns a fully resolved InputSource object ready for reading.""" if type(source) in _StringTypes: source = xmlreader.InputSource(source) elif hasattr(source, "read"): f = source source = xmlreader.InputSource() source.setByteStream(f) if hasattr(f, "name"): source.setSystemId(f.name) ... the first problem is the "ontology" problem and how much we want to support someone who want to strictly check (a) "source has intentionally a file-like read" vs. just (b) "source has some read method"... This is indipendent of whether we have adapt or declarative interfaces or both. The above could be written as: source = adapt(source,xmlreader.InputSource) moving the code inside xmlreader.InputSource.__adapt__ . But does this address (a)? I would say no. Then one could simply not implement __adapt__ but leave the burden to the users to define my-type-with-a-good-read to xmlreader.InputSource adaptations. Or put code the code inside xmlreader.InputSource.__adapt__ and susbitute elif hasattr(source, "read"): with f = adapt(source,???) so the "ontology" problem is back. My point is not against adaptation, but that adaptation does not automagically solve all our problems without further thinking. I repeat, with both adaptation and interfaces, if one cares about contracts vs. just signatures, the ontology problem is with us. Adaptation is probably expressive enough. But the choice between (I) - mechanisms to ask and declare whether object implements protocol - mechanisms to register adapter factories between protocol A and B [I think this is Zope3 model] or (II) - just adaptation should be a choice also about convenience, readability, ... both ways the ontology problem is there. regards. From aahz@pythoncraft.com Sun Aug 25 15:47:43 2002 From: aahz@pythoncraft.com (Aahz) Date: Sun, 25 Aug 2002 10:47:43 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8 In-Reply-To: References: <20020824044416.GA25299@thyrsus.com> Message-ID: <20020825144743.GA1443@panix.com> On Sat, Aug 24, 2002, Tim Peters wrote: > > Given that in real life most people still get more not-spam than spam, > removing the counter-bias in the scoring math may boost the false negative > rate. as of six months ago, i no longer believe this to be necessarily true -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ From md9ms@mdstud.chalmers.se Sun Aug 25 15:50:14 2002 From: md9ms@mdstud.chalmers.se (Martin =?ISO-8859-1?Q?Sj=F6gren?=) Date: 25 Aug 2002 16:50:14 +0200 Subject: [Python-Dev] Re: heapq method names In-Reply-To: References: <200208250003.g7P03xJ26851@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <1030287015.559.9.camel@winterfell> --=-pbEfvDgzSA6SwGcsqAwx Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable s=C3=B6n 2002-08-25 klockan 14.04 skrev Fran=C3=A7ois Pinard: > As for `push' etc. being too generic, there are used in the context of a > specialised module, which gives these word there specialised meaning, so > genericity is not a real argument. Other modules already qualify simple > words. Has it been a problem? Even, would it be that some people really > want to write `from MODULE import *' or `from MODULE import SUCH-AND-SUCH= ' > for a lot of modules at global scope, something which is not to be encour= aged > anyway, these users still have `from ... import ... as' to help them. >=20 > Please consider altering the current `heapq' module so to _not_ invite a > different importing style. Make it more similar to the rest of the libra= ry, > there is probably no real need for a difference. Let it be nicer to use! I agree completely. And as for the names being too generic. Well, gee, they *are* usually used for putting an element in the data structure, removing an element from the data structure, et.c. Often regardless of the data structure/ADT. I've always used push, pop (and peek et.c.) for heaps, as well as stacks. If anything, I think that the pop method of lists is confusing :-) Regards, Martin --=-pbEfvDgzSA6SwGcsqAwx Content-Type: application/pgp-signature; name=signature.asc Content-Description: Detta =?ISO-8859-1?Q?=E4r?= en digitalt signerad meddelandedel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQA9aO6mGpBPiZwE9FYRAhsgAKCPxqMN8EPfNuHi3EzQF5gtlqD3DACfUyQ1 OlZqQmB6eXAH5801cZrhmyY= =3ecC -----END PGP SIGNATURE----- --=-pbEfvDgzSA6SwGcsqAwx-- From tim.one@comcast.net Sun Aug 25 17:53:24 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 25 Aug 2002 12:53:24 -0400 Subject: [Python-Dev] Re: heapq method names In-Reply-To: Message-ID: Note that it's common to use the bisect module in the from bisect import bisect_right, bisect, insort way too, rather than spell out bisect.bisect (etc) each time. That's "the other" module that (conceptually) adds new methods to lists. If you want simpler names, I'm finding this little module quite pleasant to use: """ import heapq class Heap(list): def __init__(self, iterable=[]): self.extend(iterable) push = heapq.heappush popmin = heapq.heappop replace = heapq.heapreplace heapify = heapq.heapify """ That is, it creates a Heap type that's just a list with some extra methods. Note that the "pop" method can't be named "pop"! If you try, you'll soon get unbounded recursion because the heapq functions need list.pop to access the list meaning of "pop". Guido suggested a long time ago that such a class could be added to heapq, and I like it a lot in real life. From guido@python.org Sun Aug 25 22:14:07 2002 From: guido@python.org (Guido van Rossum) Date: Sun, 25 Aug 2002 17:14:07 -0400 Subject: [Python-Dev] Re: heapq method names In-Reply-To: Your message of "Sun, 25 Aug 2002 08:04:20 EDT." References: <200208250003.g7P03xJ26851@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208252114.g7PLE7k03532@pcp02138704pcs.reston01.va.comcast.net> > May I invite you to reconsider? We are going to live with that one for > a loong time, you know... I know. I have read and re-read your arguments, but I see nothing to change my mind. Somehow the short names you suggest just seem wrong to me. We can agree to disagree, but I feel strongly that the names should not be changed. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Sun Aug 25 22:56:46 2002 From: guido@python.org (Guido van Rossum) Date: Sun, 25 Aug 2002 17:56:46 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Sat, 24 Aug 2002 14:41:16 EDT." References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> Message-ID: <200208252156.g7PLukI03662@pcp02138704pcs.reston01.va.comcast.net> [François] > Allow me some random thoughts. (Aren't they always random, anyway? :-) Maybe you could contribute some not-so-random code? Words we've got enough. :-) > When I saw some of the suggestions on this list for "generating" > elements of a cartesian product, despite sometimes elegant, I > thought "Too much done, too soon.". But the truth is that I did not > give the thing a serious try, I'm not sure I would be able to offer > anything better. > > One nice thing, with a dict or a set, is that we can quickly access > how many entries there are in there. Is there some internal way to > efficiently fetch the N'th element, from the order in which the keys > would be naturally listed? If not, one could always pay some extra > temporary memory to build a list of these keys first. If you have > to "generate" a cartesian product for N sets, you could set up a > compound counter as a list of N indices, the K'th meant to run from > 0 up to the cardinality C[K] of the K'th set, and devise simple > recipes to yield the element of the product represented by the > counter, and to bump it. Moreover, it would be trivial to equip > this generator with a `__len__' function able to predict the > cardinality CCC of the whole result, and quite easy being able to > transform any KKK between 0 and NNN into an equivalent compound > counter, and from there, access any member of the cartesian product > at constant speed, without generating it all. Since the user can easily multiply the length of the input sets together, what's the importance of the __len__? And what's the use case for randomly accessing the members of a cartesian product? IMO, the Cartesian product is mostly useful for abstract matehematical though, not for solving actual programming problems. > All the above is pretty simple, and meant to introduce a few > suggestions that might solve once and for all, if we could do it > well enough, a re-occurring request on the Python list about how to > produce permutations and al. We might try to rewrite the recipes > behind a "generating" cartesian product of many sets, illustrated > above, into a similar generating function able to produce all > permutations of a single set. So let's say: > > Set([1, 2, 3]).permutations() > > would lazily produce the equivalent of: > > Set([(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]) > > That generator could offer a `__len__' function predicting the > cardinality CCC of the result, and some trickery could be used to > map integers from 0 to CCC into various permutations. Once on the > road, it would be worth offering combinations and arrangements just > as well, and maybe also offering the "power" set, I mean here the > set of all subsets of a given set, all with predictable cardinality > and constant speed access to any member by index. Obviously you were inspired here by Eric Raymond's implementation of the powerset generator... But again I ask, what's the practical use of random access to permutations? (I know there are plenty of uses of permutations, but I doubt the need for random access.) > Yet, many questions or objections may rise. Using tuples to > represent each element of a cartesian product of many sets is pretty > natural, but it is slightly less clear that a tuple is the best fit > for representing an ordered set as in permutations and arrangements, > as tuples may allow elements to be repeated, while an ordered set > does not. I think that sets are to be preferred over tuples for > returning combinations or subsets. You must have temporarily stopped thinking clearly there. Seen as sets, all permutations of a set's elements are the same! The proper output for a generator of permutations is a list; for generality, its input should be any iterable (which includes sets). If the input contains duplicates, well, that's the caller's problem. For combinations, sets are suitable as output, but again, I think it would be just a suitable to take a list and generate lists -- after all the lists are trivially turned into sets. > While it is natural to speak and think of subsets of a set, or > permutations, arrangements and combinations of a set, some people > might prefer to stay closer to an underlying implementations with > lists (sublists of a list, permutations, arrangements or > combinations of a list), and would feel that going through sets is > an unwelcome detour for their applications. Indeed, what's the real > use and justficiation for hashing keys and such things, when one > wants nothing else than arrangements from a list? Right. > Another aspect worth some thinking is that permutations, in > particular, are mathematical objects in themselves: we can notably > multiply permutations or take the inverse of a permutation. That would be a neat class indeed. How useful it would be in practice remains to be seen. Do you do much ad-hoc permutation calculations? > Arrangements are in fact permutations over combinations elements. > Some thought is surely needed for properly reflecting mathematical > elegance into how the set API is extended for the above, and not > merely burying that elegance under practical concerns. And, on the other hand, practicality beats purity. > Some people may think that these are all problems which are > orthogonal to the design of a basic set feature, and which should be > addressed in separate Python modules. On the other hand, I think > that a better and nicer integration might result if all these things > were thought together, and thought sooner than later. Moreover, my > intuition tells me that with some care and luck (both are needed), > these extra set features could be small enough additions to the > `sets' module to not be worth another one. Besides, if appropriate, > such facilities would surely add a lot of zest and punch into the > interest raised by the `sets' module when it gets published. I'd rather see the zest added to Python as a whole -- sets are a tiny part, and if you read PEP 218, you'll see that the sets module is only a modest first step of that PEP's program. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Mon Aug 26 01:08:03 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 26 Aug 2002 12:08:03 +1200 (NZST) Subject: [Python-Dev] Re: Automatic flex interface for Python? In-Reply-To: <007b01c24a7a$a081c580$0900a8c0@spiff> Message-ID: <200208260008.g7Q083bq004171@kuku.cosc.canterbury.ac.nz> > you can do that without even looking at the characters? No, but the original complaint was that immutability of strings made lexing difficult. I was pointing out that it's possible to do it without mutating anything per-character. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Mon Aug 26 01:37:22 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 26 Aug 2002 12:37:22 +1200 (NZST) Subject: [Python-Dev] Questions about sets.py In-Reply-To: <200208232121.g7NLLHh16924@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208260037.g7Q0bMuJ004191@kuku.cosc.canterbury.ac.nz> Guido van Rossum : > But in order to be a good > citizen in the world of binary operators, __or__ should not raise > TypeError; > > if the other argument implements __ror__, union() > will acquire this ability. Another possible reason is so that if a subclass overrides __or__, union() will get the new behaviour too. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Mon Aug 26 01:58:10 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 26 Aug 2002 12:58:10 +1200 (NZST) Subject: [Python-Dev] Why no math.fac? In-Reply-To: <20020824183056.GA1859@lilith.ghaering.test> Message-ID: <200208260058.g7Q0wAi8004211@kuku.cosc.canterbury.ac.nz> Gerhard =?iso-8859-1?Q?H=E4ring?= : > Any reason why there isn't any factorial function in the math > module? Probably because there isn't one in the C math library. :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From python-dev@zesty.ca Mon Aug 26 08:57:30 2002 From: python-dev@zesty.ca (Ka-Ping Yee) Date: Mon, 26 Aug 2002 00:57:30 -0700 (PDT) Subject: [Python-Dev] type categories In-Reply-To: <200208231715.g7NHFRl12405@pcp02138704pcs.reston01.va.comcast.net> Message-ID: On Fri, 23 Aug 2002, Guido van Rossum wrote: > I haven't given up the hope that inheritance and interfaces could use > the same mechanisms. But Jim Fulton, based on years of experience in > Zope, claims they really should be different. I wish I understood why > he thinks so. If i may hazard a guess, i'd imagine that Jim's answer would simply be that inheritance (of implementation) doesn't imply subtyping, and subtyping doesn't imply inheritance. That is, you may often want to re-use the implementation of one class in another class, but this doesn't mean the new class will meet all of the commitments of the old. Conversely, you may often want to declare that different classes adhere to the same set of commitments (i.e. provide the same interface) even if they have different implementations. (A common situation where the latter occurs is when the implementations are written by different people.) > Agreeing on an ontology seems the hardest part to me. Indeed. One of the advantages of separating inheritance and subtyping is that this can give you a bit more flexibility in setting up the ontology, which may make it easier to settle on something good. -- ?!ng From Jack.Jansen@oratrix.com Mon Aug 26 10:42:29 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Mon, 26 Aug 2002 11:42:29 +0200 Subject: [Python-Dev] [development doc updates] In-Reply-To: <15718.47035.3545.985252@grendel.zope.com> Message-ID: <23254D39-B8D8-11D6-A6B1-0030655234CE@oratrix.com> On Saturday, August 24, 2002, at 12:31 , Fred L. Drake, Jr. wrote: > > Jack Jansen writes: >> how much work would it be to make at least the html tarfile >> available too under www.python.org/doc/2.3? > > Are you looking for the tarfile or for an online documentation set? I'm looking for the tarfile. The documentation builder downloads it, annotates the HTML files with some stuff and then feeds it through the Help Builder. The result can be searched and browsed by Apple Help Viewer (except for the minor detail that we can't yet convince AHV that the Python documentation actually exists:-) > >> I'm looking at making the documentation friendly to the Mac help >> viewer (actually, Bill Fancher donated the code), and it would >> help the build process is there was a fixed URL based on the >> version number where I could always find the latest docs for the >> current version. > > Is there online documentation for the Mac OS help viewer? I don't > know anything about it. It's all a bit fragmented, but "providing user assistance with Apple Help" has most of the highlevel info. "Apple Help Reference" has the API a program can use to interface to the help manager (at least, some of it:-). On OSX these are online if you've installed the developer tools. Otherwise you can find them on the Apple website too. -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From laszlo.asboth@yageo.com Mon Aug 26 11:31:53 2002 From: laszlo.asboth@yageo.com (Laszlo Asboth) Date: Mon, 26 Aug 2002 11:31:53 +0100 Subject: [Python-Dev] Introduction of new project Message-ID: --=_mixed 00390300C1256C20_= Content-Type: multipart/alternative; boundary="=_alternative 00390300C1256C20_=" --=_alternative 00390300C1256C20_= Content-Type: text/plain; charset="us-ascii" Dear all, I would like to inform you that I started a new project. Please see attached file. Many thanks and best regards, Laszlo Asboth --=_alternative 00390300C1256C20_= Content-Type: text/html; charset="us-ascii"
Dear all,

I would like to inform you that I started a new project.
Please see attached file.



Many thanks and best regards,

Laszlo Asboth
--=_alternative 00390300C1256C20_=-- --=_mixed 00390300C1256C20_= Content-Type: text/plain; name="Takas Introduction.txt" Content-Disposition: attachment; filename="Takas Introduction.txt" Content-Transfer-Encoding: quoted-printable "TAKAS" Project Introduction History: The idea of this project was arisen already 10 years ago. At that time in H= ungary mostly was used computers with Intel 386 processor and Novell 3.1 op= eration system. Due to informatics development there was made a lot of new = programs. I found a problem with them. For each task we used other standalo= ne program, in most cases made by totally different supplier/programmer. Th= e bottleneck of such kind of way of working is that each program needs to b= e updated with the same data. This means waste of energy. Then came up an i= dea, why we do not use only one system for all tasks, which can be done by = computer within a company? At my next workplace found the same situation. While this company was multi= national company the decision about each important issue came from abroad. = Here we used more different operating systems than before (Windows 3.1, Nov= ell, AS/400). In my existing workplace (started at 2nd of February 1998) the so called "d= iffusion" had been grown, against that we use more ERP & CRM systems like B= aan, SAP, etc. Fortunately between these systems a lot of interfaces were b= uilt to share data (via automatic ftp jobs). After introducing the structur= e of the informatics's system I found that we use these systems only partly= . This shows for me that these systems can not fulfil fundamental (all) req= uirements of a normal company. After more than 4 yearly preparation and collection of experiences I decide= d to start to build an own ERP & CRM system which applies the existing know= ledge of mine and volunteer's who joins to this idea. Mine main philosophy is that the computer should serve people and not oppos= ite (as works nowadays). Meaning the name of the project: The name has no definite meaning. At my workplace I work together with a co= lleague. When I discussed with him about this idea, he thought there is som= ething in it too. We started to plan it. While we are already on way we nee= d to call the project somehow. After some thinking and intuition the name w= as found: TAKAS. This is coming from two slices: "TAK" and "AS". Both are c= oming from our names (TAKacs, ASboth). Purpose of the project: The project has the following purposes:=20 * Build only one new - homogeneous - computer system, which helps operate t= o maintain data of the company. * Create a framework where possible to find and realise a solution on an ea= sier, faster way. Advantages: * Robust, fast, easy to use * Clasps all tasks of informatics of an arbitrary big organisation * Decreases time of maladministration * Increases the level of services within and between companies * Much less bug possibilities (no any interfaces necessary) * Each data needs to enter once and at the point of origin only (efficiency= , more disengagedness) * The user works in unitary surface (less learning time) * Economical solution, while it can be modified easily The project would like to reach: all data, which can be handled via compute= rs, should be handled in this system only. That means integrate all tasks i= n computer into this program (beside normal direction of the company the sy= stem should handle the e-mails, fax, personal tasks, contacts, etc. too). Philosophy of the project: >From company's management point of view the purpose to use this system can = be to build an effective, economical organisation with it. A system is good= when the user can faster and easier use it. Other important issue is the cost of the programs. There are big difference= s between people about. Most of the managers think that programs bought fro= m software companies shares the responsibility of use. The consequence of t= his kind of thinking is higher cost. There are a lot of programs which costs very little or nothing (called shar= eware, freeware or open source programs). My experience shows that program = of both categories work fine. At the beginning of the project should be decided which category this syste= m will join. We already discussed about. It can be evident that experts can= be found with charge only. My opinion (was and) is now that it can be foun= d good people who join to the project in behalf of creation and psychologic= al appreciation only. On other hand I am sure that later on we will get bac= k our efforts in material form too. This means that this project will be on the open source category. Certainly= before the real implementation we have to find the concord how can the pro= ject continue to exist and fluent grow. This philosophy brought already big result according to many projects on th= e Internet where a lot of experts are participated. There is other point of view. The project has a purpose to help people to g= row and develop themselves too. New ideas, proposals change the world into = a better way as we go now. I think this issue will be much more concrete la= ter on. Backgrounds: To be successful with this project it has to have very good background. Sho= uld be defined which operating system, graphical interface, programming lan= guage and database is the best for this project. Operating system: Beside Windows there are a lot of other operating system. Although there ar= e a lot of good things in it, I think for us better to choose FreeBSD, beca= use it is organised by a "core-team", works robustly, grows dynamically and= well documented. Programming language: In this area we can find a lot of them starting from the low level to high = level languages. My opinion is that we need a high level programming langua= ge for this project. After a long search on Internet I found Python program= ming language. This language takes into account best the object-oriented ph= ilosophy. The advantage of Python is that can be used as low level and high= level operation too. There are a lot of packages to it, which are very use= ful to create arbitrary program. The possibility to use it on mostly used o= perating systems is a very good aspect too. Graphical interface: We need a graphical interface too, while to my opinion a reliable system in= our world should work with it. There are more interfaces to Python too, I = think for us the wxPython is the best choice. Database manager: This point is the most important for such a big system. There is 2 differen= t type of database managers: relational and object-oriented. In the past I = introduced relational databases only. After knowing Python I found Zope Obj= ect DataBase (ZODB). The main advantage of it that it was written in Python= . There is an "extension" for it Zope Enterprise Objects (ZEO) which provid= es to connect to ZODB via network. My existing experience about ZEO that it= is not ready for handling more than 1 database in same time (I hope maybe = this project gives the chance to go further in this way). The above mentioned backgrounds are the fundamentals of the project. Certai= nly each idea, which can help to make better solutions, is welcome. I know = that there are a lot of experts who knows much more about these and other c= omponents than I. This is the reason why I would like to ask anybody to joi= n to the project. Introdution of the actor: I would like to introduce myself too. My name is L=E1szl=F3 Asb=F3th. I am = 38 years old. I have 2 children (my daughter D=F3ra 13, my son =C1d=E1m 7).= I live in Hungary, at 5 Kiss J. str, in S=E1rv=E1r. My professions are lan= d surveyor, computer programmer and preventive parapsychologist. My existing workplace is in Szombathely, called Phycomp Hungary Kft. I work= as a program engineer. My first impression about computers started when the first IBM AT computers= came to Matav Rt (the biggest telecommunication's company of Hungary). I d= id not know the command "dir" yet. I started to play with it, because it wa= s very interesting. After some months I learned a lot about DOS, and starte= d to make a program in Basic language. This was a "TOTO" program. Although = we won some money on the second week we could not make a fortune. My boss s= howed my interest for computers he assigned me to a new post. Then I starte= d to learn programming, which I ended in 1993. I learned more computer lang= uages (Clipper, Pascal, C). On my existing workplace in 1998 met with the U= nix operating system and Baan ERP system. I learned to program in Baan as a= utodidact. Last year I started seriously search for components to the project.=20 In my spare time I am interested in self-development. I use and teach some = methods in this area. Invitation: When you think this project has something for you, or you have interest to = join, please contact me. When you can give some advises only they are welco= me too. Thank you very much for reading this introduction (excuse me for my english= ). In S=E1rv=E1r, on 19th of august, 2002. L=E1szl=F3 Asb=F3th --=_mixed 00390300C1256C20_=-- From mwh@python.net Mon Aug 26 12:16:43 2002 From: mwh@python.net (Michael Hudson) Date: 26 Aug 2002 12:16:43 +0100 Subject: [Python-Dev] utf8 issue In-Reply-To: Guido van Rossum's message of "Fri, 23 Aug 2002 17:05:27 -0400" References: <200208232105.g7NL5RE16863@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <2mznv9c1k4.fsf@starship.python.net> Guido van Rossum writes: > This might beling on SF, except it's already been solved in Python > 2.3, and I need guidance about what to do for Python 2.2.2. > > In 2.2.1, a lone surrogate encoded into utf8 gives an utf8 string that > cannot be decode back. In 2.3, this is fixed. Should this be fixed > in 2.2.2 as well? I think this was discussed really quite a long time ago, like six months or so. > I'm asking because it caused problems with reading .pyc files: if > there's a Unicode literal containing a lone surrogate, reading the > .pyc file causes an exception: > > UnicodeError: UTF-8 decoding error: unexpected code byte > > It looks like revision 2.128 fixed this for 2.3, but that patch > doesn't cleanly apply to the 2.2 maintenance branch. Can someone > help? I think the reason this didn't get fixed in 2.2.1 is that it necessitates bumping MAGIC. I can probably dig up more references if you want. Cheers, M. -- 34. The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From pinard@iro.umontreal.ca Mon Aug 26 13:24:49 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 26 Aug 2002 08:24:49 -0400 Subject: [Python-Dev] Re: heapq method names In-Reply-To: <200208252114.g7PLE7k03532@pcp02138704pcs.reston01.va.comcast.net> References: <200208250003.g7P03xJ26851@pcp02138704pcs.reston01.va.comcast.net> <200208252114.g7PLE7k03532@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] > > May I invite you to reconsider? We are going to live with that one for > > a loong time, you know... > I know. I have read and re-read your arguments, but I see nothing to > change my mind. Somehow the short names you suggest just seem wrong > to me. We can agree to disagree, but I feel strongly that the names > should not be changed. As you did read my arguments (from your first reply, I thought you missed them, and this is why I tried explaining them better), and that I have nothing substantially new to offer, that closes this discussion... We are solidly set on the current documentation. To be or not to be, ... etc. :-) Have a good day, and keep happy! -- François Pinard http://www.iro.umontreal.ca/~pinard From pinard@iro.umontreal.ca Mon Aug 26 13:32:02 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 26 Aug 2002 08:32:02 -0400 Subject: [Python-Dev] Re: heapq method names In-Reply-To: References: Message-ID: [Tim Peters] > Note that it's common to use the bisect module in the > from bisect import bisect_right, bisect, insort > way too, rather than spell out bisect.bisect (etc) each time. That's "the > other" module that (conceptually) adds new methods to lists. Wow! You just put the finger on it... I wondered a few times why this module never attracted me! :-) :-) > If you want simpler names, I'm finding this little module quite pleasant > to use: [...] That is, it creates a Heap type that's just a list with > some extra methods. Very elegant indeed. Something like this was discussed earlier, but faded out of my memory. Thanks for the tip, Tim! > Note that the "pop" method can't be named "pop"! If you try, you'll soon > get unbounded recursion because the heapq functions need list.pop to access > the list meaning of "pop". Sold! `popmin' is adequate and clear. -- François Pinard http://www.iro.umontreal.ca/~pinard From jeremy@alum.mit.edu Mon Aug 26 14:08:05 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Mon, 26 Aug 2002 09:08:05 -0400 Subject: [Python-Dev] Re: heapq method names In-Reply-To: References: Message-ID: <15722.10293.687009.656621@slothrop.zope.com> >>>>> "TP" == Tim Peters writes: TP> That is, it creates a Heap type that's just a list with some TP> extra methods. Note that the "pop" method can't be named "pop"! TP> If you try, you'll soon get unbounded recursion because the TP> heapq functions need list.pop to access the list meaning of TP> "pop". TP> Guido suggested a long time ago that such a class could be added TP> to heapq, and I like it a lot in real life. You can't use all of the regular list methods, right? If I'd called append() an a Heap(), it wouldn't maintain the heap invariant. I would think the same is true of insert() and lots of other methods. If we add a Heap class, which seems quite handy, maybe we should disable methods that don't work. Interesting to note that if you disable the invalid methods of list, then you've got a subclass of list that is not a subtype. Jeremy From guido@python.org Mon Aug 26 15:05:20 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 26 Aug 2002 10:05:20 -0400 Subject: [Python-Dev] utf8 issue In-Reply-To: Your message of "Mon, 26 Aug 2002 12:16:43 BST." <2mznv9c1k4.fsf@starship.python.net> References: <200208232105.g7NL5RE16863@pcp02138704pcs.reston01.va.comcast.net> <2mznv9c1k4.fsf@starship.python.net> Message-ID: <200208261405.g7QE5Of05199@pcp02138704pcs.reston01.va.comcast.net> > Guido van Rossum writes: > > > This might beling on SF, except it's already been solved in Python > > 2.3, and I need guidance about what to do for Python 2.2.2. > > > > In 2.2.1, a lone surrogate encoded into utf8 gives an utf8 string that > > cannot be decode back. In 2.3, this is fixed. Should this be fixed > > in 2.2.2 as well? > > I think this was discussed really quite a long time ago, like six > months or so. > > > I'm asking because it caused problems with reading .pyc files: if > > there's a Unicode literal containing a lone surrogate, reading the > > .pyc file causes an exception: > > > > UnicodeError: UTF-8 decoding error: unexpected code byte > > > > It looks like revision 2.128 fixed this for 2.3, but that patch > > doesn't cleanly apply to the 2.2 maintenance branch. Can someone > > help? > > I think the reason this didn't get fixed in 2.2.1 is that it > necessitates bumping MAGIC. > > I can probably dig up more references if you want. Please do. Bumping MAGIC is a no-no between dot releases. But I don't understand why that is necessary? --Guido van Rossum (home page: http://www.python.org/~guido/) From pinard@iro.umontreal.ca Mon Aug 26 15:09:45 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 26 Aug 2002 10:09:45 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <200208252156.g7PLukI03662@pcp02138704pcs.reston01.va.comcast.net> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <200208252156.g7PLukI03662@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] > Since the user can easily multiply the length of the input sets together, > what's the importance of the __len__? And what's the use case for > randomly accessing the members of a cartesian product? For cartesian product, permutations, combinations, arrangements and power sets, I do not see real use to __len__ or random accessing of members besides explicit looping (or maybe saving an indice for later resumption). In my own case, I guess iterators (or generators) would leave me happy enough that I do not really need more. > IMO, the Cartesian product is mostly useful for abstract matehematical > though, not for solving actual programming problems. One practical application pops to mind. People might progressively use and abuse the paradigm of looping over the members of a cartesian product, instead of relying on nests of embedded loops over each of the set members. > > So let's say: > > Set([1, 2, 3]).permutations() > > would lazily produce the equivalent of: > > Set([(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]) > > > > That generator could offer a `__len__' function predicting the > > cardinality CCC of the result, and some trickery could be used to map > > integers from 0 to CCC into various permutations. Once on the road, > > it would be worth offering combinations and arrangements just as well, > > and maybe also offering the "power" set, I mean here the set of all > > subsets of a given set, all with predictable cardinality and constant > > speed access to any member by index. > Obviously you were inspired here by Eric Raymond's implementation of > the powerset generator... I do not think so. I was inspired by the remembering I have from the SSP package (something like Social Science Package - a FORTRAN library - this was before SPSS). It was especially clever about enumerating permutations. > But again I ask, what's the practical use of random access to permutations? The only one I see is for resuming an interrupted enumeration. > > Using tuples to represent each element of a cartesian product of many > > sets is pretty natural, but it is slightly less clear that a tuple > > is the best fit for representing an ordered set as in permutations > > and arrangements, as tuples may allow elements to be repeated, while > > an ordered set does not. I think that sets are to be preferred over > > tuples for returning combinations or subsets. > You must have temporarily stopped thinking clearly there. Maybe not, but I do not always express myself clearly. Sorry. > Seen as sets, all permutations of a set's elements are the same! It's no use seeing each individual permutation as a set, of course. But the _set_ of all permutations, each of which being a tuple, is meaningful, at least from the fact that permutations are conceptually unordered. However, it might be (sometimes, maybe not that often) useful to enumerate permutations in some canonical order. > For combinations, sets are suitable as output, but again, I think it > would be just a suitable to take a list and generate lists -- after > all the lists are trivially turned into sets. Quite agreed. > > Another aspect worth some thinking is that permutations, in > > particular, are mathematical objects in themselves: we can notably > > multiply permutations or take the inverse of a permutation. > That would be a neat class indeed. How useful it would be in practice > remains to be seen. Do you do much ad-hoc permutation calculations? A few common algorithms also make use of permutations without naming them. `sort' applies a precise permutation, and it is often convenient to inverse that permutation to recover the original order after having enriched the resulting structure, say. I quite often resorted to the above trick. Notice that `string.translate' applies a permutation. In another life, I wrote a program (unknown outside CDC Cyber space) that was efficiently comparing possibly big files, and I remember I had to work a lot with permutation arithmetic. This was a bit specialised, however. I once wrote a C application named `recode' that mainly does charset conversions. It does some arithmetic on permutations at a few places, either for optimisation or while seeking reversibility, and this is really nothing far stretched or un-natural. I sometimes plan to parallel `recode' with a Python implementation, because from experience, prototyping ideas in C is rather painful, while I foresee it would be far lot easier in Python. Undoubtedly then, I would formalise permutations. > > Some thought is surely needed for properly reflecting mathematical > > elegance into how the set API is extended for the above, and not > > merely burying that elegance under practical concerns. > And, on the other hand, practicality beats purity. Only when both conflict without any hope of resolution. But when practicality and purity can coexist, that's even better. So much better in fact, that it's always worth very seriously trying to seek coexistence. -- François Pinard http://www.iro.umontreal.ca/~pinard From arigo@ulb.ac.be Mon Aug 26 15:20:35 2002 From: arigo@ulb.ac.be (Armin Rigo) Date: Mon, 26 Aug 2002 16:20:35 +0200 (CEST) Subject: [Python-Dev] SET_LINENO removal bugs Message-ID: Hello everybody, A few core bugs with the line tracing in the new SET_LINENO-free world: http://www.python.org/sf/587993 Armin From tim.one@comcast.net Mon Aug 26 15:27:53 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 26 Aug 2002 10:27:53 -0400 Subject: [Python-Dev] Re: heapq method names In-Reply-To: <15722.10293.687009.656621@slothrop.zope.com> Message-ID: [Jeremy Hylton] > You can't use all of the regular list methods, right? If I'd called > append() an a Heap(), it wouldn't maintain the heap invariant. I > would think the same is true of insert() and lots of other methods. Well, you *can* use all list methods, just as you can already call heapq.heappop and pass any old list. heapify exists so that you can repair the heap invariant if you have reason to believe you may have broken it. > If we add a Heap class, which seems quite handy, maybe we should > disable methods that don't work. If it were called PriorityQueue, definitely, but "a heap" is a specific implementation and heap users can exploit the list representation in lots of interesting ways. > Interesting to note that if you disable the invalid methods of list, > then you've got a subclass of list that is not a subtype. It is an interesting case this way! There's a related case that's perhaps of more pressing interest: Raymond Hettinger has pointed out that, e.g., 3 in Set is much slower than 3 in dict This is because Set.__contains__ is a Python-level call. I sped up almost all the binary set operations yesterday by factors of 2 to 5, mostly via just using the underlying dict.__contains__ under the covers instead of appealing to Set.__contains__. For "simple sets" (sets that don't magically try to convert mutable objects-- like sets --into immutable objects), the speed of 3 in Set could be restored by subclassing from dict and inheriting its __contains__. But Set is trying not to make any promises about representation, so this is a clearer case for some form of "(only) implementation inheritance". The desire is driven only by speed, but that can be a legit concern too (I'm not sure it's a killer concern in this particular case). From guido@python.org Mon Aug 26 15:45:28 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 26 Aug 2002 10:45:28 -0400 Subject: [Python-Dev] type categories In-Reply-To: Your message of "Mon, 26 Aug 2002 00:57:30 PDT." References: Message-ID: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> [Guido] > > I haven't given up the hope that inheritance and interfaces could use > > the same mechanisms. But Jim Fulton, based on years of experience in > > Zope, claims they really should be different. I wish I understood why > > he thinks so. [Ping] > If i may hazard a guess, i'd imagine that Jim's answer would simply be > that inheritance (of implementation) doesn't imply subtyping, and > subtyping doesn't imply inheritance. Well, yes, of course. But I strongly believe that in *most* cases, inheritance and subtyping go hand in hand. I'd rather invent a mechanism to deal with the exceptions rather than invent two parallel mechanisms that must both be deployed separately to get the full benefit out of them. > That is, you may often want to re-use the implementation of one class > in another class, but this doesn't mean the new class will meet all of > the commitments of the old. Conversely, you may often want to declare > that different classes adhere to the same set of commitments (i.e. > provide the same interface) even if they have different implementations. > (A common situation where the latter occurs is when the implementations > are written by different people.) Nevertheless, these are exceptions to the general rule. > > Agreeing on an ontology seems the hardest part to me. > > Indeed. One of the advantages of separating inheritance and subtyping > is that this can give you a bit more flexibility in setting up the > ontology, which may make it easier to settle on something good. Really? Given that there are no inheritance relationships between the existing built-in types, I would think that you could define an ontology consisting entirely of abstract types, and then graft the concrete types on it. I don't see what having separate interfaces would buy you. But perhaps you can give an example that shows your point? --Guido van Rossum (home page: http://www.python.org/~guido/) From python@rcn.com Mon Aug 26 15:55:15 2002 From: python@rcn.com (Raymond Hettinger) Date: Mon, 26 Aug 2002 10:55:15 -0400 Subject: [Python-Dev] Re: heapq method names References: Message-ID: <002901c24d10$97144a20$0d61accf@othello> From: "Tim Peters" > If you want simpler names, I'm finding this little module quite pleasant to > use: > > """ > import heapq > > class Heap(list): > def __init__(self, iterable=[]): > self.extend(iterable) > push = heapq.heappush > popmin = heapq.heappop > replace = heapq.heapreplace > heapify = heapq.heapify > """ And perhaps: def __iter__(self): while True: yield self.popmin() Raymond Hettinger From Samuele Pedroni" <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <016b01c24d13$3bc6d900$6d94fea9@newmexico> [GvR] > [Ping] > > If i may hazard a guess, i'd imagine that Jim's answer would simply be > > that inheritance (of implementation) doesn't imply subtyping, and > > subtyping doesn't imply inheritance. > > Well, yes, of course. But I strongly believe that in *most* cases, > inheritance and subtyping go hand in hand. I'd rather invent a > mechanism to deal with the exceptions rather than invent two parallel > mechanisms that must both be deployed separately to get the full > benefit out of them. One exception being to able to declare conformance to an interface after-the-fact in some sweet way. > > > Agreeing on an ontology seems the hardest part to me. > > > > Indeed. One of the advantages of separating inheritance and subtyping > > is that this can give you a bit more flexibility in setting up the > > ontology, which may make it easier to settle on something good. > > Really? Given that there are no inheritance relationships between the > existing built-in types, I would think that you could define an > ontology consisting entirely of abstract types, and then graft the > concrete types on it. I don't see what having separate interfaces > would buy you. But perhaps you can give an example that shows your > point? > E.g. my ideas of declaring partial conformance and of super-interfaces identified as a base-interface plus a subset of signatures do not fit so well in a just-abstract-classes model. But OTOH I insist, IMO, given how python code is written now, they would be handy although complex. regards. From barry@python.org Mon Aug 26 16:29:55 2002 From: barry@python.org (Barry A. Warsaw) Date: Mon, 26 Aug 2002 11:29:55 -0400 Subject: [Python-Dev] type categories References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <016b01c24d13$3bc6d900$6d94fea9@newmexico> Message-ID: <15722.18803.575337.718874@anthem.wooz.org> >>>>> "SP" == Samuele Pedroni writes: SP> One exception being to able to declare conformance to an SP> interface after-the-fact in some sweet way. This is a very important use case, IMO. I'm leary of trying to weave some interface taxonomy into the standard library and types without having a lot of experience in using this for real world applications. Even then, it's possible that there will be a lot of disagreement on the shape of the type hierarchy. So one strategy would be to not classify the existing types and classes ahead of time, but to provide a way for an application to declare conformance to existing types in a way that makes sense for the application (or library). The downside of this is that it may lead to a raft of incompatible interface declarations, but I also think that eventually we'd see convergence as we gain more experience. My guess would be that of all the interfaces that get defined and used in the Python community, on a few that are commonly agreed on or become ubiquitous idioms will migrate into the core. I don't think we need to solve this "problem" for the core types right away. Let's start by providing mechanism and not policy. -Barry From guido@python.org Mon Aug 26 16:31:34 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 26 Aug 2002 11:31:34 -0400 Subject: [Python-Dev] type categories In-Reply-To: Your message of "Mon, 26 Aug 2002 17:14:11 +0200." <016b01c24d13$3bc6d900$6d94fea9@newmexico> References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <016b01c24d13$3bc6d900$6d94fea9@newmexico> Message-ID: <200208261531.g7QFVYr05740@pcp02138704pcs.reston01.va.comcast.net> > [GvR] > > [Ping] > > > If i may hazard a guess, i'd imagine that Jim's answer would simply be > > > that inheritance (of implementation) doesn't imply subtyping, and > > > subtyping doesn't imply inheritance. > > > > Well, yes, of course. But I strongly believe that in *most* cases, > > inheritance and subtyping go hand in hand. I'd rather invent a > > mechanism to deal with the exceptions rather than invent two parallel > > mechanisms that must both be deployed separately to get the full > > benefit out of them. [Samuele] > One exception being to able to declare conformance to an interface > after-the-fact in some sweet way. I've heard of people who add mix-in base classes after the fact by using assignment to __bases__. (This is currently not supported by new-style classes, but it's on my list of things to fix.) If that's not acceptable (it certainly looks questionable to me :-), I guess a separate registry may have to be created; ditto for deviations in the other direction (implementation inheritance without interface conformance). > E.g. > my ideas of declaring partial conformance and of super-interfaces > identified as a base-interface plus a subset of signatures do not > fit so well in a just-abstract-classes model. But OTOH I insist, > IMO, given how python code is written now, they would be handy > although complex. Yes, I'll have to think about that idea some more. It's appealing because it matches current Pythonic practice better than anything else. OTOH I want a solution that can be verified at compile time. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Aug 26 16:41:11 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 26 Aug 2002 11:41:11 -0400 Subject: [Python-Dev] type categories In-Reply-To: Your message of "Mon, 26 Aug 2002 11:29:55 EDT." <15722.18803.575337.718874@anthem.wooz.org> References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <016b01c24d13$3bc6d900$6d94fea9@newmexico> <15722.18803.575337.718874@anthem.wooz.org> Message-ID: <200208261541.g7QFfB205791@pcp02138704pcs.reston01.va.comcast.net> > I'm leary of trying to weave some interface taxonomy into the standard > library and types without having a lot of experience in using this for > real world applications. Even then, it's possible that there > will be a lot of disagreement on the shape of the type hierarchy. That's what I said when I predicted it would be hard to come up with an ontology. :-) > So one strategy would be to not classify the existing types and > classes ahead of time, but to provide a way for an application to > declare conformance to existing types in a way that makes sense for > the application (or library). The downside of this is that it may > lead to a raft of incompatible interface declarations, but I also > think that eventually we'd see convergence as we gain more experience. This is what Zope does. One problem is that it has its own notion of what makes a "sequence", a "mapping", etc. that don't always match the Pythonic convention. > My guess would be that of all the interfaces that get defined and used > in the Python community, on a few that are commonly agreed on or > become ubiquitous idioms will migrate into the core. I don't think we > need to solve this "problem" for the core types right away. Let's > start by providing mechanism and not policy. Sure. But does that mean the mechanism needs to be necessarily separate from the inheritance mechanism? --Guido van Rossum (home page: http://www.python.org/~guido/) From alessandro_amici@telespazio.it Mon Aug 26 16:42:25 2002 From: alessandro_amici@telespazio.it (Amici Alessandro) Date: Mon, 26 Aug 2002 17:42:25 +0200 Subject: [Python-Dev] Large file support for the mmap module? Message-ID: hi, while looking for efficient ways to manipulate large files (>2Gb) with python i noted an artificial limitation in the mmap module present in the standard library. right now mmap objects behave like an hybrid between a file and a string, but their size is limited to 2Gb files on 32bit architectures (the offset argument in the mmap call is always set to 0 and several members of the structure have type size_t). adding a rough implementation for 64bit offset in the mmap call is trivial (i have done it, cutting and pasting from fileobject.c), but it is not obvious how the file-like soul of the mmap object should be affected by the offset. actually, it is not clear to me why the file-like behavior is present at all. is there any plan to add LFS to the mmap module? are there known workaround? thanks, alessandro From mclay@nist.gov Mon Aug 26 16:42:33 2002 From: mclay@nist.gov (Michael McLay) Date: Mon, 26 Aug 2002 11:42:33 -0400 Subject: [Python-Dev] type categories In-Reply-To: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208261142.33324.mclay@nist.gov> On Monday 26 August 2002 10:45 am, Guido van Rossum wrote: > > Indeed. One of the advantages of separating inheritance and subtyping > > is that this can give you a bit more flexibility in setting up the > > ontology, which may make it easier to settle on something good. > > Really? Given that there are no inheritance relationships between the > existing built-in types, I would think that you could define an > ontology consisting entirely of abstract types, and then graft the > concrete types on it. I don't see what having separate interfaces > would buy you. But perhaps you can give an example that shows your > point? Several posts have expressed a need to support multiple ontologies for a given set of classes. This doesn't preclude using the ontology that is defined by the class hierarchy as one method for defining an ontology. That could be the default ontology. What is missing is the ability to also place a class in to an alternate ontology that may be specific to an application. The problem could be solved if applications had the ability to add and delete references to the type interface definition that apply to a class. Aclass.declaretype(AnInterface) Aclass.deletetype(AnInterface) Perhaps the interface definitions should also be able to add themselves to class definitions. That way common interface patterns that apply to standard libraries could be defined in the standard library. This would eliminate the repeated addition of interfaces to classes in each application. Interface I: pass I.appliesto(Aclass) Removing an interface from a class might not be possible, or it may require a second class implementation to be created at compile time, because usage of that interface may be required in some other module. I suspect having two implementations of the same class might be somewhat confusing to the user. Perhaps removal of a required interface would trigger an exception. From barry@python.org Mon Aug 26 16:50:36 2002 From: barry@python.org (Barry A. Warsaw) Date: Mon, 26 Aug 2002 11:50:36 -0400 Subject: [Python-Dev] type categories References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <016b01c24d13$3bc6d900$6d94fea9@newmexico> <15722.18803.575337.718874@anthem.wooz.org> <200208261541.g7QFfB205791@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15722.20044.30593.867428@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: >> I'm leary of trying to weave some interface taxonomy into the >> standard library and types without having a lot of experience >> in using this for real world applications. Even then, it's >> possible that there will be a lot of disagreement on the >> shape of the type hierarchy. GvR> That's what I said when I predicted it would be hard to come GvR> up with an ontology. :-) Actually, it'll be easy, that's why we'll do it a hundred times. :) >> So one strategy would be to not classify the existing types and >> classes ahead of time, but to provide a way for an application >> to declare conformance to existing types in a way that makes >> sense for the application (or library). The downside of this >> is that it may lead to a raft of incompatible interface >> declarations, but I also think that eventually we'd see >> convergence as we gain more experience. GvR> This is what Zope does. One problem is that it has its own GvR> notion of what makes a "sequence", a "mapping", etc. that GvR> don't always match the Pythonic convention. Yep, that's a problem. One approach might be to provide some blessed or common interfaces in a module, but don't weave them into the types. OTOH, I suspect that big apps and frameworks like Zope may want their own notion anyway, and hopefully it'll be fairly easy for components that want to play with Zope to add the proper interface conformance assertions. >> My guess would be that of all the interfaces that get defined >> and used in the Python community, on a few that are commonly >> agreed on or become ubiquitous idioms will migrate into the >> core. I don't think we need to solve this "problem" for the >> core types right away. Let's start by providing mechanism and >> not policy. GvR> Sure. But does that mean the mechanism needs to be GvR> necessarily separate from the inheritance mechanism? It definitely means that there has to be a way to separate them that is largely transparent to all the code that checks, uses, asserts, etc. interfaces. IOW, if we allow all of inheritance, __implements__, and a registry to assert conformance to an interface, the built-in conformsto() -- or whatever we call it -- has to know about all these accepted variants and should return True for any match. -Barry From ark@research.att.com Mon Aug 26 16:57:31 2002 From: ark@research.att.com (Andrew Koenig) Date: 26 Aug 2002 11:57:31 -0400 Subject: [Python-Dev] type categories In-Reply-To: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> Message-ID: Guido> Well, yes, of course. But I strongly believe that in *most* cases, Guido> inheritance and subtyping go hand in hand. I'd rather invent a Guido> mechanism to deal with the exceptions rather than invent two parallel Guido> mechanisms that must both be deployed separately to get the full Guido> benefit out of them. I think there are (at least) three important kinds of problems, not just two. 1) You want to define a new class that inherit from a class that already exists, and you intend your class to be in all of the categories of its base class(es). This case is probably the most common, so whatever mechanism we adopt should make it easy to write. 2) You want to define a new class that inherits from a class that already exists, but you do *not* want your class to be in all of the categories of its base classes. Perhaps you are inheriting for implementation purposes only. I think we all agree that this case is relatively rare--but it is not completely unheard of, so there should be a way of coping with it. 3) You want to define a new type category, and assert that some classes that already exist are members of this category. You do not want to modify the definitions of these classes in order to do so. Defining new classes that inherit from the existing ones does not solve the problem, because you would then have to change code all over the place to make it use the new classes. Here is an example of (3). I might want to define a TotallyOrdered category, together with a sort function that accepts only a container with elements that are TotallyOrdered. For example: def sort(x): if __debug__: for i in x: assert i in TotallyOrdered # or whatever # continue with the sort here. If someone uses my sort function to sort a container of objects that are not of built-in types, I don't mind imposing the requirement on the user to affirm that those types do indeed meet the TotallyOrdered requirement. What I do not want to do, however, is require that the person making the claim is also the author of the class about which the claim is being made. Incidentally, it just occurred to me that if we regard categories as claims about types (or, if you like, predicate functions with type arguments), then it makes sense to include (Cartesian) product types. What I mean by this is that the TotallyOrdered category is really a category of pairs of types. Note that the comparison operators generally work on arguments of different type, so to make a really appropriate claim about total ordering, I really need a way to say not just that a single type (such as int) is totally ordered, but that all combinations of types from a particular set are totally ordered (or not -- recall that on most implementations it is possible to find three numbers x, y, z such that total ordering fails, as long as you mix int and float with sufficiently evil intent). -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From Samuele Pedroni" <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <016b01c24d13$3bc6d900$6d94fea9@newmexico> <200208261531.g7QFVYr05740@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <01e301c24d19$a1aa2a00$6d94fea9@newmexico> [GvR] > [me] > > E.g. > > my ideas of declaring partial conformance and of super-interfaces > > identified as a base-interface plus a subset of signatures do not > > fit so well in a just-abstract-classes model. But OTOH I insist, > > IMO, given how python code is written now, they would be handy > > although complex. > > Yes, I'll have to think about that idea some more. It's appealing > because it matches current Pythonic practice better than anything > else. Thanks, I was under the impression nobody cared. In the end you could discard the notion, the semantics are maybe too complex, but I think it is really worth some thinking. > OTOH I want a solution that can be verified at compile time. Here I don't get what you are referring to. I have indicated some possible sloppy interpretations but just in order to care for transitioning code. But under the precise interpretation they are checkable (maybe it is costly and complex to do so and that's your point?): class Source: def read(self): ... # other methods e.g. could declare to implement partially FileLike (that means the matching subset of signatures), or be very precise and declare that it implements FileLike{read} and FileLike{read} given FileLike has a very precise interpretation even at compile-time. regards. From guido@python.org Mon Aug 26 17:05:26 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 26 Aug 2002 12:05:26 -0400 Subject: [Python-Dev] type categories In-Reply-To: Your message of "Mon, 26 Aug 2002 17:59:59 +0200." <01e301c24d19$a1aa2a00$6d94fea9@newmexico> References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <016b01c24d13$3bc6d900$6d94fea9@newmexico> <200208261531.g7QFVYr05740@pcp02138704pcs.reston01.va.comcast.net> <01e301c24d19$a1aa2a00$6d94fea9@newmexico> Message-ID: <200208261605.g7QG5QV05977@pcp02138704pcs.reston01.va.comcast.net> > > OTOH I want a solution that can be verified at compile time. > > Here I don't get what you are referring to. Not specifically to your proposal. > I have indicated some possible > sloppy interpretations but just in order to care for transitioning code. But > under the precise interpretation they are checkable > (maybe it is costly and complex to do so and that's your point?): > > class Source: > def read(self): > ... > > # other methods > > e.g. could declare to implement partially FileLike (that means > the matching subset of signatures), or be very precise > and declare that it implements FileLike{read} > > and FileLike{read} given FileLike has a very precise > interpretation even at compile-time. That's great. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Mon Aug 26 17:47:35 2002 From: tim.one@comcast.net (Tim Peters) Date: Mon, 26 Aug 2002 12:47:35 -0400 Subject: [Python-Dev] Re: heapq method names In-Reply-To: <002901c24d10$97144a20$0d61accf@othello> Message-ID: [Raymond Hettinger, on """ import heapq class Heap(list): def __init__(self, iterable=[]): self.extend(iterable) push = heapq.heappush popmin = heapq.heappop replace = heapq.heapreplace heapify = heapq.heapify """ ] > And perhaps: > def __iter__(self): > while True: > yield self.popmin() If we were trying to hide the list nature, yes, but I don't think so if the intent is that a heapq heap *is* a list with a few extra methods. For example, I know I've already done max(h) at least once for a Heap of this type, with the list meaning in mind, and it would have been at best irritating if that had emptied the heap as a side effect. Renaming __iter__ to heapiter would be cool, though! From oren-py-d@hishome.net Mon Aug 26 19:40:52 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Mon, 26 Aug 2002 14:40:52 -0400 Subject: [Python-Dev] type categories In-Reply-To: <15722.18803.575337.718874@anthem.wooz.org> References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <016b01c24d13$3bc6d900$6d94fea9@newmexico> <15722.18803.575337.718874@anthem.wooz.org> Message-ID: <20020826184052.GA95376@hishome.net> On Mon, Aug 26, 2002 at 11:29:55AM -0400, Barry A. Warsaw wrote: > I'm leary of trying to weave some interface taxonomy into the standard > library and types without having a lot of experience in using this for > real world applications. Even then, it's possible that there > will be a lot of disagreement on the shape of the type hierarchy. > > So one strategy would be to not classify the existing types and > classes ahead of time, but to provide a way for an application to > declare conformance to existing types in a way that makes sense for > the application (or library). The downside of this is that it may > lead to a raft of incompatible interface declarations, but I also > think that eventually we'd see convergence as we gain more experience. > > My guess would be that of all the interfaces that get defined and used > in the Python community, on a few that are commonly agreed on or > become ubiquitous idioms will migrate into the core. I don't think we > need to solve this "problem" for the core types right away. Let's > start by providing mechanism and not policy. +1 for a non-creationist approach to type categories. Oren From walter@livinglogic.de Mon Aug 26 19:45:22 2002 From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=) Date: Mon, 26 Aug 2002 20:45:22 +0200 Subject: [Python-Dev] To commit or not to commit Message-ID: <3D6A7742.1030005@livinglogic.de> I'm ready to commit the PEP293 implementation. I've added LaTeX documentation in libcodecs.tex and libexcs.tex. There are only a few minor open issues (reflecting exception attribute modifications in args, PyString_DecodeEscape), but I guess we'll fix/document those in time. Any objections against committing the patch? Bye, Walter Dörwald From guido@python.org Mon Aug 26 19:47:18 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 26 Aug 2002 14:47:18 -0400 Subject: [Python-Dev] To commit or not to commit In-Reply-To: Your message of "Mon, 26 Aug 2002 20:45:22 +0200." <3D6A7742.1030005@livinglogic.de> References: <3D6A7742.1030005@livinglogic.de> Message-ID: <200208261847.g7QIlI806850@pcp02138704pcs.reston01.va.comcast.net> > I'm ready to commit the PEP293 implementation. I've added > LaTeX documentation in libcodecs.tex and libexcs.tex. > > There are only a few minor open issues (reflecting exception > attribute modifications in args, PyString_DecodeEscape), but > I guess we'll fix/document those in time. > > Any objections against committing the patch? What do MvL and MAL say? --Guido van Rossum (home page: http://www.python.org/~guido/) From oren-py-d@hishome.net Mon Aug 26 20:03:22 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Mon, 26 Aug 2002 15:03:22 -0400 Subject: [Python-Dev] type categories In-Reply-To: References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020826190322.GB95376@hishome.net> On Mon, Aug 26, 2002 at 11:57:31AM -0400, Andrew Koenig wrote: > Incidentally, it just occurred to me that if we regard categories > as claims about types (or, if you like, predicate functions with type > arguments), then it makes sense to include (Cartesian) product types. Would such as product type be anything more than than a predicate about tuples? Something like the (T1, T2, T3) case in Guido's static typing presentation[1] where T1, T2 and T3 are type predicates rather than just types. [1] http://www.python.org/~guido/static-typing/sld008.htm ) Oren From walter@livinglogic.de Mon Aug 26 20:08:26 2002 From: walter@livinglogic.de (=?ISO-8859-15?Q?Walter_D=F6rwald?=) Date: Mon, 26 Aug 2002 21:08:26 +0200 Subject: [Python-Dev] To commit or not to commit References: <3D6A7742.1030005@livinglogic.de> <200208261847.g7QIlI806850@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <3D6A7CAA.2060205@livinglogic.de> Guido van Rossum wrote: >>I'm ready to commit the PEP293 implementation. I've added >>LaTeX documentation in libcodecs.tex and libexcs.tex. >> >>There are only a few minor open issues (reflecting exception >>attribute modifications in args, PyString_DecodeEscape), but >>I guess we'll fix/document those in time. >> >>Any objections against committing the patch? > > > What do MvL and MAL say? AFAIK they're both on vacation. Bye, Walter Dörwald From guido@python.org Mon Aug 26 20:05:47 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 26 Aug 2002 15:05:47 -0400 Subject: [Python-Dev] To commit or not to commit In-Reply-To: Your message of "Mon, 26 Aug 2002 21:08:26 +0200." <3D6A7CAA.2060205@livinglogic.de> References: <3D6A7742.1030005@livinglogic.de> <200208261847.g7QIlI806850@pcp02138704pcs.reston01.va.comcast.net> <3D6A7CAA.2060205@livinglogic.de> Message-ID: <200208261905.g7QJ5lk07136@pcp02138704pcs.reston01.va.comcast.net> > > What do MvL and MAL say? > > AFAIK they're both on vacation. Then wait until they're back. --Guido van Rossum (home page: http://www.python.org/~guido/) From ark@research.att.com Mon Aug 26 20:10:59 2002 From: ark@research.att.com (Andrew Koenig) Date: Mon, 26 Aug 2002 15:10:59 -0400 (EDT) Subject: [Python-Dev] type categories In-Reply-To: <20020826190322.GB95376@hishome.net> (message from Oren Tirosh on Mon, 26 Aug 2002 15:03:22 -0400) References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <20020826190322.GB95376@hishome.net> Message-ID: <200208261910.g7QJAxs02173@europa.research.att.com> Oren> On Mon, Aug 26, 2002 at 11:57:31AM -0400, Andrew Koenig wrote: >> Incidentally, it just occurred to me that if we regard categories >> as claims about types (or, if you like, predicate functions with type >> arguments), then it makes sense to include (Cartesian) product types. Oren> Would such as product type be anything more than than a Oren> predicate about tuples? No, I don't think it would. Indeed, ML completely unifies Cartesian product types and tuples in a very, very cool way: Every function takes exactly one argument and yields exactly one result. However, the argument or result can be a tuple. So in ML, when I write f(x,y) that really means to bundle x and y into a tuple, and call f with that tuple as its argument. So, for example, if I write val xy = (x,y) which defines a variable named xy and binds it to the tuple (x,y), then f xy means exactly the same thing as f(x,y) The parentheses are really tuple constructors, and ML doesn't require parentheses for function calls at all. However, if you're going to define predicates over tuples of (Python) types, then you had better not try to define those predicates as part of the tuples' class definitions, because they don't have one. From guido@python.org Mon Aug 26 20:26:34 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 26 Aug 2002 15:26:34 -0400 Subject: [Python-Dev] type categories In-Reply-To: Your message of "Mon, 26 Aug 2002 15:10:59 EDT." <200208261910.g7QJAxs02173@europa.research.att.com> References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <20020826190322.GB95376@hishome.net> <200208261910.g7QJAxs02173@europa.research.att.com> Message-ID: <200208261926.g7QJQYl07432@pcp02138704pcs.reston01.va.comcast.net> > Indeed, ML completely unifies Cartesian product types and tuples in a > very, very cool way: > > Every function takes exactly one argument > and yields exactly one result. > > However, the argument or result can be a tuple. > > So in ML, when I write > > f(x,y) > > that really means to bundle x and y into a tuple, and call f with that > tuple as its argument. So, for example, if I write > > val xy = (x,y) > > which defines a variable named xy and binds it to the tuple (x,y), then > > f xy > > means exactly the same thing as > > f(x,y) > > The parentheses are really tuple constructors, and ML doesn't require > parentheses for function calls at all. ABC did this, and very early Python did this, too (but Python always required parentheses for calls). However, adding optional arguments caused trouble: after def f(a, b=1): print a*b t = (1, 2) what should f(t) mean? It could mean either f((1, 2), 1) or f(1, 2). So we had to get rid of that. I suppose ML doesn't have optional arguments (in the sense of Python), so the problem doesn't occur there; that's why it wasn't a problem in ABC. --Guido van Rossum (home page: http://www.python.org/~guido/) From oren-py-d@hishome.net Mon Aug 26 20:36:56 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Mon, 26 Aug 2002 15:36:56 -0400 Subject: [Python-Dev] type categories In-Reply-To: <200208261910.g7QJAxs02173@europa.research.att.com> References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <20020826190322.GB95376@hishome.net> <200208261910.g7QJAxs02173@europa.research.att.com> Message-ID: <20020826193656.GC95376@hishome.net> On Mon, Aug 26, 2002 at 03:10:59PM -0400, Andrew Koenig wrote: > Oren> Would such as product type be anything more than than a > Oren> predicate about tuples? > > No, I don't think it would. > (explanation about ML functions deleted) > Can you give a more concrete example of what could a cartesian product of type predicates actually stand for in Python? > However, if you're going to define predicates over tuples of (Python) > types, then you had better not try to define those predicates as part > of the tuples' class definitions, because they don't have one. Naturally. Andrew, in reply to your "scribble in the margin" question about two weeks ago see http://www.tothink.com/python/predicates Oren From ark@research.att.com Mon Aug 26 20:46:46 2002 From: ark@research.att.com (Andrew Koenig) Date: Mon, 26 Aug 2002 15:46:46 -0400 (EDT) Subject: [Python-Dev] type categories In-Reply-To: <200208261926.g7QJQYl07432@pcp02138704pcs.reston01.va.comcast.net> (message from Guido van Rossum on Mon, 26 Aug 2002 15:26:34 -0400) References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <20020826190322.GB95376@hishome.net> <200208261910.g7QJAxs02173@europa.research.att.com> <200208261926.g7QJQYl07432@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208261946.g7QJkk102509@europa.research.att.com> Guido> ABC did this, and very early Python did this, too (but Python always Guido> required parentheses for calls). However, adding optional arguments Guido> caused trouble: after Guido> def f(a, b=1): Guido> print a*b Guido> t = (1, 2) Guido> what should Guido> f(t) Guido> mean? It could mean either f((1, 2), 1) or f(1, 2). So we had to get Guido> rid of that. I suppose ML doesn't have optional arguments (in the Guido> sense of Python), so the problem doesn't occur there; that's why it Guido> wasn't a problem in ABC. Right -- ML doesn't have optional arguments. It does, however, have clausal definitions, which can serve a similar purpose: fun f[a, b] = a*b | f[a] = a Here, the square brackets denote lists, much as they do in Python. So you can call this function with a list that has one or two elements. The list's arguments must be integers, because if you don't say what type the operands of * are, it assumes int. If you were to call this function with a list with other than one or two elements, it would raise an exception. You can't do the analogous thing with tuples in ML: fun f(a, b) = a*b | f(a) = a for a rather surprising reason: The ML type inference mechanism sees from the first clause (f(a, b) = a*b) that the argument to f must be a 2-element tuple, which means that in the *second* clause, `a' must also be a 2-element tuple. Otherwise the argument of f would not have a single, well-defined type. But if `a' is a 2-element tuple, that means that the type of the result of f is also a 2-element tuple. That type is inconsistent with the type of a*b, which is int. So the compiler will complain about this definition because the function f cannot return both an int and a tuple at the same time. If we were to define it this way: fun f(a, b) = a*b | f(a) = 42 the compiler would now accept it. However, it would give a warning that the second clause is irrelevant, because there is no argument you can possibly give to f that would cause the second clause to match without first causing the first clause to match. From ark@research.att.com Mon Aug 26 20:51:13 2002 From: ark@research.att.com (Andrew Koenig) Date: Mon, 26 Aug 2002 15:51:13 -0400 (EDT) Subject: [Python-Dev] type categories In-Reply-To: <20020826193656.GC95376@hishome.net> (message from Oren Tirosh on Mon, 26 Aug 2002 15:36:56 -0400) References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <20020826190322.GB95376@hishome.net> <200208261910.g7QJAxs02173@europa.research.att.com> <20020826193656.GC95376@hishome.net> Message-ID: <200208261951.g7QJpDC02523@europa.research.att.com> Oren> Can you give a more concrete example of what could a cartesian Oren> product of type predicates actually stand for in Python? Consider my TotallyOrdered suggestion from before. I would like to have a way of saying that for any two types T1 and T2 (where T1 might equal T2) chosen from the set {int, long, float}, < imposes a total ordering on values of those types. Come to think of it, that's not really a Cartesian product. Rather, it's a claim about the members of the set union(int,union(long, float)). From oren-py-d@hishome.net Mon Aug 26 21:25:16 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Mon, 26 Aug 2002 23:25:16 +0300 Subject: [Python-Dev] type categories In-Reply-To: <200208261951.g7QJpDC02523@europa.research.att.com>; from ark@research.att.com on Mon, Aug 26, 2002 at 03:51:13PM -0400 References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <20020826190322.GB95376@hishome.net> <200208261910.g7QJAxs02173@europa.research.att.com> <20020826193656.GC95376@hishome.net> <200208261951.g7QJpDC02523@europa.research.att.com> Message-ID: <20020826232516.A22870@hishome.net> On Mon, Aug 26, 2002 at 03:51:13PM -0400, Andrew Koenig wrote: > Oren> Can you give a more concrete example of what could a cartesian > Oren> product of type predicates actually stand for in Python? > > Consider my TotallyOrdered suggestion from before. I would like to > have a way of saying that for any two types T1 and T2 (where T1 might > equal T2) chosen from the set {int, long, float}, < imposes a total > ordering on values of those types. > > Come to think of it, that's not really a Cartesian product. Rather, > it's a claim about the members of the set union(int,union(long, float)). Isn't it easier to just spell it union(int, long, float)? Your example helped me make the distinction between two very types of type categories: 1. Type categories based on form: presence of methods, call signatures, etc. 2. Type categories based on semantics. Semantic categories only live within a single form category. A method call cannot possibly be semantically correct if it isn't well-formed: it will cause a runtime error. But a method call that is well-formed may or may not be semantically correct. A language *can* verify well-formedness. It cannot verify semantical correctness but it can provide tools to help developers communicate their semantic expectations. Form-based categories may be used to convey semantic categories: just add a dummy method or member to serve as a marker. It can force an interface with an otherwise identical form to be intentionally incompatible to help you detect semantic categorization errors. The opposite is not true: semantic categories cannot be used to enforce well-formedness. You can mark a class as implementing the "TotallyOrdered" interface when it doesn't even have a comparison method. A similar case can happen when using inheritance for categorization: a subclass may modify the call signatures, making the class form-incompatible but it still retains its ancestry which may be interpreted in some cases as a marker of a semantic category. Oren From ark@research.att.com Mon Aug 26 21:28:58 2002 From: ark@research.att.com (Andrew Koenig) Date: Mon, 26 Aug 2002 16:28:58 -0400 (EDT) Subject: [Python-Dev] type categories In-Reply-To: <20020826232516.A22870@hishome.net> (message from Oren Tirosh on Mon, 26 Aug 2002 23:25:16 +0300) References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <20020826190322.GB95376@hishome.net> <200208261910.g7QJAxs02173@europa.research.att.com> <20020826193656.GC95376@hishome.net> <200208261951.g7QJpDC02523@europa.research.att.com> <20020826232516.A22870@hishome.net> Message-ID: <200208262028.g7QKSwa02692@europa.research.att.com> From ark@research.att.com Mon Aug 26 21:33:53 2002 From: ark@research.att.com (Andrew Koenig) Date: Mon, 26 Aug 2002 16:33:53 -0400 (EDT) Subject: [Python-Dev] type categories In-Reply-To: <20020826232516.A22870@hishome.net> (message from Oren Tirosh on Mon, 26 Aug 2002 23:25:16 +0300) References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <20020826190322.GB95376@hishome.net> <200208261910.g7QJAxs02173@europa.research.att.com> <20020826193656.GC95376@hishome.net> <200208261951.g7QJpDC02523@europa.research.att.com> <20020826232516.A22870@hishome.net> Message-ID: <200208262033.g7QKXr902705@europa.research.att.com> Oren> On Mon, Aug 26, 2002 at 03:51:13PM -0400, Andrew Koenig wrote: Oren> Can you give a more concrete example of what could a cartesian Oren> product of type predicates actually stand for in Python? >> Consider my TotallyOrdered suggestion from before. I would like to >> have a way of saying that for any two types T1 and T2 (where T1 >> might equal T2) chosen from the set {int, long, float}, < imposes a >> total ordering on values of those types. >> Come to think of it, that's not really a Cartesian product. >> Rather, it's a claim about the members of the set >> union(int,union(long, float)). Oren> Isn't it easier to just spell it union(int, long, float)? Yes but I have a cold today so I'm not thinking clearly. Oren> Your example helped me make the distinction between two very Oren> types of type categories: Oren> 1. Type categories based on form: presence of methods, call signatures, etc. Oren> 2. Type categories based on semantics. Oren> Semantic categories only live within a single form category. A Oren> method call cannot possibly be semantically correct if it isn't Oren> well-formed: it will cause a runtime error. But a method call Oren> that is well-formed may or may not be semantically correct. Yes. Oren> A language *can* verify well-formedness. It cannot verify Oren> semantical correctness but it can provide tools to help Oren> developers communicate their semantic expectations. Yes. Oren> Form-based categories may be used to convey semantic categories: Oren> just add a dummy method or member to serve as a marker. It can Oren> force an interface with an otherwise identical form to be Oren> intentionally incompatible to help you detect semantic Oren> categorization errors. Remember that one thing I consider important is the ability to claim that classes written by others belong to a category defined by me. I do not want to have to modify those classes in order to do so. So, for example, if I want to say that int is TotallyOrdered, I do not want to have to modify the definition of int to do so. Oren> The opposite is not true: semantic categories cannot be used to Oren> enforce well-formedness. You can mark a class as implementing Oren> the "TotallyOrdered" interface when it doesn't even have a Oren> comparison method. Yes. But semantic categories are useful anyway. Oren> A similar case can happen when using inheritance for Oren> categorization: a subclass may modify the call signatures, Oren> making the class form-incompatible but it still retains its Oren> ancestry which may be interpreted in some cases as a marker of a Oren> semantic category. Right. And several people have noted that it can be desirable for subclasses sometimes not to be members of all of their base classes' categories. From oren-py-d@hishome.net Mon Aug 26 22:13:02 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Tue, 27 Aug 2002 00:13:02 +0300 Subject: [Python-Dev] type categories In-Reply-To: <200208262033.g7QKXr902705@europa.research.att.com>; from ark@research.att.com on Mon, Aug 26, 2002 at 04:33:53PM -0400 References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <20020826190322.GB95376@hishome.net> <200208261910.g7QJAxs02173@europa.research.att.com> <20020826193656.GC95376@hishome.net> <200208261951.g7QJpDC02523@europa.research.att.com> <20020826232516.A22870@hishome.net> <200208262033.g7QKXr902705@europa.research.att.com> Message-ID: <20020827001302.A23447@hishome.net> On Mon, Aug 26, 2002 at 04:33:53PM -0400, Andrew Koenig wrote: > Oren> Form-based categories may be used to convey semantic categories: > Oren> just add a dummy method or member to serve as a marker. It can > Oren> force an interface with an otherwise identical form to be > Oren> intentionally incompatible to help you detect semantic > Oren> categorization errors. > > Remember that one thing I consider important is the ability to claim > that classes written by others belong to a category defined by me. I do not > want to have to modify those classes in order to do so. How about union(int, long, float, has_marker("TotallyOrdered")) ? This basically means "I know that int, long and float are totally ordered and I'm willing to take your word for it if you claim that your type is also totally ordered". If the set of types that match a predicate is cached it should be at least as efficient as any other form of runtime interface checking. > Oren> The opposite is not true: semantic categories cannot be used to > Oren> enforce well-formedness. You can mark a class as implementing > Oren> the "TotallyOrdered" interface when it doesn't even have a > Oren> comparison method. > > Yes. But semantic categories are useful anyway. Sure they are, but if form-based categories can be used to define semantic categories but not the other way around makes a point in favor of using form-based categories as as the basic form for categories implemented by the language. Inheritance of implementation also inherits the form (methods and call signatures). If you don't go out of your way to modify it a subclass will usually also be a subcategory so this should be pretty transparent most of the time. Form-based categories are a tool for making claims about code: "under condition X the method Y should not raise NameError or TypeError". If you want, you also use this tool to make semantic claims about your data types. With compile-time type inference these claims can be upgraded to the level of formal proofs. Oren From ark@research.att.com Mon Aug 26 22:17:28 2002 From: ark@research.att.com (Andrew Koenig) Date: Mon, 26 Aug 2002 17:17:28 -0400 (EDT) Subject: [Python-Dev] type categories In-Reply-To: <20020827001302.A23447@hishome.net> (message from Oren Tirosh on Tue, 27 Aug 2002 00:13:02 +0300) References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <20020826190322.GB95376@hishome.net> <200208261910.g7QJAxs02173@europa.research.att.com> <20020826193656.GC95376@hishome.net> <200208261951.g7QJpDC02523@europa.research.att.com> <20020826232516.A22870@hishome.net> <200208262033.g7QKXr902705@europa.research.att.com> <20020827001302.A23447@hishome.net> Message-ID: <200208262117.g7QLHSf03929@europa.research.att.com> >> Remember that one thing I consider important is the ability to claim >> that classes written by others belong to a category defined by me. I do not >> want to have to modify those classes in order to do so. Oren> How about union(int, long, float, has_marker("TotallyOrdered")) ? How about it? There is still the question of how to make such claims. Oren> Inheritance of implementation also inherits the form (methods Oren> and call signatures). If you don't go out of your way to modify Oren> it a subclass will usually also be a subcategory so this should Oren> be pretty transparent most of the time. Right. So how do you define a subclass that you do not want to be a subcategory? From oren-py-d@hishome.net Mon Aug 26 22:51:51 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Tue, 27 Aug 2002 00:51:51 +0300 Subject: [Python-Dev] type categories In-Reply-To: <200208262117.g7QLHSf03929@europa.research.att.com>; from ark@research.att.com on Mon, Aug 26, 2002 at 05:17:28PM -0400 References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <20020826190322.GB95376@hishome.net> <200208261910.g7QJAxs02173@europa.research.att.com> <20020826193656.GC95376@hishome.net> <200208261951.g7QJpDC02523@europa.research.att.com> <20020826232516.A22870@hishome.net> <200208262033.g7QKXr902705@europa.research.att.com> <20020827001302.A23447@hishome.net> <200208262117.g7QLHSf03929@europa.research.att.com> Message-ID: <20020827005151.A24077@hishome.net> On Mon, Aug 26, 2002 at 05:17:28PM -0400, Andrew Koenig wrote: > >> Remember that one thing I consider important is the ability to claim > >> that classes written by others belong to a category defined by me. I do not > >> want to have to modify those classes in order to do so. > > Oren> How about union(int, long, float, has_marker("TotallyOrdered")) ? > > How about it? There is still the question of how to make such claims. > > Oren> Inheritance of implementation also inherits the form (methods > Oren> and call signatures). If you don't go out of your way to modify > Oren> it a subclass will usually also be a subcategory so this should > Oren> be pretty transparent most of the time. > > Right. So how do you define a subclass that you do not want to be > a subcategory? If you make an incompatible change to a method call signature it will just happen by itself. If that's what you meant - good. It that wasn't what you meant this serves as a form of error checking. It gets harder if you want to remove a method or marker. The problem is that there is currently no way to mask inherited attributes. This will require either a language extension that will allow you to del them or using some other convention for this purpose. Oren From pinard@iro.umontreal.ca Mon Aug 26 22:56:22 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 26 Aug 2002 17:56:22 -0400 Subject: [Python-Dev] A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> Message-ID: --=-=-= Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit [François Pinard] > Allow me some random thoughts. [...] Some people may think that these > are all problems which are orthogonal to the design of a basic set feature, > and which should be addressed in separate Python modules. >From the received comments, I wrote a simple module reading sequences and generating lists, instead of reading and producing sets, and taking care of generating cartesian products, power sets, combinations, arrangements and permutations. I took various ideas here and there, like from previously published messages on the Python list, and made them to look a bit alike. The module could be called `cogen', abbreviation for COmbinatorial GENerators. Here is a first throw, to be criticised and improved. --=-=-= Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: attachment; filename=cogen.py Content-Transfer-Encoding: 8bit #!/usr/bin/env python # Copyright © 2002 Progiciels Bourbeau-Pinard inc. # François Pinard , 2002-08. """\ Combinatorial generators. All generators below have the property of yielding successive results in sorted order, given than input sequences were already sorted. """ from __future__ import generators def cartesian(*sequences): """\ Generate the `cartesian product' of all SEQUENCES. Each member of the product is a list containing an element taken from each original sequence. """ if len(sequences) == 0: yield [] else: first, remainder = sequences[0], sequences[1:] for element in first: for result in cartesian(*remainder): result.insert(0, element) yield result def subsets(sequence): """\ Generate all subsets of a given SEQUENCE. Each subset is delivered as a list holding zero or more elements of the original sequence. """ yield [] if len(sequence) > 0: first, remainder = sequence[0], sequence[1:] # Some subsets retain FIRST. for result in subsets(remainder): result.insert(0, first) yield result # Some subsets do not retain FIRST. for result in subsets(remainder): if len(result) > 0: yield result def combinations(sequence, number): """\ Generate all combinations of NUMBER elements from list SEQUENCE. """ # Adapted from Python 2.2 `test/test_generators.py'. if number > len(sequence): return if number == 0: yield [] else: first, remainder = sequence[0], sequence[1:] # Some combinations retain FIRST. for result in combinations(remainder, number-1): result.insert(0, first) yield result # Some combinations do not retain FIRST. for result in combinations(remainder, number): yield result def arrangements(sequence, number): """\ Generate all arrangements of NUMBER elements from list SEQUENCE. """ # Adapted from PERMUTATIONS below. if number > len(sequence): return if number == 0: yield [] else: cut = 0 for element in sequence: for result in arrangements(sequence[:cut] + sequence[cut+1:], number-1): result.insert(0, element) yield result cut += 1 def permutations(sequence): """\ Generate all permutations from list SEQUENCE. """ # Adapted from Gerhard Häring , 2002-08-24. if len(sequence) == 0: yield [] else: cut = 0 for element in sequence: for result in permutations(sequence[:cut] + sequence[cut+1:]): result.insert(0, element) yield result cut += 1 def test(): if True: print '\nTesting CARTESIAN.' for result in cartesian((5, 7), [8, 9], 'abc'): print result if True: print '\nTesting SUBSETS.' for result in subsets(range(1, 5)): print result if True: print '\nTesting COMBINATIONS.' sequence = range(1, 5) for counter in range(len(sequence) + 2): print "%d-combs of %s:" % (counter, sequence) for combination in combinations(sequence, counter): print " ", combination if True: print '\nTesting ARRANGEMENTS.' sequence = range(1, 5) for counter in range(len(sequence) + 2): print "%d-arrs of %s:" % (counter, sequence) for combination in arrangements(sequence, counter): print " ", combination if True: print '\nTesting PERMUTATIONS.' for permutation in permutations(range(1, 5)): print permutation if __name__ == '__main__': test() --=-=-= Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit -- François Pinard http://www.iro.umontreal.ca/~pinard --=-=-=-- From guido@python.org Mon Aug 26 22:56:52 2002 From: guido@python.org (Guido van Rossum) Date: Mon, 26 Aug 2002 17:56:52 -0400 Subject: [Python-Dev] type categories In-Reply-To: Your message of "Tue, 27 Aug 2002 00:51:51 +0300." <20020827005151.A24077@hishome.net> References: <200208261445.g7QEjSE05440@pcp02138704pcs.reston01.va.comcast.net> <20020826190322.GB95376@hishome.net> <200208261910.g7QJAxs02173@europa.research.att.com> <20020826193656.GC95376@hishome.net> <200208261951.g7QJpDC02523@europa.research.att.com> <20020826232516.A22870@hishome.net> <200208262033.g7QKXr902705@europa.research.att.com> <20020827001302.A23447@hishome.net> <200208262117.g7QLHSf03929@europa.research.att.com> <20020827005151.A24077@hishome.net> Message-ID: <200208262156.g7QLuqT08495@pcp02138704pcs.reston01.va.comcast.net> > It gets harder if you want to remove a method or marker. The problem is > that there is currently no way to mask inherited attributes. This will > require either a language extension that will allow you to del them or > using some other convention for this purpose. Can't you use this? def B: def foo(self): pass def C: foo = None # Don't implement foo --Guido van Rossum (home page: http://www.python.org/~guido/) From mclay@nist.gov Tue Aug 27 00:19:13 2002 From: mclay@nist.gov (Michael McLay) Date: Mon, 26 Aug 2002 19:19:13 -0400 Subject: [Python-Dev] Move sets and `cogen' into a math module In-Reply-To: References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208261919.13341.mclay@nist.gov> On Monday 26 August 2002 05:56 pm, Fran=E7ois Pinard wrote: > The module could be called `cogen', abbreviation for COmbinatorial > GENerators. Here is a first throw, to be criticised and improved. With sets and possibly cogen being added to the standard library is it li= kely=20 that additional interesting math capabilities will creep into the standar= d=20 Python libraries? Reducing the clutter of the top level namespace is hard= to=20 do if code depends on it, so better to do it right from the start. Do the= se=20 modules belong at the top level? Would it make sense to change the math=20 module into a package and move the new module inside that the math packag= e? From pinard@iro.umontreal.ca Tue Aug 27 01:38:57 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 26 Aug 2002 20:38:57 -0400 Subject: [Python-Dev] Re: Move sets and `cogen' into a math module In-Reply-To: <200208261919.13341.mclay@nist.gov> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208261919.13341.mclay@nist.gov> Message-ID: [Michael McLay] > On Monday 26 August 2002 05:56 pm, François Pinard wrote: > > The module could be called `cogen', abbreviation for COmbinatorial > > GENerators. Here is a first throw, to be criticised and improved. > With [...] possibly cogen being added to the standard library [...] > Would it make sense to change the math module into a package and move > the new module inside that the math package? For one, I do not feel that `cogen' is especially mathematical, or geared especially towards mathematical or numerical problems Algorithmic maybe... But a good part of the Python library already shares that property, doesn't it? :-) -- François Pinard http://www.iro.umontreal.ca/~pinard From jepler@unpythonic.net Tue Aug 27 02:39:44 2002 From: jepler@unpythonic.net (jepler@unpythonic.net) Date: Mon, 26 Aug 2002 20:39:44 -0500 Subject: [Python-Dev] type categories In-Reply-To: <200208262156.g7QLuqT08495@pcp02138704pcs.reston01.va.comcast.net> References: <20020826190322.GB95376@hishome.net> <200208261910.g7QJAxs02173@europa.research.att.com> <20020826193656.GC95376@hishome.net> <200208261951.g7QJpDC02523@europa.research.att.com> <20020826232516.A22870@hishome.net> <200208262033.g7QKXr902705@europa.research.att.com> <20020827001302.A23447@hishome.net> <200208262117.g7QLHSf03929@europa.research.att.com> <20020827005151.A24077@hishome.net> <200208262156.g7QLuqT08495@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020826203944.A1314@unpythonic.net> On Mon, Aug 26, 2002 at 05:56:52PM -0400, Guido van Rossum wrote: > > It gets harder if you want to remove a method or marker. The problem is > > that there is currently no way to mask inherited attributes. This will > > require either a language extension that will allow you to del them or > > using some other convention for this purpose. > > Can't you use this? > > def B: > def foo(self): pass > > def C: > foo = None # Don't implement foo This comes closer: def raise_attributeerror(self): raise AttributeError RemoveAttribute = property(raise_attributeerror) class A: def f(self): print "method A.f" def g(self): print "method A.g" class B: f = RemoveAttribute a = A() b = B() a.f() a.g() print hasattr(b, "f"), hasattr(B, "f"), hasattr(b, "g"), hasattr(B, "g") try: b.f except AttributeError: print "b.f does not exist (correctly)" else: print "Expected AttributeError not raised" b.g() writing 'b.f' will raise AttributeError, but unfortunately hasattr(B, 'f') will still return True. Jeff From oren-py-d@hishome.net Tue Aug 27 06:18:28 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Tue, 27 Aug 2002 08:18:28 +0300 Subject: [Python-Dev] type categories In-Reply-To: <20020826203944.A1314@unpythonic.net>; from jepler@unpythonic.net on Mon, Aug 26, 2002 at 08:39:44PM -0500 References: <200208261910.g7QJAxs02173@europa.research.att.com> <20020826193656.GC95376@hishome.net> <200208261951.g7QJpDC02523@europa.research.att.com> <20020826232516.A22870@hishome.net> <200208262033.g7QKXr902705@europa.research.att.com> <20020827001302.A23447@hishome.net> <200208262117.g7QLHSf03929@europa.research.att.com> <20020827005151.A24077@hishome.net> <200208262156.g7QLuqT08495@pcp02138704pcs.reston01.va.comcast.net> <20020826203944.A1314@unpythonic.net> Message-ID: <20020827081828.A27639@hishome.net> On Mon, Aug 26, 2002 at 08:39:44PM -0500, jepler@unpythonic.net wrote: > On Mon, Aug 26, 2002 at 05:56:52PM -0400, Guido van Rossum wrote: > > > It gets harder if you want to remove a method or marker. The problem is > > > that there is currently no way to mask inherited attributes. This will > > > require either a language extension that will allow you to del them or > > > using some other convention for this purpose. > > > > Can't you use this? > > > > def B: > > def foo(self): pass > > > > def C: > > foo = None # Don't implement foo > > This comes closer: > > def raise_attributeerror(self): > raise AttributeError > > RemoveAttribute = property(raise_attributeerror) > > class A: > def f(self): print "method A.f" > def g(self): print "method A.g" > > class B: > f = RemoveAttribute Yes, that's a good solution. But it should be some special builtin out-of-band value, not a user-defined property. > writing 'b.f' will raise AttributeError, but unfortunately hasattr(B, 'f') > will still return True. This isn't necessarily a problem but hasattr could be taught about this out-of-band value. -- Proposed hierarchy for categories, types and interfaces: +category +type +int +str etc. +interface +Iattribute +Icallsignature +Iunion +Iintersection etc. Both types and interfaces define a set. The set membership test is the 'isinstance' function (implemented by a new slot). For types the set membership is defined by inheritance - the isinstance handler will get the first argument's type and crawl up the __bases__ DAG to see if it finds the itself. Interfaces check the object's form instead of its ancestry. An Iattribute interface checks for the presence of a single attribute and applies another interface check to its value. An Icallsignature interface checks if the argument is a callable object with a specified number of arguments, default arguments, etc. An Iintersection interface checks that the argument matches a set of categories. example: interface readable: def read(bytes: int): str def readline(): str def readlines(): [str] is just a more convenient way to write: readable = Iintersection( Iattribute('read', Icallsignature(str, ('bytes', int) )), Iattribute('readline', Icallsignature(str)), Iattribute('readlines', Icallsignature(Ilistof(str))) ) The name 'readable' is simply bound to the resulting object; interfaces are defined by their value, not their name. The types of arguments and return values will not be checked at first and only serve as documentation. Note that they don't necessarily have to be types - they can be interfaces, too. For example, 'str|int' in an interface declaration will be coverted to Iunion(str, int). >>>isinstance(file('/dev/null'), readable) True >>>isinstance(MyFileLikeClass(), readable) True The MyFileLIkeClass or file classes do not have to be explicitly declared as implementing the readable interface. The benefit of explit inteface declarations is that you will get an error if you write a method that does not match the declaration. If you try to implement two conflicting interfaces this can also be detected immediately - the intersection of the two interfaces will reduce to the empty interface. For now this will only catch the same method name with different number of arguments but in the future it may detect conflicting argument or return value types. doesn't-have-anything-better-to-do-at-6-am-ly yours, Oren From oren-py-d@hishome.net Tue Aug 27 06:29:11 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Tue, 27 Aug 2002 01:29:11 -0400 Subject: [Python-Dev] A `cogen' module - an observation In-Reply-To: References: <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> Message-ID: <20020827052911.GA87669@hishome.net> [f(x, y) for x in X for y in Y] is equivalent to: [f(x, y) for x, y in cartesian(X, Y)] I never found any real use for list comprehensions with more than one dimension. When I use nested loops they are usually for something that cannot be expressed as a list comprehension. Oren From python@rcn.com Tue Aug 27 06:56:36 2002 From: python@rcn.com (Raymond Hettinger) Date: Tue, 27 Aug 2002 01:56:36 -0400 Subject: [Python-Dev] A `cogen' module - an observation References: <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <20020827052911.GA87669@hishome.net> Message-ID: <00a701c24d8e$8222aac0$6cb63bd0@othello> From: "Oren Tirosh" > [f(x, y) for x in X for y in Y] > > is equivalent to: > > [f(x, y) for x, y in cartesian(X, Y)] Is the order guaranteed to be the same? Will each work the same for a non-restartable iterator, say a file object (equivalently put, does the second one read Y once or many times)? Would Descartes object to his name being used thusly? Raymond Hettinger From python@rcn.com Tue Aug 27 06:59:52 2002 From: python@rcn.com (Raymond Hettinger) Date: Tue, 27 Aug 2002 01:59:52 -0400 Subject: [Python-Dev] Move sets and `cogen' into a math module References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208261919.13341.mclay@nist.gov> Message-ID: <00b201c24d8e$f6e9afc0$6cb63bd0@othello> I think of sets as being more closely affiliated with heapq, UserDict, and array and being less affiliated with math, cmath, random, etc. Raymond Hettinger ----- Original Message ----- From: "Michael McLay" Message-ID: <200208270614.g7R6EpTh005191@kuku.cosc.canterbury.ac.nz> > [f(x, y) for x in X for y in Y] > > is equivalent to: > > [f(x, y) for x, y in cartesian(X, Y)] Hmmm, in other words, cartesian() is a lazy version of zip(). (Given its intended use, I always thought zip() should have been lazy from the beginning, but the BDFL thought otherwise.) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From oren-py-d@hishome.net Tue Aug 27 07:15:54 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Tue, 27 Aug 2002 02:15:54 -0400 Subject: [Python-Dev] A `cogen' module - an observation In-Reply-To: <200208270614.g7R6EpTh005191@kuku.cosc.canterbury.ac.nz> References: <20020827052911.GA87669@hishome.net> <200208270614.g7R6EpTh005191@kuku.cosc.canterbury.ac.nz> Message-ID: <20020827061554.GA96021@hishome.net> On Tue, Aug 27, 2002 at 06:14:51PM +1200, Greg Ewing wrote: > > [f(x, y) for x in X for y in Y] > > > > is equivalent to: > > > > [f(x, y) for x, y in cartesian(X, Y)] > > Hmmm, in other words, cartesian() is a lazy version > of zip(). Nope. >>> zip([1, 2], ['a', 'b']) [(1, 'a'), (2, 'b')] >>> list(cartesian([1, 2], ['a', 'b'])) [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')] Oren From oren-py-d@hishome.net Tue Aug 27 07:27:40 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Tue, 27 Aug 2002 02:27:40 -0400 Subject: [Python-Dev] A `cogen' module - an observation In-Reply-To: <00a701c24d8e$8222aac0$6cb63bd0@othello> References: <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <20020827052911.GA87669@hishome.net> <00a701c24d8e$8222aac0$6cb63bd0@othello> Message-ID: <20020827062740.GB96021@hishome.net> On Tue, Aug 27, 2002 at 01:56:36AM -0400, Raymond Hettinger wrote: > From: "Oren Tirosh" > > > > [f(x, y) for x in X for y in Y] > > > > is equivalent to: > > > > [f(x, y) for x, y in cartesian(X, Y)] > > Is the order guaranteed to be the same? Yes. """\ Combinatorial generators. All generators below have the property of yielding successive results in sorted order, given than input sequences were already sorted. """ > Will each work the same for a non-restartable > iterator, say a file object (equivalently put, > does the second one read Y once or many times)? They have exactly the same re-iterability wart as nested loops or list comprehensions - an exhausted iterator is indistinguishable from an empty container. > Would Descartes object to his name being used thusly? The cartesian product is a set operation and therefore has no defined order. When generating it you need some specific order and this one makes the most sense. If you use it with a 'nested loop' mindset instead of a 'set theory' mindset Rene would have had some grounds for objection :-) Oren From mcherm@destiny.com Tue Aug 27 13:58:16 2002 From: mcherm@destiny.com (Michael Chermside) Date: Tue, 27 Aug 2002 08:58:16 -0400 Subject: [Python-Dev] Fw: Security hole in rexec? Message-ID: <3D6B7768.5000804@destiny.com> > [rexec compromised by deleting __builtins__] > > This has been known for a while, see python.org/sf/577530. > > My recommendation is the same as always: don't trust rexec. > > --Guido van Rossum (home page: http://www.python.org/~guido/) I think it is a VERY BAD idea to advertise publicly that rexec can be used to "safely" restrict execution, while privately (ie, the above postings to a developers-only list and to sourceforge). Therefore I propose that the official documentation to the Python Library Reference for the module rexec be modified to add a note saying that rexec is not completely reliable and can be undermined by a knowledgable hacker. The current documentation STRONGLY implies this is NOT the case by explaining in detail the more minor susceptibility to DOS attacks (memory or CPU time) and raising SystemExit. Why not add something like the following to the beginning of the module documentation: """ Warning: While the rexec module is designed to perform as described below, it does have a few known vulnerabilities which could be exploited by carefully written code. Thus it should not be relied upon in situations requiring "production ready" security. In such situations, execution via sub-processes (a separate Python executable) or very careful "cleansing" of data to be processed may be necessary. Alternatively, help in patching known rexec vulnerabilities would be welcomed. """ Admitting to library weaknesses (especially in the area of security) doesn't make great PR, but at least it's honest! -- Michael Chermside From mcherm@destiny.com Tue Aug 27 16:30:07 2002 From: mcherm@destiny.com (Michael Chermside) Date: Tue, 27 Aug 2002 11:30:07 -0400 Subject: [Python-Dev] Re: Fw: Security hole in rexec? References: <3D6B7768.5000804@destiny.com> <200208271502.g7RF2OG12350@odiug.zope.com> Message-ID: <3D6B9AFF.7020903@destiny.com> >>> [rexec is broke] >> [let's document that] > Yes. This should be done. > > --Guido van Rossum (home page: http://www.python.org/~guido/) OK. I'll submit a patch. -- Michael Chermside From guido@python.org Tue Aug 27 16:02:24 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 27 Aug 2002 11:02:24 -0400 Subject: [Python-Dev] Fw: Security hole in rexec? In-Reply-To: Your message of "Tue, 27 Aug 2002 08:58:16 EDT." <3D6B7768.5000804@destiny.com> References: <3D6B7768.5000804@destiny.com> Message-ID: <200208271502.g7RF2OG12350@odiug.zope.com> > > [rexec compromised by deleting __builtins__] > > > > This has been known for a while, see python.org/sf/577530. > > > > My recommendation is the same as always: don't trust rexec. > > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > I think it is a VERY BAD idea to advertise publicly that rexec can be > used to "safely" restrict execution, while privately (ie, the above > postings to a developers-only list and to sourceforge). > > Therefore I propose that the official documentation to the Python > Library Reference for the module rexec be modified to add a note saying > that rexec is not completely reliable and can be undermined by a > knowledgable hacker. The current documentation STRONGLY implies this is > NOT the case by explaining in detail the more minor susceptibility to > DOS attacks (memory or CPU time) and raising SystemExit. > > Why not add something like the following to the beginning of the module > documentation: > > """ > Warning: While the rexec module is designed to perform as described > below, it does have a few known vulnerabilities which could be exploited > by carefully written code. Thus it should not be relied upon in > situations requiring "production ready" security. In such situations, > execution via sub-processes (a separate Python executable) or very > careful "cleansing" of data to be processed may be necessary. > Alternatively, help in patching known rexec vulnerabilities would be > welcomed. > """ > > Admitting to library weaknesses (especially in the area of security) > doesn't make great PR, but at least it's honest! Yes. This should be done. --Guido van Rossum (home page: http://www.python.org/~guido/) From mmatus@dinha.acms.arizona.edu Tue Aug 27 20:37:55 2002 From: mmatus@dinha.acms.arizona.edu (Marcelo Matus) Date: Tue, 27 Aug 2002 12:37:55 -0700 Subject: [Python-Dev] valgrind and python? Message-ID: <3D6BD513.4050902@acms.arizona.edu> Did somebody already test python 2.2.1 using valgrind 1.0.1? http://developer.kde.org/~sewardj/ because testing one of my own modules, I get the following recurrent error report whenever I import something: ==19951== Conditional jump or move depends on uninitialised value(s) ==19951== at 0x8094B85: find_module (in /home/mmatus/oss2/gcc3/bin/python) ==19951== by 0x8095DE2: import_submodule (Python/import.c:1887) ==19951== by 0x80959B8: load_next (Python/import.c:1752) ==19951== by 0x8097608: import_module_ex (Python/import.c:1603) and I don't know if I can ignore it or if this is a real python error. Marcelo PS: the error appears whenever you import a module, so, after installing valgrind, try doing: echo import math > test.py valgrind /usr/local/bin/python test.py if you have python installed in the /usr/local/bin directory, of course. From steven.robbins@videotron.ca Tue Aug 27 20:57:57 2002 From: steven.robbins@videotron.ca (Steve M. Robbins) Date: Tue, 27 Aug 2002 15:57:57 -0400 Subject: [Python-Dev] Re: Weird error handling in os._execvpe Message-ID: <20020827195757.GD27742@nyongwa.montreal.qc.ca> Hello, I think the patch associated with this thread has an unintended consequence. In http://mail.python.org/pipermail/python-dev/2002-August/027229.html Zack pointed out three flaws in the original code: [...] Third, if an error other than the expected one comes back, the loop clobbers the saved exception info and keeps going. Consider the situation where PATH=/bin:/usr/bin, /bin/foobar exists but is not executable by the invoking user, and /usr/bin/foobar does not exist. The exception thrown will be 'No such file or directory', not the expected 'Permission denied'. The patch, as I understand it, changes the behaviour so as to raise the exception "Permission denied" in this case. Consider a similar situation in which both /bin/foobar (not executable by the user) and /usr/bin/foobar (executable by the user) exist. Given the command "foobar", the shell will execute /usr/bin/foobar. If I understand the patch correctly, python will give up when it encounters /bin/foobar and raise the "Permission denied" exception. I believe this just happened to me today. I had a shell script named "gcc" in ~/bin (first on my path) some months back. When I was finished with it, I just did "chmod -x ~/bin/gcc" and forgot about it. Today was the first time since this patch went in that I ran gcc via python (using scipy's weave). Boy was I surprised at the message "unable to execute gcc: Permission denied"! I guess the fix is to save the EPERM exception and keep going in case there is an executable later in the path. Regards, -Steve From neal@metaslash.com Wed Aug 28 00:17:59 2002 From: neal@metaslash.com (Neal Norwitz) Date: Tue, 27 Aug 2002 19:17:59 -0400 Subject: [Python-Dev] valgrind and python? References: <3D6BD513.4050902@acms.arizona.edu> Message-ID: <3D6C08A7.4B3007D7@metaslash.com> Marcelo Matus wrote: > > Did somebody already test python 2.2.1 using valgrind 1.0.1? I have used valgrind on python, but mostly the latest CVS version, not 2.2.1. valgrind and python were built with gcc 2.96 (redhat 7.2). I also use purify from time to time when it works (pretty rare). > because testing one of my own modules, I get the following > recurrent error report whenever I import something: > > ==19951== Conditional jump or move depends on uninitialised value(s) > ==19951== at 0x8094B85: find_module (in > /home/mmatus/oss2/gcc3/bin/python) > ==19951== by 0x8095DE2: import_submodule (Python/import.c:1887) > ==19951== by 0x80959B8: load_next (Python/import.c:1752) > ==19951== by 0x8097608: import_module_ex (Python/import.c:1603) > > and I don't know if I can ignore it or if this is a real python error. Unlikely a python problem. I just tried valgrind with the current CVS version of 2.2.1+ (what will eventually become 2.2.2). There were no problems reported doing import math. You can try several things to fix the warning: * run without your module * use a different compiler (version) * run the CVS version: cvs upd -r release22-maint * pass --workaround-gcc296-bugs=yes option to valgrind Neal From mmatus@dinha.acms.arizona.edu Wed Aug 28 01:25:53 2002 From: mmatus@dinha.acms.arizona.edu (Marcelo Matus) Date: Tue, 27 Aug 2002 17:25:53 -0700 Subject: [Python-Dev] valgrind and python? References: <3D6BD513.4050902@acms.arizona.edu> <3D6C08A7.4B3007D7@metaslash.com> Message-ID: <3D6C1891.5040400@acms.arizona.edu> hmmm... I tried what you said: - I used the last cvs 2.2 version (cvs upd -r release22-maint) - I never load my module - I used gcc 3.1.1 - I even passed the option --workaround-gcc296-bugs=yes but still I'v got the same kind of error report..... maybe the compiler is too new, but I can go back to gcc 2.96 (because of the c++ support). So, in the meantime, I will just ignore the reports...... Thanks Marcelo. Neal Norwitz wrote: >Marcelo Matus wrote: > > >>Did somebody already test python 2.2.1 using valgrind 1.0.1? >> >> > >I have used valgrind on python, but mostly the latest CVS version, >not 2.2.1. valgrind and python were built with gcc 2.96 (redhat 7.2). >I also use purify from time to time when it works (pretty rare). > > > >>because testing one of my own modules, I get the following >>recurrent error report whenever I import something: >> >>==19951== Conditional jump or move depends on uninitialised value(s) >>==19951== at 0x8094B85: find_module (in >>/home/mmatus/oss2/gcc3/bin/python) >>==19951== by 0x8095DE2: import_submodule (Python/import.c:1887) >>==19951== by 0x80959B8: load_next (Python/import.c:1752) >>==19951== by 0x8097608: import_module_ex (Python/import.c:1603) >> >>and I don't know if I can ignore it or if this is a real python error. >> >> > >Unlikely a python problem. I just tried valgrind with the current >CVS version of 2.2.1+ (what will eventually become 2.2.2). >There were no problems reported doing import math. > >You can try several things to fix the warning: > * run without your module > * use a different compiler (version) > * run the CVS version: cvs upd -r release22-maint > * pass --workaround-gcc296-bugs=yes option to valgrind > >Neal > > From greg@cosc.canterbury.ac.nz Wed Aug 28 01:40:30 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 28 Aug 2002 12:40:30 +1200 (NZST) Subject: [Python-Dev] A `cogen' module - an observation In-Reply-To: <20020827061554.GA96021@hishome.net> Message-ID: <200208280040.g7S0eUVF007038@kuku.cosc.canterbury.ac.nz> Oren Tirosh : > > Hmmm, in other words, cartesian() is a lazy version > > of zip(). > > Nope. > > >>> zip([1, 2], ['a', 'b']) > [(1, 'a'), (2, 'b')] > > >>> list(cartesian([1, 2], ['a', 'b'])) > [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')] Sorry, BrainError. In that case, it's probably faster to use the nested loops -- unless cartesian() were implemented in C. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From jepler@unpythonic.net Wed Aug 28 02:08:43 2002 From: jepler@unpythonic.net (jepler@unpythonic.net) Date: Tue, 27 Aug 2002 20:08:43 -0500 Subject: [Python-Dev] valgrind and python? In-Reply-To: <3D6C1891.5040400@acms.arizona.edu> References: <3D6BD513.4050902@acms.arizona.edu> <3D6C08A7.4B3007D7@metaslash.com> <3D6C1891.5040400@acms.arizona.edu> Message-ID: <20020827200842.B1248@unpythonic.net> On Tue, Aug 27, 2002 at 05:25:53PM -0700, Marcelo Matus wrote: > So, in the meantime, I will just ignore the reports...... I hope you know that you can write a "supressions" file to make valgrind never print this message for this location in Python. If not, read the documentation .. Jeff From guido@python.org Wed Aug 28 02:02:23 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 27 Aug 2002 21:02:23 -0400 Subject: [Python-Dev] Re: Weird error handling in os._execvpe In-Reply-To: Your message of "Tue, 27 Aug 2002 15:57:57 EDT." <20020827195757.GD27742@nyongwa.montreal.qc.ca> References: <20020827195757.GD27742@nyongwa.montreal.qc.ca> Message-ID: <200208280102.g7S12Nh10976@pcp02138704pcs.reston01.va.comcast.net> > I think the patch associated with this thread has an unintended > consequence. > > In http://mail.python.org/pipermail/python-dev/2002-August/027229.html > Zack pointed out three flaws in the original code: > > [...] > Third, if an error other than the expected one comes back, the > loop clobbers the saved exception info and keeps going. Consider > the situation where PATH=/bin:/usr/bin, /bin/foobar exists but is > not executable by the invoking user, and /usr/bin/foobar does not > exist. The exception thrown will be 'No such file or directory', > not the expected 'Permission denied'. > > The patch, as I understand it, changes the behaviour so as to raise > the exception "Permission denied" in this case. > > Consider a similar situation in which both /bin/foobar (not executable > by the user) and /usr/bin/foobar (executable by the user) exist. > Given the command "foobar", the shell will execute /usr/bin/foobar. > If I understand the patch correctly, python will give up when it > encounters /bin/foobar and raise the "Permission denied" exception. > > I believe this just happened to me today. I had a shell script named > "gcc" in ~/bin (first on my path) some months back. When I was > finished with it, I just did "chmod -x ~/bin/gcc" and forgot about it. > Today was the first time since this patch went in that I ran gcc via > python (using scipy's weave). Boy was I surprised at the message > "unable to execute gcc: Permission denied"! > > I guess the fix is to save the EPERM exception and keep going > in case there is an executable later in the path. This is definitely a bug. Can you or Zack provide a patch? I've opened a bug report: http://python.org/sf/601077 --Guido van Rossum (home page: http://www.python.org/~guido/) From mmatus@dinha.acms.arizona.edu Wed Aug 28 02:11:48 2002 From: mmatus@dinha.acms.arizona.edu (Marcelo Matus) Date: Tue, 27 Aug 2002 18:11:48 -0700 Subject: [Python-Dev] valgrind and python? References: <3D6BD513.4050902@acms.arizona.edu> <3D6C08A7.4B3007D7@metaslash.com> <3D6C1891.5040400@acms.arizona.edu> <20020827200842.B1248@unpythonic.net> Message-ID: <3D6C2354.7070704@acms.arizona.edu> Thanks, now that I know that there is nothing wrong with python, I'll check how to ignore the reports automatically. Thanks again Marcelo jepler@unpythonic.net wrote: >On Tue, Aug 27, 2002 at 05:25:53PM -0700, Marcelo Matus wrote: > > >>So, in the meantime, I will just ignore the reports...... >> >> > >I hope you know that you can write a "supressions" file to make valgrind >never print this message for this location in Python. If not, read the >documentation .. > >Jeff > > From guido@python.org Wed Aug 28 02:26:00 2002 From: guido@python.org (Guido van Rossum) Date: Tue, 27 Aug 2002 21:26:00 -0400 Subject: [Python-Dev] A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: Your message of "Mon, 26 Aug 2002 17:56:22 EDT." References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> Message-ID: <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> > def cartesian(*sequences): > """\ > Generate the `cartesian product' of all SEQUENCES. Each member of the > product is a list containing an element taken from each original sequence. > """ > if len(sequences) == 0: > yield [] > else: > first, remainder = sequences[0], sequences[1:] > for element in first: > for result in cartesian(*remainder): > result.insert(0, element) > yield result It occurred to me that this is rather ineffecient because it invokes itself recursively many time (once for each element in the first sequence). This version is much faster, because iterating over a built-in sequence (like a list) is much faster than iterating over a generator: def cartesian(*sequences): if len(sequences) == 0: yield [] else: head, tail = sequences[:-1], sequences[-1] for x in cartesian(*head): for y in tail: yield x + [y] I also wonder if perhaps ``tail = list(tail)'' should be inserted just before the for loop, so that the arguments may be iterators as well. I would have more suggestions (I expect that Eric Raymond's powerset is much faster than your recursive subsets()) but my family is calling me... --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Wed Aug 28 03:41:18 2002 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 27 Aug 2002 22:41:18 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Doc/lib libexcs.tex,1.43,1.43.6.1 In-Reply-To: <15724.7761.824921.958955@12-248-11-90.client.attbi.com> References: <15724.7761.824921.958955@12-248-11-90.client.attbi.com> Message-ID: <15724.14414.613022.355305@grendel.zope.com> [Following up to a message that went to the checkins list.] Raymond sez: > Note change in behavior from 1.5.2. The new argument to > NameError is an error message and not just the missing name. Skip Montanaro writes: > It seems to me that somewhere in the docs it would be worthwhile to state > > Messages to exceptions are not part of the Python API. Their contents > may change from one version of Python to the next without warning and > should not be relied on for code which will be run with multiple > versions of the interpreter. Definately! The catch, of course, is that it's not clear (perhaps only to me?) that what changed was a message. I'd interpret the original behavior (if documented, which I won't bother to check) as an API requirement. AttributeError use to have a similar behavior; I don't know how rigorously that's been maintained either. In either case, I think the ideal solution to the problem of figuring out what went wrong, from within the executing program, is for these errors to have an attribute that identifies the missing name ("name" would be a good name for it). KeyError could similarly have an attribute "key". To deal with existing code, the attributes would not be set. Additional C functions could be provided for use in code that is modified to provide the information. -Fred -- Fred L. Drake, Jr. PythonLabs at Zope Corporation From tim.one@comcast.net Wed Aug 28 03:43:06 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 27 Aug 2002 22:43:06 -0400 Subject: [Python-Dev] FW: Your message to Python-Dev awaits moderator approval Message-ID: Well, if SpamAssassin wasn't so stupid, I suppose you could have read this msg . -----Original Message----- From: python-dev-admin@python.org [mailto:python-dev-admin@python.org] Sent: Tuesday, August 27, 2002 10:38 PM To: tim.one@comcast.net Subject: Your message to Python-Dev awaits moderator approval Your mail to 'Python-Dev' with the subject The first trustworthy GBayes results Is being held until the list moderator can review it for approval. The reason it is being held: Message has a suspicious header Either the message will get posted to the list, or you will receive notification of the moderator's decision. From tim.one@comcast.net Wed Aug 28 03:36:17 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 27 Aug 2002 22:36:17 -0400 Subject: [Python-Dev] The first trustworthy GBayes results Message-ID: Setting this up has been a bitch. All early attempts floundered beca= use it turned out there was *some* systematic difference between the ham and= spam archives that made the job trivial. The ham archive: I selected 20,000 messages, and broke them into 5 s= ets of 4,000 each, at random, from a python-list archive Barry put together, containing msgs only after SpamAssassin was put into play on python.o= rg. It's hoped that's pretty clean, but nobody checked all ~=3D 160,000+ = msgs. As will be seen below, it's not clean enough. The spam archive: This is essentially all of Bruce Guenter's 2002 sp= am collection, at . It was broken at ra= ndom into 5 sets of 2,750 spams each. Problems included: + Mailman added distinctive headers to every message in the ham archive, which appear nowhere in the spam archive. A Bayesian classifier picks up on that immediately. + Mailman also adds "[name-of-list]" to every Subject line. + The spam headers had tons of clues about Bruce Guenter's mailing addresses that appear nowhere in the ham headers. + The spam archive has Windows line ends (\r\n), but the ham archive plain Unix \n. This turned out to be a killer clue(!) in the simpl= est character n-gram attempts. (Note: I can't use text mode to read msgs, because there are binary characters in the archives that Windows treats as EOF in text mode -- indeed, 400MB of the ham archive vanishes when read in text mode!) What I'm reporting on here is after normalizing all line-ends to \n, = and ignoring the headers *completely*. There are obviously good clues in= the headers, the problem is that they're killer-good clues for accidental reasons in this test data. I don't want to write code to suppress th= ese clues either, as then I'd be testing some mix of my insights (or lack thereof) with what a blind classifier would do. But I don't care how= good I am, I only care about how well the algorithm does. Since it's ignoring the headers, I think it's safe to view this as a = lower bound on what can be achieved. There's another way this should be a = lower bound: def tokenize_split(string): for w in string.split(): yield w tokenize =3D tokenize_split class Msg(object): def __init__(self, dir, name): path =3D dir + "/" + name self.path =3D path f =3D file(path, 'rb') guts =3D f.read() f.close() # Skip the headers. i =3D guts.find('\n\n') if i >=3D 0: guts =3D guts[i+2:] self.guts =3D guts def __iter__(self): return tokenize(self.guts) This is about the stupidest tokenizer imaginable, merely splitting th= e body on whitespace. Here's the output from the first run, training agains= t one pair of spam+ham groups, then seeing how its predictions stack up aga= inst each of the four other pairs of spam+ham groups: Training on Data/Ham/Set1 and Data/Spam/Set1 ... 4000 hams and 2750 s= pams testing against Data/Spam/Set2 and Data/Ham/Set2 tested 4000 hams and 2750 spams false positive: 0.00725 (i.e., under 1%) false negative: 0.0530909090909 (i.e., over 5%) testing against Data/Spam/Set3 and Data/Ham/Set3 tested 4000 hams and 2750 spams false positive: 0.007 false negative: 0.056 testing against Data/Spam/Set4 and Data/Ham/Set4 tested 4000 hams and 2750 spams false positive: 0.0065 false negative: 0.0545454545455 testing against Data/Spam/Set5 and Data/Ham/Set5 tested 4000 hams and 2750 spams false positive: 0.00675 false negative: 0.0516363636364 It's a Good Sign that the false positive/negative rates are very clos= e across the four test runs. It's possible to quantify just how good a= sign that is, but they're so close by eyeball that there's no point in bot= hering. This is using the new Tester.py in the sandbox, and that class automa= tically remembers the false positives and negatives. Here's the start of the= first false positive from the first run: """ It's not really hard!! Turn $6.00 into $1,000 or more...read this to find out how!! READING THIS COULD CHANGE YOUR LIFE!! I found this on a bulletin board anddecided to try it. A little while back, while chatting on the internet, I cam= e across an article similar to this that said you could make thousands of dollars in cash within weeks with only an initial investment of $6.00! So I thought, "Yeah right, this must be a scam", but like most of us, I was curious, so I kept reading. Anyway, it said that you send $1.00 to each of the six names and address statedin the article. You then place your own name and address in the bottom of th= e list at #6, and post the article in at least 200 newsgroups (There are thousands) or e-mail them. No """ Call me forgiving, but I think it's vaguely possible that this should= have been in the spam corpus instead . Here's the start of the second false positive: """ Please forward this message to anyone you know who is active in the s= tock market. See Below for Press Release xXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxXxX Dear Friends, I am a normal investor same as you. I am not a finance professional= nor am I connected to FDNI in any way. I recently stumbled onto this OTC stock (FDNI) while searching throug= h yahoo for small float, big potential stocks. At the time, the company had r= eleased a press release which stated they were doing a stock buyback. Intrig= ued, I bought 5,000 shares at $.75 each. The stock went to $1.50 and I sold= my shares. I then bought them back at $1.15. The company then circulat= ed another press release about a foreign acquisition (see below). The s= tock jumped to $2.75 (I sold @ $2.50 for a massive profit). I then bought= back in at $1.25 where I am holding until the next major piece of news. """ Here's the start of the third: """ Grand Treasure Industrial Limited Contact Information We are a manufacturer and exporter in Hong Kong for all kinds of plas= tic products, We export to worldwide markets. Recently , we join-ventured with a ba= g factory in China produce all kinds of shopping , lady's , traveller's bags.... visit our page and send us your enquiry by email now. Contact Address : Rm. 1905, Asian Trade Centre , 79 Lei Muk Rd, Tsuen Wan , Hong Kong. Telephone : ( 852 ) 2408 9382 """ That is, all the "false positives" there are blatant spam. It will t= ake a long time to sort this all out, but I want to make a point here now: = the classifier works so well that it can *help* clean the ham corpus! I = haven't found a non-spam among the "false positives" yet. Another lesson rei= nforces one from my previous life in speech recognition: rigorous data colle= ction, cleaning, tagging and maintenance is crucial when working with statis= ical approaches, and is damned expensive to do. Here's the start of the first "false negative" (including the headers= ): """ Return-Path: <911@911.COM> Delivered-To: em-ca-bruceg@em.ca Received: (qmail 24322 invoked from network); 28 Jul 2002 12:51:44 -0= 000 Received: from unknown (HELO PC-5.) (61.48.16.65) by churchill.factcomp.com with SMTP; 28 Jul 2002 12:51:44 -0000 x-esmtp: 0 0 1 Message-ID: <1604543-22002702894513952@smtp.vip.sina.com> To: "NEW020515" <911@911.COM> =46rom: "=D6=D0=B9=FAIT=CA=FD=BE=DD=BF=E2=CD=F8=D5=BE=A3=A8www.itdata= base.net =A3=A9" <911@911.COM> Subject: =D6=D0=B9=FAIT=CA=FD=BE=DD=BF=E2=CD=F8=D5=BE=A3=A8www.itdata= base.net =A3=A9 Date: Sun, 28 Jul 2002 17:45:13 +0800 MIME-Version: 1.0 Content-type: text/plain; charset=3Dgb2312 Content-Transfer-Encoding: quoted-printable Content-Length: 977 =3DD6=3DD0=3DB9=3DFAIT=3DCA=3DFD=3DBE=3DDD=3DBF=3DE2=3DCD=3DF8=3DD5= =3DBE=3DA3=3DA8www=3D2Eitdatabase=3D2Enet =3DA3=3D =3DA9=3DCC=3DE1=3DB9=3DA9=3DB4=3DF3=3DC1=3DBF=3DD3=3DD0=3DB9=3DD8= =3DD6=3DD0=3DB9=3DFAIT/=3DCD=3DA8=3DD0=3DC5=3DCA=3DD0=3DB3=3D =3DA1=3DD2=3DD4=3DBC=3DB0=3DC8=3DAB=3DC7=3DF2IT/=3DCD=3DA8=3DD0=3DC5= =3DCA=3DD0=3DB3=3DA1=3DB5=3DC4=3DCF=3DE0=3DB9=3DD8=3DCA=3D =3DFD=3DBE=3DDD=3DBA=3DCD=3DB7=3DD6=3DCE=3DF6=3DA1=3DA3 =3DB1=3DBE=3DCD=3DF8=3DD5=3DBE=3DC9=3DE6=3DBC=3DB0=3DD3=3DD0=3DB9= =3DD8=3D =3DB5=3DE7=3DD0=3DC5=3DD4=3DCB=3DD3=3DAA=3DCA=3DD0=3DB3=3DA1=3DA1= =3DA2=3DB5=3DE7=3DD0=3DC5=3DD4=3DCB=3DD3=3DAA=3DC9=3DCC=3DA1=3D """ Since I'm ignoring the headers, and the tokenizer is just a whitespac= e split, each line of quoted-printable looks like a single word to the classifier. Since it's never seen these "words" before, it has no re= ason to believe they're either spam or ham indicators, and favors calling it = ham. One more mondo cool thing and that's it for now. The GrahamBayes cla= ss keeps track of how many times each word makes it into the list of the= 15 strongest indicators. These are the "killer clues" the classifier ge= ts the most value from. The most valuable spam indicator turned out to be "
" -- there's simply almost no HTML mail in the ham archive (but = note that this clue would be missed if you stripped HTML!). You're never = going to guess what the most valuable non-spam indicator was, but it's quit= e plausible after you see it. Go ahead, guess. Chicken . Here are the 15 most-used killer clues across the runs shown above: = the repr of the word, followed by the # of times it made into the 15-best= list, and the estimated probability that a msg is spam if it contains this = word: testing against Data/Spam/Set2 and Data/Ham/Set2 best discrimators: 'Helvetica,' 243 0.99 'object' 245 0.01 'language' 258 0.01 '
' 292 0.99 '>' 339 0.179104 'def' 397 0.01 'article' 423 0.01 'module' 436 0.01 'import' 499 0.01 '
' 652 0.99 '>>>' 667 0.01 'wrote' 677 0.01 'python' 755 0.01 'Python' 1947 0.01 'wrote:' 1988 0.01 testing against Data/Spam/Set3 and Data/Ham/Set3 best discrimators: 'string' 494 0.01 'Helvetica,' 496 0.99 'language' 524 0.01 '
' 553 0.99 '>' 687 0.179104 'article' 851 0.01 'module' 857 0.01 'def' 875 0.01 'import' 1019 0.01 '
' 1288 0.99 '>>>' 1344 0.01 'wrote' 1355 0.01 'python' 1461 0.01 'Python' 3858 0.01 'wrote:' 3984 0.01 testing against Data/Spam/Set4 and Data/Ham/Set4 best discrimators: 'object' 749 0.01 'Helvetica,' 757 0.99 'language' 763 0.01 '
' 877 0.99 '>' 954 0.179104 'article' 1240 0.01 'module' 1260 0.01 'def' 1364 0.01 'import' 1517 0.01 '
' 1765 0.99 '>>>' 1999 0.01 'wrote' 2071 0.01 'python' 2160 0.01 'Python' 5848 0.01 'wrote:' 6021 0.01 testing against Data/Spam/Set5 and Data/Ham/Set5 best discrimators: 'object' 980 0.01 'language' 992 0.01 'Helvetica,' 1005 0.99 '
' 1139 0.99 '>' 1257 0.179104 'article' 1678 0.01 'module' 1702 0.01 'def' 1846 0.01 'import' 2003 0.01 '
' 2387 0.99 '>>>' 2624 0.01 'wrote' 2743 0.01 'python' 2864 0.01 'Python' 7830 0.01 'wrote:' 8060 0.01 Note that an "intelligent" tokenizer would likely miss that the Pytho= n prompt ('>>>') is a great non-spam indicator on python-list. I've ha= d this argument with some of you before , but the best way to let this= kind of thing be as intelligent as it can be is not to try to help it too = much: it will learn things you'll never dream of, provided only you don't f= ilter clues out in an attempt to be clever. everything's-a-clue-ly y'rs - tim From skip@pobox.com Wed Aug 28 04:09:49 2002 From: skip@pobox.com (Skip Montanaro) Date: Tue, 27 Aug 2002 22:09:49 -0500 Subject: [Python-Dev] FW: Your message to Python-Dev awaits moderator approval In-Reply-To: References: Message-ID: <15724.16125.557671.521082@12-248-11-90.client.attbi.com> Tim> Well, if SpamAssassin wasn't so stupid, I suppose you could have Tim> read this msg . I think that response was generated by Mailman. SpamAssassin does nothing more than tag messages... Skip From tim.one@comcast.net Wed Aug 28 04:20:06 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 27 Aug 2002 23:20:06 -0400 Subject: [Python-Dev] FW: Your message to Python-Dev awaits moderator approval In-Reply-To: <15724.16125.557671.521082@12-248-11-90.client.attbi.com> Message-ID: [Skip Montanaro] > I think that response was generated by Mailman. SpamAssassin does > nothing more than tag messages... Like I care who the stupid party is -- the smart party was Martijn Pieters, who now stays up all night waiting to approve my msgs . From tim.one@comcast.net Wed Aug 28 04:51:09 2002 From: tim.one@comcast.net (Tim Peters) Date: Tue, 27 Aug 2002 23:51:09 -0400 Subject: [Python-Dev] FW: Your message to Python-Dev awaits moderator approval In-Reply-To: Message-ID: FYI, here's the closest thing to a real false positive I've seen so far: """ What is the key for to break the script execution in pythonwin? Tnx -- -- Neo ************************************ Follow the White Rabbit... Knock Knock Neo... ICQ #42292922 Webmaster di www.thezion.net La prima community di web developer Ad Free The first web developer's community Ad Free ************************************ """ Other "false positives" included a strangely quoted copy of the Nigerian scam, "STOP PAYING $19.95 or more TODAY for your web site, WHEN YOU CAN GET ONE FOR ONLY $11.95 PER MONTH!", something entirely composed of high-bit characters except for a URL pointing to a Russian hosting site, "I am looking for young models (prefer early teen 14-16 year old female) for nude and semi-nude photography", "Everyone likes making easy money. This place pays you 50 cents for every hour you browse the web!!!", "We want to invited you for a new adult TOP50 on http://www.50freepics.com", and my favorite: These girls are not phonesex workers. They are horny girls who are on the line cuz it's free phonesex for them. Unfortunately, I don't think I can fudge this system to let msgs thru from my sisters . From skip@pobox.com Wed Aug 28 05:04:41 2002 From: skip@pobox.com (Skip Montanaro) Date: Tue, 27 Aug 2002 23:04:41 -0500 Subject: [Python-Dev] FW: Your message to Python-Dev awaits moderator approval In-Reply-To: References: Message-ID: <15724.19417.618634.375804@12-248-11-90.client.attbi.com> Tim> FYI, here's the closest thing to a real false positive I've seen so Tim> far: I have much smaller spam and ham corpora (currently about 400 msgs each), but both consist only of messages sent to me in the past couple weeks (though not all messages sent during that interval), so some of the header clues which skewed Tim's tests shouldn't be present. Using my currently undeleted Python mail as "unknown" (but which doesn't actually contain any spam), I saw two false positives. One had an attached gif image. The other was a one-line text+html message whose "words" were thus dominated by the HTML tags in the second part. Once my spam and ham grow to something more like 2000 each I will try Tim's technique of splitting them into smaller chunks, training on one chunk, then testing against the remaining chunks. Skip From barry@python.org Wed Aug 28 05:06:52 2002 From: barry@python.org (Barry A. Warsaw) Date: Wed, 28 Aug 2002 00:06:52 -0400 Subject: [Python-Dev] FW: Your message to Python-Dev awaits moderator approval References: <15724.16125.557671.521082@12-248-11-90.client.attbi.com> Message-ID: <15724.19548.493171.989954@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> [Skip Montanaro] >> I think that response was generated by Mailman. SpamAssassin >> does nothing more than tag messages... TP> Like I care who the stupid party is -- the smart party was TP> Martijn Pieters, who now stays up all night waiting to approve TP> my msgs . Tim's of course joking, because he cares intimately and deeply about all things. But actually it was a combination of brilliant software. SA tagged the message with a X-Spam-Level: header value that the python-dev's Mailman filter was set up to catch as "suspicious". -Barry From mj@zope.com Wed Aug 28 05:17:25 2002 From: mj@zope.com (Martijn Pieters) Date: Wed, 28 Aug 2002 00:17:25 -0400 Subject: [Python-Dev] FW: Your message to Python-Dev awaits moderator approval In-Reply-To: References: <15724.16125.557671.521082@12-248-11-90.client.attbi.com> Message-ID: <20020828041725.GB21804@zope.com> On Tue, Aug 27, 2002 at 11:20:06PM -0400, Tim Peters wrote: > Like I care who the stupid party is -- the smart party was Martijn Pieters, > who now stays up all night waiting to approve my msgs . Studying or waiting on Tim Peters, makes no difference ;) -- Martijn Pieters | Software Engineer mailto:mj@zope.com | Zope Corporation http://www.zope.com/ | Creators of Zope http://www.zope.org/ --------------------------------------------- From oren-py-d@hishome.net Wed Aug 28 05:44:50 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Wed, 28 Aug 2002 00:44:50 -0400 Subject: [Python-Dev] A `cogen' module - an observation In-Reply-To: <200208280040.g7S0eUVF007038@kuku.cosc.canterbury.ac.nz> References: <20020827061554.GA96021@hishome.net> <200208280040.g7S0eUVF007038@kuku.cosc.canterbury.ac.nz> Message-ID: <20020828044450.GA79805@hishome.net> On Wed, Aug 28, 2002 at 12:40:30PM +1200, Greg Ewing wrote: > Oren Tirosh : > > > > Hmmm, in other words, cartesian() is a lazy version > > > of zip(). > > > > Nope. > > > > >>> zip([1, 2], ['a', 'b']) > > [(1, 'a'), (2, 'b')] > > > > >>> list(cartesian([1, 2], ['a', 'b'])) > > [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')] > > Sorry, BrainError. In that case, it's probably > faster to use the nested loops -- unless > cartesian() were implemented in C. Yes, but a nested loop cannot be easily passed as an argument to a function. Generator functions are pretty efficient, too - yield does not incur the relatively high overhead of Python function calls. Oren From oren-py-d@hishome.net Wed Aug 28 06:13:55 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Wed, 28 Aug 2002 01:13:55 -0400 Subject: [Python-Dev] A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> References: <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <20020828051355.GB79805@hishome.net> On Tue, Aug 27, 2002 at 09:26:00PM -0400, Guido van Rossum wrote: > It occurred to me that this is rather ineffecient because it invokes > itself recursively many time (once for each element in the first > sequence). This version is much faster, because iterating over a > built-in sequence (like a list) is much faster than iterating over a > generator: > > def cartesian(*sequences): > if len(sequences) == 0: > yield [] > else: > head, tail = sequences[:-1], sequences[-1] > for x in cartesian(*head): > for y in tail: > yield x + [y] My implementation from http://www.tothink.com/python/dataflow/xlazy.py: def xcartprod(arg1, *args): if not args: for x in arg1: yield (x,) elif len(args) == 1: arg2 = args[0] for x in arg1: for y in arg2: yield x, y else: for x in arg1: for y in xcartprod(args[0], *args[1:]): yield (x,) + y Special-casing the 2 argument case helps a lot. It brings the performace within 50% of nested loops which means that if you actually do something inside the loop the overhead is quite negligible. The 'x' prefix is shared with other functions in this module: a lazy xmap, xzip and xfilter. > I also wonder if perhaps ``tail = list(tail)'' should be inserted > just before the for loop, so that the arguments may be iterators as > well. Ahh... re-iterability again... This is a good example of a function that *fails silently* for non re-iterable arguments. Slurping the tail into a list loses the lazy efficiency of this function. One of the ways I've used this function is to scan combinations until a condition is satisfied. The iteration is always terminated before reaching the end. Reading ahead may waste computation and memory. All I want is something that will raise an exception if any argument but the first is not re-iterable (e.g. my reiter() proposal). I'll add list() to the argument myself if I really want to. Don't try to guess what I meant. Oren From vinay_sajip@red-dove.com Wed Aug 28 11:45:11 2002 From: vinay_sajip@red-dove.com (Vinay Sajip) Date: Wed, 28 Aug 2002 11:45:11 +0100 Subject: [Python-Dev] PEP 282 Implementation References: <00e001c2261d$19bfc320$652b6992@alpha> <200208092040.g79Ke3S31416@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <006601c24e7f$fcff5440$652b6992@alpha> > In general the code looks good. Only one style nits: I prefer > docstrings that have a one-line summary, then a blank line, and then a > longer description. I will update the docstrings as per your feedback. > There's a lot of code there! Should it perhaps be broken up into > different modules? Perhaps it should become a logging *package* with > submodules that define the various filters and handlers. How strongly do you feel about this? I did think about doing this and in fact the first implementation of the module was as a package. I found this a little more cumbersome than the single-file solution, and reimplemented as logging.py. The module is a little on the large side but the single-file organization makes it a little easier to use. > - Why does the FileHandler open the file with mode "a+" (and later > with "w+")? The "+" makes the file readable, but I see no reason to > read it. Am I missing? No, you're right - using "a" and "w" should work. I'll change the code to lose the "+". > - setRollover(): the explanation isn't 100% clear. I *think* that you > always write to "app.log", and when that's full, you rename it to > app.log.1, and app.log.1 gets renamed to app.log.2, and so on, and > then you start writing to a new app.log, right? Yes. The original implementation was different - it just closed the current file and opened a new file app.log.n. The current implementation is slightly slower due to the need to rename several files, but the user can tell more easily which the latest log file is. I will update the setRollover() docstring to indicate more clearly how it works; I'm assuming that the current algorithm is deemed good enough. > - class SocketHandler: why set yourself up for buffer overflow by > using only 2 bytes for the packet size? You can use the struct > module to encode/decode this, BTW. I also wonder what the > application for this is, BTW. I agree about the 2-byte limit. I can change it to use struct and an integer length. The application for encoding the length is simply to allow a socket-based server to handle multiple events sent by SocketHandler, in the event that the connection is kept open as long as possible and not shut down after every event. > - method send(): in Python 2.2 and later, you can use the sendall() > socket method which takes care of this loop for you. OK. I can update the code to use this in the case of 2.2 and later. > - class DatagramHandler, method send(): I don't think UDP handles > fragmented packets very well -- if you have to break the packet up, > there's no guarantee that the receiver will see the parts in order > (or even all of them). You're absolutely right - I wasn't thinking clearly enough about how UDP actually works. I will replace the loop with a single sendto() call. > - fileConfig(): Is there documentation for the configuration file? > There is some documentation in the python_logging.html file which is part of the distribution and also on the Web at http://www.red-dove.com/python_logging.html - it's in the form of comments in an annotated logconf.ini. I have not polished the documentation in this area as I'm not sure how much of the configuration stuff should be in the logging module itself. Feedback I've had indicates that at least some people object moderately strongly to having a particular configuration design forced on them. I'd appreciate views on this. Many thanks for the feedback, Vinay Sajip From gward@mems-exchange.org Wed Aug 28 14:27:16 2002 From: gward@mems-exchange.org (Greg Ward) Date: Wed, 28 Aug 2002 09:27:16 -0400 Subject: [Python-Dev] Re: The first trustworthy GBayes results In-Reply-To: References: Message-ID: <20020828132716.GA12475@cthulhu.gerg.ca> On 27 August 2002, Tim Peters said: > Setting this up has been a bitch. All early attempts floundered because it > turned out there was *some* systematic difference between the ham and spam > archives that made the job trivial. > > The ham archive: I selected 20,000 messages, and broke them into 5 sets of > 4,000 each, at random, from a python-list archive Barry put together, > containing msgs only after SpamAssassin was put into play on python.org. > It's hoped that's pretty clean, but nobody checked all ~= 160,000+ msgs. As > will be seen below, it's not clean enough. One of the other perennial-seeming topics on spamassassin-devel (a list that I follow only sporodically) is that careful manual cleaning of your corpus is *essential*. The concern of the main SA developers is that spam in your non-spam folder (and vice-versa) will prejudice the genetic algorithm that evolves SA's scores in the wrong direction. Gut instinct tells me the Bayesian approach ought to be more robust against this sort of thing, but even it must have a breaking point at which misclassified messages throw off the probabilities. But that's entirely consistent with your statement: > Another lesson reinforces > one from my previous life in speech recognition: rigorous data collection, > cleaning, tagging and maintenance is crucial when working with statisical > approaches, and is damned expensive to do. On corpus collection... > The spam archive: This is essentially all of Bruce Guenter's 2002 spam > collection, at . It was broken at random > into 5 sets of 2,750 spams each. One possibility occurs to me: we could build our own corpus by collecting spam on python.org for a few weeks. Here's a rough breakdown of mail rejected by mail.python.org over the last 10 days, eyeball-estimated messages per day: bad RCPT 150 - 300 [1] bad sender 50 - 190 [2] relay denied 20 - 180 [3] known spammer addr/domain 15 - 60 8-bit chars in subject 130 - 200 8-bit chars in header addrs 10 - 60 banned charset in subject 5 - 50 [4] "ADV" in subject 0 - 5 no Message-Id header 100 - 400 [5] invalid header address syntax 5 - 50 [6] no valid senders in header 10 - 15 [7] rejected by SpamAssassin 20 - 50 [8] quarantined by SpamAssassin 5 - 50 [8] [1] this includes mail accidentally sent to eg. giudo@python.org, but based on scanning the reject logs, I'd say the vast majority is spam. However, such messages are rejected after RCPT TO, so we never see the message itself. Most of the bad recipient addrs are either ancient (ipc6@python.org, grail-feedback@python.org) or fictitious (success@python.org, info@python.org). [2] sender verification failed, eg. someone tried to claim an envelope sender like foo@bogus.domain. Usually spam, but innocent bystanders can be hit by DNS servers suddenly exploding (hello, comcast.net). This only includes hard failures (DNS "no such domain"), not soft failures (DNS timeout). [3] I'd be leery of accepting mail that's trying to hijack mail.python.org as an open relay, even though that would be a goldmine of spam. (OTOH, we could reject after the DATA command, and save the message anyways.) [4] mail.python.org rejects any message with a properly MIME-encoded subject using any of the following charsets: big5, euc-kr, gb2312, ks_c_5601-1987 [5] includes viruses as well as spam (and no doubt some innocent false positives, although I have added exemptions for the MUA/MTA combinations that most commonly result in legit mail reaching mail.python.org without a Message-Id header, eg. KMail/qmail) [6] eg. "To: all my friends" or "From: <>" [7] no valid sender address in any header line -- eg. someone gives a valid MAIL FROM address, but then puts "From: blah@bogus.domain" in the headers. Easily defeated with a "Sender" or "Reply-to" header. [8] any message scoring >= 10.0 is rejected at SMTP time; any message scoring >= 5.0 but < 10 is saved in /var/mail/spam for later review Executive summary: * it's a good thing we do all those easy checks before involving SA, or the load on the server would be a lot higher * give me 10 days of spam-harvesting, and I can equal Bruce Guenter's spam archive for 2002. (Of course, it'll take a couple of days to set the mail server up for the harvesting, and a couple more days to clean through the ~2000 caught messages, but you get the idea.) > + Mailman added distinctive headers to every message in the ham > archive, which appear nowhere in the spam archive. A Bayesian > classifier picks up on that immediately. > > + Mailman also adds "[name-of-list]" to every Subject line. Perhaps that spam-harvesting run should also set aside a random selection of apparently-non-spam messages received at the same time. Then you'd have a corpus of mail sent to the same server, more-or-less to the same addresses, over the same period of time. Oh, any custom corpus should also include the ~300 false positives and ~600 false negatives gathered since SA started running on mail.python.org in April. Greg From pg@archub.org Wed Aug 28 15:10:54 2002 From: pg@archub.org (Paul Graham) Date: 28 Aug 2002 14:10:54 -0000 Subject: [Python-Dev] Re: The first trustworthy GBayes results Message-ID: <20020828141054.27816.qmail@mail.archub.org> Bayesian filters are pretty robust in the face of corpus contamination, if you have a threshold for the number of occurrences of a word that you'll consider. If you don't do that, then yes, a single legit email in your spam corpus could cause your filters to reject every similar email. A single email could easily contain five to eight words that never occur in any other email. (Username, domain name, server name, street address, etc.) If this got into your spam corpus by mistake, then every succeeding email from the same person would be classified as spam. What this means is that you may want to use slightly different thresholds for occurrences depending on how much you trust the (human) classifier. For an app to be used by end users, you might want to have a high threshold, like 20 occurrences. I find from my own experience that I often misclassify mail. I seem to be more likely to put spam in a legit mail folder than the reverse. But, as you guys found, the first result of testing your filters tends to be to clean up such mistakes. --pg --Greg Ward wrote: > On 27 August 2002, Tim Peters said: > > Setting this up has been a bitch. All early attempts floundered because it > > turned out there was *some* systematic difference between the ham and spam > > archives that made the job trivial. > > > > The ham archive: I selected 20,000 messages, and broke them into 5 sets of > > 4,000 each, at random, from a python-list archive Barry put together, > > containing msgs only after SpamAssassin was put into play on python.org. > > It's hoped that's pretty clean, but nobody checked all ~= 160,000+ msgs. As > > will be seen below, it's not clean enough. > > One of the other perennial-seeming topics on spamassassin-devel (a list > that I follow only sporodically) is that careful manual cleaning of your > corpus is *essential*. The concern of the main SA developers is that > spam in your non-spam folder (and vice-versa) will prejudice the genetic > algorithm that evolves SA's scores in the wrong direction. Gut instinct > tells me the Bayesian approach ought to be more robust against this sort > of thing, but even it must have a breaking point at which misclassified > messages throw off the probabilities. > > But that's entirely consistent with your statement: > > > Another lesson reinforces > > one from my previous life in speech recognition: rigorous data collection, > > cleaning, tagging and maintenance is crucial when working with statisical > > approaches, and is damned expensive to do. > > On corpus collection... > > > The spam archive: This is essentially all of Bruce Guenter's 2002 spam > > collection, at . It was broken at random > > into 5 sets of 2,750 spams each. > > One possibility occurs to me: we could build our own corpus by > collecting spam on python.org for a few weeks. Here's a rough breakdown > of mail rejected by mail.python.org over the last 10 days, > eyeball-estimated messages per day: > > bad RCPT 150 - 300 [1] > bad sender 50 - 190 [2] > relay denied 20 - 180 [3] > known spammer addr/domain 15 - 60 > 8-bit chars in subject 130 - 200 > 8-bit chars in header addrs 10 - 60 > banned charset in subject 5 - 50 [4] > "ADV" in subject 0 - 5 > no Message-Id header 100 - 400 [5] > invalid header address syntax 5 - 50 [6] > no valid senders in header 10 - 15 [7] > rejected by SpamAssassin 20 - 50 [8] > quarantined by SpamAssassin 5 - 50 [8] > > > [1] this includes mail accidentally sent to eg. giudo@python.org, > but based on scanning the reject logs, I'd say the vast majority > is spam. However, such messages are rejected after RCPT TO, > so we never see the message itself. Most of the bad recipient > addrs are either ancient (ipc6@python.org, > grail-feedback@python.org) or fictitious (success@python.org, > info@python.org). > > [2] sender verification failed, eg. someone tried to claim an > envelope sender like foo@bogus.domain. Usually spam, but innocent > bystanders can be hit by DNS servers suddenly exploding (hello, > comcast.net). This only includes hard failures (DNS "no such > domain"), not soft failures (DNS timeout). > > [3] I'd be leery of accepting mail that's trying to hijack > mail.python.org as an open relay, even though that would > be a goldmine of spam. (OTOH, we could reject after the > DATA command, and save the message anyways.) > > [4] mail.python.org rejects any message with a properly MIME-encoded > subject using any of the following charsets: > big5, euc-kr, gb2312, ks_c_5601-1987 > > [5] includes viruses as well as spam (and no doubt some innocent > false positives, although I have added exemptions for the MUA/MTA > combinations that most commonly result in legit mail reaching > mail.python.org without a Message-Id header, eg. KMail/qmail) > > [6] eg. "To: all my friends" or "From: <>" > > [7] no valid sender address in any header line -- eg. someone gives a > valid MAIL FROM address, but then puts "From: blah@bogus.domain" > in the headers. Easily defeated with a "Sender" or "Reply-to" > header. > > [8] any message scoring >= 10.0 is rejected at SMTP time; any > message scoring >= 5.0 but < 10 is saved in /var/mail/spam > for later review > > Executive summary: > > * it's a good thing we do all those easy checks before involving > SA, or the load on the server would be a lot higher > > * give me 10 days of spam-harvesting, and I can equal Bruce > Guenter's spam archive for 2002. (Of course, it'll take a couple > of days to set the mail server up for the harvesting, and a couple > more days to clean through the ~2000 caught messages, but you get > the idea.) > > > + Mailman added distinctive headers to every message in the ham > > archive, which appear nowhere in the spam archive. A Bayesian > > classifier picks up on that immediately. > > > > + Mailman also adds "[name-of-list]" to every Subject line. > > Perhaps that spam-harvesting run should also set aside a random > selection of apparently-non-spam messages received at the same time. > Then you'd have a corpus of mail sent to the same server, more-or-less > to the same addresses, over the same period of time. > > Oh, any custom corpus should also include the ~300 false positives and > ~600 false negatives gathered since SA started running on > mail.python.org in April. > > Greg From guido@python.org Wed Aug 28 15:27:30 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 28 Aug 2002 10:27:30 -0400 Subject: [Python-Dev] A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: Your message of "Wed, 28 Aug 2002 01:13:55 EDT." <20020828051355.GB79805@hishome.net> References: <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> <20020828051355.GB79805@hishome.net> Message-ID: <200208281427.g7SERU819829@odiug.zope.com> > Special-casing the 2 argument case helps a lot. It brings the performace > within 50% of nested loops which means that if you actually do something > inside the loop the overhead is quite negligible. Hm, I tried that and found no difference. Maybe I didn't benchmark right. > Ahh... re-iterability again... > > This is a good example of a function that *fails silently* for non > re-iterable arguments. This failure is hardly silent IMO: the results are totally bogus, which is a pretty good clue that something's wrong. > Slurping the tail into a list loses the lazy efficiency of this function. > One of the ways I've used this function is to scan combinations until a > condition is satisfied. The iteration is always terminated before reaching > the end. Reading ahead may waste computation and memory. I don't understand. The Cartesian product calculation has to iterate over the second argument many times (unless you have it iterate over the first argument many times). So a lazy argument won't work. Am I missing something? > All I want is something that will raise an exception if any argument but > the first is not re-iterable (e.g. my reiter() proposal). I'll add list() > to the argument myself if I really want to. Don't try to guess what I > meant. Actually, I don't want to reiterate this debate. --Guido van Rossum (home page: http://www.python.org/~guido/) From oren-py-d@hishome.net Wed Aug 28 15:49:20 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Wed, 28 Aug 2002 10:49:20 -0400 Subject: [Python-Dev] A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: <200208281427.g7SERU819829@odiug.zope.com> References: <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> <20020828051355.GB79805@hishome.net> <200208281427.g7SERU819829@odiug.zope.com> Message-ID: <20020828144920.GA58184@hishome.net> On Wed, Aug 28, 2002 at 10:27:30AM -0400, Guido van Rossum wrote: > > Ahh... re-iterability again... > > > > This is a good example of a function that *fails silently* for non > > re-iterable arguments. > > This failure is hardly silent IMO: the results are totally bogus, > which is a pretty good clue that something's wrong. Sure, at the interactive prompt or very shallow code it is obvious. Exceptions are noisy. Anything else is silent. > > Slurping the tail into a list loses the lazy efficiency of this function. > > One of the ways I've used this function is to scan combinations until a > > condition is satisfied. The iteration is always terminated before reaching > > the end. Reading ahead may waste computation and memory. > > I don't understand. The Cartesian product calculation has to iterate > over the second argument many times (unless you have it iterate over > the first argument many times). So a lazy argument won't work. Am I > missing something? Even if all the arguments are re-iterable containers the recursive call produces a lazy generator object - the cartesian product of the tail. I don't want to read it eagerly into a list. Oren From guido@python.org Wed Aug 28 16:02:25 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 28 Aug 2002 11:02:25 -0400 Subject: [Python-Dev] A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: Your message of "Wed, 28 Aug 2002 10:49:20 EDT." <20020828144920.GA58184@hishome.net> References: <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> <20020828051355.GB79805@hishome.net> <200208281427.g7SERU819829@odiug.zope.com> <20020828144920.GA58184@hishome.net> Message-ID: <200208281502.g7SF2P320402@odiug.zope.com> > Even if all the arguments are re-iterable containers the recursive call > produces a lazy generator object - the cartesian product of the tail. I > don't want to read it eagerly into a list. And I wasn't proposing that. def cartesian(*sequences): if len(sequences) == 0: yield [] else: head, tail = sequences[:-1], sequences[-1] tail = list(tail) # <--- This is what I was proposing for x in cartesian(*head): for y in tail: yield x + [y] --Guido van Rossum (home page: http://www.python.org/~guido/) From pinard@iro.umontreal.ca Wed Aug 28 16:50:29 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 28 Aug 2002 11:50:29 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: <20020828051355.GB79805@hishome.net> References: <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> <20020828051355.GB79805@hishome.net> Message-ID: [Oren Tirosh] > My implementation from http://www.tothink.com/python/dataflow/xlazy.py: > [...] Special-casing the 2 argument case helps a lot. Good idea! I added such speed-up code for cartesians, subsets and permutations, and am now seeking how to do it (nicely) for combinations and arrangements as well. > Slurping the tail into a list loses the lazy efficiency of this function. Generators are an elegant way to be lazy. I agree that we are likely to loose something if we attempt to do too much, too soon. -- François Pinard http://www.iro.umontreal.ca/~pinard From oren-py-d@hishome.net Wed Aug 28 17:10:36 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Wed, 28 Aug 2002 12:10:36 -0400 Subject: [Python-Dev] A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: <200208281502.g7SF2P320402@odiug.zope.com> References: <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> <20020828051355.GB79805@hishome.net> <200208281427.g7SERU819829@odiug.zope.com> <20020828144920.GA58184@hishome.net> <200208281502.g7SF2P320402@odiug.zope.com> Message-ID: <20020828161036.GA68673@hishome.net> On Wed, Aug 28, 2002 at 11:02:25AM -0400, Guido van Rossum wrote: > > Even if all the arguments are re-iterable containers the recursive call > > produces a lazy generator object - the cartesian product of the tail. I > > don't want to read it eagerly into a list. > > And I wasn't proposing that. > > def cartesian(*sequences): > if len(sequences) == 0: > yield [] > else: > head, tail = sequences[:-1], sequences[-1] > tail = list(tail) # <--- This is what I was proposing > for x in cartesian(*head): > for y in tail: > yield x + [y] Silly me. Too much LISPthink made me automatically see "head" as the first item and "tail" as the rest. Now I see that the head is all but the last item and the tail is the last. It was really funny - like seeing the cup change into two faces right in front of your eyes... Oren From pinard@iro.umontreal.ca Wed Aug 28 17:12:41 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 28 Aug 2002 12:12:41 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Guido van Rossum] > > def cartesian(*sequences): [...] > It occurred to me that this is rather ineffecient because it invokes > itself recursively many time (once for each element in the first > sequence). This version is much faster, because iterating over a > built-in sequence (like a list) is much faster than iterating over a > generator: Granted, thanks! I postponed optimisations for the first draft of `cogen', but if it looks acceptable overall, we can try to get some speed of it now. > def cartesian(*sequences): > if len(sequences) == 0: > yield [] > else: > head, tail = sequences[:-1], sequences[-1] > for x in cartesian(*head): > for y in tail: > yield x + [y] > I also wonder if perhaps ``tail = list(tail)'' should be inserted > just before the for loop, so that the arguments may be iterators as > well. `cogen' does not make any special effort for protecting iterators given as input sequences. `cartesian' is surely not the only place where iterators would create a problem. Solving it as a special case for `cartesian' only is not very nice. Of course, we might transform all sequences into lists everywhere in `cogen', but `list'-ing a list copies it, and I'm not sure this would be such a good idea in the end. Best may be to let the user explicitly transform input iterators into lists by calling `list' explicitly those arguments. That might be an acceptable compromise. > I expect that Eric Raymond's powerset is much faster than your recursive > subsets() [...] Very likely, yes. I started `cogen' with algorithms looking a bit all alike, and did not look at speed. OK, I'll switch. I do not have Eric's algorithm handy, but if I remember well, it merely mapped successive integers to subsets by associating each bit with an element. -- François Pinard http://www.iro.umontreal.ca/~pinard From mcherm@destiny.com Wed Aug 28 17:49:51 2002 From: mcherm@destiny.com (Michael Chermside) Date: Wed, 28 Aug 2002 12:49:51 -0400 Subject: [Python-Dev] Re: PEP 282 Implementation Message-ID: <3D6CFF2F.9080702@destiny.com> >> - setRollover(): the explanation isn't 100% clear. I *think* that you >> always write to "app.log", and when that's full, you rename it to >> app.log.1, and app.log.1 gets renamed to app.log.2, and so on, and >> then you start writing to a new app.log, right? > > Yes. The original implementation was different - it just closed the current > file and opened a new file app.log.n. The current implementation is slightly > slower due to the need to rename several files, but the user can tell more > easily which the latest log file is. I will update the setRollover() > docstring to indicate more clearly how it works; I'm assuming that the > current algorithm is deemed good enough. Why not have the current logfile named "app.log", and when it's full rename it as "app.log.n" (for the approprite value of n)? It's still easy to find the current log file, there's only one file to rename, and the obvious sort order will put the files in chronlogical order not reverse chronological order. The only downside that's obvious to me is that if you just clear out old log files by deleting all but the latest 3, then your numbers will keep increasing over time. But I hardly see that as a problem... if you find that a filename of "app.log.143" really drives you crazy you can just rename the remaining logfiles the next time you clean out the log directory and everything's fine. -- Michael Chermside From guido@python.org Wed Aug 28 18:58:18 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 28 Aug 2002 13:58:18 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: Your message of "Wed, 28 Aug 2002 12:12:41 EDT." References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208281758.g7SHwII01131@odiug.zope.com> > Granted, thanks! I postponed optimisations for the first draft of `cogen', > but if it looks acceptable overall, we can try to get some speed of it now. I'm not saying that it looks good overall -- I'd like to defer to Tim, who has used and written this kind of utities for real, and who probably has a lot of useful feedback. Right now, he's felled by some kind of ilness though. > > def cartesian(*sequences): > > if len(sequences) == 0: > > yield [] > > else: > > head, tail = sequences[:-1], sequences[-1] > > for x in cartesian(*head): > > for y in tail: > > yield x + [y] > > > I also wonder if perhaps ``tail = list(tail)'' should be inserted > > just before the for loop, so that the arguments may be iterators as > > well. > > `cogen' does not make any special effort for protecting iterators > given as input sequences. `cartesian' is surely not the only place > where iterators would create a problem. Solving it as a special > case for `cartesian' only is not very nice. Of course, we might > transform all sequences into lists everywhere in `cogen', but > `list'-ing a list copies it, and I'm not sure this would be such a > good idea in the end. Hm. All but the last will be iterated over many times. In practice the inputs will be relatively small (I can't imagine using this for sequences with 100s of 1000s elements). Or you might sniff the type and avoid the copy if you know it's a list or tuple. Or you might use Oren's favorite rule of thumb and listify it when iter(x) is iter(x) (or iter(x) is x). > Best may be to let the user explicitly transform input iterators into > lists by calling `list' explicitly those arguments. That might be an > acceptable compromise. Maybe. > > I expect that Eric Raymond's powerset is much faster than your recursive > > subsets() [...] > > Very likely, yes. I started `cogen' with algorithms looking a bit all > alike, and did not look at speed. OK, I'll switch. I do not have Eric's > algorithm handy, but if I remember well, it merely mapped successive > integers to subsets by associating each bit with an element. def powerset(base): """Powerset of an iterable, yielding lists.""" pairs = [(2**i, x) for i, x in enumerate(base)] for n in xrange(2**len(pairs)): yield [x for m, x in pairs if m&n] --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Wed Aug 28 20:12:53 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 28 Aug 2002 15:12:53 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: Message-ID: FYI. After cleaning the blatant spam identified by the classifier out of my ham corpus, and replacing it with new random msgs from Barry's corpus, the reported false positive rate fell to about 0.2% (averaging 8 per each batch of 4000 ham test messages). This seems remarkable given that it's ignoring headers, and just splitting the raw text on whitespace in total ignorance of HTML & MIME etc. 'FREE' (all caps) moved into the ranks of best spam indicators. The false negative rate got reduced by a small amount, but I doubt it's a statistically significant reduction (I'll compute that stuff later; I'm looking for Big Things now). Some of these false positives are almost certainly spam, and at least one is almost certainly a virus: these are msgs that are 100% base64-encoded, or maximally obfuscated quoted-printable. That could almost certainly be fixed by, e.g., decoding encoded text. The other false positives seem harder to deal with: + Brief HMTL msgs from newbies. I doubt the headers will help these get through, as they're generally first-time posters, and aren't replies to earlier msgs. There's little positive content, while all elements of raw HTML have high "it's spam" probability. Example: """ --------------=_4D4800B7C99C4331D7B8 Content-Description: filename="text1.txt" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Is there a version of Python with Prolog Extension?? Where can I find it if there is? Thanks, Luis. P.S. Could you please reply to the sender too. --------------=_4D4800B7C99C4331D7B8 Content-Description: filename="text1.html" Content-Type: text/html Content-Transfer-Encoding: quoted-printable Prolog Extension
Is there a version of Python with Prolog Extension??
Where can I find it if there is?

Thanks,
Luis.

P.S. Could you please reply to the sender too.
--------------=_4D4800B7C99C4331D7B8--""" """ Here's how it got scored: prob = 0.999958816093 prob('') = 0.979284 prob('Prolog') = 0.01 prob('') = 0.97989 prob('Thanks,') = 0.0337316 prob('Prolog') = 0.01 prob('Python') = 0.01 prob('NAME=3D"GENERATOR"') = 0.99 prob('') = 0.99 prob('') = 0.989494 prob('') = 0.987429 prob('Thanks,') = 0.0337316 prob('Python') = 0.01 Note that ' > Can you create and use new files with dbhash.open()? > > Yes. But if I run db_dump on these files, it says "unexpected file type > or format", regardless which db_dump version I use (2.0.77, 3.0.55, > 3.1.17) > It may be that db_dump isn't compatible with version 1.85 databse files. I can't remember. I seem to recall that there was an option to build 1.85 versions of db_dump and db_load. Check the configure options for BerkeleyDB to find out. (Also, while you are there, make sure that BerkeleyDB was built the same on both of your platforms...) > > > Try running db_verify (one of the utilities built > > when you compiled DB) on the file and see what it tells you. > > There is no db_verify among my Berkeley DB utilities. There should have been a bunch of them built when you compiled DB. I've got these: -r-xr-xr-x 1 rd users 343108 Dec 11 12:11 db_archive -r-xr-xr-x 1 rd users 342580 Dec 11 12:11 db_checkpoint -r-xr-xr-x 1 rd users 342388 Dec 11 12:11 db_deadlock -r-xr-xr-x 1 rd users 342964 Dec 11 12:11 db_dump -r-xr-xr-x 1 rd users 349348 Dec 11 12:11 db_load -r-xr-xr-x 1 rd users 340372 Dec 11 12:11 db_printlog -r-xr-xr-x 1 rd users 341076 Dec 11 12:11 db_recover -r-xr-xr-x 1 rd users 353284 Dec 11 12:11 db_stat -r-xr-xr-x 1 rd users 340340 Dec 11 12:11 db_upgrade -r-xr-xr-x 1 rd users 340532 Dec 11 12:11 db_verify -- Robin Dunn Software Craftsman robin@AllDunn.com http://wxPython.org Java give you jitters? http://wxPROs.com Relax with wxPython! """ Looks utterly on-topic! So why did Robin's msg get flagged? It's solely due to his Unix name in the ls output(!): prob = 0.999999999895 prob('Berkeley') = 0.01 prob('configure') = 0.01 prob('remember.') = 0.01 prob('these:') = 0.01 prob('recall') = 0.01 prob('rd') = 0.99 prob('rd') = 0.99 prob('rd') = 0.99 prob('rd') = 0.99 prob('rd') = 0.99 prob('rd') = 0.99 prob('rd') = 0.99 prob('rd') = 0.99 prob('rd') = 0.99 prob('rd') = 0.99 Spammers often generate random "word-like" gibberish at the ends of msgs, and "rd" is one of the random two-letter combos that appears in the spam corpus. Perhaps it would be good to ignore "words" with fewer than W characters (to be determined by experiment). The other example is long, an off-topic but delightful exchange between Peter Hansen and Alex Martelli. Here's a "typical" paragraph: Since it's important to use very abundant amounts of water when cooking pasta, the price of what is still a very cheap dish would skyrocket if that abundant water had to be costly bottled mineral water. The scoring: prob = 0.99 prob('"Peter') = 0.01 prob(':-)') = 0.01 prob('') = 0.01 prob('tasks') = 0.01 prob('drinks') = 0.01 prob('wrote') = 0.01 prob('Hansen"') = 0.01 prob('water') = 0.99 prob('water') = 0.99 prob('skyrocket') = 0.99 prob('water') = 0.99 prob('water') = 0.99 prob('water') = 0.99 prob('water') = 0.99 prob('water') = 0.99 Alex is drowning in his aquatic excess . I expect that including the headers would have given these much better chances of getting through, given Robin and Alex's posting histories. Still, the idea of counting words multiple times is open to question, and experiments both ways are in order. + Brief put-ons, like """ HEY DUDEZ ! I WANT TO GET INTO THIS AUTOCODING THING. ANYONE KNOW WHERE I CAN GET SOME IBM 1401 WAREZ ? -- MULTICS-MAN """ It's not actually things like WAREZ that hurt here, it's more the mere fact of SCREAMING: prob = 0.999982095931 prob('AUTOCODING') = 0.2 prob('THING.') = 0.2 prob('DUDEZ') = 0.2 prob('ANYONE') = 0.884211 prob('GET') = 0.847334 prob('GET') = 0.847334 prob('HEY') = 0.2 prob('--') = 0.0974729 prob('KNOW') = 0.969697 prob('THIS') = 0.953191 prob('?') = 0.0490886 prob('WANT') = 0.99 prob('TO') = 0.988829 prob('CAN') = 0.884211 prob('WAREZ') = 0.2 OTOH, a lot of the Python community considered the whole autocoding thread to be spam, and I personally could have lived without this contribution to its legacy (alas, the autocoding thread wasn't spam, just badly off-topic). + Msgs top-quoting an earlier spam in its entirety. For example, one msg quoted an entire Nigerian scam msg, and added just Aw jeez, another one of these Nigerian wire scams. This one has been around for 20 years. What's an acceptable false positive rate? What do we get from SpamAssassin? I expect we can end up below 0.1% here, and with a generous meaning for "not spam", but I think *some* of these examples show that the only way to get a 0% false-positive rate is to recode spamprob like so: def spamprob(self, wordstream, evidence=False): return 0.0 That would also allow other simplifications . From pinard@iro.umontreal.ca Wed Aug 28 20:34:11 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 28 Aug 2002 15:34:11 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: <200208281758.g7SHwII01131@odiug.zope.com> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> <200208281758.g7SHwII01131@odiug.zope.com> Message-ID: [Guido van Rossum] > Right now, [Tim] is felled by some kind of illness though. For the mere crumbs falling off of his project table, he seems to be working like hell, so no surprise his body asks him to slow down once in a while. I wish he will soon be healthy and happy again! Life is not the same when he is away... > All but the last will be iterated over many times. In practice the inputs > will be relatively small (I can't imagine using this for sequences with > 100s of 1000s elements). Do not under-estimate statisticians! They are well known for burning oodles and oodles of computer cycles, contemplating or combining data sets which are not always small, in all ways imaginable. Of course, even statisticians cannot afford all subsets of sets having 100 elements, or will not scan permutations of 1000 elements. But 100s or 1000s elements are well within the bounds of cartesian products. > Or you might use Oren's favorite rule of thumb and listify it when iter(x) > is iter(x) (or iter(x) is x). I'm a bit annoyed by the idea that `iter(x)' might require some computation for producing an iterator, and that we immediately throw away the result. Granted that `__iter__(self): return self' is efficient when an object is an iterator, but nowhere it is said that `__iter__' has to be efficient when the object is a container, and it does not shock me that some complex containers require time to produce their iterator. I much prefer limiting the use of `__iter__' for when one intends to use the iterator... > def powerset(base): > """Powerset of an iterable, yielding lists.""" > pairs = [(2**i, x) for i, x in enumerate(base)] > for n in xrange(2**len(pairs)): > yield [x for m, x in pairs if m&n] Thanks! Hmph! This does not yield the subsets in "sorted order", like the other `cogen' methods do, and I would prefer to keep that promise. Hopefully, the optimisation I added this morning will make both algorithms more comparable, speed-wise. I should benchmark them to see. -- François Pinard http://www.iro.umontreal.ca/~pinard From tim.one@comcast.net Wed Aug 28 20:36:57 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 28 Aug 2002 15:36:57 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: <200208281758.g7SHwII01131@odiug.zope.com> Message-ID: This is a multi-part message in MIME format. --Boundary_(ID_K+WtxWU4+owRK5lyzEc5lw) Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT [Guido] > I'm not saying that it looks good overall -- I'd like to defer to Tim, > who has used and written this kind of utities for real, and who > probably has a lot of useful feedback. Right now, he's felled by some > kind of ilness though. I think that's over! I'm very tired, though (couldn't get to sleep until 11, and woke up at 2 with the last, umm, episode ). This is a Big Project if done right. I volunteered time for it a few years ago, but there wasn't enough interest then to keep it going. I'll attach the last publicly-distribued module I had then, solely devoted to combinations. It was meant to be the first in a series, all following some basic design decisions: + A Basic class that doesn't compromise on speed, typically by working on canonical representatives in Python list-of-int form. + A more general class that deals with arbitrary sequences, perhaps at great loss of efficiency. + Multiple iterators are important: lex order is needed sometimes; Gray code order is an enormous help sometimes; random generation is vital sometimes. + State-of-the-art algorithms. That's a burden for anything that goes into the core -- if it's a toy algorithm, users can do just as well on their own, and then people submit patch after patch that the original author isn't really qualified to judge (else they would have done a state-of-the-art thing to begin with). + The ability to override the random number generator. Python's default WH generator is showing its age as machines get faster; it's simply not adequate anymore for long-running programs making heavy use of it on a fast box. Combinatorial algorithms in particular do tend to make heavy use of it. (Speaking of which, "someone" should look into grabbing one of the Mersenne Twister extensions for Python -- that's the current state of *that* art). Ideas not worth taking: + Leave the chi-square algorithm out of it. A better implementation would be nice to have in a statistics package, but it doesn't belong here regardless. me-i'm-going-back-to-sleep-ly y'rs - tim --Boundary_(ID_K+WtxWU4+owRK5lyzEc5lw) Content-type: text/plain; name=combgen.py Content-transfer-encoding: 7BIT Content-disposition: attachment; filename=combgen.py # Module combgen version 0.9.1 # Released to the public domain 18-Dec-1999, # by Tim Peters (tim_one@email.msn.com). # Provided as-is; use at your own risk; no warranty; no promises; enjoy! """\ CombGen(s, k) supplies methods for generating k-combinations from s. CombGenBasic(n, k) acts like CombGen(range(n), k) but is more efficient. s is of any sequence type such that s supports catenation (s1 + s2) and slicing (s[i:j]). For example, s can be a list, tuple or string. k is an integer such that 0 <= k <= len(s). A k-combination of s is a subsequence C of s where len(C) = k, and for some k integers i_0, i_1, ..., i_km1 (km1 = k-1) with 0 <= i_0 < i_1 < ... < i_km1 < len(s), C[0] is s[i_0] C[1] is s[i_1] ... C[k-1] is s[i_km1] Note that each k-combination is a sequence of the same type as s. Different methods generate k-combinations in lexicographic index order, a particular "Gray code" order, or at random. The .reset() method can be used to start over. The .set_start(ivector) method can be used to force generation to begin at a particular combination. Module function comb(n, k) returns the number of combinations of n things taken k at a time; n >= k >= 0 required. CAUTIONS + The CombGen constructor saves a reference to (not a copy of) s, so don't mutate s after calling CombGen. + For efficiency, CombGenBasic getlex and getgray return the *same* list each time, mutating it in place. You must not mutate this list; and, if you want to save a combination's value across calls, copy the list. For example, >>> g = CombGenBasic(2, 1) >>> x = g.getlex(); y = g.getlex() >>> x is y # the same! 1 >>> x, y # so these print the same thing ([1], [1]) >>> g.reset() >>> x = g.getlex()[:]; y = g.getlex()[:] >>> x, y # copies work as expected ([0], [1]) >>> In contrast, CombGen methods return a new sequence each time -- but they're slower. GETLEX -- LEXICOGRAPHIC GENERATION Each invocation of .getlex() returns a new k-combination of s. The combinations are generated in lexicographic index order (for CombGenBasic, the k-combinations themselves are in lexicographic order). That is, the first k-combination consists of s[0], s[1], ..., s[k-1] in that order; the next of s[0], s[1], ..., s[k] and so on until reaching s[len(s)-k], s[len(s)-k+1], ..., s[len(s)-1] After all k-combinations have been generated, .getlex() returns None. Examples: >>> g = CombGen("abc", 0).getlex >>> g(), g() ('', None) >>> g = CombGen("abc", 1).getlex >>> g(), g(), g(), g() ('a', 'b', 'c', None) >>> g = CombGenBasic(3, 2).getlex >>> print g(), g(), g(), g() [0, 1] [0, 2] [1, 2] None >>> g = CombGen((0, 1, 2), 3).getlex >>> print g(), g(), g() (0, 1, 2) None None >>> p = CombGenBasic(4, 2) >>> g = p.getlex >>> print g(), g(), g(), g(), g(), g(), g(), g() [0, 1] [0, 2] [0, 3] [1, 2] [1, 3] [2, 3] None None >>> p.reset() >>> print g(), g(), g(), g(), g(), g(), g(), g() [0, 1] [0, 2] [0, 3] [1, 2] [1, 3] [2, 3] None None >>> GETGRAY -- GRAY CODE GENERATION Each invocation of .getgray() returns a triple C, tossed, added where C is the next k-combination of s tossed is the element of s removed from the last k-combination added is the element of s added to the last k-combination tossed and added are None for the first call. Consecutive combinations returned by .getgray() differ by two elements (one removed, one added). If you invoke getgray() more than comb(n,k) times, it "wraps around" and generates the same sequence again. Note that the last combination in the return sequence also differs by two elements from the first combination in the return sequence. Gray code ordering can be very useful when you're computing an expensive function on each combination: that exactly one element is added and exactly one removed can often be exploited to save recomputation for the k-2 common elements. >>> o = CombGen("abcd", 2) >>> for i in range(7): # note that this wraps around ... print o.getgray() ('ab', None, None) ('bd', 'a', 'd') ('bc', 'd', 'c') ('cd', 'b', 'd') ('ad', 'c', 'a') ('ac', 'd', 'c') ('ab', 'c', 'b') >>> GETRAND -- RANDOM GENERATION Each invocation of .getrand() returns a random k-combination. >>> o = CombGenBasic(1000, 6) >>> import random >>> random.seed(87654) >>> o.getrand() [69, 223, 437, 573, 722, 778] >>> o.getrand() [409, 542, 666, 703, 732, 847] >>> CombGenBasic(1000000, 4).getrand() [199449, 439831, 606885, 874530] >>> """ # 0,0,1 09-Dec-1999 # initial version # 0,0,2 10-Dec-1999 # Sped CombGenBasic.{getlex, getgray} substantially by no longer # making copies of the indices; getgray is truly O(1) now. # A bad aspect is that they return the same list object each time # now, which can be confusing; e.g., had to change some examples. # Use CombGen instead if this bothers you -- CombGenBasic's # purpose in life is to be lean & mean. # Removed the restriction on mixing calls to CombGenBasic's # getlex and getgray; not sure it's useful, but it was irksome. # Changed __findj to return a simpler result. This is less useful # for getgray, but now getlex can exploit it too (there are no # longer any Python-level loops in CombGenBasic's getlex; there's # an implied C-level loop (via "range"), and it's in the nature of # lex order that this can't be removed). # Added some exhaustive tests for getlex, and finger verification. # 0,9,1 18-Dec-1999 # Changed _testrand to compute and print chi-square statistics, # and probabilities, because one of _testrand's outputs didn't # "look random" to me. Indeed, it's got a poor chi-square value! # But sometimes that *should* happen, and it does not appear to # be happening more often than expected. __version__ = 0, 9, 1 def _chop(n): """n -> int if it fits, else long.""" try: return int(n) except OverflowError: return n def comb(n, k): """n, k -> number of combinations of n items, k at a time. n >= k >= 0 required. >>> for i in range(7): ... print "comb(6, %d) ==" % i, comb(6, i) comb(6, 0) == 1 comb(6, 1) == 6 comb(6, 2) == 15 comb(6, 3) == 20 comb(6, 4) == 15 comb(6, 5) == 6 comb(6, 6) == 1 >>> comb(52, 5) # number of poker hands 2598960 >>> comb(52, 13) # number of bridge hands 635013559600L """ if not n >= k >= 0: raise ValueError("n >= k >= 0 required: " + `n, k`) if k > (n >> 1): k = n-k if k == 0: return 1 result = long(n) i = 2 n, k = n-1, k-1 while k: # assert (result * n) % i == 0 result = result * n / i i = i+1 k = k-1 n = n-1 return _chop(result) import random class CombGenBasic: def __init__(self, n, k): self.n, self.k = n, k if not n >= k >= 0: raise ValueError("n >= k >= 0 required:" + `n, k`) self.reset() def reset(self): """Restore state to that immediately after construction.""" # The first result is the same for either lexicographic or # Gray code generation. self.set_start(range(self.k)) # __findj is used only to initialize self.j for getlex and # getgray. It returns the largest j such that slot j has # "breathing room"; that is, such that slot j isn't at its largest # possible value (n-k+j). j is -1 if no such index exists. # After initialization, getlex and getgray incrementally update # this more efficiently. def __findj(self, v): n, k = self.n, self.k assert len(v) == k j = k-1 while j >= 0 and v[j] == n-k+j: # v[j] is at its largest possible value j = j-1 return j def getlex(self): """Return next (in lexicographic order) k-combination. Return None if all possibilities have been generated. Caution: getlex returns the *same* list each time, mutated in place. Don't mutate it yourself, or save a reference to it (the next call will mutate its contents; make a copy if you need to save the value across calls). """ indices, n, k, j = self.indices, self.n, self.k, self.j if self.firstcall: self.firstcall = 0 return indices if j < 0: return None new = indices[j] = indices[j] + 1 if j+1 == k: if new + 1 == n: j = j-1 else: if new + 1 < indices[j+1]: indices[j:] = range(new, new + k - j) j = k-1 else: j = j-1 self.j = j # assert j == self.__findj(indices) return indices def getgray(self): """Return next (c, tossed, added) triple. c is the next k-combination in a particular Gray code order. tossed is the element of range(n) removed from the last combination. added is the element of range(n) added to the last combination. tossed and added are None if this is the first call, or on every call if there is only one k-combination. Else tossed != added, and neither is None. Caution: getgray wraps around if you invoke it more than comb(n, k) times. Caution: getgray returns the *same* list each time, mutated in place. Don't mutate it yourself, or save a reference to it (the next call will mutate its contents; make a copy if you need to save the value across calls). """ # The popular routine in Nijenhuis & Wilf's "Combinatorial # Algorithms" is exceedingly complicated (although trivial # to program with recursive generators!). # # Instead I'm using a variation of Algorithm A3 from the paper # "Loopless Gray Code Algorithms", by T.A. Jenkyns (Brock # University, Ontario). The code is much simpler, and, # because it's loop-free, takes O(1) time on each call (not # just amortized over the whole sequence). # # Because the paper doesn't yet seem to be well known, here's # the idea: Modify the definition of lexicographic ordering # in a funky way: in the element comparisons, replace "<" by # ">" in every other element position starting at the 2nd. # IOW, and skipping end cases, sequence s is "less than" # sequence t iff their elements are equal up until index # position i, and then s[i] < t[i] if i is even, or s[i] > # t[i] if i is odd. Jenkyns calls this "alternating # lexicographic" order. It's clear that this defines a total # ordering. What isn't obvious is that it's also a Gray code # ordering! Very pretty. # # Modifications made here to A3 are minor, and include # switching from 1-based to 0-based; allowing for trivial # sequences; allowing for wrap-around; returning the "tossed" # and "added" elements; starting the generation at an # arbitrary k-combination; and sharing a finger (self.j) with # the getlex method. indices, n, k, j = self.indices, self.n, self.k, self.j if self.firstcall: self.firstcall = 0 return indices, None, None # Slide over to first slot that *may* be able to move down. # Note that this leaves odd j alone (including -1!), and may # make j equal to k. j = j | 1 if j == k: # k is odd and so indices[-1] "wants to move up", and # indices[-1] < n-1 so it *can* move up. tossed = indices[-1] added = indices[-1] = tossed + 1 j = j-1 if added == n-1: j = j-1 elif j < 0: # indices has the last value in alt-lex order, e.g. # [4, 5, 6, 7]; wrap around to the first value, e.g. # [0, 5, 6, 7]. assert indices == range(n-k, n) if k and indices[0]: tossed = indices[0] added = indices[0] = 0 j = 0 else: # comb(n, k) is 1 -- this is a trivial sequence. tossed = added = None else: # 0 < j < k (note that 0 < j because j is odd). # Want to move this slot down (again because j is odd). atj = indices[j] if indices[j-1] + 1 == atj: # can't move it down; move preceding up tossed = atj - 1 # the value in indices[j-1] indices[j-1] = atj added = indices[j] = n-k+j j = j-1 if atj + 1 == added: j = j-1 else: # can move it down tossed = atj added = indices[j] = atj - 1 if j+1 < k: tossed = indices[j+1] indices[j+1] = atj j = j+1 self.j = j # assert j == self.__findj(indices) return indices, tossed, added def set_start(self, start): """Force .getlex() or .getgray() to start at given value. start is a vector of k unique integers in range(n), where k and n were passed to the CombGenBasic constructor. The vector is sorted in increasing order, and is used as the the next k-combination to be returned by .getlex() or .getgray(). >>> gen = CombGenBasic(3, 2) >>> for i in range(4): ... print gen.getgray() ([0, 1], None, None) ([1, 2], 0, 2) ([0, 2], 1, 0) ([0, 1], 2, 1) >>> gen.set_start([0, 2]) >>> for i in range(4): ... print gen.getgray() ([0, 2], None, None) ([0, 1], 2, 1) ([1, 2], 0, 2) ([0, 2], 1, 0) """ if len(start) != self.k: raise ValueError("start vector not of length " + `k`) indices = start[:] indices.sort() seen = {} # Verify the vector makes sense. for i in indices: if not 0 <= i < self.n: raise ValueError("start vector contains element " "not in 0.." + `self.n-1` + ": " + `i`) if seen.has_key(i): raise ValueError("start vector contains duplicate " "element: " + `i`) seen[i] = 1 self.indices = indices self.j = self.__findj(indices) self.firstcall = 1 def getrand(self, random=random.random): """Return a k-combination at random. Optional arg random specifies a no-argument function that returns a random float in [0., 1.). By default, random.random is used. """ # The trap to avoid is doing O(n) work when k is much less # than n. Letting m = min(k, n-k), we actually do Python work # of O(m), and C-level work of O(m log m) for a sort. In # addition, O(k) work is required to build the final result, # but at worst O(m) of that work is done at Python speed. n, k = self.n, self.k complement = 0 if k > n/2: # Generate the values *not* in the combination. complement = 1 k = n-k # Generate k distinct random values. result = {} for i in xrange(k): # The expected # of times thru the next loop is n/(n-i). # Since i < k <= n/2, n-i > n/2, so n/(n-i) < 2 and is # usually closer to 1: on average, this succeeds very # quickly! while 1: candidate = int(random() * n) if not result.has_key(candidate): result[candidate] = 1 break result = result.keys() result.sort() if complement: # We want everything in range(n) that's *not* in result. avoid = result avoid.append(n) result = [] start = 0 for limit in avoid: result.extend(range(start, limit)) start = limit + 1 return result class CombGen: def __init__(self, seq, k): n = len(seq) if not 0 <= k <= n: raise ValueError("k must be in 0.." + `n` + ": " + `k`) self.seq = seq self.base = CombGenBasic(n, k) def reset(self): """Restore state to that immediately after construction.""" self.base.reset() def getlex(self): """Return next (in lexicographic index order) k-combination. Return None if all possibilities have been generated. """ indices = self.base.getlex() if indices is None: return None else: return self.__indices2seq(indices) def getgray(self): """Return next (c, tossed, added) triple. c is the next k-combination in a particular Gray code order. tossed is the element of s removed from the last combination. added is the element of s added to the last combination. Caution: getgray wraps around if you invoke it more than comb(len(s), k) times. """ indices, tossed, added = self.base.getgray() if tossed is None: return (self.__indices2seq(indices), None, None) else: return (self.__indices2seq(indices), self.seq[tossed], self.seq[added]) def set_start(self, start): """Force .getlex() or .getgray() to start at given value. start is a vector of k unique integers in range(len(s)), where k and s were passed to the CombGen constructor. The vector is sorted in increasing order, and is used as a vector of indices (into s) for the next k-combination to be returned by .getlex() or .getgray(). >>> gen = CombGen("abc", 2) >>> for i in range(4): ... print gen.getgray() ('ab', None, None) ('bc', 'a', 'c') ('ac', 'b', 'a') ('ab', 'c', 'b') >>> gen.set_start([0, 2]) # start with "ac" >>> for i in range(4): ... print gen.getgray() ('ac', None, None) ('ab', 'c', 'b') ('bc', 'a', 'c') ('ac', 'b', 'a') >>> gen.set_start([0, 2]) # ditto >>> print gen.getlex(), gen.getlex(), gen.getlex() ac bc None """ self.base.set_start(start) def getrand(self, random=random.random): """Return a k-combination at random. Optional arg random specifies a no-argument function that returns a random float in [0., 1.). By default, random.random is used. """ return self.__indices2seq(self.base.getrand(random)) def __indices2seq(self, ivec): assert len(ivec) == self.base.k, "else internal error" seq = self.seq result = seq[0:0] # an empty sequence of the proper type for i in ivec: result = result + seq[i:i+1] return result del random ##################################################################### # Testing. ##################################################################### def _verifycomb(n, k, comb, inbase, baseobj=None): if len(comb) != k: print "OUCH!", this, "should have length", k # verify it's an increasing sequence of baseseq elements lastelt = None for elt in comb: if not inbase(elt): print "OUCH!", elt, "not in base seqeuence", n, k, comb if not lastelt < elt: print "OUCH!", elt, ">=", lastelt, n, k, comb lastelt = elt if baseobj: # verify search finger is correct cachedj = baseobj.j truej = baseobj._CombGenBasic__findj(baseobj.indices) if cachedj != truej: print "OUCH! cached j", cachedj, "!= true j", truej, \ n, k, comb def _testnk_gray(n, k): start = "abcdefghijklmnopqrstuvwxyz"[:n] def inbase(elt, start=start): return elt in start o = CombGen(start, k) c = comb(n, k) seen = {} last, lastlist = None, None for i in xrange(c+1): this, tossed, added = o.getgray() _verifycomb(n, k, this, inbase, o.base) if seen.has_key(this) and i < c: print "OUCH!", this, "seen before at", seen[this], n, k seen[this] = i thislist = list(this) if (tossed is None) != (added is None): print "OUCH! tossed and added None clash", tossed, \ added, n, k, last, this if last is None: last, lastlist = this, thislist continue if tossed is not None: if tossed == added: print "OUCH! tossed == added", tossed, added, \ n, k, last, this lastlist.remove(tossed) lastlist.append(added) lastlist.sort() elif c != 1: print "OUCH! tossed None but comb(n, k) not 1", \ c, tossed, added, n, k, last, this if lastlist != thislist: print "OUCH! does not compute", n, k, tossed, added, \ last, this last, lastlist = this, thislist if last != start[:k]: print "OUCH! didn't wrap around", n, k, last, this # getgray is especially delicate, so hammer on it. def _testgray(): """ >>> _testgray() testing getgray 0 testing getgray 1 testing getgray 2 testing getgray 3 testing getgray 4 testing getgray 5 testing getgray 6 testing getgray 7 testing getgray 8 testing getgray 9 testing getgray 10 testing getgray 11 testing getgray 12 """ for n in range(13): print "testing getgray", n for k in range(n+1): _testnk_gray(n, k) # getlex is easier. def _testnk_lex(n, k): start = "abcdefghijklmnopqrstuvwxyz"[:n] def inbase(elt, start=start): return elt in start o = CombGen(start, k) c = comb(n, k) last = None for i in xrange(c): this = o.getlex() _verifycomb(n, k, this, inbase, o.base) if not last < this: print "OUCH! not lexicographic", last, this, n, k last = this this = o.getlex() if this is not None: print "OUCH! should have returned None", n, k, this def _testlex(): """ >>> _testlex() testing getlex 0 testing getlex 1 testing getlex 2 testing getlex 3 testing getlex 4 testing getlex 5 testing getlex 6 testing getlex 7 testing getlex 8 """ for n in range(9): print "testing getlex", n for k in range(n+1): _testnk_lex(n, k) import math _math = math del math # This is a half-assed implementation, prone to overflow and/or # underflow given "large" x or v. If they're both <= a few hundred, # though, it's quite accurate. The main advantage is that it's # self-contained. def _chi_square_distrib(x, v): """x, v -> return probability that chi-square statistic <= x. v is the number of degrees of freedom, an integer >= 1. x is a non-negative float or int. """ if x < 0: raise ValueError("x must be >= 0: " + `x`) if v < 1: raise ValueError("v must be >= 1: " + `v`) if v != int(v): raise TypeError("v must be an integer: " + `v`) if x == 0: return 0.0 # (x/2)**(v/2) / gamma((v+2)/2) * exp(-x/2) * # (1 + sum(i=1 to inf, x**i/prod(j=1 to i, v+2*j))) # Alas, for even moderately large x or v, this is numerically # intractable. But the mean of the distribution is v, so in # practice v will likely be "close to" x. Rewrite the first # line as # (x/2/e)**(v/2) / gamma((v+2)/2) * exp(v/2-x/2) # Now exp is much less likely to over or underflow. The power is # still a problem, though, so we compute # (x/2/e)**(v/2) / gamma((v+2)/2) # via repeated multiplication. x = float(x) a = x / 2 / _math.exp(1) v = float(v) v2 = v/2 if int(v2) * 2 == v: # v is even base = 1.0 i = 1.0 else: # v is odd, so the gamma bottoms out at gamma(.5) = sqrt(pi), # and we need to get a sqrt(a) factor into the numerator # (since v2 "ends with" .5). base = 1.0 / _math.sqrt(a * _math.pi) i = 0.5 while i <= v2: base = base * (a / i) i = i + 1.0 base = base * _math.exp(v2 - x/2) # Now do the infinite sum. oldsum = None sum = base while oldsum != sum: oldsum = sum v = v + 2.0 base = base * (x / v) sum = sum + base return sum def _chisq(observed, expected): n = len(observed) assert n == len(expected) sum = 0.0 for i in range(n): e = float(expected[i]) sum = sum + (observed[i] - e)**2 / e return sum, _chi_square_distrib(sum, n-1) def _testrand(): """ >>> _testrand() random 0 combs of abcde 100 random 1 combs of abcde a 99 b 106 c 98 d 99 e 98 probability[chisq <= 0.46] = 0.0227 random 2 combs of abcde ab 100 ac 115 ad 111 ae 98 bc 98 bd 103 be 95 cd 84 ce 100 de 96 probability[chisq <= 6.6] = 0.321 random 3 combs of abcde abc 83 abd 119 abe 86 acd 88 ace 103 ade 94 bcd 107 bce 101 bde 112 cde 107 probability[chisq <= 12.78] = 0.827 random 4 combs of abcde abcd 86 abce 99 abde 113 acde 101 bcde 101 probability[chisq <= 3.68] = 0.549 random 5 combs of abcde abcde 100 """ def drive(s, k): print "random", k, "combs of", s o = CombGen(s, k) g = o.getrand n = len(s) def inbase(elt, s=s): return elt in s count = {} c = comb(len(s), k) for i in xrange(100 * c): x = g() _verifycomb(n, k, x, inbase) count[x] = count.get(x, 0) + 1 items = count.items() items.sort() for x, i in items: print x, i if c > 1: observed = count.values() if len(observed) < c: observed.extend([0] * (c - len(observed))) x, p = _chisq(observed, [100]*c) print "probability[chisq <= %g] = %.3g" % (x, p) for k in range(6): drive("abcde", k) __test__ = {"_testgray": _testgray, "_testlex": _testlex, "_testrand": _testrand} def _test(): import doctest, combgen doctest.testmod(combgen) if __name__ == "__main__": _test() --Boundary_(ID_K+WtxWU4+owRK5lyzEc5lw)-- From guido@python.org Wed Aug 28 20:41:00 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 28 Aug 2002 15:41:00 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: Your message of "Wed, 28 Aug 2002 15:34:11 EDT." References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> <200208281758.g7SHwII01131@odiug.zope.com> Message-ID: <200208281941.g7SJf0V04475@odiug.zope.com> > Of course, even statisticians cannot afford all subsets of sets having > 100 elements, or will not scan permutations of 1000 elements. But 100s > or 1000s elements are well within the bounds of cartesian products. Yes, and there the cost of an extra list() copy is neglegeable (allocate and copy 4K bytes). > > Or you might use Oren's favorite rule of thumb and listify it when iter(x) > > is iter(x) (or iter(x) is x). > > I'm a bit annoyed by the idea that `iter(x)' might require some computation > for producing an iterator, and that we immediately throw away the result. > Granted that `__iter__(self): return self' is efficient when an object is > an iterator, but nowhere it is said that `__iter__' has to be efficient > when the object is a container, and it does not shock me that some complex > containers require time to produce their iterator. I much prefer limiting > the use of `__iter__' for when one intends to use the iterator... Yes, that's why I prefer to just make a list() copy. > > def powerset(base): > > """Powerset of an iterable, yielding lists.""" > > pairs = [(2**i, x) for i, x in enumerate(base)] > > for n in xrange(2**len(pairs)): > > yield [x for m, x in pairs if m&n] > > Thanks! Hmph! This does not yield the subsets in "sorted order", like > the other `cogen' methods do, and I would prefer to keep that promise. That may be a matter of permuting the bits? > Hopefully, the optimisation I added this morning will make both algorithms > more comparable, speed-wise. I should benchmark them to see. Yes! Nothing beats a benchmark as an eye-opener. --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@mems-exchange.org Wed Aug 28 20:42:48 2002 From: gward@mems-exchange.org (Greg Ward) Date: Wed, 28 Aug 2002 15:42:48 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: References: Message-ID: <20020828194248.GA16407@cthulhu.gerg.ca> On 28 August 2002, Tim Peters said: > What's an acceptable false positive rate? Speaking as one of the people who reviews suspected spam for python.org and rescues false positives, I would say that the more relevant figure is: how much suspected spam do I have to review every morning? < 10 messages would be peachy; right now it's around 5-20 messages per day. Currently there are probably 1-3 FPs per day, although on a bad day there can be 5-10. (Eg. on 2002-08-21, six mailman-users posts from the same guy were all caught, mainly because his ISP added X-AntiAbuse, and his messages were multipart/alternative with unwrapped plain text. This is a perfect example of SpamAssassin screwing up royally.) 1-3 FPs/day I can live with, but the real burden is the manual review: I'd much rather have 5 FPs in a pool of 10 suspects than 1 FP out of 100 suspects. > What do we get from SpamAssassin? Recall the stats I posted this morning; the bulk of spam is in Chinese or Korean, and I have things setup so SpamAssassin never even sees it. I think the only way to meaningfully answer this question is to stash *everything* mail.python.org receives for a day or 10, spam and otherwise, and run it all through SA. Greg From pg@archub.org Wed Aug 28 20:47:44 2002 From: pg@archub.org (Paul Graham) Date: 28 Aug 2002 19:47:44 -0000 Subject: [Python-Dev] The first trustworthy GBayes results Message-ID: <20020828194744.35268.qmail@mail.archub.org> Don't count words multiple times, and you'll probably get fewer false positives. That's the main reason I don't do it-- because it magnifies the effect of some random word like water happening to have a big spam probability. (Incidentally, why so high? In my db it's only 0.3930784.) --pg --Tim Peters wrote: > FYI. After cleaning the blatant spam identified by the classifier out of my > ham corpus, and replacing it with new random msgs from Barry's corpus, the > reported false positive rate fell to about 0.2% (averaging 8 per each batch > of 4000 ham test messages). This seems remarkable given that it's ignoring > headers, and just splitting the raw text on whitespace in total ignorance of > HTML & MIME etc. > > 'FREE' (all caps) moved into the ranks of best spam indicators. The false > negative rate got reduced by a small amount, but I doubt it's a > statistically significant reduction (I'll compute that stuff later; I'm > looking for Big Things now). > > Some of these false positives are almost certainly spam, and at least one is > almost certainly a virus: these are msgs that are 100% base64-encoded, or > maximally obfuscated quoted-printable. That could almost certainly be fixed > by, e.g., decoding encoded text. > > The other false positives seem harder to deal with: > > + Brief HMTL msgs from newbies. I doubt the headers will help these > get through, as they're generally first-time posters, and aren't > replies to earlier msgs. There's little positive content, while > all elements of raw HTML have high "it's spam" probability. > > Example: > > """ > --------------=_4D4800B7C99C4331D7B8 > Content-Description: filename="text1.txt" > Content-Type: text/plain; charset=ISO-8859-1 > Content-Transfer-Encoding: quoted-printable > > Is there a version of Python with Prolog Extension?? > Where can I find it if there is? > > Thanks, > Luis. > > P.S. Could you please reply to the sender too. > > > --------------=_4D4800B7C99C4331D7B8 > Content-Description: filename="text1.html" > Content-Type: text/html > Content-Transfer-Encoding: quoted-printable > > > > > Prolog Extension > > > > > > >
Is there a version of Python with Prolog Extension??
> Where can I find it if there is?
> 
> Thanks,
> Luis.
> 
> P.S. Could you please reply to the sender too.
> > > > --------------=_4D4800B7C99C4331D7B8--""" > """ > > Here's how it got scored: > > prob = 0.999958816093 > prob(' prob(' prob(' prob('') = 0.979284 > prob('Prolog') = 0.01 > prob('') = 0.97989 > prob('Thanks,') = 0.0337316 > prob('Prolog') = 0.01 > prob('Python') = 0.01 > prob('NAME=3D"GENERATOR"') = 0.99 > prob('') = 0.99 > prob('') = 0.989494 > prob('') = 0.987429 > prob('Thanks,') = 0.0337316 > prob('Python') = 0.01 > > Note that ' > + Msgs talking *about* HTML, and including HTML in examples. This one > may be troublesome, but there are mercifully few of them. > > + Brief msgs with obnoxious employer-generated signatures. Example: > > """ > Hi there, > > I am looking for you recommendations on training courses available in the UK > on Python. Can you help? > > Thanks, > > Vickie Mills > IS Training Analyst > > Tel: 0131 245 1127 > Fax: 0131 245 1550 > E-mail: vickie_mills@standardlife.com > > For more information on Standard Life, visit our website > http://www.standardlife.com/ The Standard Life Assurance Company, Standard > Life House, 30 Lothian Road, Edinburgh EH1 2DH, is registered in Scotland > (No SZ4) and regulated by the Personal Investment Authority. Tel: 0131 225 > 2552 - calls may be recorded or monitored. This confidential e-mail is for > the addressee only. If received in error, do not retain/copy/disclose it > without our consent and please return it to us. We virus scan all e-mails > but are not responsible for any damage caused by a virus or alteration by a > third party after it is sent. > """ > > The scoring: > > prob = 0.98654879055 > prob('our') = 0.928936 > prob('sent.') = 0.939891 > prob('Tel:') = 0.0620155 > prob('Thanks,') = 0.0337316 > prob('received') = 0.940256 > prob('Tel:') = 0.0620155 > prob('Hi') = 0.0533333 > prob('help?') = 0.01 > prob('Personal') = 0.970976 > prob('regulated') = 0.99 > prob('Road,') = 0.01 > prob('Training') = 0.99 > prob('e-mails') = 0.987542 > prob('Python.') = 0.01 > prob('Investment') = 0.99 > > The brief human-written part is fine, but the longer boilerplate sig is > indistinguishable from spam. > > + The occassional non-Python conference announcement(!). These are > long, so I'll skip an example. In effect, it's automated bulk email > trying to sell you a conference, so is prone to use the language and > artifacts of advertising. Here's typical scoring, for the TOOLS > Europe '99 conference announcement: > > prob = 0.983583974285 > prob('THE') = 0.983584 > prob('Object') = 0.01 > prob('Bell') = 0.01 > prob('Object-Oriented') = 0.01 > prob('**************************************************************') = > 0.99 > prob('Bertrand') = 0.01 > prob('Rational') = 0.01 > prob('object-oriented') = 0.01 > prob('CONTACT') = 0.99 > prob('**************************************************************') = > 0.99 > prob('innovative') = 0.99 > prob('**************************************************************') = > 0.99 > prob('Olivier') = 0.01 > prob('VISIT') = 0.99 > prob('OUR') = 0.99 > > Note the repeated penalty for the lines of asterisks. That segues into the > next one: > > + Artifacts of that the algorithm counts multiples instances of "a word" > multiple times. These are baffling at first sight! The two clearest > examples: > > """ > > > Can you create and use new files with dbhash.open()? > > > > Yes. But if I run db_dump on these files, it says "unexpected file type > > or format", regardless which db_dump version I use (2.0.77, 3.0.55, > > 3.1.17) > > > > It may be that db_dump isn't compatible with version 1.85 databse files. I > can't remember. I seem to recall that there was an option to build 1.85 > versions of db_dump and db_load. Check the configure options for > BerkeleyDB to find out. (Also, while you are there, make sure that > BerkeleyDB was built the same on both of your platforms...) > > > > > > > Try running db_verify (one of the utilities built > > > when you compiled DB) on the file and see what it tells you. > > > > There is no db_verify among my Berkeley DB utilities. > > There should have been a bunch of them built when you compiled DB. I've got > these: > From oren-py-d@hishome.net Wed Aug 28 21:11:39 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Wed, 28 Aug 2002 16:11:39 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: <200208281941.g7SJf0V04475@odiug.zope.com> References: <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> <200208281758.g7SHwII01131@odiug.zope.com> <200208281941.g7SJf0V04475@odiug.zope.com> Message-ID: <20020828201139.GA96403@hishome.net> On Wed, Aug 28, 2002 at 03:41:00PM -0400, Guido van Rossum wrote: > > > Or you might use Oren's favorite rule of thumb and listify it when iter(x) > > > is iter(x) (or iter(x) is x). > > > > I'm a bit annoyed by the idea that `iter(x)' might require some computation > > for producing an iterator, and that we immediately throw away the result. > > Granted that `__iter__(self): return self' is efficient when an object is > > an iterator, but nowhere it is said that `__iter__' has to be efficient > > when the object is a container, and it does not shock me that some complex > > containers require time to produce their iterator. I much prefer limiting > > the use of `__iter__' for when one intends to use the iterator... > > Yes, that's why I prefer to just make a list() copy. Oh, come on you two... stop beating up the poor strawman. The reiter() function doesn't make a single redundant call to __iter__. It's just like iter() but ensures in passing that the result is really a fresh iterator. Oren From guido@python.org Wed Aug 28 21:19:11 2002 From: guido@python.org (Guido van Rossum) Date: Wed, 28 Aug 2002 16:19:11 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: Your message of "Wed, 28 Aug 2002 16:11:39 EDT." <20020828201139.GA96403@hishome.net> References: <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> <200208281758.g7SHwII01131@odiug.zope.com> <200208281941.g7SJf0V04475@odiug.zope.com> <20020828201139.GA96403@hishome.net> Message-ID: <200208282019.g7SKJBQ06414@odiug.zope.com> > Oh, come on you two... stop beating up the poor strawman. The reiter() > function doesn't make a single redundant call to __iter__. It's just > like iter() but ensures in passing that the result is really a fresh > iterator. I'm really sorry. I had forgotten what exactly your proposal was. It is actually very reasonable: Proposal: new built-in function reiter() def reiter(obj): """reiter(obj) -> iterator Get an iterator from an object. If the object is already an iterator a TypeError exception will be raised. For all Python built-in types it is guaranteed that if this function succeeds the next call to reiter() will return a new iterator that produces the same items unless the object is modified. Non-builtin iterable objects which are not iterators SHOULD support multiple iteration returning the same items.""" it = iter(obj) if it is obj: raise TypeError('Object is not re-iterable') return it --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Wed Aug 28 21:30:11 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 28 Aug 2002 16:30:11 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: References: <200208281758.g7SHwII01131@odiug.zope.com> Message-ID: <15725.13011.164341.365821@slothrop.zope.com> --rquOkQCFkL Content-Type: text/plain; charset=us-ascii Content-Description: message body text Content-Transfer-Encoding: 7bit TP> + The ability to override the random number generator. Python's TP> default WH generator is showing its age as machines get TP> faster; it's simply not adequate anymore for long-running TP> programs making heavy use of it on a fast box. Combinatorial TP> algorithms in particular do tend to make heavy use of it. TP> (Speaking of which, "someone" should look into grabbing one of TP> the Mersenne Twister extensions for Python -- that's the TP> current state of *that* art). The last time we talked about random number generation, I remember finding a tiny algorithm by Pierre L'Ecuyer based on a recommendation from Luc Devroye. (That's a good pedigree!) Here's an almost equally tiny C extension that wraps up the algorithm. We should do a real test of it. Last time I checked, it wasn't obvious how to actually run the DIEHARD tests. Jeremy --rquOkQCFkL Content-Type: application/octet-stream Content-Disposition: attachment; filename="plerandommodule.c" Content-Transfer-Encoding: base64 I2luY2x1ZGUgIlB5dGhvbi5oIgoKUHlEb2NfU1RSVkFSKHBsZXJhbmRvbV9kb2MsCiJBIHVu aWZvcm0gWzAsIDFdIHJhbmRvbSBudW1iZXIgZ2VuZXJhdG9yLlxuIgoiXG4iCiJUaGUgYWxn b3JpdGhtIHdhcyBkZXZlbG9wZWQgYnkgUGllcnJlIExlY3V5ZXIgYmFzZWQgb24gYSBjbGV2 ZXJcbiIKImFuZCB0ZXN0ZWQgY29tYmluYXRpb24gb2YgdHdvIGxpbmVhciBjb25ncnVlbnRp YWwgc2VxdWVuY2VzLlxuIgoiXG4iCiJMdWMgRGV2cm95ZSB3cml0ZXM6IEkgZ2V0IGZyZXF1 ZW50bHkgYXNrZWQgZm9yIGEgZ29vZCByZWxpYWJsZVxuIgoidW5pZm9ybSByYW5kb20gbnVt YmVyIGdlbmVyYXRvci4gVGhlcmUgaXMgbm8gc3VjaCB1bml2ZXJzYWwgYmVhc3QsXG4iCiJi dXQgdGhlIGxpbmsgW3RvIGEgc21hbGwgQyBpbXBsZW1lbnRhdGlvbl0gYWJvdmUgbGV0cyB5 b3UgZG93bmxvYWRcbiIKImEgaGlnaCBxdWFsaXR5IGdlbmVyYXRvci5cbiIKIlxuIgoiaHR0 cDovL3d3dy1jZ3JsLmNzLm1jZ2lsbC5jYS9+bHVjL2xlY3V5ZXIuYyAgICBcbiIKIlxuIik7 Cgp0eXBlZGVmIHN0cnVjdCB7CiAgICAgICAgUHlPYmplY3RfSEVBRAogICAgICAgIGxvbmcg czEsIHMyOwp9IEdlbmVyYXRvck9iamVjdDsKClB5RG9jX1NUUlZBUihwbGVyYW5kb21fcmFu ZG9tX2RvYywKIkdldCB0aGUgbmV4dCByYW5kb20gbnVtYmVyIGluIHRoZSByYW5nZSBbMC4w LCAxLjApLlxuIik7CgovKiBXZSBkb24ndCBuZWVkIGEgbG9jayBmb3IgcmFuZG9tKCkgdGhl IHdheSB3aHJhbmRvbSBkb2VzLCBiZWNhdXNlIHRoaXMKICAgaXNuJ3QgZXhlY3V0aW5nIGFu eSBQeXRob24gY29kZS4gIEdJTCB0byB0aGUgcmVzY3VlIGFnYWluLgoqLwoKc3RhdGljIFB5 T2JqZWN0ICoKcGxlcmFuZG9tX3JhbmRvbShQeU9iamVjdCAqc2VsZikKewoJc3RhdGljIGRv dWJsZSBmYWN0b3IgPSAxLjAvMjE0NzQ4MzU2My4wOwoJcmVnaXN0ZXIgbG9uZyBrLHo7CglH ZW5lcmF0b3JPYmplY3QgKmcgPSAoR2VuZXJhdG9yT2JqZWN0ICopc2VsZjsKCWsgPSBnLT5z MSAvIDUzNjY4OwoJZy0+czEgPSA0MDAxNCAqIChnLT5zMSAlIDUzNjY4KSAtIGsgKiAxMjIx MTsKCWlmIChnLT5zMSA8IDApIAoJCWctPnMxICs9IDIxNDc0ODM1NjM7CglrID0gZy0+czIg LyA1Mjc3NDsKCWctPnMyID0gNDA2OTIgKiAoZy0+czIgJSA1Mjc3NCkgLSBrICogMzc5MTsK CWlmIChnLT5zMiA8IDApIAoJCWctPnMyICs9IDIxNDc0ODMzOTk7CgoJLyoKCXogPSBhYnMo czEgXiBzMik7CgkqLwoJeiA9IChnLT5zMSAtIDIxNDc0ODM1NjMpICsgZy0+czI7CglpZiAo eiA8IDEpIAoJCXogKz0gMjE0NzQ4MzU2MjsKCQkKCXJldHVybiBQeUZsb2F0X0Zyb21Eb3Vi bGUoKChkb3VibGUpKHopKSpmYWN0b3IpOwp9CgpQeURvY19TVFJWQVIocGxlcmFuZG9tX3Nl ZWRfZG9jLAoiU2V0IHRoZSBzZWVkLlxuIik7CgpzdGF0aWMgUHlPYmplY3QgKgpwbGVyYW5k b21fc2VlZChQeU9iamVjdCAqc2VsZiwgUHlPYmplY3QgKmFyZ3MpCnsKICAgICAgICBpbnQg czE7CiAgICAgICAgaW50IHMyOwogICAgICAgIGlmICghUHlBcmdfUGFyc2VUdXBsZShhcmdz LCAiaWk6c2VlZCIsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICZzMSwgJnMyKSkK ICAgICAgICAgICAgICAgIHJldHVybiBOVUxMOwoKCSgoR2VuZXJhdG9yT2JqZWN0ICopc2Vs ZiktPnMxID0gczE7CgkoKEdlbmVyYXRvck9iamVjdCAqKXNlbGYpLT5zMSA9IHMyOwoKCVB5 X0lOQ1JFRihQeV9Ob25lKTsKCXJldHVybiBQeV9Ob25lOwp9CgpzdGF0aWMgaW50CnBsZXJh bmRvbV9pbml0KFB5T2JqZWN0ICpzZWxmLCBQeU9iamVjdCAqYXJncykKewoJR2VuZXJhdG9y T2JqZWN0ICpnID0gKEdlbmVyYXRvck9iamVjdCAqKXNlbGY7Cglsb25nIHMxID0gNTU1NTUs IHMyID0gOTk5OTk7CgkKCWlmICghUHlBcmdfUGFyc2VUdXBsZShhcmdzLCAifGlpOnBsZXJh bmRvbSIsIHMxLCBzMikpCgkJcmV0dXJuIDA7CglnLT5zMSA9IHMyOwoJZy0+czIgPSBzMjsK CXJldHVybiAxOwp9CgpzdGF0aWMgc3RydWN0IFB5TWV0aG9kRGVmIHBsZXJhbmRvbV9tZXRo b2RzW10gPSB7Cgl7InJhbmRvbSIsIChQeUNGdW5jdGlvbilwbGVyYW5kb21fcmFuZG9tLCBN RVRIX05PQVJHUywKICAgICAgICAgcGxlcmFuZG9tX3JhbmRvbV9kb2N9LAoJeyJzZWVkIiwg KFB5Q0Z1bmN0aW9uKXBsZXJhbmRvbV9zZWVkLCBNRVRIX1ZBUkFSR1MsCiAgICAgICAgIHBs ZXJhbmRvbV9zZWVkX2RvY30sCgl7TlVMTCwgTlVMTH0KfTsKCnN0YXRpYyBQeVR5cGVPYmpl Y3QgUHlwbGVyYW5kb21fVHlwZSA9IHsKICAgICAgICBQeU9iamVjdF9IRUFEX0lOSVQoMCkK ICAgICAgICAwLAkJCQkvKiBvYl9zaXplICovCiAgICAgICAgInBsZXJhbmRvbS5wbGVyYW5k b20iLAkvKiB0cF9uYW1lICovCiAgICAgICAgc2l6ZW9mKEdlbmVyYXRvck9iamVjdCksCS8q IHRwX2Jhc2ljc2l6ZSAqLwogICAgICAgIDAsCQkJCS8qIHRwX2l0ZW1zaXplICovCiAgICAg ICAgMCwJCQkJLyogdHBfZGVhbGxvYyAqLwogICAgICAgIDAsCQkJCS8qIHRwX3ByaW50ICov CiAgICAgICAgMCwJCQkJLyogdHBfZ2V0YXR0ciAqLwogICAgICAgIDAsCQkJCS8qIHRwX3Nl dGF0dHIgKi8KICAgICAgICAwLAkJCQkvKiB0cF9jb21wYXJlICovCiAgICAgICAgMCwJCQkJ LyogdHBfcmVwciAqLwogICAgICAgIDAsCQkJCS8qIHRwX2FzX251bWJlciAqLwogICAgICAg IDAsCQkJCS8qIHRwX2FzX3NlcXVlbmNlICovCiAgICAgICAgMCwJCQkJLyogdHBfYXNfbWFw cGluZyAqLwogICAgICAgIDAsCQkJCS8qIHRwX2hhc2ggKi8KICAgICAgICAwLAkJCQkvKiB0 cF9jYWxsICovCiAgICAgICAgMCwJCQkJLyogdHBfc3RyICovCiAgICAgICAgUHlPYmplY3Rf R2VuZXJpY0dldEF0dHIsCS8qIHRwX2dldGF0dHJvICovCiAgICAgICAgMCwJCQkJLyogdHBf c2V0YXR0cm8gKi8KICAgICAgICAwLAkJCQkvKiB0cF9hc19idWZmZXIgKi8KICAgICAgICBQ eV9UUEZMQUdTX0RFRkFVTFQgfCBQeV9UUEZMQUdTX0JBU0VUWVBFLAkJLyogdHBfZmxhZ3Mg Ki8KICAgICAgICAwLAkJCQkvKiB0cF9kb2MgKi8KICAgICAgICAwLAkJCQkvKiB0cF90cmF2 ZXJzZSAqLwogICAgICAgIDAsCQkJCS8qIHRwX2NsZWFyICovCiAgICAgICAgMCwJCQkJLyog dHBfcmljaGNvbXBhcmUgKi8KICAgICAgICAwLAkJCQkvKiB0cF93ZWFrbGlzdG9mZnNldCAq LwogICAgICAgIDAsCQkJCS8qIHRwX2l0ZXIgKi8KICAgICAgICAwLAkJCQkvKiB0cF9pdGVy bmV4dCAqLwogICAgICAgIHBsZXJhbmRvbV9tZXRob2RzLAkJLyogdHBfbWV0aG9kcyAqLwog ICAgICAgIDAsCQkJCS8qIHRwX21lbWJlcnMgKi8KICAgICAgICAwLAkJCQkvKiB0cF9nZXRz ZXQgKi8KICAgICAgICAwLAkJCQkvKiB0cF9iYXNlICovCiAgICAgICAgMCwJCQkJLyogdHBf ZGljdCAqLwogICAgICAgIDAsCQkJCS8qIHRwX2Rlc2NyX2dldCAqLwogICAgICAgIDAsCQkJ CS8qIHRwX2Rlc2NyX3NldCAqLwogICAgICAgIDAsCQkJCS8qIHRwX2RpY3RvZmZzZXQgKi8K ICAgICAgICAoaW5pdHByb2MpcGxlcmFuZG9tX2luaXQsCS8qIHRwX2luaXQgKi8KICAgICAg ICBQeVR5cGVfR2VuZXJpY0FsbG9jLAkJLyogdHBfYWxsb2MgKi8KICAgICAgICBQeVR5cGVf R2VuZXJpY05ldywJCS8qIHRwX25ldyAqLwogICAgICAgIDAsCQkJCS8qIHRwX2ZyZWUgKi8K ICAgICAgICAwLAkJCQkvKiB0cF9pc19nYyAqLwp9OwoKUHlNT0RJTklUX0ZVTkMKaW5pdHBs ZXJhbmRvbSh2b2lkKQp7CiAgICAgICAgUHlPYmplY3QgKm1vZDsKCiAgICAgICAgbW9kID0g UHlfSW5pdE1vZHVsZTMoInBsZXJhbmRvbSIsIHBsZXJhbmRvbV9tZXRob2RzLAogICAgICAg ICAgICAgICAgICAgICAgICAgICAgIHBsZXJhbmRvbV9kb2MpOwogICAgICAgIGlmIChtb2Qg PT0gTlVMTCkKICAgICAgICAgICAgICAgIHJldHVybjsKCiAgICAgICAgUHlwbGVyYW5kb21f VHlwZS5vYl90eXBlID0gJlB5VHlwZV9UeXBlOwogICAgICAgIFB5cGxlcmFuZG9tX1R5cGUu dHBfZGVhbGxvYyA9IChkZXN0cnVjdG9yKSZQeU9iamVjdF9EZWw7CiAgICAgICAgaWYgKCFQ eU9iamVjdF9TZXRBdHRyU3RyaW5nKG1vZCwgInBsZXJhbmRvbSIsCiAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgIChQeU9iamVjdCAqKSZQeXBsZXJhbmRvbV9UeXBlKSkK ICAgICAgICAgICAgICAgIHJldHVybjsKCn0K --rquOkQCFkL Content-Type: application/octet-stream Content-Disposition: attachment; filename="setup.py" Content-Transfer-Encoding: base64 ZnJvbSBkaXN0dXRpbHMuZXh0ZW5zaW9uIGltcG9ydCBFeHRlbnNpb24KZnJvbSBkaXN0dXRp bHMuY29yZSBpbXBvcnQgc2V0dXAKCnNldHVwKG5hbWU9InBsZXJhbmRvbSIsCiAgICAgIGRl c2NyaXB0aW9uPSJBIGhpZ2ggcXVhbGl0eSB1bmlmb3JtIHJhbmRvbSBudW1iZXIgZ2VuZXJh dG9yLiIsCiAgICAgIGV4dF9tb2R1bGVzID0gW0V4dGVuc2lvbihuYW1lPSJwbGVyYW5kb20i LAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgc291cmNlcz1bInBsZXJhbmRvbW1v ZHVsZS5jIl0pXQogICAgICApCgppbXBvcnQgcGxlcmFuZG9tCnIgPSBwbGVyYW5kb20ucGxl cmFuZG9tKCkKcHJpbnQgci5yYW5kb20oKQpwcmludCByLnJhbmRvbSgpCg== --rquOkQCFkL-- From pinard@iro.umontreal.ca Wed Aug 28 21:45:14 2002 From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Date: 28 Aug 2002 16:45:14 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: <200208281941.g7SJf0V04475@odiug.zope.com> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <200208280126.g7S1Q1U15876@pcp02138704pcs.reston01.va.comcast.net> <200208281758.g7SHwII01131@odiug.zope.com> <200208281941.g7SJf0V04475@odiug.zope.com> Message-ID: [Guido van Rossum] > > > def powerset(base): > > This does not yield the subsets in "sorted order", like the other > > `cogen' methods do, and I would prefer to keep that promise. > That may be a matter of permuting the bits? I looked at them: nothing evident pops up. I tried inverting, inversing, and both, in hope to trigger the sight of some magic property, to no avail. -- François Pinard http://www.iro.umontreal.ca/~pinard From tim.one@comcast.net Wed Aug 28 21:59:39 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 28 Aug 2002 16:59:39 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: <20020828194744.35268.qmail@mail.archub.org> Message-ID: [Paul Graham] > Don't count words multiple times, and you'll probably > get fewer false positives. That's the main reason I > don't do it-- because it magnifies the effect of some > random word like water happening to have a big spam > probability. Yes, that makes sense, but I'm trained not to think . Experiment will decide it (although I *expect* it's a good change, and counting multiple occurrences was obviously a factor in several of the rare false positives). If spam really is different, it should be different in several distinct ways. > (Incidentally, why so high? In my db it's only 0.3930784.) --pg I expect it's because this tokenizer *only* split on whitespace. Punctuation was left intact. So, e.g., on the Python discussion list stuff like The new approach blows it out of the water: and This is very deep water; and Then you'll take to Python like a duck takes to water! are counted as "water:" and "water;" and "water!", not as "water". The spam corpus is chock full o' "water", though: + Porn sites advertising water sports. + Assorted bottled water pitches. + Assorted "oxygenated water" pitches. + Claims of environmental friendliness explicated via stuff like "no harmful chlorine to pollute the water or air!". + Pitches for weight-loss gimmicks emphasizing that you'll really loss fat, not just reduce water retention. + Pitches for weight-loss gimmicks empphasizing that you'll reduce water retention as well as lose fat. + One repeated bizarre analogy for how a breast enlargement cream works in the way "a sponge absorbs water". + This revolutionary new flat garden hose will really cut your water bills. + Ditto this miracle new laundry tablet lets you use a fraction of the water needed by old-fashioned detergents. + Survivalist pitches often mention water in the same sentence as air and medical care. I got tired then . From pg@archub.org Wed Aug 28 22:04:46 2002 From: pg@archub.org (Paul Graham) Date: 28 Aug 2002 21:04:46 -0000 Subject: [Python-Dev] The first trustworthy GBayes results Message-ID: <20020828210446.36987.qmail@mail.archub.org> I see, if you count the punctuation as part of the token, you end up with undersized-corpus effects. Esp if you are case-sensitive too. If I were you I'd map your input down into a narrower set of tokens, or you'll get too many errors. --pg --Tim Peters wrote: > [Paul Graham] > > Don't count words multiple times, and you'll probably > > get fewer false positives. That's the main reason I > > don't do it-- because it magnifies the effect of some > > random word like water happening to have a big spam > > probability. > > Yes, that makes sense, but I'm trained not to think . Experiment will > decide it (although I *expect* it's a good change, and counting multiple > occurrences was obviously a factor in several of the rare false positives). > If spam really is different, it should be different in several distinct > ways. > > > (Incidentally, why so high? In my db it's only 0.3930784.) --pg > > I expect it's because this tokenizer *only* split on whitespace. > Punctuation was left intact. So, e.g., on the Python discussion list stuff > like > > The new approach blows it out of the water: > and > This is very deep water; > and > Then you'll take to Python like a duck takes to water! > > are counted as "water:" and "water;" and "water!", not as "water". > > The spam corpus is chock full o' "water", though: > > + Porn sites advertising water sports. > + Assorted bottled water pitches. > + Assorted "oxygenated water" pitches. > + Claims of environmental friendliness explicated via stuff like > "no harmful chlorine to pollute the water or air!". > + Pitches for weight-loss gimmicks emphasizing that you'll really > loss fat, not just reduce water retention. > + Pitches for weight-loss gimmicks empphasizing that you'll reduce > water retention as well as lose fat. > + One repeated bizarre analogy for how a breast enlargement cream > works in the way "a sponge absorbs water". > + This revolutionary new flat garden hose will really cut your water > bills. > + Ditto this miracle new laundry tablet lets you use a fraction of > the water needed by old-fashioned detergents. > + Survivalist pitches often mention water in the same sentence as > air and medical care. > > I got tired then . > From tim.one@comcast.net Wed Aug 28 22:20:02 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 28 Aug 2002 17:20:02 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: <15725.13011.164341.365821@slothrop.zope.com> Message-ID: [Jeremy Hylton] > The last time we talked about random number generation, I remember > finding a tiny algorithm by Pierre L'Ecuyer based on a recommendation > from Luc Devroye. (That's a good pedigree!) Here's an almost equally > tiny C extension that wraps up the algorithm. > > We should do a real test of it. Last time I checked, it wasn't > obvious how to actually run the DIEHARD tests. It still isn't, but DIEHARD is likely obsolete now. Testing for randomness has become a full-blown science in its own right. Your government is happy to give you a bunch of more modern randomness tests developed on a Sun, complete with a multi-hundred page testing manual every word of which is vitally important <0.8 wink>: http://csrc.nist.gov/rng/ Note that the Mersenne Twister is likely substantially faster than the little C program (e.g, it doesn't need division, and on some platforms is reported to be faster than the uselessly simple-minded C rand()), is provably equi-distributed through 623 dimensions (linear congruential generators are damned luck to get 6), has a period of nearly 2**20000, and is probably the most widely tested generator in existence now. Knuth was reported as saying "well, I guess that about wraps it up for random number generation!", although I'd be more likely to believe L'Ecuyer or Marsaglia on this particular topic . From tim.one@comcast.net Wed Aug 28 22:41:43 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 28 Aug 2002 17:41:43 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: <20020828210446.36987.qmail@mail.archub.org> Message-ID: [Paul Graham] > I see, if you count the punctuation as part of the > token, you end up with undersized-corpus effects. > Esp if you are case-sensitive too. If I were you > I'd map your input down into a narrower set of tokens, > or you'll get too many errors. --pg Possibly, but that's for experiment to decide (along with many other variations). The initial tokenization method was chosen merely for speed. Still, I looked at every false positive across 80,000 presumed non-spam test inputs, and posted the results earlier: it's hard to imagine that ignoring punctuation and/or case would have stopped any of them except for this one (which is darned hard to care about ): """ HEY DUDEZ ! I WANT TO GET INTO THIS AUTOCODING THING. ANYONE KNOW WHERE I CAN GET SOME IBM 1401 WAREZ ? -- MULTICS-MAN """ prob = 0.999982095931 prob('AUTOCODING') = 0.2 prob('THING.') = 0.2 prob('DUDEZ') = 0.2 prob('ANYONE') = 0.884211 prob('GET') = 0.847334 prob('GET') = 0.847334 prob('HEY') = 0.2 prob('--') = 0.0974729 prob('KNOW') = 0.969697 prob('THIS') = 0.953191 prob('?') = 0.0490886 prob('WANT') = 0.99 prob('TO') = 0.988829 prob('CAN') = 0.884211 prob('WAREZ') = 0.2 I also noted earlier that FREE (all caps) is now one of the 15 words that most often makes it into the scorer's best-15 list, and cutting the legs off a clue like that is unattractive on the face of it. So I'm loathe to fold case unless experiment proves that's an improvement, and it just doesn't look likely to do so. For smaller corpora, some other conclusion may well be justified; but experimenting on smaller corpora isn't on my near-term agenda, so that will have to wait (we've got a specific application in mind right now for which the copora size I'm using is actually tiny -- python.org hosts some very high-volume mailing lists). From Jack.Jansen@oratrix.com Wed Aug 28 23:13:18 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Thu, 29 Aug 2002 00:13:18 +0200 Subject: [Python-Dev] Sourceforge CVS repository behaving strange? Message-ID: <5B6E3068-BAD3-11D6-8D51-003065517236@oratrix.com> Either my brain needs to be pushed into gear or the CVS repository is behaving very strange. About half an hour ago I added and checked in a file Mac/OSX/setupDocs.py, but after I did another cvs update the file immediately disappeared with the message that it was no longer in the repository. And if I check the repository over the web (at http://cvs.sourceforge.net/cgi- bin/viewcvs.cgi/python/python/dist/src/Mac/OSX/) the file is indeed gone. Moreover, if I look there the other files (such as the Makefile) *also* seem to be gone. But cvs update doesn't think they're gone, and doing a fresh checkout also makes them appear as they should (but it does not revive setupDocs.py:-(). Hmm, together with checking in setupDocs.py I also added an entry to Misc/ACKS, lemme check... No Misc/ACKS in viewcvs... Actually, there are no files at all in viewcvs, just directories! I am now utterly confused, can someone shed light on this? -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From Jack.Jansen@oratrix.com Wed Aug 28 23:22:57 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Thu, 29 Aug 2002 00:22:57 +0200 Subject: [Python-Dev] Sourceforge CVS repository behaving strange? In-Reply-To: <5B6E3068-BAD3-11D6-8D51-003065517236@oratrix.com> Message-ID: On donderdag, augustus 29, 2002, at 12:13 , Jack Jansen wrote: > Either my brain needs to be pushed into gear or the CVS > repository is behaving very strange. > > About half an hour ago I added and checked in a file > Mac/OSX/setupDocs.py, but after I did another cvs update the > file immediately disappeared with the message that it was no > longer in the repository. When I re-checked my console output I noticed what seems to have actually happened, but I'm still at loss for an explanation. It seems cvs added the file in the wrong location. In Mac/OSX, I did cvs add setupDocs.py cvs commit ../../Misc/ACKS setupDocs.py The commit message produced the following output: Checking in ../../Misc/ACKS; /cvsroot/python/python/dist/src/Misc/ACKS,v <-- ACKS new revision: 1.199; previous revision: 1.198 done RCS file: /cvsroot/python/python/setupDocs.py,v done Checking in setupDocs.py; /cvsroot/python/python/setupDocs.py,v <-- setupDocs.py initial revision: 1.1 done So, it somehow added my file at the root of the repository! Anyway, I've fixed it, so the only thing that remains is a baffled look on my face as to why it did this, and also as to why viewcvs acts funny. -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From tim.one@comcast.net Thu Aug 29 01:39:36 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 28 Aug 2002 20:39:36 -0400 Subject: [Python-Dev] Sourceforge CVS repository behaving strange? In-Reply-To: Message-ID: [Jack Jansen] > ... > Anyway, I've fixed it, so the only thing that remains is a > baffled look on my face as to why it did this, and also as to > why viewcvs acts funny. Check the state of the Show only files with tag dropdown box at the bottom of the ViewCVS page? It that's set to a funny value, you'll see funny things . From tim.one@comcast.net Thu Aug 29 02:19:38 2002 From: tim.one@comcast.net (Tim Peters) Date: Wed, 28 Aug 2002 21:19:38 -0400 Subject: [Python-Dev] RE: The first trustworthy GBayes results In-Reply-To: <20020828132716.GA12475@cthulhu.gerg.ca> Message-ID: [Greg Ward] > One of the other perennial-seeming topics on spamassassin-devel (a list > that I follow only sporodically) is that careful manual cleaning of your > corpus is *essential*. The concern of the main SA developers is that > spam in your non-spam folder (and vice-versa) will prejudice the genetic > algorithm that evolves SA's scores in the wrong direction. Gut instinct > tells me the Bayesian approach ought to be more robust against this sort > of thing, but even it must have a breaking point at which misclassified > messages throw off the probabilities. Like all other questions , this can be quantified if someone is willing to do the grunt work of setting up, running, and analyzing appropriate experiments. This kind of algorithm is generally quite robust against disaster, but note that even tiny changes in accuracy rates can have a large effect on *you*: say that 99% of the time the system says a thing is spam, it really is. Then say that degrades by a measly 1%: 99% falls to 98%. From *your* POV this is huge, because the error rate has actually doubled (from 1% wrong to 2% wrong: you've got twice as many false positives to deal with). So the scheme has an ongoing need for accurate human training (spam changes, list topics change, list members change, etc; the system needs an ongoing random sample of both new spam and new non-spam to adapt). > ... > One possibility occurs to me: we could build our own corpus by > collecting spam on python.org for a few weeks. Simpler is better: as you suggested later, capture everything for a while, and without injecting Mailman or SpamAssasin headers. That won't be a particularly good corpus for the lists in general, because over any brief period a small number of topics and posters dominate. But it will be a fair test for how systems do over exactly that brief period . > Here's a rough breakdown of mail rejected by mail.python.org over the > last 10 days, eyeball-estimated messages per day: > > bad RCPT 150 - 300 [1] > bad sender 50 - 190 [2] > relay denied 20 - 180 [3] > known spammer addr/domain 15 - 60 > 8-bit chars in subject 130 - 200 > 8-bit chars in header addrs 10 - 60 > banned charset in subject 5 - 50 [4] > "ADV" in subject 0 - 5 > no Message-Id header 100 - 400 [5] > invalid header address syntax 5 - 50 [6] > no valid senders in header 10 - 15 [7] > rejected by SpamAssassin 20 - 50 [8] > quarantined by SpamAssassin 5 - 50 [8] We should start another category, "Messages from Tim rejected for bogus reasons" . > [1] this includes mail accidentally sent to eg. giudo@python.org, > but based on scanning the reject logs, I'd say the vast majority > is spam. However, such messages are rejected after RCPT TO, > so we never see the message itself. Most of the bad recipient > addrs are either ancient (ipc6@python.org, > grail-feedback@python.org) or fictitious (success@python.org, > info@python.org). > > [2] sender verification failed, eg. someone tried to claim an > envelope sender like foo@bogus.domain. Usually spam, but innocent > bystanders can be hit by DNS servers suddenly exploding (hello, > comcast.net). This only includes hard failures (DNS "no such > domain"), not soft failures (DNS timeout). > > [3] I'd be leery of accepting mail that's trying to hijack > mail.python.org as an open relay, even though that would > be a goldmine of spam. (OTOH, we could reject after the > DATA command, and save the message anyways.) > > [4] mail.python.org rejects any message with a properly MIME-encoded > subject using any of the following charsets: > big5, euc-kr, gb2312, ks_c_5601-1987 > > [5] includes viruses as well as spam (and no doubt some innocent > false positives, although I have added exemptions for the MUA/MTA > combinations that most commonly result in legit mail reaching > mail.python.org without a Message-Id header, eg. KMail/qmail) > > [6] eg. "To: all my friends" or "From: <>" > > [7] no valid sender address in any header line -- eg. someone gives a > valid MAIL FROM address, but then puts "From: blah@bogus.domain" > in the headers. Easily defeated with a "Sender" or "Reply-to" > header. > > [8] any message scoring >= 10.0 is rejected at SMTP time; any > message scoring >= 5.0 but < 10 is saved in /var/mail/spam > for later review Greg, you show signs of enjoying this job too much . > Executive summary: > > * it's a good thing we do all those easy checks before involving > SA, or the load on the server would be a lot higher So long as easy checks don't block legitimate email, I can't complain about that. > * give me 10 days of spam-harvesting, and I can equal Bruce > Guenter's spam archive for 2002. (Of course, it'll take a couple > of days to set the mail server up for the harvesting, and a couple > more days to clean through the ~2000 caught messages, but you get > the idea.) If it would be helpful for me to do research on corpora that include the headers, then the point would be to collect both spam and non-spam messages, so that they can be compared directly to each other. Those should be as close to the bytes coming off the pipe as possible (e.g., before injecting new headers of our own). As is, I've had to throw the headers away in both corpora, so am, in effect, working with a crippled version of the algorithm. Or if someone else is doing research on how best to tokenize and tag headers, I'm not terribly concerned about merging the approaches untested. If the approach is valuable enough to deploy, we'll eventually see exactly how well it works in real life. > ... > Perhaps that spam-harvesting run should also set aside a random > selection of apparently-non-spam messages received at the same time. > Then you'd have a corpus of mail sent to the same server, more-or-less > to the same addresses, over the same period of time. Yes, it wants something as close to a slice of real life as possible, in all conceivable respects, including ratio of spam to not spam, arrival times, and so on. > Oh, any custom corpus should also include the ~300 false positives and > ~600 false negatives gathered since SA started running on > mail.python.org in April. Definitely not. That's not a slice of real life, it's a distortion based on how some *other* system screwed up. Train it systematically on that, and you're not training it for real life. The urge to be clever is strong, but must be resisted <0.3 wink>. What would be perfectly reasonable is to run (not train) the system against those corpora to see how it does. BTW, Barry said the good-message archives he put together were composed of msgs archived after SpamAssassin was enabled. Since about 80% of the 1% "false positive" rate I first saw turned out to be blatant spam in the ham corpus, this suggests SpamAssassin let about 160000 * 1% * 80% = 12800 spams through to the python-list archive alone. That doesn't jibe with "600 false negatives" at all. I don't want to argue about it, it's just fair warning that I don't believe much that I hear . In particular, in *this* case I don't believe python-list actually got 160000 messages since April, unless we're talking about April of 2000. From skip@pobox.com Thu Aug 29 04:15:11 2002 From: skip@pobox.com (Skip Montanaro) Date: Wed, 28 Aug 2002 22:15:11 -0500 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: References: Message-ID: <15725.37311.970263.211518@12-248-11-90.client.attbi.com> [ lots of interesting stuff elided ] Tim> What's an acceptable false positive rate? What do we get from Tim> SpamAssassin? I expect we can end up below 0.1% here, and with a Tim> generous meaning for "not spam", but I think *some* of these Tim> examples show that the only way to get a 0% false-positive rate is Tim> to recode spamprob like so: I don't know what an acceptable false positive rate is. I guess it depends on how important those falsies are. ;-) One thing I think would be worthwhile would be to run GBayes first, then only run stuff it thought was spam through SpamAssassin. Only messages that both systems categorized as spam would drop into the spam folder. This has a couple benefits over running one or the other in isolation: * The training set for GBayes probably doesn't need to be as big * The two systems use substantially different approaches to identifying spam, so I suspect your false positive rate would go way down. False negatives would go up, but only testing can suggest by how much. * Since SA is dog slow most of the time, SA users get a big speedup, since a substantially smaller fraction of your messages get run through it. This sort of chaining is pretty trivial to setup with procmail. Dunno what the Windows set will do though. Skip From tim.one@comcast.net Thu Aug 29 05:18:04 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 29 Aug 2002 00:18:04 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: Message-ID: FYI, about counting multiple instances of a word multiple times, or only once, when scoring. Changing it to count words only once did fix the specific false positive examples I mentioned. However, across 20 test runs (training on one of five pairs of corpora, and then for each such training pair running predictions across the remaining four pairs), it was a mixed bag. On some runs it appeared to be a real improvement, on others a real regression. Overall, the results didn't support concluding it made a significant difference to the false positive rate, but weakly supported concluding that it increased the false negative rate. That's very tentative -- I didn't stare at the actual misclassifications, I just ran it while sleeping off a flu, then woke up and crunched the numbers. This ignorant-of-MIME tokenization scheme is ridiculously bad for the false negative rate anyway (an entire line of base64 or obfuscated quoted-printable looks like a ham-favoring single "unknown word" to it), so there are bigger fish to fry first. From python@rcn.com Thu Aug 29 06:23:07 2002 From: python@rcn.com (Raymond Hettinger) Date: Thu, 29 Aug 2002 01:23:07 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net><200208201310.18625.mclay@nist.gov><200208201722.g7KHMX122261@odiug.zope.com><20020820231738.GA21011@thyrsus.com><200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net><20020821025725.GB28198@thyrsus.com><200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net><20020822162432.E9248@idi.ntnu.no><200208221512.g7MFCvI27671@odiug.zope.com><20020824163848.C10202@idi.ntnu.no> Message-ID: <007501c24f1c$29565ec0$2261accf@othello> Ideas for the day: 1. Optimize BaseSet._update(iterable) by checking for two special cases where a C-speed update method is already available and the entries are known in advance to be immutable: . . . if isinstance(iterable, BaseSet): self._data.update(iterable._data) return if isinstance(iterable, dict): self._data.update(iterable) return . . . 2. Eliminate the binary sanity checks which verify for operators that 'other' is a BaseSet. If 'other' isn't a BaseSet, try using it, directly or by coercing to a set, as an iterable: >>> Set('abracadabra') | 'alacazam' Set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l']) This improves usability because the second argument did not have to be pre-wrapped with Set. It improves speed, for some operations, by using the iterable directly and not having to build an equivalent dictionary. 3. Have ImmutableSet keep a reference to the original iterable. Add an ImmutableSet.refresh() method that rebuilds ._data from the iterable. Add a Set.refresh() method that triggers ImmutableSet.refresh() where possible. The goal is to improve the usability of sets of sets where the inner sets have been updated after the outer set was created. >>> inner = Set('abracadabra') >>> outer = Set([inner]) >>> inner.add('z') # now the outer set is out-of-date >>> outer.refresh() # now it is current >>> outer Set(['a', 'c', 'r', 'z', 'b', 'd']) This would only work for restartable iterables -- a file object would not be so easily refreshed. Raymond Hettinger From python@rcn.com Thu Aug 29 06:45:52 2002 From: python@rcn.com (Raymond Hettinger) Date: Thu, 29 Aug 2002 01:45:52 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net><200208201310.18625.mclay@nist.gov><200208201722.g7KHMX122261@odiug.zope.com><20020820231738.GA21011@thyrsus.com><200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net><20020821025725.GB28198@thyrsus.com><200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net><20020822162432.E9248@idi.ntnu.no><200208221512.g7MFCvI27671@odiug.zope.com><20020824163848.C10202@idi.ntnu.no> <007501c24f1c$29565ec0$2261accf@othello> Message-ID: <007d01c24f1f$5713efa0$2261accf@othello> > 3. Have ImmutableSet keep a reference to the original iterable. Add an ImmutableSet.refresh() method that rebuilds ._data from > the iterable. Add a Set.refresh() method that triggers ImmutableSet.refresh() where possible. The goal is to improve the > usability of sets of sets where the inner sets have been updated after the outer set was created. > > >>> inner = Set('abracadabra') > >>> outer = Set([inner]) > >>> inner.add('z') # now the outer set is out-of-date > >>> outer.refresh() # now it is current > >>> outer > Set(['a', 'c', 'r', 'z', 'b', 'd']) Make that: Set(ImmutableSet('a', 'c', 'r', 'z', 'b', 'd'])) From skip@pobox.com Thu Aug 29 06:49:56 2002 From: skip@pobox.com (Skip Montanaro) Date: Thu, 29 Aug 2002 00:49:56 -0500 Subject: [Python-Dev] lots of test failures... Message-ID: <15725.46596.268649.918014@12-248-11-90.client.attbi.com> Just cvs up'd and got a bunch of test failures on my Linux box: 12 tests failed: test___all__ test_cookie test_descrtut test_difflib test_doctest test_doctest2 test_generators test_grammar test_inspect test_pyclbr test_sundry test_tokenize Several failed looking for a missing attribute "testmod", e.g.: test test_generators crashed -- exceptions.AttributeError: 'module' object has no attribute 'testmod' Here's the test_tokenize output: test test_tokenize produced unexpected output: ********************************************************************** *** mismatch between lines 127-138 of expected output and lines 127-138 of actual output: - 42,17-42,29: NUMBER '020000000000' ? ^^ + 42,17-42,30: NUMBER '020000000000L' ? ^^ + - 42,29-42,30: NEWLINE '\n' ? ^^ ^ + 42,30-42,31: NEWLINE '\n' ? ^^ ^ - 43,0-43,12: NUMBER '037777777777' ? ^ + 43,0-43,13: NUMBER '037777777777L' ? ^ + - 43,13-43,15: OP '!=' ? ^ ^ + 43,14-43,16: OP '!=' ? ^ ^ - 43,16-43,17: OP '-' ? ^ ^ + 43,17-43,18: OP '-' ? ^ ^ - 43,17-43,18: NUMBER '1' ? ^ ^ + 43,18-43,19: NUMBER '1' ? ^ ^ - 43,18-43,19: NEWLINE '\n' ? ------ + 43,19-43,20: NEWLINE '\n' ? ++++++ - 44,0-44,10: NUMBER '0xffffffff' ? ^ + 44,0-44,11: NUMBER '0xffffffffL' ? ^ + - 44,11-44,13: OP '!=' ? ^ ^ + 44,12-44,14: OP '!=' ? ^ ^ - 44,14-44,15: OP '-' ? ^ ^ + 44,15-44,16: OP '-' ? ^ ^ - 44,15-44,16: NUMBER '1' ? ^ ^ + 44,16-44,17: NUMBER '1' ? ^ ^ - 44,16-44,17: NEWLINE '\n' ? ^ ^ + 44,17-44,18: NEWLINE '\n' ? ^ ^ ********************************************************************** I'll look more closely in the morning. It's too late to investigate now. Skip From David Abrahams" <15725.13011.164341.365821@slothrop.zope.com> Message-ID: <02c401c24f21$ceed79e0$1c86db41@boostconsulting.com> > TP> + The ability to override the random number generator. Python's > TP> default WH generator is showing its age as machines get > TP> faster; it's simply not adequate anymore for long-running > TP> programs making heavy use of it on a fast box. Combinatorial > TP> algorithms in particular do tend to make heavy use of it. > TP> (Speaking of which, "someone" should look into grabbing one of > TP> the Mersenne Twister extensions for Python -- that's the > TP> current state of *that* art). FWIW, in case "someone" cares: http://www.boost.org/libs/random/index.html It's a nice library architecture, designed and implemented by people who know the domain, and I think it should be applicable to Python. ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From Anthony Baxter Thu Aug 29 08:04:32 2002 From: Anthony Baxter (Anthony Baxter) Date: Thu, 29 Aug 2002 17:04:32 +1000 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: Message-ID: <200208290704.g7T74WA22198@localhost.localdomain> This is a multipart MIME message. --==_Exmh_20690933280 Content-Type: text/plain; charset=us-ascii For what it's worth, the attached (simple) script will 'de-spamassassin' an email message. I use it on my 'spam' folder to get test messages of various ugly MIME things that spam and viruses let through... It's not pretty, but it does the job (for me, anyway) -- Anthony Baxter It's never too late to have a happy childhood. --==_Exmh_20690933280 Content-Type: text/plain ; name="deSA.py"; charset=us-ascii Content-Description: deSA.py Content-Disposition: attachment; filename="deSA.py" def deSA(fp): import email, re m = email.message_from_string(fp.read()) if m['X-Spam-Status']: if m['X-Spam-Status'].startswith('No'): del m['X-Spam-Status'] del m['X-Spam-Level'] else: del m['X-Spam-Status'] del m['X-Spam-Level'] del m['X-Spam-Flag'] del m['X-Spam-Checker-Version'] pct = m['X-Spam-Prev-Content-Type'] if pct: del m['X-Spam-Prev-Content-Type'] m['Content-Type'] = pct pcte = m['X-Spam-Prev-Content-Transfer-Encoding'] if pcte: del m['Content-Transfer-Encoding'] m['Content-Transfer-Encoding'] = pcte del m['X-Spam-Prev-Content-Transfer-Encoding'] body = m.get_payload() subj = m['Subject'] del m['Subject'] m['Subject'] = re.sub(r'\*\*\*\*\*SPAM\*\*\*\*\* ', '', subj) newbody = [] at_start = 1 for line in body.splitlines(): if at_start and line.startswith('SPAM: '): continue elif at_start: at_start = 0 else: newbody.append(line) m.set_payload("\n".join(newbody)) return m if __name__ == "__main__": import sys print deSA(open(sys.argv[1])) --==_Exmh_20690933280-- From mwh@python.net Thu Aug 29 10:00:58 2002 From: mwh@python.net (Michael Hudson) Date: 29 Aug 2002 10:00:58 +0100 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/test test_grammar.py,1.41,1.42 tokenize_tests.py,1.3,1.4 In-Reply-To: bwarsaw@users.sourceforge.net's message of "Wed, 28 Aug 2002 09:36:13 -0700" References: Message-ID: <2mk7ma3uph.fsf@starship.python.net> bwarsaw@users.sourceforge.net writes: > Update of /cvsroot/python/python/dist/src/Lib/test > In directory usw-pr-cvs1:/tmp/cvs-serv31010/Lib/test > > Modified Files: > test_grammar.py tokenize_tests.py > Log Message: > Quite down some FutureWarnings. Barry, is this why these tests have started to fail? Cheers, M. -- I love the way Microsoft follows standards. In much the same manner that fish follow migrating caribou. -- Paul Tomblin -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html From barry@python.org Thu Aug 29 13:38:02 2002 From: barry@python.org (Barry A. Warsaw) Date: Thu, 29 Aug 2002 08:38:02 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/test test_grammar.py,1.41,1.42 tokenize_tests.py,1.3,1.4 References: <2mk7ma3uph.fsf@starship.python.net> Message-ID: <15726.5546.938683.657630@anthem.wooz.org> >>>>> "SM" == Skip Montanaro writes: SM> 12 tests failed: test___all__ test_cookie test_descrtut SM> test_difflib test_doctest test_doctest2 test_generators SM> test_grammar test_inspect test_pyclbr test_sundry SM> test_tokenize SM> Several failed looking for a missing attribute "testmod", SM> e.g.: >>>>> "MH" == Michael Hudson writes: >> Update of /cvsroot/python/python/dist/src/Lib/test In directory >> usw-pr-cvs1:/tmp/cvs-serv31010/Lib/test Modified Files: >> test_grammar.py tokenize_tests.py Log Message: Quite down some >> FutureWarnings. MH> Barry, is this why these tests have started to fail? Anything's possible. They weren't failing for me yesterday before I checked them in, but on my home machine now I see failures for test_grammar, test_strptime, and test_tokenize (and test_linuxaudiodev but that's always failed for me). I definitely don't see the other failures that Skip reports. I'll investigate. -Barry From barry@python.org Thu Aug 29 14:18:11 2002 From: barry@python.org (Barry A. Warsaw) Date: Thu, 29 Aug 2002 09:18:11 -0400 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/test test_grammar.py,1.41,1.42 tokenize_tests.py,1.3,1.4 References: <2mk7ma3uph.fsf@starship.python.net> <15726.5546.938683.657630@anthem.wooz.org> Message-ID: <15726.7955.100988.418085@anthem.wooz.org> >>>>> "BAW" == Barry A Warsaw writes: BAW> before I checked them in, but on my home machine now I see BAW> failures for test_grammar, test_strptime, and test_tokenize BAW> (and test_linuxaudiodev but that's always failed for me). I BAW> definitely don't see the other failures that Skip reports. The test_tokenize failure was easy, the output file had changed. The test_grammar failures make sense given the changing semantics of those constants. I've checked in a change that basically commented the hex -1 and oct -1 tests since those seem to be testing something that won't be true. Someone should double check that this is the right fix. The test_strptime failure has "gone away". It was: test test_strptime failed -- Traceback (most recent call last): File "/home/barry/projects/python/Lib/test/test_strptime.py", line 176, in test_hour self.failUnless(strp_output[3] == self.time_tuple[3], "testing of '%%I %%p' directive failed; '%s' -> %s != %s" % (strf_output, strp_output[3], self.time_tuple[3])) File "/home/barry/projects/python/Lib/unittest.py", line 268, in failUnless if not expr: raise self.failureException, msg AssertionError: testing of '%I %p' directive failed; '12 PM' -> 24 != 12 I don't get why this was failing and now is not, but don't have time right now to look at it. I now have no unexpected skips or failures. -Barry From python@rcn.com Thu Aug 29 14:20:00 2002 From: python@rcn.com (Raymond Hettinger) Date: Thu, 29 Aug 2002 09:20:00 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] References: <200208281758.g7SHwII01131@odiug.zope.com> <15725.13011.164341.365821@slothrop.zope.com> <02c401c24f21$ceed79e0$1c86db41@boostconsulting.com> Message-ID: <007e01c24f5e$c84e73e0$5fb63bd0@othello> From: "David Abrahams" > > TP> + The ability to override the random number generator. Python's > > TP> default WH generator is showing its age as machines get > > TP> faster; it's simply not adequate anymore for long-running > > TP> programs making heavy use of it on a fast box. Combinatorial > > TP> algorithms in particular do tend to make heavy use of it. > > TP> (Speaking of which, "someone" should look into grabbing one of > > TP> the Mersenne Twister extensions for Python -- that's the > > TP> current state of *that* art). > > FWIW, in case "someone" cares: http://www.boost.org/libs/random/index.html > It's a nice library architecture, designed and implemented by people who > know the domain, and I think it should be applicable to Python. I'm willing to implement this one. Raymond From guido@python.org Thu Aug 29 15:06:53 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 29 Aug 2002 10:06:53 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Thu, 29 Aug 2002 01:23:07 EDT." <007501c24f1c$29565ec0$2261accf@othello> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <007501c24f1c$29565ec0$2261accf@othello> Message-ID: <200208291406.g7TE6rv09318@odiug.zope.com> > 1. Optimize BaseSet._update(iterable) by checking for two special cases where a C-speed update method is already available and the > entries are known in advance to be immutable: > > . . . > if isinstance(iterable, BaseSet): > self._data.update(iterable._data) > return > if isinstance(iterable, dict): > self._data.update(iterable) > return > . . . Yes. > 2. Eliminate the binary sanity checks which verify for operators that 'other' is a BaseSet. If 'other' isn't a BaseSet, try using > it, directly or by coercing to a set, as an iterable: > > >>> Set('abracadabra') | 'alacazam' > Set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l']) > > This improves usability because the second argument did not have to be pre-wrapped with Set. It improves speed, for some > operations, by using the iterable directly and not having to build an equivalent dictionary. No. This has been proposed before. I think it's a bad idea, just as [1,2,3] + "abc" is a bad idea. If you want this, it's easy enough to do s = Set('abracadabra') s.update('alacazam') > 3. Have ImmutableSet keep a reference to the original iterable. Add an ImmutableSet.refresh() method that rebuilds ._data from > the iterable. Add a Set.refresh() method that triggers ImmutableSet.refresh() where possible. The goal is to improve the > usability of sets of sets where the inner sets have been updated after the outer set was created. > > >>> inner = Set('abracadabra') > >>> outer = Set([inner]) > >>> inner.add('z') # now the outer set is out-of-date > >>> outer.refresh() # now it is current > >>> outer > Set([ImmutableSet(['a', 'c', 'r', 'z', 'b', 'd'])]) > > This would only work for restartable iterables -- a file object would not be so easily refreshed. This *appears* to be messing with the immutability. If I wrote: a = range(3) s1 = ImmutableSet(a) s2 = Set([s1]) a.append(4) s2.refresh() What would the value of s1 be? I think I understand your use case (the example in the docs, where an employee is added), but I think we should think harder about what to do about that. Possibly it's not a good example of how sets are used (even if it's a good example of how sets work). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Aug 29 15:25:50 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 29 Aug 2002 10:25:50 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: Your message of "Thu, 29 Aug 2002 09:20:00 EDT." <007e01c24f5e$c84e73e0$5fb63bd0@othello> References: <200208281758.g7SHwII01131@odiug.zope.com> <15725.13011.164341.365821@slothrop.zope.com> <02c401c24f21$ceed79e0$1c86db41@boostconsulting.com> <007e01c24f5e$c84e73e0$5fb63bd0@othello> Message-ID: <200208291425.g7TEPog10317@odiug.zope.com> > > FWIW, in case "someone" cares: http://www.boost.org/libs/random/index.html > > It's a nice library architecture, designed and implemented by people who > > know the domain, and I think it should be applicable to Python. > > I'm willing to implement this one. Please do! (Have you got much experience with random number generation?) --Guido van Rossum (home page: http://www.python.org/~guido/) From python@rcn.com Thu Aug 29 15:36:18 2002 From: python@rcn.com (Raymond Hettinger) Date: Thu, 29 Aug 2002 10:36:18 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] References: <200208281758.g7SHwII01131@odiug.zope.com> <15725.13011.164341.365821@slothrop.zope.com> <02c401c24f21$ceed79e0$1c86db41@boostconsulting.com> <007e01c24f5e$c84e73e0$5fb63bd0@othello> <200208291425.g7TEPog10317@odiug.zope.com> Message-ID: <00a301c24f69$709a0e60$5fb63bd0@othello> > > > FWIW, in case "someone" cares: http://www.boost.org/libs/random/index.html > > > It's a nice library architecture, designed and implemented by people who > > > know the domain, and I think it should be applicable to Python. > > > > I'm willing to implement this one. > > Please do! (Have you got much experience with random number > generation?) Yes, but my experience is out-of-date. I've read Knuth (esp the part on testing generators), done numerical analysis, written simulations and high-end crypto, etc. The Mersenne Twister algorithm is new to me -- studying it is part of my motivation to volunteer to implement it. From guido@python.org Thu Aug 29 15:42:49 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 29 Aug 2002 10:42:49 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: Your message of "Thu, 29 Aug 2002 10:36:18 EDT." <00a301c24f69$709a0e60$5fb63bd0@othello> References: <200208281758.g7SHwII01131@odiug.zope.com> <15725.13011.164341.365821@slothrop.zope.com> <02c401c24f21$ceed79e0$1c86db41@boostconsulting.com> <007e01c24f5e$c84e73e0$5fb63bd0@othello> <200208291425.g7TEPog10317@odiug.zope.com> <00a301c24f69$709a0e60$5fb63bd0@othello> Message-ID: <200208291442.g7TEgnO11038@odiug.zope.com> > > > > FWIW, in case "someone" cares: > > > > http://www.boost.org/libs/random/index.html It's a nice > > > > library architecture, designed and implemented by people who > > > > know the domain, and I think it should be applicable to > > > > Python. > > > > > > I'm willing to implement this one. > > > > Please do! (Have you got much experience with random number > > generation?) > > Yes, but my experience is out-of-date. I've read Knuth (esp the > part on testing generators), done numerical analysis, written > simulations and high-end crypto, etc. The Mersenne Twister > algorithm is new to me -- studying it is part of my motivation to > volunteer to implement it. Cool! You & Tim will have something to talk about. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Thu Aug 29 15:43:09 2002 From: skip@pobox.com (Skip Montanaro) Date: Thu, 29 Aug 2002 09:43:09 -0500 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: <200208290704.g7T74WA22198@localhost.localdomain> References: <200208290704.g7T74WA22198@localhost.localdomain> Message-ID: <15726.13053.111171.335483@12-248-11-90.client.attbi.com> (trimming the cc list a bit, since this is drifting a bit away from strictly discussing the current algorithm.) Anthony> For what it's worth, the attached (simple) script will Anthony> 'de-spamassassin' an email message. I use it on my 'spam' Anthony> folder to get test messages of various ugly MIME things that Anthony> spam and viruses let through... Thanks, that helps me as well, as I need to delete the X-VM-* headers Emacs's VM mail package inserts. While spamassassin -d does what you are doing, it can be easily extended to elide other headers as well. One thing worth noting before everybody starts using it to massage their mailboxes is that the email package contains a bug which causes it to occasionally delete whitespace when reformatting headers. For example, in one example, the header went from Received: from rogers.com ([24.43.65.252]) by fep02-mail.bloor.is.net.cable.rogers.com (InterMail vM.5.01.05.06 201-253-122-126-106-20020509) with ESMTP id <20020820205424.DFHH4777.fep02-mail.bloor.is.net.cable.rogers.com@rogers.com>; Tue, 20 Aug 2002 16:54:24 -0400 to Received: from rogers.com ([24.43.65.252]) by fep02-mail.bloor.is.net.cable.rogers.com (InterMail vM.5.01.05.06 201-253-122-126-106-20020509) with ESMTPid <20020820205424.DFHH4777.fep02-mail.bloor.is.net.cable.rogers.com@rogers.com>; Tue, 20 Aug 2002 16:54:24 -0400 Note that in the second version there is no space between "ESMTP" and "id", which had previously been separated by a newline and several spaces. I filed a bug report about it a few days ago: http://python.org/sf/594893 Skip From python@rcn.com Thu Aug 29 15:53:07 2002 From: python@rcn.com (Raymond Hettinger) Date: Thu, 29 Aug 2002 10:53:07 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <007501c24f1c$29565ec0$2261accf@othello> <200208291406.g7TE6rv09318@odiug.zope.com> Message-ID: <00b101c24f6b$c9e7c5a0$5fb63bd0@othello> > > 2. Eliminate the binary sanity checks which verify for operators that 'other' is a BaseSet. If 'other' isn't a BaseSet, try using > > it, directly or by coercing to a set, as an iterable: > > > > >>> Set('abracadabra') | 'alacazam' > > Set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l']) > > > > This improves usability because the second argument did not have to be pre-wrapped with Set. It improves speed, for some > > operations, by using the iterable directly and not having to build an equivalent dictionary. > > No. This has been proposed before. I think it's a bad idea, just as > > [1,2,3] + "abc" > > is a bad idea. I see the wisdom in preventing weirdness. The real motivation was to get sets.py to play nicely with other set implementations. Right now, it can only interact with instances of BaseClass. And, even if someone subclasses BaseClass, they currently *must* have a self._data attribute that is a dictionary. This prevents non-dictionary based extensions. > > > 3. Have ImmutableSet keep a reference to the original iterable. Add an ImmutableSet.refresh() method that rebuilds ._data from > > the iterable. Add a Set.refresh() method that triggers ImmutableSet.refresh() where possible. The goal is to improve the > > usability of sets of sets where the inner sets have been updated after the outer set was created. > > > > >>> inner = Set('abracadabra') > > >>> outer = Set([inner]) > > >>> inner.add('z') # now the outer set is out-of-date > > >>> outer.refresh() # now it is current > > >>> outer > > Set([ImmutableSet(['a', 'c', 'r', 'z', 'b', 'd'])]) > > > > This would only work for restartable iterables -- a file object would not be so easily refreshed. > > This *appears* to be messing with the immutability. If I wrote: > > a = range(3) > s1 = ImmutableSet(a) > s2 = Set([s1]) > a.append(4) > s2.refresh() > > What would the value of s1 be? Hmm, I intended to have s1.refresh() return a new object for use in s2 while leaving s1 alone (being immutable and all). Now, I wonder if that was the right thing to do. The answer lies in use cases for algorithms that need sets of sets. If anyone knows off the top of their head that would be great; otherwise, I seem to remember that some of that business was found in compiler algorithms and graph packages. From skip@pobox.com Thu Aug 29 16:29:40 2002 From: skip@pobox.com (Skip Montanaro) Date: Thu, 29 Aug 2002 10:29:40 -0500 Subject: [Python-Dev] Re: [Python-checkins] python/dist/src/Lib/test test_grammar.py,1.41,1.42 tokenize_tests.py,1.3,1.4 In-Reply-To: <2mk7ma3uph.fsf@starship.python.net> References: <2mk7ma3uph.fsf@starship.python.net> Message-ID: <15726.15844.914465.919976@12-248-11-90.client.attbi.com> >> Modified Files: >> test_grammar.py tokenize_tests.py >> Log Message: >> Quite down some FutureWarnings. Michael> Barry, is this why these tests have started to fail? Whatever Barry and Guido did fixed the problems with those files. The other failures were all caused by a cvs conflict in my locally modified version of dis.py. Oddly enough, "cvs up" didn't report a "C", just an "M", so I didn't even think to look there for problems. Skip From guido@python.org Thu Aug 29 16:26:51 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 29 Aug 2002 11:26:51 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: Your message of "Thu, 29 Aug 2002 10:53:07 EDT." <00b101c24f6b$c9e7c5a0$5fb63bd0@othello> References: <200208192038.g7JKcfV22417@pcp02138704pcs.reston01.va.comcast.net> <200208201310.18625.mclay@nist.gov> <200208201722.g7KHMX122261@odiug.zope.com> <20020820231738.GA21011@thyrsus.com> <200208210225.g7L2PIe31201@pcp02138704pcs.reston01.va.comcast.net> <20020821025725.GB28198@thyrsus.com> <200208210347.g7L3lCT31814@pcp02138704pcs.reston01.va.comcast.net> <20020822162432.E9248@idi.ntnu.no> <200208221512.g7MFCvI27671@odiug.zope.com> <20020824163848.C10202@idi.ntnu.no> <007501c24f1c$29565ec0$2261accf@othello> <200208291406.g7TE6rv09318@odiug.zope.com> <00b101c24f6b$c9e7c5a0$5fb63bd0@othello> Message-ID: <200208291526.g7TFQp513030@odiug.zope.com> (How about limiting our lines to 72 characters?) > > > Eliminate the binary sanity checks which verify for operators > > > that 'other' is a BaseSet. If 'other' isn't a BaseSet, try using > > > it, directly or by coercing to a set, as an iterable: > > > > > > >>> Set('abracadabra') | 'alacazam' > > > Set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l']) > > > > > > This improves usability because the second argument did not have > > > to be pre-wrapped with Set. It improves speed, for some > > > operations, by using the iterable directly and not having to > > > build an equivalent dictionary. > > > > No. This has been proposed before. I think it's a bad idea, just as > > > > [1,2,3] + "abc" > > > > is a bad idea. > > I see the wisdom in preventing weirdness. The real motivation was > to get sets.py to play nicely with other set implementations. Right > now, it can only interact with instances of BaseClass. And, even if > someone subclasses BaseClass, they currently *must* have a > self._data attribute that is a dictionary. This prevents > non-dictionary based extensions. I've thought of (and I think I even posted) a different way to accomplish the latter, *if* *and* *when* it becomes necessary. Here it is again: - BaseSet becomes a true abstract class. I don't care if it has dummy methods that raise NotImplementedError, but the set of operations it stands for should be documented. I propose that it should stand for only the published operations of ImmutableSet. Other set implementations can then derive from BaseSet. - The implementation currently in BaseSet is moved to a new internal class, e.g. _CoreSet, which derives from BaseSet. - Set and ImmutableSet derive from _CoreSet. - The binary operators (and sundry other places as needed) make *two* checks for the 'other' argument: - If it is a _CoreSet instance, do what's currently done, taking a shortcut knowing the implementation. - Otherwise, if it is a BaseSet instance, implement the operation using only the published set API. Example: def __or__(self, other): if isinstance(other, _CoreSet): result = self.__class__(self._data) result._data.update(other._data) return result if isinstance(other, BaseSet): result = self.__class__(self._data) result.update(other) return NotImplemented This effectively makes BaseSet a protocol. I realize that there is some resistance to using inheritance from a designated abstract base class as a way to indicate that a class implements a given protocol; but since we don't have other solutions in place, I think this is a reasonable solution. Trying to sniff whether the other argument implements a set protocol by testing the presence of specific APIs seems awkward, especially since most set APIs (__or__ etc.) are heavily overloaded by types that aren't sets at all. > > > Have ImmutableSet keep a reference to the original iterable. > > > Add an ImmutableSet.refresh() method that rebuilds ._data from > > > the iterable. Add a Set.refresh() method that triggers > > > ImmutableSet.refresh() where possible. The goal is to improve > > > the usability of sets of sets where the inner sets have been > > > updated after the outer set was created. > > > > > > >>> inner = Set('abracadabra') > > > >>> outer = Set([inner]) > > > >>> inner.add('z') # now the outer set is out-of-date > > > >>> outer.refresh() # now it is current > > > >>> outer > > > Set([ImmutableSet(['a', 'c', 'r', 'z', 'b', 'd'])]) > > > > > > This would only work for restartable iterables -- a file object would not be so easily refreshed. > > > > This *appears* to be messing with the immutability. If I wrote: > > > > a = range(3) > > s1 = ImmutableSet(a) > > s2 = Set([s1]) > > a.append(4) > > s2.refresh() > > > > What would the value of s1 be? > > Hmm, I intended to have s1.refresh() return a new object for use in > s2 while leaving s1 alone (being immutable and all). Now, I wonder > if that was the right thing to do. The answer lies in use cases for > algorithms that need sets of sets. If anyone knows off the top of > their head that would be great; otherwise, I seem to remember that > some of that business was found in compiler algorithms and graph > packages. Let's call YAGNI on this one. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Thu Aug 29 17:21:38 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 29 Aug 2002 12:21:38 -0400 Subject: [Python-Dev] Re: PEP 218 (sets); moving set.py to Lib In-Reply-To: <00b101c24f6b$c9e7c5a0$5fb63bd0@othello> Message-ID: [Raymond Hettinger] > ... > Hmm, I intended to have s1.refresh() return a new object for use > in s2 while leaving s1 alone (being immutable and all). Now, I > wonder if that was the right thing to do. The answer lies in use > cases for algorithms that need sets of sets. If anyone knows > off the top of their head that would be great; otherwise, I seem > to remember that some of that business was found in compiler > algorithms and graph packages. There's no real use case I know of for having a mutation of a set element propagate to the set containing it. Sets in Python are collections of values, not collections of object ids (sets in Icon are collections of object ids, and, e.g., Set([[], []]) in Icon is a set with two elements). Value semantics darned near require copying, or fancier copy on write, under the covers, and value semantics are most useful for sets of sets. Once the value has been established, you want to guarantee it never changes, not make it easy to change it by accident . From Anthony Baxter Thu Aug 29 17:31:42 2002 From: Anthony Baxter (Anthony Baxter) Date: Fri, 30 Aug 2002 02:31:42 +1000 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: <15726.13053.111171.335483@12-248-11-90.client.attbi.com> Message-ID: <200208291631.g7TGVgd28718@localhost.localdomain> >>> Skip Montanaro wrote > One thing worth noting before everybody starts using it to massage their > mailboxes is that the email package contains a bug which causes it to > occasionally delete whitespace when reformatting headers. There's one other known problem - seriously misformatted MIME (as seen in spam, and email from Microsoft Entourage) causes the email package to barf out. I plan, at some point, to try and make a "if it fails, just leave the body as one chunk of text" mode, but it's a long long way down my list of priorities. -- Anthony Baxter It's never too late to have a happy childhood. From esr@thyrsus.com Thu Aug 29 18:13:07 2002 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 29 Aug 2002 13:13:07 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: References: Message-ID: <20020829171307.GA5823@thyrsus.com> Tim Peters : > Spammers often generate random "word-like" gibberish at the ends of msgs, > and "rd" is one of the random two-letter combos that appears in the spam > corpus. Perhaps it would be good to ignore "words" with fewer than W > characters (to be determined by experiment). Bogofilter throws out words of length one and two. > I expect that including the headers would have given these much better > chances of getting through, given Robin and Alex's posting histories. > Still, the idea of counting words multiple times is open to question, and > experiments both ways are in order. And bogofilter includes the headers. This is important, since otherwise you don't rate things like spamhaus addresses and sender names. -- Eric S. Raymond From tim@zope.com Thu Aug 29 18:54:30 2002 From: tim@zope.com (Tim Peters) Date: Thu, 29 Aug 2002 13:54:30 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: <20020829171307.GA5823@thyrsus.com> Message-ID: [Eric S. Raymond] > Bogofilter throws out words of length one and two. Right, I saw that. It's something I'll run experiments against later. I'm running a 5x5 test grid (skipping the diagonal), and as was also true in speech recognition, if I had been running against just one spam+ham training corpora and just one spam+ham prediction set, I would have erroneously concluded that various things either are improvements, are regressions, or don't matter. But some ideas obtained from staring at mistakes from one test run turn out to be irrelevant, or even counter-productive, if applied to other test runs. The idea that some notion of "word" is important seems highly defensible , but beyond that I discount claims that aren't derived from a similarly paranoid testing setup. > ... > And bogofilter includes the headers. This is important, since > otherwise you don't rate things like spamhaus addresses and sender > names. Of course -- the reasons I'm not using headers in these particular tests have been spelled out several times. They'll get added later, but for now I don't have a large enough test set where doing so doesn't render the classifier's job trivial. From python@rcn.com Thu Aug 29 19:26:59 2002 From: python@rcn.com (Raymond Hettinger) Date: Thu, 29 Aug 2002 14:26:59 -0400 Subject: [Python-Dev] Mersenne Twister References: <200208281758.g7SHwII01131@odiug.zope.com> <15725.13011.164341.365821@slothrop.zope.com> <02c401c24f21$ceed79e0$1c86db41@boostconsulting.com> <007e01c24f5e$c84e73e0$5fb63bd0@othello> <200208291425.g7TEPog10317@odiug.zope.com> <00a301c24f69$709a0e60$5fb63bd0@othello> <200208291442.g7TEgnO11038@odiug.zope.com> Message-ID: <003b01c24f89$aa62a2e0$3ad8accf@othello> I'm sketching out an approach to the Mersenne Twister and wanted to make sure it is in line with what you want. -- Write it in pure python as a drop-in replacement for Wichman-Hill. -- Add a version number argument to Random() which defaults to two. If set to one, use the old generator so that it is possible to recreate sequences from earlier versions of Python. Note, the code is much shorter if we drop this requirement. On the plus side, it gives more than backwards compatability, it gives the ability to re-run a simulation with another generator to assure that the result isn't a fluke related to a generator design flaw. -- Document David Abrahams's link to http://www.boost.org/boost/random/mersenne_twister.hpp as the reference implementation and http://www.math.keio.ac.jp/matumoto/emt.html as a place for more information. Key-off of the MT19337 version as the most recent stable evolution. -- Move the existing in-module test-suite into a unittest. Add a new, separate unittest suite with tests specific to MT (recreating a few sequences produced by reference implementations) and with a battery of Knuth style tests. The validation results are at: http://www.math.keio.ac.jp/matumoto/ver991029.html -- When we're done, have a python link put on the Mersenne Twister Home Page (the second link above). -- Write, test and document the generator first. Afterwards, explore techniques for creating multiple independent streams: http://www.math.h.kyoto-u.ac.jp/~matumoto/RAND/DC/dc.html Raymond ----- Original Message ----- From: "Guido van Rossum" To: "Raymond Hettinger" Cc: Sent: Thursday, August 29, 2002 10:42 AM Subject: Re: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] > > > > > FWIW, in case "someone" cares: > > > > > http://www.boost.org/libs/random/index.html It's a nice > > > > > library architecture, designed and implemented by people who > > > > > know the domain, and I think it should be applicable to > > > > > Python. > > > > > > > > I'm willing to implement this one. > > > > > > Please do! (Have you got much experience with random number > > > generation?) > > > > Yes, but my experience is out-of-date. I've read Knuth (esp the > > part on testing generators), done numerical analysis, written > > simulations and high-end crypto, etc. The Mersenne Twister > > algorithm is new to me -- studying it is part of my motivation to > > volunteer to implement it. > > Cool! You & Tim will have something to talk about. > > --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Thu Aug 29 19:32:16 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 29 Aug 2002 14:32:16 -0400 Subject: [Python-Dev] Mersenne Twister In-Reply-To: <003b01c24f89$aa62a2e0$3ad8accf@othello> References: <200208281758.g7SHwII01131@odiug.zope.com> <15725.13011.164341.365821@slothrop.zope.com> <02c401c24f21$ceed79e0$1c86db41@boostconsulting.com> <007e01c24f5e$c84e73e0$5fb63bd0@othello> <200208291425.g7TEPog10317@odiug.zope.com> <00a301c24f69$709a0e60$5fb63bd0@othello> <200208291442.g7TEgnO11038@odiug.zope.com> <003b01c24f89$aa62a2e0$3ad8accf@othello> Message-ID: <15726.26800.163068.274448@slothrop.zope.com> Why not wrap the existing C implementation? I think a wrapper has two advantages. We get to reuse the existing implementation, without worry for transliteration errors. We also get better performance. Jeremy From guido@python.org Thu Aug 29 19:46:42 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 29 Aug 2002 14:46:42 -0400 Subject: [Python-Dev] Re: Mersenne Twister In-Reply-To: Your message of "Thu, 29 Aug 2002 14:26:59 EDT." <003b01c24f89$aa62a2e0$3ad8accf@othello> References: <200208281758.g7SHwII01131@odiug.zope.com> <15725.13011.164341.365821@slothrop.zope.com> <02c401c24f21$ceed79e0$1c86db41@boostconsulting.com> <007e01c24f5e$c84e73e0$5fb63bd0@othello> <200208291425.g7TEPog10317@odiug.zope.com> <00a301c24f69$709a0e60$5fb63bd0@othello> <200208291442.g7TEgnO11038@odiug.zope.com> <003b01c24f89$aa62a2e0$3ad8accf@othello> Message-ID: <200208291846.g7TIkgl01270@odiug.zope.com> > -- Write it in pure python as a drop-in replacement for Wichman-Hill. Yup. I think the seed arguments are different though -- MT takes a single int, while whrandom takes three ints in range(256). > -- Add a version number argument to Random() which defaults to two. > If set to one, use the old generator so that it is possible to recreate > sequences from earlier versions of Python. Note, the code is much > shorter if we drop this requirement. On the plus side, it gives more > than backwards compatability, it gives the ability to re-run a > simulation with another generator to assure that the result isn't > a fluke related to a generator design flaw. I think this is useful. But I'd like to hear what Tim has to say. > -- Document David Abrahams's link to > http://www.boost.org/boost/random/mersenne_twister.hpp as the > reference implementation and Hm. What part of that file contains the actual algorithm? I gues the function void mersenne_twister::twist() ??? > http://www.math.keio.ac.jp/matumoto/emt.html as a place for > more information. Key-off of the MT19337 version as the most > recent stable evolution. Sure. It would be nice to have at least *some* documentation in-line in case those links disappear. Maybe you can quote the relevant C++ code from the Boost version (with attribution) in a comment. > -- Move the existing in-module test-suite into a unittest. Add a new, > separate unittest suite with tests specific to MT (recreating a few > sequences produced by reference implementations) and with a battery > of Knuth style tests. The validation results are at: > http://www.math.keio.ac.jp/matumoto/ver991029.html It might be fun to have some heavy duty tests (which take hours or days to run) checked in but not run by default. We usually do this by not naming the test file test_foo.py; it can then be run manually. > -- When we're done, have a python link put on the Mersenne Twister > Home Page (the second link above). Sounds like they would be only too eager to comply. :-) > -- Write, test and document the generator first. Afterwards, explore > techniques for creating multiple independent streams: > http://www.math.h.kyoto-u.ac.jp/~matumoto/RAND/DC/dc.html Isn't that trivial if you follow the WH implementation strategy which stores all the state in a class instance? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Thu Aug 29 19:59:25 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 29 Aug 2002 14:59:25 -0400 Subject: [Python-Dev] Re: A `cogen' module [was: Re: PEP 218 (sets); moving set.py to Lib] In-Reply-To: <00a301c24f69$709a0e60$5fb63bd0@othello> Message-ID: [Raymond Hettinger] > ... but my experience is out-of-date. I've read Knuth (esp the > part on testing generators), Knuth is behind the times here. Better: But if you're folding in the Twister, you don't have to test ab initio -- you just have to make sure the test vector they supply produces the results they say it should. > done numerical analysis, written simulations and high-end crypto, etc. > The Mersenne Twister algorithm is new to me -- studying it is part of my > motivation to volunteer to implement it. It's been implmented for Python several times already (in Python code, and as a C extension). This is more of an integration and API task than a write-code-from-scratch task. Do visit the authors' home page: http://www.math.keio.ac.jp/~matumoto/emt.html Indeed, the authors have been reduced to announcing "yet another Python module ...". Note that a subtle weakness was discovered in the seed initialization early this year, so that implementations older than that are suspect on this count. BTW, Knuth's lagged Fibonacci generator turned out to have the same kind of initialization weakness, and was corrected in the ninth printing of Vol 2: http://www-cs-faculty.stanford.edu/~knuth/news.html If this stuff interests you , Ivan Frohne (a statistician) wrote a wonderful pure-Python random-number package several years ago, including a pure Python implementation of the Twister, and several other stronger-than-WH (0, 1) base generators. It's hard to keep track of that package -- and of Ivan. This may be the most recent version: http://www.frohne.westhost.com/ From python@rcn.com Thu Aug 29 20:00:03 2002 From: python@rcn.com (Raymond Hettinger) Date: Thu, 29 Aug 2002 15:00:03 -0400 Subject: [Python-Dev] Mersenne Twister References: <200208281758.g7SHwII01131@odiug.zope.com><15725.13011.164341.365821@slothrop.zope.com><02c401c24f21$ceed79e0$1c86db41@boostconsulting.com><007e01c24f5e$c84e73e0$5fb63bd0@othello><200208291425.g7TEPog10317@odiug.zope.com><00a301c24f69$709a0e60$5fb63bd0@othello><200208291442.g7TEgnO11038@odiug.zope.com><003b01c24f89$aa62a2e0$3ad8accf@othello> <15726.26800.163068.274448@slothrop.zope.com> Message-ID: <002001c24f8e$48d27e60$83ec7ad1@othello> [Jeremy] > Why not wrap the existing C implementation? I think a wrapper has two > advantages. We get to reuse the existing implementation, without > worry for transliteration errors. We also get better performance. On the plus side, it gives a chance to write a pure C helper function for creating many random numbers at a time. On the minus side, random number generation is a much disputed topic, occassionly requiring full disclosure of seeds and source. Having the code in random.py makes it more visible than burying it in the C code. The C code I saw is covered by a BSD license -- I don't know if that's an issue or not. As for implementation difficulty or accuracy, the code is so short and clear that there isn't a savings from re-using the C code. Raymond From bsder@mail.allcaps.org Thu Aug 29 20:14:30 2002 From: bsder@mail.allcaps.org (Andrew P. Lentvorski) Date: Thu, 29 Aug 2002 12:14:30 -0700 (PDT) Subject: [Python-Dev] Mersenne Twister In-Reply-To: <003b01c24f89$aa62a2e0$3ad8accf@othello> Message-ID: <20020829115802.J52397-100000@mail.allcaps.org> On Thu, 29 Aug 2002, Raymond Hettinger wrote: > -- Add a version number argument to Random() which defaults to two. Why not have a WHRandom() and a MersenneRandom() instance inside module random? That way you can even give a future behavior warning that Random() is about to change and people can either choose the particular generator they want or accept the default. To my mind, this is a case of explicit (actually naming the generator types) is better than implicit (version number? Where's my documentation? Which generator is which version?) Maybe this isn't a big deal now, but I can believe that we might accumulate another RNG or two (there are some good reasons to want *weaker* or correlated RNGs) and having a weaker generator with a *later* version number is just bound to cause havoc. -a From python@rcn.com Thu Aug 29 20:21:46 2002 From: python@rcn.com (Raymond Hettinger) Date: Thu, 29 Aug 2002 15:21:46 -0400 Subject: [Python-Dev] Rehashing in PyDict_Copy Message-ID: <004701c24f91$51ee0200$83ec7ad1@othello> Is there a reason that dict.copy() runs like an update()? It creates a new dict object, then re-hashes and inserts every element one-by-one, complete with collisions. I would have expected a single pass to update refcounts, an allocation for identical size, and a memcpy to polish it off. Raymond From guido@python.org Thu Aug 29 20:29:21 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 29 Aug 2002 15:29:21 -0400 Subject: [Python-Dev] Mersenne Twister In-Reply-To: Your message of "Thu, 29 Aug 2002 12:14:30 PDT." <20020829115802.J52397-100000@mail.allcaps.org> References: <20020829115802.J52397-100000@mail.allcaps.org> Message-ID: <200208291929.g7TJTL702611@odiug.zope.com> > > -- Add a version number argument to Random() which defaults to two. > > Why not have a WHRandom() and a MersenneRandom() instance inside module > random? That way you can even give a future behavior warning that > Random() is about to change and people can either choose the particular > generator they want or accept the default. > > To my mind, this is a case of explicit (actually naming the generator > types) is better than implicit (version number? Where's my documentation? > Which generator is which version?) Maybe this isn't a big deal now, but I > can believe that we might accumulate another RNG or two (there are some > good reasons to want *weaker* or correlated RNGs) and having a weaker > generator with a *later* version number is just bound to cause havoc. Hm, I hadn't realized that the random.Random class doesn't import the whrandom module but simply reimplements it. Here's an idea. class BaseRandom: implements the end user methods: randrange(), choice(), normalvariate(), etc., except random(), which is an abstract method, raising NotImplementedError. class WHRandom and class MersenneRandom: add the specific random number generator implementation, as random(). Random is an alias for the random generator class of the day, currently MersenneRandom. Details: can MersenneRandom support jumpahead()? Should it support whseed(), which is provided only for backwards compatibility? If someone pickles a Random instance with Python 2.2 and tries to unpickle it with Python 2.3, this will fail, because (presumably) the state for MersenneRandom is different from the state for WHRandom. Perhaps there should be a module-level call to make Random an alias for WHRandom rather than for MersenneRandom. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Thu Aug 29 20:30:52 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 29 Aug 2002 15:30:52 -0400 Subject: [Python-Dev] Rehashing in PyDict_Copy In-Reply-To: Your message of "Thu, 29 Aug 2002 15:21:46 EDT." <004701c24f91$51ee0200$83ec7ad1@othello> References: <004701c24f91$51ee0200$83ec7ad1@othello> Message-ID: <200208291930.g7TJUqf02625@odiug.zope.com> > Is there a reason that dict.copy() runs like an update()? > It creates a new dict object, then re-hashes and inserts > every element one-by-one, complete with collisions. > > I would have expected a single pass to update refcounts, > an allocation for identical size, and a memcpy to polish > it off. After you've inserted and removed many elements into a dict, the elements may not be in the best order, and there may be many "deleted" markers. The update() strategy avoids copying such cruft. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Thu Aug 29 20:45:41 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 29 Aug 2002 15:45:41 -0400 Subject: [Python-Dev] Mersenne Twister In-Reply-To: <003b01c24f89$aa62a2e0$3ad8accf@othello> Message-ID: [Raymond Hettinger] > I'm sketching out an approach to the Mersenne Twister and > wanted to make sure it is in line with what you want. > > -- Write it in pure python as a drop-in replacement for Wichman-Hill. I'd rather the Twister were in C -- it's low-level bit-fiddling, and Python isn't well-suited to high-throughput bit fiddling. IIRC, Ivan Frohne eventually got his pure-Python Twister implementation (see earlier msg) to within 15% of whrandom's speed, but it *could* be 10x faster w/o even trying. > -- Add a version number argument to Random() which defaults to two. > If set to one, use the old generator so that it is possible > to recreate sequences from earlier versions of Python. Note, the code > is much shorter if we drop this requirement. On the plus side, it gives > more than backwards compatability, Backwards compatability is essential, at least in the sense that there's *some* way to exactly reproduce results from earlier Python releases. The seed function in whrandom.py was very weak (for reasons explained in random.py), and I did make that incompatible when improving it, but also introduced whseed so that there was *some* way to reproduce older results bit-for-bit. I've gotten exactly one complaint about the incompatibility since then. Note that new generators are intended to be introduced via sublassing of Random: # Specific to Wichmann-Hill generator. Subclasses wishing to use a # different core generator should override the seed(), random(), # getstate(), setstate() and jumpahead() methods. I don't believe the jumpahead() method can be usefully implemented for the Twister. > it gives the ability to re-run a simulation with another generator to > assure that the result isn't a fluke related to a generator design flaw. That's very important in real life. Ivan Frohne's design (and alternative generators) should be considered here. > -- Document David Abrahams's link to > http://www.boost.org/boost/random/mersenne_twister.hpp as the > reference implementation and > http://www.math.keio.ac.jp/matumoto/emt.html as a place for > more information. Key-off of the MT19337 version as the most > recent stable evolution. I'd simply use the authors' source code. You don't get bonus points for ripping out layers of C++ templates someone else gilded the lily under . > -- Move the existing in-module test-suite into a unittest. Easier said than done. The test suite doesn't do anything except print results -- it has no intelligence about whether the results are good or bad. It was expected that a bona fide expert would stare at the output. > Add a new, separate unittest suite with tests specific to MT (recreating > a few sequences produced by reference implementations) I doubt you need a distinct test file for that. > and with a battery of Knuth style tests. They're far behind the current state of the art, and, as Knuth mentions in an exercise, it's "term project" level effort to implement them even so. > The validation results are at: > http://www.math.keio.ac.jp/matumoto/ver991029.html > > -- When we're done, have a python link put on the Mersenne Twister > Home Page (the second link above). Yes! Cool idea. > -- Write, test and document the generator first. As opposed to what, eating dinner first ? > Afterwards, explore > techniques for creating multiple independent streams: > http://www.math.h.kyoto-u.ac.jp/~matumoto/RAND/DC/dc.html Agreed that should be delayed. From tim.one@comcast.net Thu Aug 29 22:23:37 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 29 Aug 2002 17:23:37 -0400 Subject: [Python-Dev] Mersenne Twister In-Reply-To: <200208291929.g7TJTL702611@odiug.zope.com> Message-ID: [Guido] > Here's an idea. > > class BaseRandom: implements the end user methods: randrange(), > choice(), normalvariate(), etc., except random(), which is an abstract > method, raising NotImplementedError. That's fine, and close to the intended way to extend this. BaseRandom should also leave as abstract seed(), getstate(), setstate() (but should implement __getstate__ and __setstate__ -- Random already does this correctly), and jumpahead(). > class WHRandom and class MersenneRandom: add the specific random > number generator implementation, as random(). seed, getstate, setstate and jumpahead are also initimately connected to the details of the base generator, so need also to be supplied by subclasses. > Random is an alias for the random generator class of the day, > currently MersenneRandom. I like that better than what we've got now -- I would like to say that Random may vary across releases, as the state of the art advances, so that naive users are far less likely to fool themselves. But users also need to force use of a specific generator at times, and this scheme caters to both. > Details: can MersenneRandom support jumpahead()? Yes, but I don't think *usefully*. That is, I doubt it could do it faster than calling the base random() N times. jumpahead() is easy to implement efficiently for linear congruential generators, and possible to implement efficiently for pure lagged Fibonacci generators, but that's about it. > Should it support whseed(), which is provided only for backwards > compatibility? It should not. whseed() shouldn't even be used by WHRandom users! It's solely for bit-for-bit reproducibility of an older and weaker scheme. > If someone pickles a Random instance with Python 2.2 and tries to > unpickle it with Python 2.3, this will fail, because (presumably) the > state for MersenneRandom is different from the state for WHRandom. That's in part why the concrete classes have to implement getstate() and setstate() appropriately. Note that every 2.2 Random pickle starts with the integer 1 (the then-- and now --value of Random.VERSION). That's enough clue so that all future Python versions can know which flavor of random pickle they're looking at. A Twister pickle should start with some none-1 integer. > Perhaps there should be a module-level call to make Random an alias > for WHRandom rather than for MersenneRandom. I suppose it would also have to replace all the other module globals appropriately. I'm thinking of the _inst = Random() seed = _inst.seed random = _inst.random uniform = _inst.uniform randint = _inst.randint choice = _inst.choice randrange = _inst.randrange shuffle = _inst.shuffle normalvariate = _inst.normalvariate lognormvariate = _inst.lognormvariate cunifvariate = _inst.cunifvariate expovariate = _inst.expovariate vonmisesvariate = _inst.vonmisesvariate gammavariate = _inst.gammavariate stdgamma = _inst.stdgamma gauss = _inst.gauss betavariate = _inst.betavariate paretovariate = _inst.paretovariate weibullvariate = _inst.weibullvariate getstate = _inst.getstate setstate = _inst.setstate jumpahead = _inst.jumpahead whseed = _inst.whseed cruft at the end here. From tim.one@comcast.net Thu Aug 29 22:39:52 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 29 Aug 2002 17:39:52 -0400 Subject: [Python-Dev] Re: Mersenne Twister In-Reply-To: <200208291846.g7TIkgl01270@odiug.zope.com> Message-ID: [Raymond] > -- Write, test and document the generator first. Afterwards, explore > techniques for creating multiple independent streams: > http://www.math.h.kyoto-u.ac.jp/~matumoto/RAND/DC/dc.html [Guido] > Isn't that trivial if you follow the WH implementation strategy which > stores all the state in a class instance? "Independent" has more than one meaning. The implementation meanings you have in mind (two instances of the generator don't share state, and nothing one does affects what the other does) are indeed trivially achieved via attaching all state to an instance. A different meaning of "independent" is "statistically uncorrelated", and that's more what the link is aiming at. It's never easy to get multiple, statistically independent streams. For example, using WH's jumpahead, it's possible that you pick a large number N, jumpahead(N) in one instance of a WH generator, and then the two streams turn out to be perfectly correlated. That's trivially so if you pick N to be a multiple of WH's period, but there are smaller values of N that also suffer. One reason to run a simulation multiple times with distinct generators is that it's pragmatically impossible to outguess all this stuff. Two generators are a sanity check; three can break the impasse when the first two deliver significantly different results; four is clearly excessive . From jeremy@alum.mit.edu Fri Aug 30 00:28:36 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Thu, 29 Aug 2002 19:28:36 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop Message-ID: <15726.44580.778481.454047@slothrop.zope.com> I noticed that one frequently executed line in the mainloop is testing whether either the ticker has dropped to 0 or if there are things_to_do. Would it be kosher to just drop the ticker to 0 whenever things_to_do is set to true? Then you'd only need to check the ticker each time through. Jeremy Index: ceval.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/ceval.c,v retrieving revision 2.332 diff -c -c -r2.332 ceval.c *** ceval.c 23 Aug 2002 14:11:35 -0000 2.332 --- ceval.c 29 Aug 2002 23:19:18 -0000 *************** *** 378,383 **** --- 378,384 ---- int Py_AddPendingCall(int (*func)(void *), void *arg) { + PyThreadState *tstate; static int busy = 0; int i, j; /* XXX Begin critical section */ *************** *** 395,400 **** --- 396,404 ---- pendingcalls[i].func = func; pendingcalls[i].arg = arg; pendinglast = j; + + tstate = PyThreadState_GET(); + tstate->ticker = 0; things_to_do = 1; /* Signal main loop */ busy = 0; /* XXX End critical section */ *************** *** 669,675 **** async I/O handler); see Py_AddPendingCall() and Py_MakePendingCalls() above. */ ! if (things_to_do || --tstate->ticker < 0) { tstate->ticker = tstate->interp->checkinterval; if (things_to_do) { if (Py_MakePendingCalls() < 0) { --- 673,679 ---- async I/O handler); see Py_AddPendingCall() and Py_MakePendingCalls() above. */ ! if (--tstate->ticker < 0) { tstate->ticker = tstate->interp->checkinterval; if (things_to_do) { if (Py_MakePendingCalls() < 0) { From guido@python.org Fri Aug 30 01:06:09 2002 From: guido@python.org (Guido van Rossum) Date: Thu, 29 Aug 2002 20:06:09 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: Your message of "Thu, 29 Aug 2002 19:28:36 EDT." <15726.44580.778481.454047@slothrop.zope.com> References: <15726.44580.778481.454047@slothrop.zope.com> Message-ID: <200208300006.g7U069O05891@pcp02138704pcs.reston01.va.comcast.net> > I noticed that one frequently executed line in the mainloop is testing > whether either the ticker has dropped to 0 or if there are > things_to_do. Would it be kosher to just drop the ticker to 0 whenever > things_to_do is set to true? Then you'd only need to check the ticker > each time through. I think not -- Py_AddPendingCall() is supposed to be called from interrupts and other low-level stuff, where you don't know whose thread state you would get. Too bad, it was a nice idea. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Fri Aug 30 02:23:54 2002 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 30 Aug 2002 13:23:54 +1200 (NZST) Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: <200208300006.g7U069O05891@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <200208300123.g7U1NsAr022995@kuku.cosc.canterbury.ac.nz> > > Would it be kosher to just drop the ticker to 0 whenever > > things_to_do is set to true? > > I think not -- Py_AddPendingCall() is supposed to be called from > interrupts and other low-level stuff, where you don't know whose > thread state you would get. Could you have just one ticker, instead of one per thread? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From skip@pobox.com Fri Aug 30 02:37:29 2002 From: skip@pobox.com (Skip Montanaro) Date: Thu, 29 Aug 2002 20:37:29 -0500 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: <200208300123.g7U1NsAr022995@kuku.cosc.canterbury.ac.nz> References: <200208300006.g7U069O05891@pcp02138704pcs.reston01.va.comcast.net> <200208300123.g7U1NsAr022995@kuku.cosc.canterbury.ac.nz> Message-ID: <15726.52313.734491.272985@gargle.gargle.HOWL> Greg> Could you have just one ticker, instead of one per thread? That would make ticker really count down checkinterval ticks. Also, of possible interest is this declaration and comment in longobject.c: static int ticker; /* XXX Could be shared with ceval? */ Any time C code would want to read or update ticker, it would have the GIL, right? Sounds like you could get away with a single ticker. The long int implementation appears to do just fine with only one... Skip From tim.one@comcast.net Fri Aug 30 02:51:20 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 29 Aug 2002 21:51:20 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: <200208300006.g7U069O05891@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Jeremy] > I noticed that one frequently executed line in the mainloop is testing > whether either the ticker has dropped to 0 or if there are > things_to_do. Would it be kosher to just drop the ticker to 0 whenever > things_to_do is set to true? Then you'd only need to check the ticker > each time through. [Guido] > I think not -- Py_AddPendingCall() is supposed to be called from > interrupts and other low-level stuff, where you don't know whose > thread state you would get. Too bad, it was a nice idea. Well ... does each tstate really need its own ticker? If that were a property of the interpreter instead ("number of ticks until it's time for this interpreter to switch threads"), shared across all threads running in that interpreter, then I bet the visible semantics would be much the same. The GIL is always held when the ticker is decremented or reset, so there's nothing non-deterministic in sharing it. From skip@pobox.com Fri Aug 30 03:02:43 2002 From: skip@pobox.com (Skip Montanaro) Date: Thu, 29 Aug 2002 21:02:43 -0500 Subject: [Python-Dev] single ticker patch Message-ID: <15726.53827.402861.875268@gargle.gargle.HOWL> Here's a single ticker patch: http://python.org/sf/602191 It also gets rid of the private ticker in longobject.c. Skip From tim.one@comcast.net Fri Aug 30 03:43:28 2002 From: tim.one@comcast.net (Tim Peters) Date: Thu, 29 Aug 2002 22:43:28 -0400 Subject: [Python-Dev] Mersenne Twister In-Reply-To: <002001c24f8e$48d27e60$83ec7ad1@othello> Message-ID: [Raymond Hettinger] > ... > The C code I saw is covered by a BSD license -- I don't > know if that's an issue or not. That's fine, provided it doesn't have the dreaded "advertising clause". I personally don't care whether it does -- it's the FSF that has bug up their butt about that one. I expect we'd have to reproduce their copyright notice in the docs somewhere; yup: 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. I think we *ought* to perform a similar courtesy for, e.g., the Tcl/Tk and zlib components shipped with the Python Windows installer too. > As for implementation difficulty or accuracy, the code is so short > and clear that there isn't a savings from re-using the C code. That isn't the point here. If you use Nishimura and Matsumoto's code as close to verbatim as possible, then that's the perfect answer to your earlier point: > On the minus side, random number generation is a much disputed > topic, occassionly requiring full disclosure of seeds and source. Nothing *could* be more fully disclosed than their source code: it's extremely well known to every worker in the field, and has gotten critical review from the smartest eyeballs in the world. From goodger@users.sourceforge.net Fri Aug 30 05:23:15 2002 From: goodger@users.sourceforge.net (David Goodger) Date: Fri, 30 Aug 2002 00:23:15 -0400 Subject: [Python-Dev] ANN: New PEP Format: reStructuredText Message-ID: With many thanks to Barry Warsaw for his help and patience, I am pleased to announce that a new format for PEPs (Python Enhancement Proposals) has been deployed. The new format is reStructuredText, a lightweight what-you-see-is-what-you-get plaintext markup syntax and parser component of the Docutils project. From the new PEP 12: ReStructuredText is offered as an alternative to plaintext PEPs, to allow PEP authors more functionality and expressivity, while maintaining easy readability in the source text. The processed HTML form makes the functionality accessible to readers: live hyperlinks, styled text, tables, images, and automatic tables of contents, among other advantages. The following PEPs have been marked up with reStructuredText: - PEP 12 -- Sample reStructuredText PEP Template (http://www.python.org/peps/pep-0012.html) - PEP 256 -- Docstring Processing System Framework (http://www.python.org/peps/pep-0256.html) - PEP 257 -- Docstring Conventions (http://www.python.org/peps/pep-0257.html) - PEP 258 -- Docutils Design Specification (http://www.python.org/peps/pep-0258.html) - PEP 287 -- reStructuredText Docstring Format (http://www.python.org/peps/pep-0287.html) - PEP 290 -- Code Migration and Modernization (http://www.python.org/peps/pep-0290.html) In addition, the text of PEP 1 and PEP 9 has been revised. Authors of new PEPs are invited to consider using the new format, and authors of existing PEPs are invited to convert their PEPs to reStructuredText to take advantage of the many enhancements over the plaintext format. I, along with the other Docutils developers and users, will be happy to assist. Please send questions to: docutils-users@lists.sourceforge.net The latest project snapshot can always be downloaded from: http://docutils.sourceforge.net/docutils-snapshot.tgz (This is required to process the PEP source into HTML. It requires at least Python 2.0; Python 2.1 or later is recommended.) Docutils and reStructuredText are under active development. Input is very welcome, especially HTML rendering/stylesheet issues with different browsers. We welcome new contributors. If you'd like to get involved, please visit: http://docutils.sourceforge.net/ -- David Goodger Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ From goodger@users.sourceforge.net Fri Aug 30 05:23:42 2002 From: goodger@users.sourceforge.net (David Goodger) Date: Fri, 30 Aug 2002 00:23:42 -0400 Subject: [Python-Dev] PEP 12 -- Sample reStructuredText PEP Template Message-ID: This PEP presents an alternative format to that of PEP 9. Feedback is welcome. -- David Goodger PEP: 12 Title: Sample reStructuredText PEP Template Version: $Revision: 1.3 $ Last-Modified: $Date: 2002/08/30 04:11:20 $ Author: David Goodger , Barry A. Warsaw Status: Active Type: Informational Content-Type: text/x-rst Created: 05-Aug-2002 Post-History: 30-Aug-2002 Abstract ======== This PEP provides a boilerplate or sample template for creating your own reStructuredText PEPs. In conjunction with the content guidelines in PEP 1 [1]_, this should make it easy for you to conform your own PEPs to the format outlined below. Note: if you are reading this PEP via the web, you should first grab the text (reStructuredText) source of this PEP in order to complete the steps below. **DO NOT USE THE HTML FILE AS YOUR TEMPLATE!** To get the source of this (or any) PEP, look at the top of the HTML page and click on the link titled "PEP Source". If you would prefer not to use markup in your PEP, please see PEP 9, "Sample Plaintext PEP Template" [2]_. Rationale ========= PEP submissions come in a wide variety of forms, not all adhering to the format guidelines set forth below. Use this template, in conjunction with the format guidelines below, to ensure that your PEP submission won't get automatically rejected because of form. ReStructuredText is offered as an alternative to plaintext PEPs, to allow PEP authors more functionality and expressivity, while maintaining easy readability in the source text. The processed HTML form makes the functionality accessible to readers: live hyperlinks, styled text, tables, images, and automatic tables of contents, among other advantages. For an example of a PEP marked up with reStructuredText, see PEP 287. How to Use This Template ======================== To use this template you must first decide whether your PEP is going to be an Informational or Standards Track PEP. Most PEPs are Standards Track because they propose a new feature for the Python language or standard library. When in doubt, read PEP 1 for details or contact the PEP editors . Once you've decided which type of PEP yours is going to be, follow the directions below. - Make a copy of this file (``.txt`` file, **not** HTML!) and perform the following edits. - Replace the "PEP: 9" header with "PEP: XXX" since you don't yet have a PEP number assignment. - Change the Title header to the title of your PEP. - Leave the Version and Last-Modified headers alone; we'll take care of those when we check your PEP into CVS. - Change the Author header to include your name, and optionally your email address. Be sure to follow the format carefully: your name must appear first, and it must not be contained in parentheses. Your email address may appear second (or it can be omitted) and if it appears, it must appear in angle brackets. It is okay to obfuscate your email address. - If there is a mailing list for discussion of your new feature, add a Discussions-To header right after the Author header. You should not add a Discussions-To header if the mailing list to be used is either python-list@python.org or python-dev@python.org, or if discussions should be sent to you directly. Most Informational PEPs don't have a Discussions-To header. - Change the Status header to "Draft". - For Standards Track PEPs, change the Type header to "Standards Track". - For Informational PEPs, change the Type header to "Informational". - For Standards Track PEPs, if your feature depends on the acceptance of some other currently in-development PEP, add a Requires header right after the Type header. The value should be the PEP number of the PEP yours depends on. Don't add this header if your dependent feature is described in a Final PEP. - Change the Created header to today's date. Be sure to follow the format carefully: it must be in ``dd-mmm-yyyy`` format, where the ``mmm`` is the 3 English letter month abbreviation, i.e. one of Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec. - For Standards Track PEPs, after the Created header, add a Python-Version header and set the value to the next planned version of Python, i.e. the one your new feature will hopefully make its first appearance in. Do not use an alpha or beta release designation here. Thus, if the last version of Python was 2.2 alpha 1 and you're hoping to get your new feature into Python 2.2, set the header to:: Python-Version: 2.2 - Leave Post-History alone for now; you'll add dates to this header each time you post your PEP to python-list@python.org or python-dev@python.org. If you posted your PEP to the lists on August 14, 2001 and September 3, 2001, the Post-History header would look like:: Post-History: 14-Aug-2001, 03-Sept-2001 You must manually add new dates and check them in. If you don't have check-in privileges, send your changes to the PEP editors. - Add a Replaces header if your PEP obsoletes an earlier PEP. The value of this header is the number of the PEP that your new PEP is replacing. Only add this header if the older PEP is in "final" form, i.e. is either Accepted, Final, or Rejected. You aren't replacing an older open PEP if you're submitting a competing idea. - Now write your Abstract, Rationale, and other content for your PEP, replacing all this gobbledygook with your own text. Be sure to adhere to the format guidelines below, specifically on the prohibition of tab characters and the indentation requirements. - Update your References and Copyright section. Usually you'll place your PEP into the public domain, in which case just leave the Copyright section alone. Alternatively, you can use the `Open Publication License`__, but public domain is still strongly preferred. __ http://www.opencontent.org/openpub/ - Leave the Emacs stanza at the end of this file alone, including the formfeed character ("^L", or ``\f``). - Send your PEP submission to the PEP editors at peps@python.org. ReStructuredText PEP Formatting Requirements ============================================ The following is a PEP-specific summary of reStructuredText syntax. For the sake of simplicity and brevity, much detail is omitted. For more detail, see `Resources`_ below. `Literal blocks`_ (in which no markup processing is done) are used for examples throughout, to illustrate the plaintext markup. General ------- You must adhere to the Emacs convention of adding two spaces at the end of every sentence. You should fill your paragraphs to column 70, but under no circumstances should your lines extend past column 79. If your code samples spill over column 79, you should rewrite them. Tab characters must never appear in the document at all. A PEP should include the standard Emacs stanza included by example at the bottom of this PEP. Section Headings ---------------- PEP headings must begin in column zero and the initial letter of each word must be capitalized as in book titles. Acronyms should be in all capitals. Section titles must be adorned with an underline, a single repeated punctuation character, which begins in column zero and must extend at least as far as the right edge of the title text (4 characters minimum). First-level section titles are underlined with "=" (equals signs), second-level section titles with "-" (hyphens), and third-level section titles with "'" (single quotes or apostrophes). For example:: First-Level Title ================= Second-Level Title ------------------ Third-Level Title ''''''''''''''''' If there are more than three levels of sections in your PEP, you may insert overline/underline-adorned titles for the first and second levels as follows:: ============================ First-Level Title (optional) ============================ ----------------------------- Second-Level Title (optional) ----------------------------- Third-Level Title ================= Fourth-Level Title ------------------ Fifth-Level Title ''''''''''''''''' You shouldn't have more than five levels of sections in your PEP. If you do, you should consider rewriting it. You must use two blank lines between the last line of a section's body and the next section heading. If a subsection heading immediately follows a section heading, a single blank line in-between is sufficient. The body of each section is not normally indented, although some constructs do use indentation, as described below. Blank lines are used to separate constructs. Paragraphs ---------- Paragraphs are left-aligned text blocks separated by blank lines. Paragraphs are not indented unless they are part of an indented construct (such as a block quote or a list item). Inline Markup ------------- Portions of text within paragraphs and other text blocks may be styled. For example:: Text may be marked as *emphasized* (single asterisk markup, typically shown in italics) or **strongly emphasized** (double asterisks, typically boldface). ``Inline literals`` (using double backquotes) are typically rendered in a monospaced typeface. No further markup recognition is done within the double backquotes, so they're safe for any kind of code snippets. Block Quotes ------------ Block quotes consist of indented body elements. For example:: This is a paragraph. This is a block quote. A block quote may contain many paragraphs. Block quotes are used to quote extended passages from other sources. Block quotes may be nested inside other body elements. Use 4 spaces per indent level. Literal Blocks -------------- .. In the text below, double backquotes are used to denote inline literals. "``::``" is written so that the colons will appear in a monospaced font; the backquotes (``) are markup, not part of the text. See "Inline Markup" above. By the way, this is a comment, described in "Comments" below. Literal blocks are used for code samples or preformatted ASCII art. To indicate a literal block, preface the indented text block with "``::``" (two colons). The literal block continues until the end of the indentation. Indent the text block by 4 spaces. For example:: This is a typical paragraph. A literal block follows. :: for a in [5,4,3,2,1]: # this is program code, shown as-is print a print "it's..." # a literal block continues until the indentation ends The paragraph containing only "``::``" will be completely removed from the output; no empty paragraph will remain. "``::``" is also recognized at the end of any paragraph. If immediately preceded by whitespace, both colons will be removed from the output. When text immediately precedes the "``::``", *one* colon will be removed from the output, leaving only one colon visible (i.e., "``::``" will be replaced by "``:``"). For example, one colon will remain visible here:: Paragraph:: Literal block Lists ----- Bullet list items begin with one of "-", "*", or "+" (hyphen, asterisk, or plus sign), followed by whitespace and the list item body. List item bodies must be left-aligned and indented relative to the bullet; the text immediately after the bullet determines the indentation. For example:: This paragraph is followed by a list. * This is the first bullet list item. The blank line above the first list item is required; blank lines between list items (such as below this paragraph) are optional. * This is the first paragraph in the second item in the list. This is the second paragraph in the second item in the list. The blank line above this paragraph is required. The left edge of this paragraph lines up with the paragraph above, both indented relative to the bullet. - This is a sublist. The bullet lines up with the left edge of the text blocks above. A sublist is a new list so requires a blank line above and below. * This is the third item of the main list. This paragraph is not part of the list. Enumerated (numbered) list items are similar, but use an enumerator instead of a bullet. Enumerators are numbers (1, 2, 3, ...), letters (A, B, C, ...; uppercase or lowercase), or Roman numerals (i, ii, iii, iv, ...; uppercase or lowercase), formatted with a period suffix ("1.", "2."), parentheses ("(1)", "(2)"), or a right-parenthesis suffix ("1)", "2)"). For example:: 1. As with bullet list items, the left edge of paragraphs must align. 2. Each list item may contain multiple paragraphs, sublists, etc. This is the second paragraph of the second list item. a) Enumerated lists may be nested. b) Blank lines may be omitted between list items. Definition lists are written like this:: what Definition lists associate a term with a definition. how The term is a one-line phrase, and the definition is one or more paragraphs or body elements, indented relative to the term. Tables ------ Simple tables are easy and compact:: ===== ===== ======= A B A and B ===== ===== ======= False False False True False False False True False True True True ===== ===== ======= There must be at least two columns in a table (to differentiate from section titles). Column spans use underlines of hyphens ("Inputs" spans the first two columns):: ===== ===== ====== Inputs Output ------------ ------ A B A or B ===== ===== ====== False False False True False True False True True True True True ===== ===== ====== Text in a first-column cell starts a new row. No text in the first column indicates a continuation line; the rest of the cells may consist of multiple lines. For example:: ===== ========================= col 1 col 2 ===== ========================= 1 Second column of row 1. 2 Second column of row 2. Second line of paragraph. 3 - Second column of row 3. - Second item in bullet list (row 3, column 2). ===== ========================= Hyperlinks ---------- When referencing an external web page in the body of a PEP, you should include the title of the page in the text, with either an inline hyperlink reference to the URL or a footnote reference (see `Footnotes`_ below). Do not include the URL in the body text of the PEP. Hyperlink references use backquotes and a trailing underscore to mark up the reference text; backquotes are optional if the reference text is a single word. For example:: In this paragraph, we refer to the `Python web site`_. An explicit target provides the URL. Put targets in a References section at the end of the PEP, or immediately after the reference. Hyperlink targets begin with two periods and a space (the "explicit markup start"), followed by a leading underscore, the reference text, a colon, and the URL (absolute or relative):: .. _Python web site: http://www.python.org/ The reference text and the target text must match (although the match is case-insensitive and ignores differences in whitespace). Note that the underscore trails the reference text but precedes the target text. If you think of the underscore as a right-pointing arrow, it points *away* from the reference and *toward* the target. The same mechanism can be used for internal references. Every unique section title implicitly defines an internal hyperlink target. We can make a link to the Abstract section like this:: Here is a hyperlink reference to the `Abstract`_ section. The backquotes are optional since the reference text is a single word; we can also just write: Abstract_. Footnotes containing the URLs from external targets will be generated automatically at the end of the References section of the PEP, along with footnote references linking the reference text to the footnotes. Text of the form "PEP x" or "RFC x" (where "x" is a number) will be linked automatically to the appropriate URLs. Footnotes --------- Footnote references consist of a left square bracket, a number, a right square bracket, and a trailing underscore:: This sentence ends with a footnote reference [1]_. Whitespace must precede the footnote reference. Leave a space between the footnote reference and the preceding word. When referring to another PEP, include the PEP number in the body text, such as "PEP 1". The title may optionally appear. Add a footnote reference following the title. For example:: Refer to PEP 1 [2]_ for more information. Add a footnote that includes the PEP's title and author. It may optionally include the explicit URL on a separate line, but only in the References section. Footnotes begin with ".. " (the explicit markup start), followed by the footnote marker (no underscores), followed by the footnote body. For example:: References ========== .. [2] PEP 1, "PEP Purpose and Guidelines", Warsaw, Hylton (http://www.python.org/peps/pep-0001.html) If you decide to provide an explicit URL for a PEP, please use this as the URL template:: http://www.python.org/peps/pep-xxxx.html PEP numbers in URLs must be padded with zeros from the left, so as to be exactly 4 characters wide, however PEP numbers in the text are never padded. During the course of developing your PEP, you may have to add, remove, and rearrange footnote references, possibly resulting in mismatched references, obsolete footnotes, and confusion. Auto-numbered footnotes allow more freedom. Instead of a number, use a label of the form "#word", where "word" is a mnemonic consisting of alphanumerics plus internal hyphens, underscores, and periods (no whitespace or other characters are allowed). For example:: Refer to PEP 1 [#PEP-1]_ for more information. References ========== .. [#PEP-1] PEP 1, "PEP Purpose and Guidelines", Warsaw, Hylton http://www.python.org/peps/pep-0001.html Footnotes and footnote references will be numbered automatically, and the numbers will always match. Once a PEP is finalized, auto-numbered labels should be replaced by numbers for simplicity. Images ------ If your PEP contains a diagram, you may include it in the processed output using the "image" directive:: .. image:: diagram.png Any browser-friendly graphics format is possible: .png, .jpeg, .gif, .tiff, etc. Since this image will not be visible to readers of the PEP in source text form, you should consider including a description or ASCII art alternative, using a comment (below). Comments -------- A comment block is an indented block of arbitrary text immediately following an explicit markup start: two periods and whitespace. Leave the ".." on a line by itself to ensure that the comment is not misinterpreted as another explicit markup construct. Comments are not visible in the processed document. For the benefit of those reading your PEP in source form, please consider including a descriptions of or ASCII art alternatives to any images you include. For example:: .. image:: dataflow.png .. Data flows from the input module, through the "black box" module, and finally into (and through) the output module. The Emacs stanza at the bottom of this document is inside a comment. Escaping Mechanism ------------------ reStructuredText uses backslashes ("``\``") to override the special meaning given to markup characters and get the literal characters themselves. To get a literal backslash, use an escaped backslash ("``\\``"). There are two contexts in which backslashes have no special meaning: `literal blocks`_ and inline literals (see `Inline Markup`_ above). In these contexts, no markup recognition is done, and a single backslash represents a literal backslash, without having to double up. If you find that you need to use a backslash in your text, consider using inline literals or a literal block instead. Habits to Avoid =============== Many programmers who are familiar with TeX often write quotation marks like this:: `single-quoted' or ``double-quoted'' Backquotes are significant in reStructuredText, so this practice should be avoided. For ordinary text, use ordinary 'single-quotes' or "double-quotes". For inline literal text (see `Inline Markup`_ above), use double-backquotes:: ``literal text: in here, anything goes!`` Resources ========= Many other constructs and variations are possible. For more details about the reStructuredText markup, in increasing order of thoroughness, please see: * `A ReStructuredText Primer`__, a gentle introduction. __ http://docutils.sourceforge.net/docs/rst/quickstart.html * `Quick reStructuredText`__, a users' quick reference. __ http://docutils.sourceforge.net/docs/rst/quickref.html * `reStructuredText Markup Specification`__, the final authority. __ http://docutils.sourceforge.net/spec/rst/reStructuredText.html The processing of reStructuredText PEPs is done using Docutils_. If you have a question or require assistance with reStructuredText or Docutils, please `post a message`_ to the `Docutils-Users mailing list`_. The `Docutils project web site`_ has more information. .. _Docutils: http://docutils.sourceforge.net/ .. _post a message: mailto:docutils-users@lists.sourceforge.net?subject=PEPs .. _Docutils-Users mailing list: http://lists.sourceforge.net/lists/listinfo/docutils-users .. _Docutils project web site: http://docutils.sourceforge.net/ References ========== .. [1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton (http://www.python.org/peps/pep-0001.html) .. [2] PEP 9, Sample Plaintext PEP Template, Warsaw (http://www.python.org/peps/pep-0009.html) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: From goodger@users.sourceforge.net Fri Aug 30 05:58:34 2002 From: goodger@users.sourceforge.net (David Goodger) Date: Fri, 30 Aug 2002 00:58:34 -0400 Subject: [Python-Dev] Re: ANN: New PEP Format: reStructuredText In-Reply-To: Message-ID: Almost forgot. The Docutils package must be installed in order to use pep2html.py on the new-format PEPs, for python.org processing. There are instructions for getting and installing the Docutils package for this purpose here: http://www.python.org/peps/README.html This is the processed form of python/nondist/peps/README.txt in CVS. -- David Goodger Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ From oren-py-d@hishome.net Fri Aug 30 06:54:14 2002 From: oren-py-d@hishome.net (Oren Tirosh) Date: Fri, 30 Aug 2002 01:54:14 -0400 Subject: [Python-Dev] Mersenne Twister In-Reply-To: References: <002001c24f8e$48d27e60$83ec7ad1@othello> Message-ID: <20020830055414.GA87371@hishome.net> On Thu, Aug 29, 2002 at 10:43:28PM -0400, Tim Peters wrote: > [Raymond Hettinger] > > ... > > The C code I saw is covered by a BSD license -- I don't > > know if that's an issue or not. > > That's fine, provided it doesn't have the dreaded "advertising clause". I > personally don't care whether it does -- it's the FSF that has bug up their > butt about that one. I expect we'd have to reproduce their copyright notice The Mersenne Twister distribution is now under the Artistic License. http://www.math.keio.ac.jp/~matumoto/eartistic.html Oren From tdelaney@avaya.com Fri Aug 30 07:35:55 2002 From: tdelaney@avaya.com (Delaney, Timothy) Date: Fri, 30 Aug 2002 16:35:55 +1000 Subject: [Python-Dev] SF Bug #602245: os.popen() negative error code IOError Message-ID: I just submitted a bug at SF entitled 'os.popen() negative error code IOError'. However, not knowing SF too well, I've messed up the formatting of the test code, so here it is. When a negative return code is received by the os.popen() family, an IOError is raised when the last pipe from the process is closed. The following code demonstrates the problem: import sys import os import traceback import sys import os import traceback if __name__ == '__main__': if len(sys.argv) == 1: try: r = os.popen('%s %s %s' % (sys.executable, sys.argv[0], -1,)) r.close() except IOError: traceback.print_exc() try: w, r = os.popen2('%s %s %s' % (sys.executable, sys.argv[0], -1,)) w.close() r.close() except IOError: traceback.print_exc() try: w, r, e = os.popen3('%s %s %s' % (sys.executable, sys.argv[0], -1,)) w.close() r.close() e.close() except IOError: traceback.print_exc() else: sys.exit(int(sys.argv[1])) ---------- Run ---------- Traceback (most recent call last): File "Q:\Viper\src\webvis\tests\test.py", line 11, in ? r.close() IOError: (0, 'Error') Traceback (most recent call last): File "Q:\Viper\src\webvis\tests\test.py", line 18, in ? r.close() IOError: (0, 'Error') Traceback (most recent call last): File "Q:\Viper\src\webvis\tests\test.py", line 26, in ? e.close() IOError: (0, 'Error') Tim Delaney From Jack.Jansen@cwi.nl Fri Aug 30 09:18:26 2002 From: Jack.Jansen@cwi.nl (Jack Jansen) Date: Fri, 30 Aug 2002 10:18:26 +0200 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: <15726.52313.734491.272985@gargle.gargle.HOWL> Message-ID: <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> On Friday, August 30, 2002, at 03:37 , Skip Montanaro wrote: > > Greg> Could you have just one ticker, instead of one per thread? > > That would make ticker really count down checkinterval ticks. Also, of > possible interest is this declaration and comment in longobject.c: > > static int ticker; /* XXX Could be shared with ceval? */ > > Any time C code would want to read or update ticker, it would have the > GIL, > right? Not if the idea that lead to this thread (clearing ticker if something is put in things_to_do) is implemented, because we may be in an interrupt routine at the time we fiddle things_to_do. And I don't think we can be sure that even clearing is guaranteed to work (if another thread is halfway a load-decrement-store sequence the clear could be lost). -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From guido@python.org Fri Aug 30 13:33:29 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 30 Aug 2002 08:33:29 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: Your message of "Fri, 30 Aug 2002 10:18:26 +0200." <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> References: <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> Message-ID: <200208301233.g7UCXTB07136@pcp02138704pcs.reston01.va.comcast.net> [Jack] > And I don't think we can be sure that even clearing is guaranteed to > work (if another thread is halfway a load-decrement-store sequence > the clear could be lost). I think that pretty much kills the idea. has anybody checked whether it causes a measurable speedup? If not, I propose not to bother. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Fri Aug 30 13:38:52 2002 From: skip@pobox.com (Skip Montanaro) Date: Fri, 30 Aug 2002 07:38:52 -0500 Subject: [Python-Dev] Re: SF Bug #602245: os.popen() negative error code IOError In-Reply-To: References: Message-ID: <15727.26460.515083.862490@gargle.gargle.HOWL> Tim> I just submitted a bug at SF entitled 'os.popen() negative error Tim> code IOError'. However, not knowing SF too well, I've messed up the Tim> formatting of the test code, so here it is. It's actually formatted okay, if you save or view source you can cut out your test pretty easily. Still, anytime you want to include more than two or three lines of code, you're going to be much better off attaching the code (which I just did). -- Skip Montanaro skip@pobox.com consulting: http://manatee.mojam.com/~skip/resume.html From skip@pobox.com Fri Aug 30 14:59:04 2002 From: skip@pobox.com (Skip Montanaro) Date: Fri, 30 Aug 2002 08:59:04 -0500 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> References: <15726.52313.734491.272985@gargle.gargle.HOWL> <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> Message-ID: <15727.31272.80804.453415@gargle.gargle.HOWL> Skip> Any time C code would want to read or update ticker, it would have Skip> the GIL, right? Jack> Not if the idea that lead to this thread (clearing ticker if Jack> something is put in things_to_do) is implemented, because we may Jack> be in an interrupt routine at the time we fiddle things_to_do. Jack> And I don't think we can be sure that even clearing is guaranteed Jack> to work (if another thread is halfway a load-decrement-store Jack> sequence the clear could be lost). Hmm... I guess you lost me. The code that fiddles the ticker in ceval.c clearly operates while the GIL is held. I think the code in sysmodule.c that updates the checkinterval works under that assumption as well. The other ticker in longobject.c I'm not so sure about. The patch I submitted doesn't implement the ticker clear that Jeremy originally suggested. It just pulls the ticker and the checkinterval out of the thread state and makes them two globals. They are both manipulated in otherwise the same way. Skip From guido@python.org Fri Aug 30 15:13:52 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 30 Aug 2002 10:13:52 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: Your message of "Fri, 30 Aug 2002 08:59:04 CDT." <15727.31272.80804.453415@gargle.gargle.HOWL> References: <15726.52313.734491.272985@gargle.gargle.HOWL> <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> <15727.31272.80804.453415@gargle.gargle.HOWL> Message-ID: <200208301413.g7UEDqZ07890@pcp02138704pcs.reston01.va.comcast.net> > Skip> Any time C code would want to read or update ticker, it would have > Skip> the GIL, right? > > Jack> Not if the idea that lead to this thread (clearing ticker if > Jack> something is put in things_to_do) is implemented, because we may > Jack> be in an interrupt routine at the time we fiddle things_to_do. > > Jack> And I don't think we can be sure that even clearing is guaranteed > Jack> to work (if another thread is halfway a load-decrement-store > Jack> sequence the clear could be lost). > > Hmm... I guess you lost me. The code that fiddles the ticker in ceval.c > clearly operates while the GIL is held. I think the code in sysmodule.c > that updates the checkinterval works under that assumption as well. The > other ticker in longobject.c I'm not so sure about. > > The patch I submitted doesn't implement the ticker clear that Jeremy > originally suggested. It just pulls the ticker and the checkinterval out of > the thread state and makes them two globals. They are both manipulated in > otherwise the same way. > > Skip Yeah, but the whole *point* would be to save an extra test and (rarely-taken jump) by allowing Jeremy's suggestion to be implemented. Otherwise I don't see much advantage to the patch (or do you see a speed-up?). --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com Fri Aug 30 15:29:06 2002 From: skip@pobox.com (Skip Montanaro) Date: Fri, 30 Aug 2002 09:29:06 -0500 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: <200208301413.g7UEDqZ07890@pcp02138704pcs.reston01.va.comcast.net> References: <15726.52313.734491.272985@gargle.gargle.HOWL> <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> <15727.31272.80804.453415@gargle.gargle.HOWL> <200208301413.g7UEDqZ07890@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15727.33074.324120.988215@gargle.gargle.HOWL> Guido> Yeah, but the whole *point* would be to save an extra test and Guido> (rarely-taken jump) by allowing Jeremy's suggestion to be Guido> implemented. Otherwise I don't see much advantage to the patch Guido> (or do you see a speed-up?). Haven't tested it for speed. You do save 8 bytes per thread state instance though... S From guido@python.org Fri Aug 30 15:29:52 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 30 Aug 2002 10:29:52 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: Your message of "Fri, 30 Aug 2002 09:29:06 CDT." <15727.33074.324120.988215@gargle.gargle.HOWL> References: <15726.52313.734491.272985@gargle.gargle.HOWL> <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> <15727.31272.80804.453415@gargle.gargle.HOWL> <200208301413.g7UEDqZ07890@pcp02138704pcs.reston01.va.comcast.net> <15727.33074.324120.988215@gargle.gargle.HOWL> Message-ID: <200208301429.g7UETqQ08033@pcp02138704pcs.reston01.va.comcast.net> > Haven't tested it for speed. You do save 8 bytes per thread state > instance though... I don't care about the thread state size. If it could speed Python up by 5% I'd gladly add a kilobyte to each thread state... --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Fri Aug 30 15:35:23 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Fri, 30 Aug 2002 10:35:23 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: <200208301429.g7UETqQ08033@pcp02138704pcs.reston01.va.comcast.net> References: <15726.52313.734491.272985@gargle.gargle.HOWL> <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> <15727.31272.80804.453415@gargle.gargle.HOWL> <200208301413.g7UEDqZ07890@pcp02138704pcs.reston01.va.comcast.net> <15727.33074.324120.988215@gargle.gargle.HOWL> <200208301429.g7UETqQ08033@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15727.33451.698048.657655@slothrop.zope.com> The difference I saw with only the ticker check in ceval was only about 1% for pystone. Python was always faster with the change, but never by much. Jeremy From tim.one@comcast.net Fri Aug 30 15:47:36 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 30 Aug 2002 10:47:36 -0400 Subject: [Python-Dev] Mersenne Twister In-Reply-To: <20020830055414.GA87371@hishome.net> Message-ID: [Oren Tirosh[] > The Mersenne Twister distribution is now under the Artistic License. > > http://www.math.keio.ac.jp/~matumoto/eartistic.html That's out of date. When they updated the code earlier this year to fix the initialization weakness, they also adpoted a new license: MT19337 with initialization improved 2002/1/26 The initialization scheme of older versions of MT has a (small) problem, that MSBs are not well reflected to the state vector. Here is the latest version of initialization scheme, which we consider the newest standard. An initialization routine using an array of seeds is also available. We adopted BSD-license which we think most flexible, so this code is freely usable. That's much less problematic than the Artistic License (which Stallman holds in contempt; his objection to BSD+advertising_clause is so technical it's hard to give a damn). From mwh@python.net Fri Aug 30 16:06:06 2002 From: mwh@python.net (Michael Hudson) Date: 30 Aug 2002 16:06:06 +0100 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: Jeremy Hylton's message of "Fri, 30 Aug 2002 10:35:23 -0400" References: <15726.52313.734491.272985@gargle.gargle.HOWL> <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> <15727.31272.80804.453415@gargle.gargle.HOWL> <200208301413.g7UEDqZ07890@pcp02138704pcs.reston01.va.comcast.net> <15727.33074.324120.988215@gargle.gargle.HOWL> <200208301429.g7UETqQ08033@pcp02138704pcs.reston01.va.comcast.net> <15727.33451.698048.657655@slothrop.zope.com> Message-ID: <2m3csw5qu9.fsf@starship.python.net> Jeremy Hylton writes: > The difference I saw with only the ticker check in ceval was only > about 1% for pystone. Python was always faster with the change, but > never by much. A bunch of 0.5% improvements add up. If there's not much cost in complexity, why not go for it? Cheers, M. -- 6. Symmetry is a complexity-reducing concept (co-routines include subroutines); seek it everywhere. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From tim.one@comcast.net Fri Aug 30 16:20:17 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 30 Aug 2002 11:20:17 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> Message-ID: [Jack Jansen] > Not if the idea that lead to this thread (clearing ticker if something > is put in things_to_do) is implemented, because we may be in an > interrupt routine at the time we fiddle things_to_do. > > And I don't think we can be sure that even clearing is guaranteed to > work (if another thread is halfway a load-decrement-store sequence the > clear could be lost). So long as the ticker is declared volatile, the odds of setting ticker to 0 in Py_AddPendingCall during a "bad time" for --ticker are small, a window of a couple machine instructions. Ticker will eventually go to 0 regardless. It's not like things_to_do isn't ignored for "long" stretches of time now either: Py_MakePendingCalls returns immediately unless the thread with the GIL just happens to be the main thread. Even if it is the main thread, there's another race there with some non-main thread happening to call Py_AddPendingCall at the same time. Opening another hole of a couple machine instructions shouldn't make much difference, although Py_MakePendingCalls should also be changed then to reset ticker to 0 in its "early exit because the coincidences I'm relying on haven't happened yet" cases. From guido@python.org Fri Aug 30 16:22:01 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 30 Aug 2002 11:22:01 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: Your message of "Fri, 30 Aug 2002 11:20:17 EDT." References: Message-ID: <200208301522.g7UFM1F09201@pcp02138704pcs.reston01.va.comcast.net> > So long as the ticker is declared volatile, the odds of setting ticker to 0 > in Py_AddPendingCall during a "bad time" for --ticker are small, a window of > a couple machine instructions. Ticker will eventually go to 0 regardless. > It's not like things_to_do isn't ignored for "long" stretches of time now > either: Py_MakePendingCalls returns immediately unless the thread with the > GIL just happens to be the main thread. Even if it is the main thread, > there's another race there with some non-main thread happening to call > Py_AddPendingCall at the same time. Good point. (Though some apps set the check interval to 1000; well, that would still be fast enough.) > Opening another hole of a couple machine instructions shouldn't make much > difference, although Py_MakePendingCalls should also be changed then to > reset ticker to 0 in its "early exit because the coincidences I'm relying on > haven't happened yet" cases. OK, let's try it then. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@comcast.net Fri Aug 30 16:24:20 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 30 Aug 2002 11:24:20 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: <2m3csw5qu9.fsf@starship.python.net> Message-ID: [Jeremy Hylton] > The difference I saw with only the ticker check in ceval was only > about 1% for pystone. Python was always faster with the change, but > never by much. [Michael Hudson] > A bunch of 0.5% improvements add up. If there's not much cost in > complexity, why not go for it? There isn't, and we should . I'd do it even if it slowed things by 1%: reducing the test+branch count on the critical path will *eventually* pay off. The SET_LINENO removal worked in the other direction, and that proved a timing mini-disaster under MSVC6. Doing even random things in the right direction may very well nudge MSVC6 back into the local minimum it got knocked out of. From sholden@holdenweb.com Fri Aug 30 16:35:32 2002 From: sholden@holdenweb.com (Steve Holden) Date: Fri, 30 Aug 2002 11:35:32 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop References: <15726.52313.734491.272985@gargle.gargle.HOWL> <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> <15727.31272.80804.453415@gargle.gargle.HOWL> <200208301413.g7UEDqZ07890@pcp02138704pcs.reston01.va.comcast.net> <15727.33074.324120.988215@gargle.gargle.HOWL> <200208301429.g7UETqQ08033@pcp02138704pcs.reston01.va.comcast.net> <15727.33451.698048.657655@slothrop.zope.com> <2m3csw5qu9.fsf@starship.python.net> Message-ID: <055301c2503a$e1cfea60$6300000a@holdenweb.com> > Jeremy Hylton writes: > > > The difference I saw with only the ticker check in ceval was only > > about 1% for pystone. Python was always faster with the change, but > > never by much. > > A bunch of 0.5% improvements add up. If there's not much cost in > complexity, why not go for it? > Yeah, right, we just need 200 of them and we're laughing. Computation in infinitesimal time. suddenly-dwim-mode-is-possib-ly y'rs - steve ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming pydish.holdenweb.com/pwp/ Previous .sig file retired to www.homeforoldsigs.com ----------------------------------------------------------------------- From skip@pobox.com Fri Aug 30 16:41:49 2002 From: skip@pobox.com (Skip Montanaro) Date: Fri, 30 Aug 2002 10:41:49 -0500 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: <200208301522.g7UFM1F09201@pcp02138704pcs.reston01.va.comcast.net> References: <200208301522.g7UFM1F09201@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15727.37437.794845.391785@12-248-14-26.client.attbi.com> >> Opening another hole of a couple machine instructions shouldn't make >> much difference, although Py_MakePendingCalls should also be changed >> then to reset ticker to 0 in its "early exit because the coincidences >> I'm relying on haven't happened yet" cases. Guido> OK, let's try it then. You mean I just wasted my time running pystones? ;-) Just the same, here's the output, after and before. For each setting, I ran pystones twice manually, then the three reported times: with patch: Pystone(1.1) time for 50000 passes = 7.52 This machine benchmarks at 6648.94 pystones/second Pystone(1.1) time for 50000 passes = 7.51 This machine benchmarks at 6657.79 pystones/second Pystone(1.1) time for 50000 passes = 7.5 This machine benchmarks at 6666.67 pystones/second without patch: Pystone(1.1) time for 50000 passes = 7.69 This machine benchmarks at 6501.95 pystones/second Pystone(1.1) time for 50000 passes = 7.68 This machine benchmarks at 6510.42 pystones/second Pystone(1.1) time for 50000 passes = 7.67 This machine benchmarks at 6518.9 pystones/second I was quite surprised at the difference. Someone definitely should check this. The patch is at http://python.org/sf/602191 My guess is that the code is avoiding a lot of pointer dereferences. Oh, wait a minute. I muffed a bit. I initialized the ticker and checkinterval variables to 100. Should have been 10. ... a short time passes while Skip thanks God he's not rebuilding VTK ... With _Py_CheckInterval set to 10 it's still not too shabby: Pystone(1.1) time for 50000 passes = 7.57 This machine benchmarks at 6605.02 pystones/second Pystone(1.1) time for 50000 passes = 7.56 This machine benchmarks at 6613.76 pystones/second Pystone(1.1) time for 50000 passes = 7.55 This machine benchmarks at 6622.52 pystones/second This is still without Jeremy's suggested change. apples-and-oranges-ly, y'rs, Skip From mwh@python.net Fri Aug 30 17:13:14 2002 From: mwh@python.net (Michael Hudson) Date: 30 Aug 2002 17:13:14 +0100 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: "Steve Holden"'s message of "Fri, 30 Aug 2002 11:35:32 -0400" References: <15726.52313.734491.272985@gargle.gargle.HOWL> <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> <15727.31272.80804.453415@gargle.gargle.HOWL> <200208301413.g7UEDqZ07890@pcp02138704pcs.reston01.va.comcast.net> <15727.33074.324120.988215@gargle.gargle.HOWL> <200208301429.g7UETqQ08033@pcp02138704pcs.reston01.va.comcast.net> <15727.33451.698048.657655@slothrop.zope.com> <2m3csw5qu9.fsf@starship.python.net> <055301c2503a$e1cfea60$6300000a@holdenweb.com> Message-ID: <2mfzwwiaud.fsf@starship.python.net> "Steve Holden" writes: > > A bunch of 0.5% improvements add up. If there's not much cost in > > complexity, why not go for it? > > > > Yeah, right, we just need 200 of them and we're laughing. Computation in > infinitesimal time. Multiply up doesn't have the same ring to it, does it? Cheers, M. -- I don't have any special knowledge of all this. In fact, I made all the above up, in the hope that it corresponds to reality. -- Mark Carroll, ucam.chat From tim.one@comcast.net Fri Aug 30 17:12:24 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 30 Aug 2002 12:12:24 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: <15727.37437.794845.391785@12-248-14-26.client.attbi.com> Message-ID: [Skip Montanaro] > ... > My guess is that the code is avoiding a lot of pointer dereferences. > Oh, wait a minute. I muffed a bit. I initialized the ticker and > checkinterval variables to 100. Should have been 10. Someone may wish to question the historical 10 too. A few weeks ago on c.l.py, a number of programs were posted showing that, on Linux, the thread scheduling is such the the *offer* to switch threads every 10 bytecodes was usually declined: the thread that got the GIL was overwhelmingly most often the thread that released it, so that the whole dance was overwhelmingly most often pure overhead. This may be different under 2.3, where the pthreads GIL is implemented via a semaphore rather than a condvar. But in that case, actually switching threads every 10 bytecodes is an awful lot of thread switching (10 bytecodes don't take as long as they used to ). I don't know how to pick a good "one size fits all" value, but suspect 10 is "clearly too small". In app after app, people who discover sys.setcheckinterval() discover soon after that performance improves if they increase it. From guido@python.org Fri Aug 30 17:16:07 2002 From: guido@python.org (Guido van Rossum) Date: Fri, 30 Aug 2002 12:16:07 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: Your message of "Fri, 30 Aug 2002 12:12:24 EDT." References: Message-ID: <200208301616.g7UGG7F09580@pcp02138704pcs.reston01.va.comcast.net> > Someone may wish to question the historical 10 too. A few weeks ago > on c.l.py, a number of programs were posted showing that, on Linux, the > thread scheduling is such the the *offer* to switch threads every 10 > bytecodes was usually declined: the thread that got the GIL was > overwhelmingly most often the thread that released it, so that the whole > dance was overwhelmingly most often pure overhead. This may be different > under 2.3, where the pthreads GIL is implemented via a semaphore rather than > a condvar. But in that case, actually switching threads every 10 bytecodes > is an awful lot of thread switching (10 bytecodes don't take as long as they > used to ). > > I don't know how to pick a good "one size fits all" value, but suspect 10 is > "clearly too small". In app after app, people who discover > sys.setcheckinterval() discover soon after that performance improves if they > increase it. Let's try 100 and see how that works. --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@alum.mit.edu Fri Aug 30 16:50:52 2002 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Fri, 30 Aug 2002 11:50:52 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: <055301c2503a$e1cfea60$6300000a@holdenweb.com> References: <15726.52313.734491.272985@gargle.gargle.HOWL> <0ED9227E-BBF1-11D6-B9DE-0030655234CE@cwi.nl> <15727.31272.80804.453415@gargle.gargle.HOWL> <200208301413.g7UEDqZ07890@pcp02138704pcs.reston01.va.comcast.net> <15727.33074.324120.988215@gargle.gargle.HOWL> <200208301429.g7UETqQ08033@pcp02138704pcs.reston01.va.comcast.net> <15727.33451.698048.657655@slothrop.zope.com> <2m3csw5qu9.fsf@starship.python.net> <055301c2503a$e1cfea60$6300000a@holdenweb.com> Message-ID: <15727.37980.395003.384558@slothrop.zope.com> >>>>> "SH" == Steve Holden writes: >> Jeremy Hylton writes: >> >> > The difference I saw with only the ticker check in ceval was >> > only about 1% for pystone. Python was always faster with the >> > change, but never by much. >> >> A bunch of 0.5% improvements add up. If there's not much cost in >> complexity, why not go for it? >> SH> Yeah, right, we just need 200 of them and we're SH> laughing. Computation in infinitesimal time. I think Xeno's laughing already. Jeremy From tim.one@comcast.net Fri Aug 30 17:30:55 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 30 Aug 2002 12:30:55 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: <20020828194248.GA16407@cthulhu.gerg.ca> Message-ID: [Tim] > What's an acceptable false positive rate? [Greg Ward] > Speaking as one of the people who reviews suspected spam for python.org > and rescues false positives, I would say that the more relevant figure > is: how much suspected spam do I have to review every morning? < 10 > messages would be peachy; right now it's around 5-20 messages per day. I must be missing something. I would *hope* that you review *all* messages claimed to be spam, in which case the number of msgs to be reviewed would, in a perfectly accurate system, be equal to the number of spams received. OTOH, the false positive rate doesn't have anything to do with the number of spams received, it has to do with the number of non-spams received. > Currently there are probably 1-3 FPs per day, although on a bad day > there can be 5-10. (Eg. on 2002-08-21, six mailman-users posts from the > same guy were all caught, mainly because his ISP added X-AntiAbuse, and > his messages were multipart/alternative with unwrapped plain text. This > is a perfect example of SpamAssassin screwing up royally.) 1-3 FPs/day > I can live with, but the real burden is the manual review: I'd much > rather have 5 FPs in a pool of 10 suspects than 1 FP out of 100 > suspects. Maybe you don't want this kind of approach at all. The classifier doesn't have "gray areas" in practice: it tends to give probabilites near 1, or near 0, and there's very little in between -- a msg either has a preponderance of spam indicators, or a preponderance of non-spam indicators. You're simply not going to get a batch of "hmm, I'm not really sure about these" out of it. You would in a conventional Bayesian classifer, but Graham's ignores almost all of the words, judging on only the most extreme words present; when only extremes are fed in, the final result also tends to be extreme (the only cases where that doesn't obtain are those where the most extreme words it finds aren't extreme at all; e.g., a msg consisting entirely of "the", "and" and "it" would get rated as 0.5). >> What do we get from SpamAssassin? > Recall the stats I posted this morning; the bulk of spam is in Chinese > or Korean, and I have things setup so SpamAssassin never even sees it. > I think the only way to meaningfully answer this question is to stash > *everything* mail.python.org receives for a day or 10, spam and > otherwise, and run it all through SA. It would be good to have such a corpus regardless. From tim.one@comcast.net Fri Aug 30 17:45:44 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 30 Aug 2002 12:45:44 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: <15725.37311.970263.211518@12-248-11-90.client.attbi.com> Message-ID: [Skip Montanaro] > ... > One thing I think would be worthwhile would be to run GBayes first, then > only run stuff it thought was spam through SpamAssassin. Only > messages that both systems categorized as spam would drop into the spam > folder. This has a couple benefits over running one or the other in > isolation: > > * The training set for GBayes probably doesn't need to be as big Training GBayes is cheap, and the more you feed it the less need to do information-destroying transformations (like folding case or ignoring punctuation). > * The two systems use substantially different approaches to > identifying spam, Which could indeed be a killer-strong benefit. > so I suspect your false positive rate would go way down. I'm already having a real problem with this just looking at content: the false positive rate is already so low that I can't make statistically significant conclusions about things that may improve it (e.g., if I do something that removes just *one* false positive in a test run on 4000 hams, the false-positive rate falls by 12.5% -- I don't have enough false positives to make fine-grained judgments. And, indeed, every time I test a change to the algorithm, the most *significant* thing I find is that it turns up another class of blatant spam hiding in the ham corpus: my training data is still too dirty, and cleaning it up is labor-intensive). > False negatives would go up, but only testing can suggest by how > much. > > * Since SA is dog slow most of the time, SA users get a big speedup, > since a substantially smaller fraction of your messages get run > through it. > > This sort of chaining is pretty trivial to setup with procmail. > Dunno what the Windows set will do though. There are different audiences here. Greg is keen to have a better approach for python.org as a whole, while Barry is keen about that and about doing something more generic for Mailman. Windows isn't an issue for either of those. Everyone else can eat cake . From yahoo-mail.20.johnw@antichef.com Fri Aug 30 18:10:26 2002 From: yahoo-mail.20.johnw@antichef.com (John Williams) Date: Fri, 30 Aug 2002 10:10:26 -0700 (PDT) Subject: [Python-Dev] alternate reiter proposal Message-ID: <20020830171026.4738.qmail@web11306.mail.yahoo.com> Hi, this is my first post, so go easy on me! :-) I got this idea from the "cogen" discussion, seeing how the lack of a reliable re-iterability test makes it hard to write lazy multi-pass algorithms. Rather than (1) assuming iterators are re-iterable, (2) "forcing" iterators to be re-iterable by eagerly converting them to lists, or (3) trying to heuristically guess whether an iterator is re-iterable, why not combine the best of (1) and (2) while avoiding (3) entirely? This class will lazily convert an interator to a list on the first pass and then iterate over the saved list on all subsequent passes. class reiter(object): def __init__(self, iterable): self.iterator = iter(iterable) self.cache = [] def __iter__(self): if self.iterator is None: return iter(self.cache) else: return self def next(self): try: element = self.iterator.next() self.cache.append(element) return element except StopIteration: self.iterator = None raise __________________________________________________ Do You Yahoo!? Yahoo! Finance - Get real-time stock quotes http://finance.yahoo.com From ark@research.att.com Fri Aug 30 18:29:50 2002 From: ark@research.att.com (Andrew Koenig) Date: 30 Aug 2002 13:29:50 -0400 Subject: [Python-Dev] alternate reiter proposal In-Reply-To: <20020830171026.4738.qmail@web11306.mail.yahoo.com> References: <20020830171026.4738.qmail@web11306.mail.yahoo.com> Message-ID: John> This class will lazily convert an interator to a list on the first pass John> and then iterate over the saved list on all subsequent passes. John> class reiter(object): John> def __init__(self, iterable): John> self.iterator = iter(iterable) John> self.cache = [] John> def __iter__(self): John> if self.iterator is None: John> return iter(self.cache) John> else: John> return self John> def next(self): John> try: John> element = self.iterator.next() John> self.cache.append(element) John> return element John> except StopIteration: John> self.iterator = None John> raise Maybe I'm missing something here, but doesn't this fail if you try to restart it before it has entirely consumed the input? -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark From tim.one@comcast.net Fri Aug 30 18:41:24 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 30 Aug 2002 13:41:24 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: <200208301616.g7UGG7F09580@pcp02138704pcs.reston01.va.comcast.net> Message-ID: [Tim] >> I don't know how to pick a good "one size fits all" value, but >> suspect 10 is "clearly too small". In app after app, people who >> discover sys.setcheckinterval() discover soon after that performance >> improves if they increase it. [Guido] > Let's try 100 and see how that works. +1 here. Skip, you want to fold that into your patch? From skip@pobox.com Fri Aug 30 19:26:46 2002 From: skip@pobox.com (Skip Montanaro) Date: Fri, 30 Aug 2002 13:26:46 -0500 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: References: <200208301616.g7UGG7F09580@pcp02138704pcs.reston01.va.comcast.net> Message-ID: <15727.47334.411424.349071@12-248-14-26.client.attbi.com> Guido> Let's try 100 and see how that works. Tim> +1 here. Skip, you want to fold that into your patch? Done. S From yahoo-mail.20.johnw@antichef.com Fri Aug 30 20:07:26 2002 From: yahoo-mail.20.johnw@antichef.com (John Williams) Date: Fri, 30 Aug 2002 12:07:26 -0700 (PDT) Subject: [Python-Dev] alternate reiter proposal In-Reply-To: Message-ID: <20020830190726.81036.qmail@web11303.mail.yahoo.com> --- "Andrew Koenig - ark@research.att.com" wrote: > Maybe I'm missing something here, but doesn't this fail if you > try to restart it before it has entirely consumed the input? I guess that depends on what your expectations are--calling it "reiter" was probably not a good idea in that respect. I was really just trying to solve the problem of implementing the cartesian product function lazily and got a little ahead of myself proposing a solution to a special case as a general-purpose solution. I'd like to develop the basic idea into something with wider applicability, since I'm fond of anything that makes it easier to implement lazy algorthms. Somebody speak up if you think it's worthwhile, otherwise I think I'll just let it drop since I'm fairly certain *I* won't be needing it anytime soon. __________________________________________________ Do You Yahoo!? Yahoo! Finance - Get real-time stock quotes http://finance.yahoo.com From tim.one@comcast.net Fri Aug 30 20:41:51 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 30 Aug 2002 15:41:51 -0400 Subject: [Python-Dev] Mining URLs for spam detection In-Reply-To: Message-ID: I've gotten interesting results from this gimmick: import re url_re = re.compile(r"http://([^\s>'\"\x7f-\xff]+)", re.IGNORECASE) urlfield_re = re.compile(r"[;?:@&=+,$.]") def tokenize_url(string): for url in url_re.findall(string): for i, piece in enumerate(url.lower().split('/')): prefix = "url%d:" % i for chunk in urlfield_re.split(piece): yield prefix + chunk ... (and then do other tokenization) ... So it splits a case-normalized http thingie via /, tags the first piece "url0:", the second "url1:", and so on. Within each piece, it splits on separators, like '=' and '.'. Two particular tokens generated this way then made it into the list of 15 words that most often survived to the end of the scoring step: url0:python as a strong non-spam indicator url1:remove as a strong spam indicator The rest of the tokenization was unchanged, still doing MIME-ignorant splitting on whitespace. Just the http gimmick was added, and that alone cut the false negative rate in half. IOW, there's a *lot* of valuable info in the http thingies! Not being a Web Guy, I'm not sure how to extract the most info from it. If you've got suggestions for a better URL tagging strategy, I'd love to hear them. Cute: If I tokenize *only* the http thingies, ignoring all other parts of the text, the false positive rate is about 1%. This is because most legit msgs don't have any http thingies, so they get classified correctly as ham (no tokens at all are generated for them). This caught at least one spam in the ham corpus (a bogus "false positive"): Data/Ham/Set2/8695.txt prob = 0.999997392672 prob('url0:240') = 0.2 prob('url1:') = 0.612567 prob('url0:250') = 0.99 prob('url0:225') = 0.99 prob('url0:207') = 0.99 Sweet XXX! http://207.240.225.250/ II33bp-] An example of a real false positive was due to /F including this URL: http://w1.132.telia.com/~u13208596/temp/py15-980706.zip Oddly enough, prob('url0:132') = 0.99 prob('url0:telia') = 0.99 so there was significant spam with "132" and "telia" in the first field of an http thingie. The false negative rate when tokenizing only http thingies zoomed to over 30%. Curiously, the best way for a spam to evade this check is *not* to disguise itself with numeric IPs. Numbers end up looking suspicious. But, e.g., this looks netural: http://shocking-incest.com prob('url0:com') = 0.658328 and it never saw "shocking-incest" before. From Jack.Jansen@oratrix.com Fri Aug 30 21:21:27 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Fri, 30 Aug 2002 22:21:27 +0200 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: Message-ID: <0FDA1EC8-BC56-11D6-8FBD-003065517236@oratrix.com> On vrijdag, augustus 30, 2002, at 06:12 , Tim Peters wrote: > [Skip Montanaro] >> ... >> My guess is that the code is avoiding a lot of pointer dereferences. >> Oh, wait a minute. I muffed a bit. I initialized the ticker and >> checkinterval variables to 100. Should have been 10. > > Someone may wish to question the historical 10 too. A > few weeks ago > on c.l.py, a number of programs were posted showing that, on Linux, the > thread scheduling is such the the *offer* to switch threads every 10 > bytecodes was usually declined: the thread that got the GIL was > overwhelmingly most often the thread that released it, so that > the whole > dance was overwhelmingly most often pure overhead. And it costs! Running pystone without another thread active I get 5500 pystones out of my machine. Running it with another thread active (in a sleep(1000)) I get 4200. After setcheckinterval(100) I'm back up to 5200. For completeness' sake: with no other thread active raising setcheckinterval() doesn't make a difference (it's in the noise, in my measurement it was actually 0.5% slower). We could get a serious speedup for multithreaded programs if we could raise the check interval. Some wild ideas: - Use an exponential (or linear?) backoff. If you attempt to switch and nothing happens you double the check interval, up to a maximum. If you do switch you reset to the minimum. - Combine the above with resetting (to zero? to minimum value if currently >= minimum?) the check interval on anything we know could influence thread schedule (releasing a lock, etc). -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From tim.one@comcast.net Fri Aug 30 21:35:09 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 30 Aug 2002 16:35:09 -0400 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: <0FDA1EC8-BC56-11D6-8FBD-003065517236@oratrix.com> Message-ID: [Jack Jansen] > On vrijdag, augustus 30, 2002, at 06:12 , Tim Peters wrote: Jack, I'm never on vrijdag -- vrijdag is illegal in the US . > ... > And it costs! > > Running pystone without another thread active I get 5500 > pystones out of my machine. Running it with another thread > active (in a sleep(1000)) I get 4200. > After setcheckinterval(100) I'm back up to 5200. > > For completeness' sake: with no other thread active raising > setcheckinterval() doesn't make a difference (it's in the noise, > in my measurement it was actually 0.5% slower). > > We could get a serious speedup for multithreaded programs if we > could raise the check interval. Guido already agreed to try boosting it to 100. > Some wild ideas: > - Use an exponential (or linear?) backoff. If you attempt to > switch and nothing happens you double the check interval, up to > a maximum. If you do switch you reset to the minimum. On a pthreads system under 2.3, using semaphores, chances are good it will always switch. But unless you're trying to fake soft realtime, it's a real drag on performance to switch so often We can't out-guess this, because it depends on what the *app* wants. Most apps aren't trying to fake soft realtime, so favoring less frequent switches is a good default. > - Combine the above with resetting (to zero? to minimum value if > currently >= minimum?) the check interval on anything we know > could influence thread schedule (releasing a lock, etc). You need a model for what it is you're trying to optimize here. I'm just trying to cut useless overheads . From Jack.Jansen@oratrix.com Sat Aug 31 02:39:09 2002 From: Jack.Jansen@oratrix.com (Jack Jansen) Date: Sat, 31 Aug 2002 03:39:09 +0200 Subject: [Python-Dev] tiny optimization in ceval mainloop In-Reply-To: Message-ID: <71FF26DA-BC82-11D6-8FBD-003065517236@oratrix.com> On vrijdag, augustus 30, 2002, at 10:35 , Tim Peters wrote: > [Jack Jansen] >> On vrijdag, augustus 30, 2002, at 06:12 , Tim Peters wrote: > > Jack, I'm never on vrijdag -- vrijdag is illegal in the US . Oh? Didn't realize that, thought they hadn't gotten any further then outlawing rookdag and drinkdag yet. >> Some wild ideas: >> - Use an exponential (or linear?) backoff. If you attempt to >> switch and nothing happens you double the check interval, up to >> a maximum. If you do switch you reset to the minimum. > > On a pthreads system under 2.3, using semaphores, chances are > good it will > always switch. But unless you're trying to fake soft realtime, > it's a real > drag on performance to switch so often We can't out-guess > this, because it > depends on what the *app* wants. Most apps aren't trying to fake soft > realtime, so favoring less frequent switches is a good default. Agreed. But the main application I was thinking of are along the lines of one thread doing real computational work and the others doing GUI stuff or serving web-requests or some such. I.e. while busy you care about response for other threads, but you don't want to spend too many cycles on it. I remember seeing bad artefacts of having a large value for the check interval, such as bad responsiveness to control-C, but it could well be that this was MacPython-OS9 specific. > You need a model for what it is you're trying [...] I though you said you didn't have vrijdag? -- - Jack Jansen http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman - From python@rcn.com Sat Aug 31 17:44:06 2002 From: python@rcn.com (Raymond Hettinger) Date: Sat, 31 Aug 2002 12:44:06 -0400 Subject: [Python-Dev] Proposed Mixins for Wide Interfaces Message-ID: <001101c2510d$9fce0920$5f66accf@othello> How about adding some mixins to simplify the implementation of some of the fatter interfaces? class CompareMixin: """ Given an __eq__ method in a subclass, adds a __ne__ method Given __eq__ and __lt__, adds !=, <=, >, >=. """ class MappingMixin: """ Given __setitem__, __getitem__, and keys, implements values, items, update, get, setdefault, len, iterkeys, iteritems, itervalues, has_key, and __contains__. If __delitem__ is also supplied, implements clear, pop, and popitem. Takes advantage of __iter__ if supplied (recommended). Takes advantage of __contains__ or has_key if supplied (recommended). """ The idea is to make it easier to implement these interfaces. Also, if the interfaces get expanded, the clients automatically updated. Raymond Hettinger From David Abrahams" Message-ID: <0cd001c2510e$3c933eb0$1c86db41@boostconsulting.com> From: "Raymond Hettinger" > How about adding some mixins to simplify the > implementation of some of the fatter interfaces? > > class CompareMixin: > """ > Given an __eq__ method in a subclass, adds a __ne__ method > Given __eq__ and __lt__, adds !=, <=, >, >=. > """ > > class MappingMixin: > """ > Given __setitem__, __getitem__, and keys, > implements values, items, update, get, setdefault, len, > iterkeys, iteritems, itervalues, has_key, and __contains__. > > If __delitem__ is also supplied, implements clear, pop, > and popitem. > > Takes advantage of __iter__ if supplied (recommended). > > Takes advantage of __contains__ or has_key if supplied > (recommended). > """ > > The idea is to make it easier to implement these interfaces. > Also, if the interfaces get expanded, the clients automatically > updated. I think these are a great idea, *in the context of* an understanding of what we want interfaces to be, say, and do. Are we there yet? ----------------------------------------------------------- David Abrahams * Boost Consulting dave@boost-consulting.com * http://www.boost-consulting.com From tim.one@comcast.net Sat Aug 31 07:45:31 2002 From: tim.one@comcast.net (Tim Peters) Date: Sat, 31 Aug 2002 02:45:31 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: Message-ID: This is a multi-part message in MIME format. --Boundary_(ID_zzsknvVMvbB6Q6ljBH81DQ) Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT [Tim, predicting a false-positive rate] > I expect we can end up below 0.1% here, and with a generous > meaning for "not spam", We're there now, and still ignoring the headers. > but I think *some* of these examples show that the only way to get > a 0% false-positive rate is to recode spamprob like so: > > def spamprob(self, wordstream, evidence=False): > return 0.0 Likewise. I'll check in what I've got after this. Changes included: + Using the email pkg to decode (only) text parts of msgs, and, given multipart/alternative with both text/plain and text/html branches, ignoring the HTML part (else a newbie will never get a msg thru: all HTML decorations have monster-high spam probabilities). + Boosting MAX_DISCRIMINATORS, from 15 to 16. + Ignoring very short and very long "words" (this is Eurocentric). + Neither counting unique words once nor an unbounded number of times in the scoring. A word is counted at most twice now. This helps otherwise spamish msgs that have *some* highly relevant content, but doesn't, e.g., let spam through just because it says "Python" 80 times at the start. It helps the false negative rate more, although that may really be due to that UNKNOWN_SPAMPROB is too low (UNKNOWN_SPAMPROB is irrelevant to any of the false positives remaining, so I haven't run any tests varying that yet). I'll attach a complete listing of all false positives across the 20,000 ham msgs I've been using. People using c.l.py as an HTML clinic are out of luck. I'd personally call at least 5 of them spam, but I've been very reluctant to throw msgs out of the "good" archive -- nobody would question the ones I did throw out and replace. The false negative rate is still relatively high. In part that comes from getting the false positive rate so low (this is very much a tradeoff when both get low!), and in part because the spam corpus has a surprising number of msgs with absolutely nothing in the bodies. The latter generate no tokens, so end up with "probability" 0.5. The only thing I tried that cut the false negative rate in a major way was the special parsing+tagging of URLs in the body (see earlier msg), and that was a highly significant aid (it cut the false negative rate in half). There's good reason to hope that adding headers into the scoring would slash the false negative rate. Full results across all 20 runs; floats are percentages: Training on Data/Ham/Set1 & Data/Spam/Set1 ... 4000 hams & 2750 spams testing against Data/Ham/Set2 & Data/Spam/Set2 ... 4000 hams & 2750 spams false positive: 0.025 false negative: 2.10909090909 testing against Data/Ham/Set3 & Data/Spam/Set3 ... 4000 hams & 2750 spams false positive: 0.05 false negative: 2.47272727273 testing against Data/Ham/Set4 & Data/Spam/Set4 ... 4000 hams & 2750 spams false positive: 0.1 false negative: 2.50909090909 testing against Data/Ham/Set5 & Data/Spam/Set5 ... 3999 hams & 2750 spams false positive: 0.0500125031258 false negative: 2.8 Training on Data/Ham/Set2 & Data/Spam/Set2 ... 4000 hams & 2750 spams testing against Data/Ham/Set1 & Data/Spam/Set1 ... 4000 hams & 2750 spams false positive: 0.05 false negative: 2.8 testing against Data/Ham/Set3 & Data/Spam/Set3 ... 4000 hams & 2750 spams false positive: 0.075 false negative: 2.47272727273 testing against Data/Ham/Set4 & Data/Spam/Set4 ... 4000 hams & 2750 spams false positive: 0.15 false negative: 2.36363636364 testing against Data/Ham/Set5 & Data/Spam/Set5 ... 3999 hams & 2750 spams false positive: 0.0500125031258 false negative: 2.43636363636 Training on Data/Ham/Set3 & Data/Spam/Set3 ... 4000 hams & 2750 spams testing against Data/Ham/Set1 & Data/Spam/Set1 ... 4000 hams & 2750 spams false positive: 0.075 false negative: 3.16363636364 testing against Data/Ham/Set2 & Data/Spam/Set2 ... 4000 hams & 2750 spams false positive: 0.075 false negative: 2.43636363636 testing against Data/Ham/Set4 & Data/Spam/Set4 ... 4000 hams & 2750 spams false positive: 0.15 false negative: 2.90909090909 testing against Data/Ham/Set5 & Data/Spam/Set5 ... 3999 hams & 2750 spams false positive: 0.0750187546887 false negative: 2.61818181818 Training on Data/Ham/Set4 & Data/Spam/Set4 ... 4000 hams & 2750 spams testing against Data/Ham/Set1 & Data/Spam/Set1 ... 4000 hams & 2750 spams false positive: 0.1 false negative: 2.65454545455 testing against Data/Ham/Set2 & Data/Spam/Set2 ... 4000 hams & 2750 spams false positive: 0.1 false negative: 1.81818181818 testing against Data/Ham/Set3 & Data/Spam/Set3 ... 4000 hams & 2750 spams false positive: 0.1 false negative: 2.25454545455 testing against Data/Ham/Set5 & Data/Spam/Set5 ... 3999 hams & 2750 spams false positive: 0.0750187546887 false negative: 2.50909090909 Training on Data/Ham/Set5 & Data/Spam/Set5 ... 3999 hams & 2750 spams testing against Data/Ham/Set1 & Data/Spam/Set1 ... 4000 hams & 2750 spams false positive: 0.075 false negative: 2.94545454545 testing against Data/Ham/Set2 & Data/Spam/Set2 ... 4000 hams & 2750 spams false positive: 0.05 false negative: 2.07272727273 testing against Data/Ham/Set3 & Data/Spam/Set3 ... 4000 hams & 2750 spams false positive: 0.1 false negative: 2.58181818182 testing against Data/Ham/Set4 & Data/Spam/Set4 ... 4000 hams & 2750 spams false positive: 0.15 false negative: 2.83636363636 The false positive rates vary by a factor of 6. This isn't significant, because the absolute numbers are so small; 0.025% is a single message, and it never gets higher than 0.150%. At these rates, I'd need test coropora about 10x larger to draw any fine distinction among false positive rates with high confidence. --Boundary_(ID_zzsknvVMvbB6Q6ljBH81DQ) Content-type: application/x-zip-compressed; name=fp.zip Content-transfer-encoding: base64 Content-disposition: attachment; filename=fp.zip UEsDBBQAAAAIAO4UHy0xMjhb1XoAABZuAQAGAAAAZnAudHh07FtZd9vGkn7XOfoPJSdnIt8IEHaA jBdREmTREZdLUl7GxycHJJokImwXiyTm4b7N/75VjYUgKdnOXCeTmTESi2Qv1dXVXVVfVTf+9rev +ezvnTuZc3zpBMdjlinHqmrJYnaf7e/FSTSF5yCJrfWjFMWHP4y7E/uHp0W1obc0raoYTH9ls6ys klRVMxWrqjtlSZY4oVvVylXFxWBUETNbmll3mFzWg5imoqt1ebJa3LIdMgPfu/VYslPe7eMAvc6k O+hX5FpfUtcbjOrxLU2y1vxeX13ttH5rn+6UnQ36k87ZZKf8TRcFuFPqeuksT1MvCtOj3cldjxod 9vcukihow5PJYHA1hrMonLOEhTOWPoFnWRT56Qn/K86i4MX+Xp/dpYskyuO0DVgSi74TLsQpy5yj 9c8gcnPfURsl0ZQlUdgoiFniiwGy2SxbZcuNRqmTLVnSKMhm/v7eOOc7ow0Fx3aeRDGDH1qtBvN8 M7I2jPPwCBQVes4KZNx2IJttxWqrEgiSKUn7e4Nk4YTeb06GsmrDiC3w0/Gh4/ueg3RgHiXQDfFv wJtAn2V3UXLjhQs4HHW6/af7e1deyFAahrG/12Np6iyY0D1vwzPTcwx3+n2m3Hwvn4QoNzFxvFCM kgXJsT8ZCsMozZCScImfbYjybOrMbkTmzefMJ3nv770TSOAJc1yWtKHnzZIojeYZDPLMj6IbsO/j BMcETTQVUZV1XZSoU88L2ODKbsMwwaWYMRdOV43eZTW82eg2RGm3gRg9yPOQZQfz+b2Y5yJ9jXM/ dRLRxY5hykVE/B1s/54tvTheibde6vHfREzAeW2W1HI4CKNMQNkKgePhwr5L2LxgYGtrtRVV0htl 5f5qy4oua43yYpdhsaHhzobyeWDTIUG9pZvNKr732oaq4SbZ3oFtzSKDtUsQ9yOOJqtkmSZRGwoq gu+l2UnxnaZJSvbBiSM/Wngs5VsqyP3Mi32GtGIqi+aQLb0UnDCMctx3KNbsI/X7rNn95EMUkOEz GEAfLsCGEf7r428bSzv4rY8112VJj9dNyj47z7a28VYTNluGNLEVTaEw2cIg8ZB93HRXKKMc9YHm 5cJ4lWYsSDepoyZ2sW0ScvVCzWvqcKPdk8Hpa/tsMj6Cs0FvOOjbffreeUWfTzZZ7qPiro7gIiEF PoLXuJfBFGTpiFuA7ckNHd9BwbvIJA69QG3C7wWNXUkssyxuHx/f3d2JtVE8ZlwkX22xNh40+xf2 yO6f2TAcDV6NOr2evdPqn488nCX4ma1QzRiQocBl4XJOYboCn1uVFLwQ9x5aOo/5brsk/toJUBAD EWUS+7iaR3DKfB8XdBolThbhAqdHcD3uVM2ZEwpDJ/fhwlugxYazzhBescALvWohypZ24s2W8MoJ Aueo3C/NTbSxF45gfOdlv6HS4vYp+1eeH3psRQPxDs4s824ZjNG23TkJAztcoE1mCdrWJpeFs0cj jzbrxsmxrp8H2GrmwMRJb9C6d3BP9aPkzlmVXUogAKMonyLJIxhV+7QabD0/EvYkJ+E4PhdwwmbR Ap0LagK7R9uTpZV4FQmWOAm0jSj7qktBQdULeSBXPsQOdttcuYpEynyUHZLGgWj5vA0tQk+/SJyA TFXgZRkr2WtKi9xYuoxiND7FBgjQCyGZMLp1eAMvTXNWcjV0QuYjSLjuXXFdPmWrKHSfFJVnaA99 brKcZAXsFr/UbOYpTiBgjPwcbpnKd3IIAU2YAqeDC/zLsplYUO2E4GRZxW0azbzGvMo9m+Yon2mh rYCL4fC+X0MRCZg9qH3wHzCyX3XHkxGHetCAfU3FdLLPGYvuHFZRDndeuoQsgmiaoVekogSiu5Ac w6pwC+Qk1jaxlECAGw+F7qS4kImH6o0k8pSG5WMJxThr6FbIhdvw/+KL7fyKbmhz0xQrB/jHS5i/ Qjt4i4YDNQ9pUx8nRlswKw0IshYV5j6qzH1W6zGN9ZhV+syDoh+MgABzU7SA/5cO6Ho0GNoc7mG7 Tv89DBDcj/b37DfkC7AT4O+ycWMBx/aoa4+P9vc4YgYEwYA4Gyj8gM7kwcVCdDiCEnijGaFmW4D4 vz3Lf36Nfbr9bEZg6rEsGYrVeiwGM7Q6CrLfDcm3llFBHZr1B6eD8/fbpW/s0aR71rnaLk+jgCGG CRdlhYwAW9brmKZ7bm/3cL1fo/Z2YYxqPGPbpeT0ca13Ap33g2vxocIvKnNRz3y06rsUOuuoTDUQ 3tXBakY6Uk1QbcnrQPKRADOIwmx5sBFvrXG2OHVWAeI90nwOswvc7aTLCnf7UcLEWTorsHOYxaIz TXnNLL4ROLxe5lNZnE7DGL0ky9Yge5Exj7fkvwLn/g49uJiuEpG5OS+cM+aKqZ8nMW+XeUEVDOLX k1u2QOgtIrPi7B4OJ14Al2R3kn/kCG+fPhYKFrC3EaWNMAxDJULrObm+4so8PjgAWo83Nry+Hk/g 7aB/cLC/N2JV5InBE0UEgRPiLsJNpBi60RKVFgJ78WGI/YK6x/5KIBSO3Aerk1nslOj7nWC7HjrZ NrzxAkGXzKa233oBtTreCt5SPwlbiYaei6Yl7ggEB7xG1yZ0Fmj72kDNj3GBRUNUY1+DQwwK8/t1 cCgrVUT6lrkYkRowZjHCAEkGRaGIVDfgVW/yYFyoSAaJQcTWok6zmaBTZMUO2lxC4NLSVFU1NnvB 4fawa19VRMQ4/tn55OkWA4+w3OzSDNu29/NufIURm2Zpnw6Wxizk0W6jXnBcxJIbrewkiZJU2CL0 QMN3wikCvgmGcezxMTFkLjac8AYhMc8FKKIkGnAoS7JiSiiYIdkll/ZnG6a5f0NriwNeMj8u92sW tZusJOwfiJ+a47xMC6V4vsROL0oCQ77IDxDY3N68LSpVOku8KbLwrNzD1E9ctzwuFeeYKHjhPDpu UHxxtF7238lxWo1csdJ124jwQ4STfgPHAdHhES4BhiGnU4G2gICfX4aEDUYaHIi7U74O0/+xSefh zrQ7yWyJePRxVmIPISkVNnk4fkGGv4dBGPdugCNC4UvgFJuyFRoc/vMXXZZOVs4yini+DbiL3N97 AeiRSMUQSgdTCtqypcPViwP/Ap7N/HwK1LZ0lkA+F/Elxm0LBIekVGR2Rd6mA1mCTFTOHGpfyGux GcycEBBfeggEF2RadEn6sR6eGtUUCY07UHg6oImmxxW54zzmHzxoILi7dDiYL6FCMY0p8zHypkB0 f8/xMxEjLYciTFHkcF4QkGiBRaBEKhzijXGS3fElIr6/XxOmQ89y8GfAKgROsrGDqkzV1GVVbhk1 iMgTX2qjiymdv27K+jrtPUc/W1VUZcPEw9CLVRnbui3Bfy+pmquWtjHCrCZk6KaxTnn3xlUeGofV W2uwx7NtNbyRNKPucpEwVlcYmmzU6AmjkqpCUvCpKjTVMg19m2HO1w6o4qXLKGsMX/d4xapMv6XL LaXuc7nRnNwWOfosTyjdQDDqGQbKibNCO3RSki4T1ecMQ3e0+S53EUHw42Omf4QKiS3RnNGaQIMK HPrOnSLMFVNsln5QcPlVRZQtWVTMj0/X5gVD8S1zAIdk3ufe/VNUxGwJ9rg3GYLngnKunZ7bnc7F ujdZzWcPc/niJ+hF4RGABa9RLbkTlsy2LrclAwQJVRMO7TG5763ZEDsQe7MbjLYRrtx6aHyQy+Z0 OF/r3DAxOH5z9lMjmOVDN0fW+Mg6jmxRDn1rTLmliaqOO0i0eJrpTpm7tQCDNFyPejmZDBsD8XGk xkCy0taltm4W0OgdJRYXHoWrCE26wzZ8aAz1sT7I6ON0YcT3xJPH98cnEUiFXQkyRDcVdHucP0GS SBCUV18DCJmS6mdoFREhCpNVjBQydp8dI1D3wp+gOFN4PvejO4qxN9DnVeetIlwo5qUpr+x31127 M6ARXJ3Nt6ZRC8XvJGQ+fMTpONA2l7RcoqTpKhxeT84aW/aie2VPuj37+YdO6/zcls+ltiSfSWbL lD/+paCY/NeHYhRO/H9DYn/OnL8EiG1x8gkcVvpTKJzX/t4vX+vZ30M/BuQsofLlQD4VCp9bmEf0 y1C6ta0sYUOzxT8axmjHLQ0j4V0YYxmSrmimVYOCBEFgVOc8JF1r1YClPNGpAEtLNiS1PmYmLV5n gxCCmJpZIx1ckBppaIol1aPFCeLZyt2rpmJa5hdUPU6v47p0QFqjEdOwtBp1EPqskj8SRqYttQYe piTtZoU+RWvsBbWU1mftUUzZ0t0z+LdRclORsRRrjc4uvbqxprZ04xMdfl8mqcwCZUue96/TRDyB sFkcO6F3fzD1oygQpsxBbREDL9vMGuXZDS/Ib1b8M2XO0rnDstlMZOEiEauKOGF1VgnNdoCOebZ0 bqLQddDeYMFJs/mjFwwKHW545idD2g1PKPTonvv2y5eVk57kqHSyCa9zv3R/elumFAs6aW33xP86 JJSYehlPtv+Mrjqf3az4kUZOOAPGBWpK16mc1pazHlLYdNV/J2qigj5flkwZob7U0kzUF6UlGYJM zls6eUxID18HeKz1F8CMif1ucjy86nT7PwFKO0lZ9vx6LHTGZ90uOdjarZP8Hx/ndyV2ZNX4X5TY 0b8ldr4ldr5aYgeNNrHydrkCr0zN0LUROkKFMALuuKBwBZXBEmGyjPLFEuszQKTuTNMoz1B6IW7G lOzRUUGLyDhQOOFS/ui0oJvBXZT7bpFQmaIiwCKKXPDm4CGocOjUbsk2hj7CZjMnTxlPxGB5Tier lN6JOGEKBQThxXDU7U94vsZFh4boYO7R6btXzumA51jIlHOM8nlQ9Pize92hfLYcxb87SolEoPLf j4xaPeRq67b7e4gEsChy6RICdG6ZeATf2dq6+QQ1xcUAk66P4OLxRa6AEe4tdo9DZyT9yrkIGgbs Vtlb04AjBxhnSJg4xW9R/J0kGdh7eAltS28JqqIKimVJh8Ph0yarfXTwuXME6DQR18Xp0sPdIoCk SgY6BXbh3LcVXeXdW2ZLh82HT5RDJ+hH0AZEb4Jl4mgawRrqXJdImgHbz4ilYt1d5G0ROAqKoctF Gu6PRrCyohuW/sD5pqVplmpKNXAbD3rV6ZyJoa2xPpW8tOuLmaZMT431rieDs8F5t/9qO1Ula5K8 XTa5xIbidunbzsj+z+3C8+vz3cJXdnWH09IteX2q2Lu+mnTPxkKv09/u0um/H/Qr3hEPKlKdFvy5 P3hbY1Rce6PBZnXI29J1w5A/N/7bTn/3cumlvXMmfFbzZ1myqRRZuo2k0AdFaomypouGJlrSRzhE Txk9L8KdZSTOUlQRRtDi0+k0njZiaZDFcGjfewGoiLngO7nRy3NBNt79ajNKyqiaLBDoqyrJzj3s icsMm6TBBZsSclRAstqq0takIsO2OyW5pYiyYSGcUHBCcRzTnVLZ1LZmtJ47AUJTeyhjuNXj0BJl WTSOi4/t3OFC1s7fvs8MHbHm70of7sxN1TazhxWtQxaiZ4hiJvCZpm5gLk7eeMnCCz1HtM+vn27k xD5Lt8D/5YlHz/kVIeWNB8926fKM1hqgFg1cQVYswZAF2ZLEaRRlsXhb9eFA9YtyebSXbcTr3dMe kA7DWsWBayp8uHPQOXTyLJpFLjkMgYMd9ANItLzDlkX8LtcsK/ZAnObMjWoghBDz4/5eNxTWh9H8 JNnR09zFqInNMZTy/ZOYuYtEN2UxDcU8yMWUYMhuUDEYU1ChqRic4n+aZCkY3EqihiF4FVN8RjZE ttuz/41o4S8E2q1voP0baP9qoB1hu/0euDsGxLTQ5RYCJgNAbwiIgQdA/rJpJgo3j20L5wvkaIFj COyNPpD3JLCxNjKFZXnJTzEb3rwCSH8CSJJMw9rN8+GjG6amyDVquPLmdQpNsXSpBkl097S+IqW2 rNY6u7U+FdzoMUStpRuFVaXRktbjTJZOeLN+LUVVTXV90DhhfpVekwxFknX9s2MVi72TYRtFjrub d+uiX0szuqq6A2sStsh99GnuTs3jTCUlIqi6aGidavDEeNI3rbN2CKTrjmRyXu5wN6EXIxo509ZD QEpu6YgiZBHhhFzhqDByGe420Q1+E8lVuU7i+rielM/7KoBKT87TnACVpd98MaDi94ZkCQboLYuD QxWRAeKDKhW3NbXUz2/cWaiJ5bVUf2cucFi32a5as/RBlkSchWgqW0jrU1KCw2ESCZbYEtXj+lsJ vErctabV7XR01TCkT0Ou3elbNH1ERT8icHto+hibII6Hww8yfy0HwbSoqPLHp8j8evAvkdJVlOVp wXdv0oFbTTREDOAOrRYdpJmCbAj0LsTThnyaMwVLwtCqYyNYkBBynBrSI7OR2wj5ytlswBdJMpkk z2Rd1t3vp1NF11ua9D0zMbqYO7OTaqqPz+FFfZz7xpvdeAx6iJvolbRb/vMXglHpyQO9OOZ61G/X aLBSNYzac0Q8/C2I0kHit+ufN67rbU9boRfIqml/Dl41znzX0q5w1hMvxb1GIb78pNExccJ0zhLB Dgsk2gZz6mUEjIaJFyVetmqDynHSmJBSo7RPp8p+BaH++JfGvl3W+wYP/4/mdIt85xF/VwOcAEh3 SCokMMqg0us9GPKhASrfiAghq6zKrLQqzi0Sd6Y+q95YIcsSVaZGBDhzwuJeHAEC/opGAY/oa9Py IQ/jtdHqoMlcpRlvT+/l4CPJqgyKpoOMKlBk7zaLdR3VtTiI5jWfNKRE+YLeU4woi914/5VnKou2 QIjxCOjVzgzozPuO/Yu9Z21OG1n2u6v8Hybc+NjewgKJt2/sWICwteG1EgSz2ccRIIxiQFxJmJBz cv7urfsvbnePBBIPP85mT22q4q0NIM2jp6enu6enu6cHX4FaQmfamw0noGdY3tFGmOy6c4qiooM3 Y7qMrwrgVEKJG3vuQmfAd0G2jSzAGil4TAH22Js7dyOm3MBAyzdkNgddDiYSvb8Q73rf9ngI2Und ZvrP6VMyb68UviCEKlBb2VpLRGvAiFirgJZewBxHqASaHAg10GRY3wAEgl61ZD2TaMIZQJOEuyl6 nZsDrIrBneR3MMDwGgy24S4BFsWDcmO9wW3OpglYHmOH6pAFGiaOw0QuGmcDmw4NHBNP4xIYJ5TA NT22XZgqoAmU5PacTwl0ieo7jdcPF3LIfQ5PCShkCLrpmDCJDigMLvp8wnB84IDoetAORtTxDt0Z NGchNeMagGliA2MC/IHRsQLh0fCbwvdjDDYjqsEXOEYLJnxmOB78HMJLBAIQQBuM/9R+SNrpvQm7 mlwhD/rtapuCUX2/fDJztuVu6enmcHjWs72t5010XN1OStCUr5WQSs8fWu7pVsH+HOT4fEfI/sKx QRnZetyulxVtq2W9JbcUfetxwMy3W98VG1NUdyQmaNfVllLeelzF/rbtxLua3TAoh6M2YhVYKo51 z6rz6WAEmh5gGZB85Zljy4iqhM8o+bgbQSTUpJPOOGkhn02//imVFdIFMZXn4fpzdwFsAA2ylyGp JKYzSdgbwL4a9ggpKZXKCOkkaB2Zq545nt8ZwsT+aEx8MFa17P9ZGnnr4/14IQzd4ZVnAXsTUNZa fcscu2e9mSX0jbCGinEz74A5wzKRZzP3Lbo4BC4ISelluuDzkgiAQpfEDUdaEFPJF6mEmzUje4HB p4+A35SUez3pfoStazKVT6WyScKxIfBJ4zjmKnfFsUDlzgc25SRGyIjZ87S4P0JG5N640LaQIdSg GEGV2+NKp9ED/rSmj40gmhAQrACTK0oSxuVEGmUn22ClOFjrWS4peyJo9tbFKpsuKS0Exo+ENr3H 8zKAhjIx3acSLqDLjii4C4sHeuEX/JwYLvUBpL5CwKsNhAjNhg5r/jmZGvjiOs+n83gs8Q04gogC TCtFZlsUvDzev3H4DyjQyN2M6f+CvGFcjkBZ+ryyHFuYY3gFLOExcgnG5QEGYfw4BwgCuUFifu5i z8r0boyRxQbooCBgQQ8YrKCJM0QVSNxLbEENFAqK5yB9jS18FwZ3I+gXVQF3PpvZKO1BfbgH/vQW Gymb/LzIcd8yhoFx+ImL0PRQB4Jf52en5CDBT1R89ottxLARADlG6u+H/sjs3/PQEJP5ghb6JUWe MmKc2CM+AK49nFItANvxKFwat84Ti6aHinOZxbhQZI0Kk2uKppbkOKMjj5ZaqcTZe11gNbWkNfRG BdZjqaE1Gzy+GzTMFpOrQpyVlYpSL4P80rHZjqyjMbiFJcoCKwnwvqTUiooGmjZlmshjMeW2pGjN FnWMIpVdk2yGwrCuVB40HnaD+KkNkhSegoaOIS4ABY9+rjC1FbJQawCLhqE3GNLNVApNxvLFRqvV qGFx/FVRNb3FUPtgpCnQwxtFRls2DFKuVgExMXzIJTh7BxBCMbnZFGKMlRsU3KMrClSUW2+jkMp1 vaNoAKdKJeSgYWHvgLqKLrAIYJp8rcnNmzV0aJOnVmCEchmGpbNKo1ptdPRzTAHS1phaL7f1ltal QVc7cldn1UbjHY0IA8ex+bpy22LH68FUYbopLRM7O2NYSGbwQLluQDONCrAjmHS009Mw46wIjcOM tVtqVW11YaBlPA1QqlUVs43A1OnqdR02aEWl1KgpGHFeVvWmUtflYhUDz4GSoGyjTpPeVBrNqoJK dkMrA3vRGiVF1xuaTu3qTRrnjaJAhx2FQFDYNuSIBg16bOtqHaqzplASjnUkaqAPwJZaB4oDWRWL zlB5jQ4dQKc4eOowHpl2jJJXtZoC8IWwdowoxlIYJl/UYA4ULdqBTyCagqTEOpra4ockiERd1fYR TL2xn0b8Jm/k9wrF9gO0OoX+s6JSVRV4SkRfVsv14xZ1qcBk/Ts94XENdqXcEh9gHQCbHtQUXGR0 qFPsQleKXtLUIh9ZBBlInTKLrTEW279EKM6Zt4z44U3QyXOjXS3D4OIwh6AIAIW/V5VOHBpuNprt qqzxQ6c4jKUFM4nEI2tqtUsJETg8QLFtYFrUxapdFQ+j6oomV1ddhNqUSy31PRD3fvTg8tpBiDCm 8E9gPziasqa+h4Wgy1XOYikjA6bMKLcpZU9VfaewRlNBlgqw6V1QKWrARpFCb2StjIvvD1MLMmi1 rvpJOfCFxhqdOlpooRuAKrY9nP0zxmdLD+NsXW0/RFUFKwLZwhIHuXGNlEUHiUpUvtSa7RaAUlZL WFHWunHOZ45hIeEKpXERSRKCdeRzcrGB87yDOwB7IcqVu9CiDkwVGHJHbd3AMPx2K0xvAMvCUkj4 yM0ATiE8igiMbCeIINVyTCkTjuO4NCrIjiJLIMSy/a5b7Jqoo9Vp+FXUAOjVkIE43yvAjbFXQKHC T14xLgm6qWkCF5tb4HI5Ep535GxREsWZ19ulG+C9fCIDno7RsZz/Ih8Fya6V1Z95FcI5AF5pK1Ug 0xVdk/z06RXVhxbmJ8E0JJu0TVKmc6Nit9AOKAvEq4ADRzkoSHQ1mHGVJDsF6PLwVhAvAGZ0dIcH m8Pj9NGGJzKlP2nphEWlRXWi3fnUeQ3cVC69IyysiG0XPmGFIgUj8+nwjvy0DSCltTJqGzc+WmpP z80uiY4H4Tvmiu2dKpiMx+aKPT5VlNBl/1yx7anaw87lKgLThQmrBtgDKGrd8HhRs4isDGKpROYv YinEBWjtQ3VN8VNo3KhNJBdE4OaA9vG0tt4GOd1F1HSPNT/3RkuuovLkd6C2UK2h3/shRQSAVtsC yoDWyAcCx0pySueYII5cVJQ611Q10DMRWH2XauPLEKyCMp1kSX09d/6Mbs+bvy6fQBwXRRFVistD 0iOArnAH0EXQ+Lh3ALgPn1QGtxZqDRObNDQC5QPsbXduRJ60VL7oL2rWzCTEdLIgPpJvdWWBi8Vi WwZAbt731p4H61eu57zg8Ru1dr1l83uT8JzLrae0hf11u4WB9bCrhcF2C7uLetv+G2921R6Z1t3I uxBzz+3Nms7m214j0f62zvRNdKnY53bhzoxJ33e7SAtJacPnIieWKyXyuWiaz/a5IEuTlGXyzOEO mZirJcfDxHf5XNwtBJ5hDXbQnu89wP1Vc4KYTwlSuvDrumdyNkFnEVGQqZYe1Dp9jm/Jo4Osq3/a IBGE1CbE7GT345Vbb0oQCxseJDuH/pS3buqn6k8dCVrMJEMeANtjSJ9LWe4ck4uO4YQG4Tnm1Jtc je2+MR7ZrrcN2Y4hbkC2AqcuZbKp1EvACRkyHyvGLfQthBUNxffsjQ/3Bmi+p4Y+n1oe+9F2R+gO /BE/rywMQuNFSv1nOPaiofzGXqAZS/Wj6imZ6oYZGsFNpqWsCP+nJEEGys6kd4AVORvID/PDoZnN CX7dlCgJuVwunzML6asZt/EKd7Z9Nw6qvygHQOAGMnfPDLdvWeuyZcuF5i1uEbYol0o041Rt7nkJ UZCsTX/jl4IcIgK+WKKzgCe+NOs05dk4n3ZAcVLCuEox26wFc397VrHGdOp6Vlye43HquPYJuMI9 utd4bGi4/CDSNFw83dTNkF00dGS8MSUJfvbguQneYOKGDjIojhEY6BkWnLt42hFnI8tzL84yQpKh o4AFkFzgDw+4m3uh1n/XlObvoIg88Om5gHUu/rWSHCS/+7B892H5aj4sH8LslStcaKwHChrb3PRP 0YNGwDW5VscWI8zRqx5P2Fox9I8VDEB93xPIhYO/xHb8eshyLxkomPgBihEIwoE3uoiJvxzFLtnO PyzJuGrFPGCPF44xsGz2YIzn5kXs6GQ4H4+nxsQ8dVHpPTqx3IHlnLoxhg954Tp8u+TtgPbJVnod G4F+1TcvJOY6/YuY+XliDiwjsbCmA3uRGFpjczayPdvC+PaPszsCmgMsFvz2UBHcDfdf549DCoAS 0kElZv2x4boXsYk5nfe5NHFjl28MNnLMYRSnscujE/7tTcK4fJOA2nzk38TQg5GvyCyJdPY4Go5O QPricHGoz2r/W6KBFSYyz8GEa302H0fFt4sBSdrPc3b/8RYeQdfEHlhDyxychtYJ4YbwjHvcFfc7 PBAFTAPDFsbUd7My2dERBY+5fWOGLBeYdh+VZNe6m8aZQ/cMYGz3lCWOhMMDiTeABgRzgOoWOm95 DINt0cmQ4pmP3WPUzgx2DIvatk/hZw+2B/cIQMtZkhJ8jj8IqDB9HD2CHBBeX4Uh/2F2HGLGf3Ui RAiJEr4KA/6rrzk+2jXbPSJm80eZ7rcyy2sm+/S4n2ax3964Jekx7rH7D1t4Dmt1V0vA56w+X/W5 amlkwm4pjtwNquITyg26tjb432ub23osyv971Kj64r8tK6yUThVS+6ywUn6dWh108cD2mcqnC9LK 79RdTj3j07av52DtbyrlJHGd3IDSaoqZzJZpkhT+rXasycx2tl1XjemyZw+Wq+cpMbtOr8W7KGyb SuGFeA4c3Pz0wlfJ8/7n3c9d8zNwxN3vpHRh3wtpz4vkvhqRFy9PpRV1stt0w3Pvl+g71zND36Dz T8aAGkB/PNORzkQh9BBbFeHR0DFN+GfD4+4FydbbWrWqFsnS0tQat92V26z+fz3QISxzynS5/V6+ VjA9po1W6CvX7LnG/AH2yb4r5p5k7KIkpdLZTF6QCrlkTthrPIh6NPo712gnG+a526lbEOV0sViS 02K6Iq8LA+VD8StRyglJtJFcRg1ht4i4RBotnLgobs8q5NgZMmt5lfxn+XykZG9fdT6kHtpf/tly xsXeh4L+xcgqb++KlU//equd9nr32d/a15XY+B+n7t9u/9W5zL765Uuv/0OhMfn51W/tZOxLzGmo nz58uLjPGDeLhKc57+718/5vXe2+0U0kL9uNk7R3/vDBrP/gvfrlYbR4X7zNvI7/1n9/NL26vVn8 V1pRPr7+YXF398v956PSj4PSMts0OrnmndJbtuqzZuzVm6b18R/N9xfq6eDHv41TP5iJbPK/v5y1 3tg/1Lvvvvz9XcHo1t1/nuS4A7Dpectr2x5QusH+cg+qEzP+OhGYcaVVelBuO0+fS36SVd/DOJXc 6ccaqZf369WU3V65mOkjJUh5AXMSrxxuOQ3ls9kc26B4JgJh5Tar7fTkXS+bF2YpS6eyeE/bN5Kl 7HvCg+/WwK9nDYzJMUy5SCG3LnD86WACytFn35sYA4xwEZ0/xewPDzaog7aYl4ySoF0ydZW0jGsU bDGy2chw8XYhh1GYEBbzhZRn35m086UzMxJXzJ33MQUa7pKWb7mZ0nMsvx7+z1UYZrtxEOVjq4fP bFcwpw+WY08/xBA1vyOLWMZ+vTj2EYVnmqAmCGJaxEPN83wynzzGmlpbu+DtCPABHBX6ejBPgnoU xEY6CeAqQQqNMPIm4+P4sW48mAP8Qe2cBuBhzNTKqRrT3N8zPP4Y2KbL6AIsgQ+KZ+Y3ydGazTCM CmsGJbHyHSb2dyhNHA2HHAwo85tzD/igiwRN/pBjFltyYT/JfC0SX4Uwc7zGzPGvoPwEY8RYXlBd BTr3LeTOU6KUhzERTBdsjRx7Zk5XiNkQMSHMnAoYa3JCntmo6K4y4qGjOunq/gEM+knwRZrgnayH Tl7mMMkz2DDAcAdAdgaRBWECLwdYqf9bas3hwRo2NJ2EfoEuwUhq7FFJ/twNgpgoFDJbCcvSmYyY KuRSqeQqDcb6attVxgiY7M1HXXvuP8pm86l17rLNawFyYl5c6fLDdY7/nJQV1wFvrjkOEnbkC9ns +iLeLl5H8SrIViEVUpn1JQNFKwA0mvyfrrAIWstlshu3IhhzypPnrus+peFyFVaGZ6w0MubsDWYH nBcKocsyQmrwim++7PA1opBigoRkXkqKuYwI3A/WBXBCx+7fm5SlmRiolMwIBACxYR8IX38YGObE nl71F5YwHbMTzNZ0JqN6QF8x9UJUSS511DiTJxiSMzAmOw+Sw2fwSSbP71Z32mKC96gGl/0qF7o+ L75o/+1SC8sxR/MeYOCVO3eGuBWCrxwnr3aIwcdDi3KwL31Ce8P/ns4JufkHqLXJ3urT+uEBEDbt oHBVEKPFdRPK1x1QsLCiv/+MjUFM5NKZZDSAlWiDry7NHPNQnZE129rjm5/M/nyd4CZ0azZuR9fp elZrlQIDyEVO3g4G3W1ewDjtaAYb/vzaBp4/3Zl4p45XhlrGvufbt7O1RqF7SgpPDcJvZrr14mY+ HTg70v20NYzz2HqMYfiWvd2Mxin337YmKO3V9n9HMu7+sj/GrIDuoocLDEt6yxmQ3FRwaTVh1Yil wN/sa0tjCvqeAx9X+M9wvJxOXx4cG2iDhXxGTEuwe4KdWVos7N/5R0wRfhZ8VkSiQOEdpk920nZw C3+64lm5wp8T1ppJCumMBLqf9NKw1mjNiHwoOrdeVkinCsnX12VUKSVYl1ebk3MZvfdNZDXD8RPJ JM8zecyf42eX3L2DljISCHBBxBRZezbDQVehTfYmFAynL5PGhGSRNoN74cJggSTBO0DWuxaewnHv vXA76mKVTfGm1zGr2iqo9SW79gKQXe6b2bSnvm/av2/av9qmXV6wj6b5GUN7bdoo4wkwvxbYNVkg 2xjqWZi+Y+IKPNcIFsNdN2UGN4DXY/oTxDEojksTFF9//wTsHoQDq6B0QHpFZ2C8eXph9vD+Xwzq RaGdGJh4UH14gNvhiDjhKk9Mhj2Yy94ZPcuZg9gx7unb1XCyGJ3BSka5s7YzTDgfxWTnIYPD80QM XQqH9gMKxsTokLCWcshjmn2ZTETFhcxuEeQXr1iOi3v2CQZTu/bY6sMunG5iRs+mvrfK3NI3MaDa Me8MZ+DS+Tosa79d8pryUwUZPUzA4pGTFVAyxkvj4T8m+glysPtXO4NWtIwmhpka3hwmkzthLWAW YdeNNQH3mPAdmzRmINn6Fh5uMd4fbqWHFl2R7Tfsq0MgzKaAQIeut8CE3y6ptb6adnhwUql1bk7p 6r3LgJy4e1fQwFpzwyb8IpSXZuqNl6Age2w+YwYOgm7rPjxwzAfLXMDWfWoiONiy71KGsMK2ywJi nyJCLEq0Y92hjoB5Z4I2AMiFgXl04ixQWhkl8JkZFrd78MsHYaA0fzCGIInPDHuH0bGJNbboQvIR aAh0XzV3cWfluRPEl/M0TXSX9dw5PBjMPbrYHrFuPFDTfFrwcJIqGCSpsEJbfy2l4hkxF8d8Diet BZRbMlJNma8qAl09mMxXMw8PaACXTKdrtnFZtkYAJ85Ge2rhCOm40mVlezyGBXrKhmOb7joLckcZ /T727vpTDKoejpubVrDlkonYG7OiMb0PzZXPEWAzai63Lg3w/FRzDg/oN4hJmLDUg94o9+Al9b/C qDMf4yWQCDo8tzDN0Mh2PDdg4vBkFuRSQlofGX5BpI070K7IxIMieyCAFkEkPXZtvAgSbW/cXmQN cR4ck8xAlAzgExAyrUEa/2pI/vWXqOzRqbDhp2yiqYQ3BD8fN/fHwaFQzTW4eON6BGR/bEsY9N3I IydmIPPDA9fD685df22jEAPMAbeYu8EswXpHO5XAitGVTuDjbA7sCRJ9H+TawiV896wB5ZvuWw/W mO7pAzjJmZI8stHsBq/j+MXhFIHgTVAR9BOCbUwap2HfBMlJmYMADXgWMSOaVH/yfU5mjA0fu8P/ Z+9Km9s2mvR3Vek/jJxkLVUomOBNvd7EikVbfFeyZEuJkz1qCwSGJCQQQHBIon/tfny9v2L76TkA ipSPbJLa1Nr5EBHAXD0z3U8f00NrNuf1cL9mbDPeNBHtXKFTTanaAVBQ2JZh7udlBQl8UCOnvaRm RJMf2aaQ6UolzuLsVDYX18paRhILvrZLMxhUu2SOtNJFX+UOU8mpLC9Rqwmy0IugLCzVwjd8RUnF GiPh8228QSBvJSpXJQLq4ozzhXFSDQTGQh7jqBfkBg7ALNHLiEYyI/FjJp75IJM74RxyOnmWAF/N adCGRuom19s5hIWhNs2iWrm1qaKeqvlRmTroc14pno/tibVpC5VK7KDbUfiOB2QrcsQx7yX6mLN6 0G7EQmJBw8shkhif2Z8qUHhG3I35RD5HoWoqi3mZ74tW9xuL48wIaGdRr3ep3j3Rb36jyE27ZVeT 3MitfK8hdPEF7SpSdolsCfKTveU7bnGsTkfSLZK80DxbdwRChirHB2lJaEvlOfOY6upkE4thLH6V 6nOpkqNRB4mdgY7Mf6j4pFximKFfRiSGaQEuPAJvNASaWrzmTUQcicnA/a8qcjScOFeJ1uKa6RtX uYDwnrofhH66Tepv7k0Vq9ESh5bVJColO2kwDVSWvQTMfjTB7ErAcqdJwg0uPMOEBWI/KgNpRlcD FfQV4uti1drUuyPsVaqrfA01MhwhQQMgBFoe8walCYEoZuzC2QAD5J2LcsuE7YLSo/8BwEtDpIZ+ NsJhElHHiMzz10Ei543Tj2U8I8LLTKHM75AX6QmnNWy1O8IV/e6w77bbvG0/DetUH15IH+nzSDgW nKUmkncNMb5OlmFDnHizJFeD+aht7rP+3b9HuN9udTaf2Ox1ut2WNfOPOEMRNquxMXWG/b61t59X mpKxxnVaLbdn7ffnb85evjk8NYX77dpFfDq9oHnXxdUp1nBW3SIyGHaqS0GOrOa2Zv27PLaJ7gad Ybf1sefPiXGSNLSZvSsTXwk560XryerAVxZJ4G2wPY7uaOfHM7luMRzppbTpzOpDwzxNiOFvKjAu vGhZa+LzzH3EAupxRGzyI0niqdRckivJZ6u5u/CqKOPECWuFfBqR7+ERAYbMc8o4TEP+grb5zvVS PVJfICoYR6aUbfCUsDjxA9ht6I88JBVXPXkWhFLaUpsMhQkruY3KROMFXu2XX//7229rv668Gw9a cVrce+gYTV9mtTek9KW1n6pZWqj79RaUUqge1OwF6gEsecS45L6M64cF5c9FJgmA1nYNLm4y+ah+ Pochy9jsLsq4IdyagcvtHHSaB52eSf68auI6Go9GKxE0K/bC9uHhD4Nmd+Qc9tx258VRd53a6xbA 1NdT46x8/LsdM/xApmkYt9iwmrwjEeQhW7T4Nxn/h9h9G8bDwd/EeA+fHfq+TIv9E22jIQrHH7fu qfk8cFvNjturGfxoNR0Me8zVjfmltqAOOt2O223XH3377UG32ekN65VUS+2g1W43XXdTZfeW3kG7 0+736lVjBR50u51hZ1Px2oI8cN1Wq7PBatnqNPXj2uLERYmD1v0u1RfrwbDfaQ3+MubO1hdz5xdz 5+9m7hRr/xRHFvv1E8J3KZ5xXSuFWoT2xnHhRHWWTtT9+ZyB4AuCeZy9+Fyp6KQKbGqR/iFptN6S ooYdVnt46i1JKuy32g2hu/gTGGUeLkLkYrwgAIxrmAhYAjDYsoQEOTvN+ZvRyfh0jMxKQkM0JK7R 4x1fcNKds1cn41ejnZ0dwQDStm0qeTN6Ob64VKkKBSeXO/oRt9Go0pzma2QKI7e4rFMmJ8kSlFHN UEwyg60yrFdboMWmue2tt2dv/uXi+Oz8oiEsPBMcaI/bcy8kr24a+Dn0XfxfGWSUsZVvjdRjU4FP sI2RflHGqeYaBWe5ztneQpM2uluT1GxDE8oOHuG7Wz5dRLLOn8ch1ijvK8NN6wZsdW3lW6iFCwnF R9igpOecwBsFlfkb/SVICktxylYtNtwdIZqDlAudCZNT01wiedPZ+ejVjtg9kl6AU/rQ7Q5JgSPa uGphOHs46n7As7d5EZuJ4PCpr8yvGsUs6VlXhz2I+yvjLIkiDI9q399EsYaAORR5BLTF82T08kxM MmRmyBIvcFDucBYCDWq8K3YPT/cOkE+tYlmI06IN4IekKSmlSnsKOA5rXxgcj8zpSXQj0SVHdcno LsKiczzXO/KNOqWPz3PxSs4SUlZBX3ZOvNiQf/7gAQpq/skEhFvjf/dve+swTaJkBptJODXJ/qEo KZXXi2OCND73W3WRD+4l9Mj540NAOk9wzXrnvubYarYJhxB06FsN7pK2r9FUBkO3ObAK0U2YKC+I fkvv3OrS9HoIR3PY7Xcq5eeCVgGxj2jZsPUOmwOrNMGqYVSjZqc7sEosp6e3kSeDQXtYBZHVrnZy SSe22iG2k02+Puy1O50qYqUWAeL2m5XS+rr0ImJH9l2zW4WexUabWVPpHn7zUN8e1Fxp4YQbdNMX sgjz+YrWqAM1Lkl7W9JOkMHykXhaXMHEtnxW0mb8vBvRz7lhZRXiEAE23yC+lPhwGcGeXHfbt8Sr 5EYlo242D9p9wsomEuHsAzekH8kIrhlp9RycXFrRcwY30yjpfB3fZF+7Oo96bSwbVJw0Ja2kA0Hq xHn19R8XCkL6hdPq/JZQkKrkh2LreNBFOS1yDoYjRkIMNyDOVxKrmBF3MyFyD8fN+bfhXRVe5yd8 sfBNVbsh0mfk5SZlqzv8y4RR/J9VJiB4/r/pEn/OmD9FlbjXkw8FToipvBUk/0hcz5IG2+ktmxe3 cLsQdmLPMp898Dgdm/ACAZN5EcEzRMgZgkYo5i3OQz8X9OyYQKfQgga/2ZSPDwEyx0hCpIzpDaEQ j3Yd29tdAK45cmuecCixStUVJwJ3uWh35vbWJHmX7CnES0XhdYKfgQAY2LCSMoIlpPYAU8FU5N5S IT3jdbhlX1VMSK3BomiqYDqgO5GLioR+jSzK68m2f+VBqiINRKEcuF5ubsMlqEN60rWUKfuotaPt 1lPHFBh7UD2zRPdEXx+jgfC4XjX7HybQsFTn4dpV3jUiBt7hR8zA0PjmFJ7Q9+ZoV5PCaco7w41Q 9/hlFrBjk+cZBY4V3QWJ3gU7Mi5kdkMkEruXZxd71AZfS+Pp6I/tLYuWjEuPPuMGsPPQP6VSsMc0 5/yPyjxEdIa6SWOBH55DM+CfvA5jviBIRXpsb6nUOuzNUv1wuHKOdBGiQlziRHpz8Zyfnl68ssOw 6PtCHZqoIpH/DDDqdnoEAD/hRp2/7a/feIOT4GsP/SQqF+vOBag9OVHJX0dXT3HSaD1nY7CMvQXC maLlpzaiDqCv1fRArklM7yd+vHGkD9bw5MEXm7u3/r29dNvPygWxlafMkp6F8X/a7e4U1/mn40uk 2MvnzBXGokZZgpgQS7SNoXwJ7kj+vT1p2tF2EtwN3jrodmE953MKqyjz3PPDKfEiWsIcISpOwkXI +1jjTJydX8GZnn/tta++ngwLwpnzqe+kqgqAJGd+vRlshmnL7TvUe7ffu1dg06nirtNEuOznHw8H S0+9XxnCpV6UeCQH9/N46jqzQobVZ/bXquPntXPpHMOxXgsLR7oU53qZUEWl4/nOVcovZ5zioJxf Aw7SMHbw9z4BJhKF8498wa9b9EjHkcdFeo8oD+LLTzhr2+vyxb5/ETu2634xZH8xZP9uhuwx7RLC bBwTRajA8khvhXWWacABRXzSkqi3WKpzRjYUbXtLX6ooSHqi3jMdZIYCBG94ODQXDeLJajpQ/J9E JSsra6gGdgQ8gaIAO6RAUhYZaQMc0EsahRzPaRrRp0t1cEokp7B0BVxOSVAd2oloK2IFjJpMgz6N babC5SD9VDAbjYgLAKzBQHFF0smE8AXhlG3TRb0oAjdNjUk1ekS7CIX/MhiB73dL00N61FYaShXX ZisKcQHi45zj/lhiIaYLsUlRlNxycBIBQIUBwwXRhANA9anVvQMFqxTowF9fiUPBk8TDnOEMq75h SacHCoN/fgTa6aQ3SBtUbYnvnk6+0/7Y/OmTyXdPn6T19yZv1FcjBnt+FHIN9V+cPGqSbSz1ShY4 TUQkTqJH39V/5VUxk3an1l0maq2/9MpTSb4erXYEtYAY93o0MVUrZAIijTAXitqgLscmcaw5cDki vnKzjMA9OForxmLALtch1ZhOVIWQOcRoedmGyZUzM1UNEUjlCuYfZi9FOJMtC9/hBEO6f7UBrtLM DlBdaVB/uWGYn1OhvuLhc2q0E/XBNffELk1sXqhoan9VoVsP7wtPPL/4SSAhmwqM44s6x9BBaYMg PAtX3pJiXXiImPfikjlZJi2LY5VsktxoLsV3kqUEgQJ95ygxOcvWwIn0oXCqJEiU8odQSpnNjM/C 9kcr6oWEibNQcduxRLemqJXXrFmyzAsUGNW5S08vEBQWPM45UTm3wLo48RpCaDOqZhffRZJXYRiX iCLHEPbEVLKqht5Sh2KtqIP1sGkVtXN1Ab0J1JAJmuN6Pb6kLNb8raJKg6M/b7PKMfM4iYLHwmTF Morm9tbqRsSoFEtWnHBaol/E/R8HtKqv5YqswQe38yTSyFzNRkmqoldo+QNmTF8jMwNPqA5htgRX 9gvaqw6N3awEPZnw3Wnxhe+JeaKEml+z0Xb5DljSc0mMQoEdw4QR4VRtQTLmkLAmtYaNqIXfPIH7 6nlDcBR3sr8yFLuaeAruL6m0nESkP+iW9xD0DmsEN7e9tZBebFZDLQReRXSu0oSNLQ2O9i3URb6F jnHXU0O16XxlaiYCyeWsyQSRpGYuVquuUEBFO96+mu4023UpgclumIMgim7oNJGoQAx9iFBrwuJr AncJ9koUZKvRLu90dS/x+vw+LLWXhtQZ+gVJPZErvi65mX1wc6aOPabQBc2q6nF90TL3WOolpgK5 +XQHzSDS2yu4QnDyBmHj4nR58fpE2CQQ6qJDc1Igpo3mBWbP2NFpyDR+HEX6Y7MF2SgWSS9jSSPv ZOYjKp4ejx/Ta7UjzEkS3icG16qbmBXYZQjCDIuWXOkpGROoYz4q78eKLW0CQk6kjhqfaq6kcTO8 z8fn36v2eQmZTqy0e3y+Iw72NCbDwaCcaIu5wASlSQh/pCYm0RE7ZjfXFxJAivI+m5MiDzZhZOr2 VpBYuxt8rgg2Rr2MXqdlJPJyNlM4k285v+VFoapgTGcFz0l1RToHDdM3c6IyP8KHRBn+zibyM9YJ NaY/3G/a73eb65aqfrvvuoNW2zr38nI6DRG7oO0p9sW/38nhzFN4+f67NMn5bmz93O25rdq70BZw B51uxzpMkWPVvmj1q6hdx3H2TCBr7d6Vyh3Z6Q6qYFV8vfZxZL2nw57rtu3zKlxCvx60Om63Sj6S zUpWCGyFLcQFm9dxUmTrp+9/LcO1Z6TCrJ/2f6hTZVyv9DcdoZ9Enn89Ken5HRDeSoxt03VIUjvF fsJ3KTiBrF41117V/55lCUHabIfkqBdzmhj9504cZ6lTe7zxHP7f6fX+SVLS3vph9OZkPLp8JJ5e Rc5EZlRt8azIQi+QTrFwptmnG+NelwSXiS3miR++L97nInjPoS4p8DJJx/NfLo/PXuGvF28OXz0f GXPc5bxsCNfcQ8wBrW73wO1vDmh9yYcoia3iZlpi9ZckE+GrOOMxE5CKtdcVG1zbrivH8OCewW7Y ncRJ++vOoP21+yx3OUmQwxNc0XCz1Y5U5n2X/mt2W060jJ0g9KLSkn6a1U6+P1CvGA7cZrvdbQ9F p93tih6ueHGdPt/sXNGjUqDc9oE7OOgOYavce/DofdXAxgPya9Q2lX7b5EODP14+3/tj3Ny4jnvo Dp1e77dc5F0r+RkGv0Fv2PxrXPzM9r4v5r4v5r7fzdx3rk9c3dDu8DnyMNeWEGLNkrOYEZj7L82X G4LD6IycrcUvInVZGYGxCgtCULohSGOdkjYh3hv8QUgzxtE6NEUIdsGheQEfdLuS0CBLAnaykhBl EVJ/cQDWBwZVjXKUmzbyRTgqR5CtvMIBT2HwDOrc1JS3eJ+RXgPtIXisK2cfaUArQMuf3UVSyBIF oYHQPoW+9/Ls7OXJqKGsF8gz8Mvh8dlZAwAVQYVQDeR+nkrvmqs6Pm8IBkPoJG3lXLz+8fHoFXKM 7I9PBP2lRJz4nv7pgRCxPWiG2T9wvq14D66IkVM/ph5SJZQGLXNY6Ig0eikIwgA8+wjjy2JSqq9g iC3vzOjL2AxeVZy9x8YgxTnGYUkchER3JyQpuYld5Okv9FRhOcz/wRGqCLMts2SPJkiwip2Ed+iY JlkMq4NHqIiWWgjS/jfBLaocdAYFYfLl5YN+KssJ7JXUL559wgEgZBiHC77UHsoqiqqciThWTBU8 JjXS507lSVQqXE9lgUXF3w9/OkTXnu+RYk5EQtgEEB/pgLye+UMZk0LM6+19hBlhKh49pvWZ4Wz0 e2pzFnuwHHxvZmUdB/G+cS4cQVI7pAnIlCqq6DC+wHXtQJB2At/zSVmaBlofi4lXYi0RHhYnv/Bt tDQs2LYWWqv1oUKrjWZ2xDQJc0wl1blrNuK/np2P1PpSg/ijs3e5w16n9VCC8N6wionEsdkKKbd7 w94nvHpe5fCjdtwq9PHVqDrk1iYkZJ4HGYL71jSHy7Nz+znhQqvHeCnNQKX71A7RPVR/eJNUDvta lCSxT5mvH9RLZZLWNAT7nBTHeLYh2nIhYT8I/XytR7ekeueJ3DS6IkQSv/VGFDHqVd1DsH3S3F60 j45+aPeO2s718tdF/GwZZ8jvEOWxyUKogP9bNn09n2ckIyKZPYvnXpIsy4k/U+dZLZR3Gs74hTg8 O+G7Z3E1/RvHaTQ+pgqsIvXDHHzeK5S/5GNtE45MKnG3hNjRoJ3zym9A4Cp7as8ZdB2326xnmrp1 W04cOxPfi4Mw4FCmYaff6nV67da9YmKXUxc2+yaJdO26R7d50GwfEBo/fSjb1P2yK0VWciOufNTn j9oq5uBjUZsbErCxU3waQdvEV7Pb1v3x7szLyRoRdjYQxjk/I652xC11HRhcC6R2ozcwE344enNA WPWv4Ux3cSSMM02wAPKihwH2nwA0+SoezmeDHeaT/KiZfz1BjE4Q1JBsPOw0xen5Me2HwrymJcSP LvAJrJhIiCBQa2tjrT6sZjjTQIq4jSsk3qhP3RM1OPHHgHfPtShwFRBplUtbbVtXW1VJ6hkhBloZ OplDXniZukZRyQH92RVSUIDqvg63Y+strLgzkqoC7hLTRke3gat38nkokRelRhM+BZR5hGvgkgpV ko0UqZt8FWeo2KS+LFJKzIi2avrIK6Ua6eJ6d6KNtlsjiybKcnLiBU6a8BCRiUr7FdSf1GmfEMVC eWdRAtTjArriHv1/xFmT4uS2cgfg00nGV01qJ/HqzDB9HuEc9H6ZPuLVFSfCyDMki8hh0OUm+h9a Mdw0WwIEp7fOi2WkMjfXPosSEi8eB2pyin44pROdMYcPLhGZ0NLg4ZY4MQosu0FCfdWZoWvvu3qa cunpxDtTbxFGSxOLqMcypP/b6q20rFdEZJDVKZ1KbMLejEKZTAnPMvp2m44YI4bR1qiye+jDMYFo A6GxBVjL2NoaKHQOC1VuOtXm+oXuqOs6DxJdtQdwI1QSC7ar2/QxNBs+qoUDGQGjcFrYenxbypCE +HKtW1DNtB8Fn0ferahl5FkZZ67WPNY/3sVYsaoxfAI3EH0zD1M0RC25bb7DC7jF1qMhRuURYG84 PFigjqpMYyBTJseWpzHwCuZ5Ls1CdTscq6oqQ6ofqtCqbUiWkhLPNPlATBeQYFgmpoauU7VjOzUF ZrPpVvB+5uWa6HDNtJrfKFcQ1vMMU0TyMucEUZJDM1TdtFUP4yV7ftaWTO5h865OlXozN9lrbqFQ ZQ1x+uTFE6r1e11rvzbmCZ93Ic2LCXNvsqj7Gn4q2jGVM3ZsbWjXdHrg3K/HuPVuEUuuuJPyGyYR cpBRwxqdKs9YgaxRmfIJepklxnCN0CwsTOx5oU6YwrdCvBF0rZaoajNOqsWW2wRFZvu0aGsqtsj0 5uiaKNFOL2Yh99ii5Sa5t2w8wh0i+z8spfOIaxvdEhMhIfJr8q5IxTwOUiHzq+gmeCcWRX6zZI3u 7XJaSlKZ8+nVNA5xhV04n4jraxnRNrrJg+2tML9LkXqmJEWOQ0OyGxFlUbq9FdHiub2Wk1Kk9EmZ RmI+vb6iL3CHk0RKG4TcT1NxE00QFw81diLiPLwScnklpqRxl9yJ11EqppKaI4F1M5fhUtzlHjWa Rr8SWWceFC15hbxUMzkhPhoI6l8oppOpZH3vJ/kuEsvrUoTJrLii3e1n21vJ3YIGfH17S6zFvxOS lOX8f9h71ua2jSS/u8r/Yax1KvaeCAHgW+sooiXZ1saSfKIcJ7eVcuExIEEMHsIAJKFff909A5AU KUfOOXu1VXGVbQnAvHp6evrdMmNZ5gKnXEZA3YESZDkaPIGoljEDEjoVPGDVHNZW5TlAtRTsrhRx iICDHiKYRxjymT9FTY6PsnfKlnfTGD3qbzEoYOFLmFyCxyh3YM8yXgJQb5VdLYNGXCQsgeVWET0b BSmmJOMeQHcmArZcLJ8+kV4csXkG5EIsZ+kd8wMchy8AfkLMUfcQBDKJWAXzmj194iN2RbEUrMxD 2DRHxiyIpUzgFC6CkMY5jQImhQcvJpUD0GM8lBHB/oaXks0qwRI/WmQpbAmAB/OJTIqCRZXPYP5C CnjoxLCVcJ1NK7jnUVdSZDyCjqLy9umTAM3A3F96TLhBEcEWOTzP0IWN5QDOJIMXQHonDCaCGzAP 4EXFIpfNQ+aUT59E0Ce74zJ1WBEBKL0YiHkcwRuvRK5gAeylCF34gIVyAmcglwvYtAVIUUAbojCA vQZQVrC3WYrKFbS23lbuArN88RlB4WOGJSxgjydwVm9jwSK/nLEwgNGjdA7HqcJEZABK6D5y3adP 8jBkBazahe7jLGCiAgYK9sL3EwHj5hHgU8yncFx5lElUCbHw9g6wv0rhlbeMBV+yqQdnIpuo48Zj QAiZ4Q6kgQgEyyMRAspkmbKp+lEq4DTiZsoggGvM5zOAwVLMmBDCC1gQ3rHKk3CgOCaD8QGLE+gy rKZsFoSwEl8ARWAcpgMYGGI2JpYvYB4unBeZogYIDsM88AQgAwayZDHNbFQAXshAuvyW5UEyz+88 tgyLpVgABvhFiBEJjo8IHAAHDQAv/SSGOfCqzJgfIX8bJtEdYC9gpctklM2Al43kBPC9gsdCxMCK uiLJMN0XHFEJp6YCbtSdwKpAeJgD7MPQW8DNheQjimJAlYBHy4jdwtWe+oIg9EH4WcIBT6VAa38J 0I9Cx0d8gQ1L/bAI4IhE83TJkrsyWbJZFixByAlcR4SAjZOFh5CDM52p9KknQEUyT2Bkjl/CThdM BjnuaSEwsP3pE7cE1r0AVE8mQQyXBg8D6cA6AR+SEiN6wnTOJoH3TJGypIJZBRGL/RgwHHZB+IBS cGtp1dSfq5yyDwZ9a4dyatCxu2bXthstEzDez2rlSXdoN6Xo9s6/j/e08sTuDoarNFM3787HO62+ G4mhup1GFfNRgli4rWZyKR3fj1vPH+rGQYOovyvj/ceTn84v326brFFviSrorRbAVxXbWia+xOw3 lOBgq69352/fPe6hW062ux69PVv78OtDEHboMmSZ3LqekYfy1rj1DM+hD9sGppafiNTVRR+SwN56 hIEC5r2njT5jw+69oTHrjIYDu9+33phDI5Mymi/nyfFclMtlOFs4oTb5Ko3Za5LEfmI/pXCX5uwV JhtMML8mT/q93XU+3l5fffzAbkY/nV39fHbNzi9VJo2z8XhTfeaIwvBzgLKRpfcTUG1lpQJZSBiY g/CBVFOpNESYlEv6ZjNqpLGotrRJtqVNsq0Lx0M3QDltnfmUYQJjS2zDNtFJ5JfWVR5OwsQRrV1Z 4LtGx7AG6CDSvpedv7BhK4NeP3Es7prHXppnhlQRNCoGZd0luD2yh/1O+41xMhoNLTjpx3HVOuUz R2VnfsVzS9hFJ3Zu46LnFu1uKYpONEmdfjRpe73SC487zvJ+r0PbL2a9511/+tw6dlNXWd5zjzAT epVpGXWKQS/tRInTu+3F00jKfhGJjh2Zc79f7Ow1++fC7hmDTtu2ny97RnswGA57Zu9YBazkvodk ykB3wfstC78nOoXnAox5FqZi0M0se9az09KU3XDQmYtd4/3z45LG6wzbNN7QbA/6Zv8R4/WGHowX 9YD652Yatvu9BBjJzIwtV/ZLL4k6u8Zz8nc4HlJPNd6w17eH7UeMd/HhrWF13EF7MHScIOj3e4Pe cDDseVwF26OeSB+rpo1XFr2eYeFRfG4uu4bV65jDXvdYnX4PZdAwaeoVNM2Gpt1pm6OTTpyCKAYi lJPxJXxzjGrd4dDo9owh7PCwPU0G1nPRiTHgP8kzy/BrrNrsrduzB2/6X+6t7XS9Qdc2P+vkAUG6 DHXVgF09Dqwz2/r9HrltuQ/3qJXfmOpLqZF77J9lomsadA6xQuJQKZAfKNJgD7Gggo1pKB4q0lDF GfqBAglz1tTnO+kqGw7huFtmd7jZua7WYK5mtwKGaeI020N2dvqlag3m+srWmty3J7xGrcklBzLt fNEJZIOwHtr9rmV9KRXbzrxuDb09hC+6MLEt5bdl9trWgG1T38NOH47N8D/G6aT7l9PJX04n38zp BD3cv5cq/Ik0PrIos9BHTwo3x9MO90cq+bLEaH94kHAv2mcxaZGC0kM9LazFzVGX7XMHGgClEuhA 5qVeJEvMGSqfMeDqUXlLMeqoF0ZlZBowxaI/Y3QSUNeTfF+gDKw6JgfoSRPl5dfOxiAj8RAn8Ax1 o9/nKrmYAycgwcge6BfY1WJKswAkiJnrTOQ+KcRQ34/9UfowapXIhcoREPNnqJim5NV+ihNRDvho CAkwB/jN1RXTPD97PTqlwbWPNjRCn2vKUQ/soouJt0grFwH/hnnRSOSgBkjU0eSRoLJVrQ8hUJdy BOEe48mUlztmbgCGjuyxMI2CPOqVSQOLH2BVTJdKO6qtK1KlpY0SzgHml7DWCXoVUo9psvaSYJE5 lc5ojbUx3bKoIy/gFsmp/AHsZPCMfZqGuJ5KgZo6K8h7ewzAYBe/spOrk59wBVEDXUqRn5LEec5G F+z1NRbaALb86t174LHfX12fUtkNlGTYxQgkgGvMW/dxfHZ5dkPNLioWUEWNPK0w60Huq13SiIOj ZBx9TkKtVCa1ukomHcaY3jxtUk6rzfZVCg0VnkfOprWcttlvzkhM0+Iyozx5mCmvzoSBb2vDHHbI tcOkcrapU51p2bhu/of+6OYMwDQmUF2CUFLD8h4k7wORfavRmz+YQ52yvIfkvlRXGNC4qYC80VyR l/oV2im0JM10KJObFgUqiAKC/gr0WqHtkmMRx7SHZF9qpGR9TNHtKVU50qHH/dVWw1DytgRRltcB TXiUKC1ggnHpiJ0oLivEwWZECXKuEdaJWV4mFNahCBSSDXXcUc2o7EVow9A1aGRTkOUcrxkVUEoq dJcsm5gSpka6EM76qQrvCJzJJIXVUhkSTIeCgTw0BVR1YP5FzM54Ovp0iRQHdhZ/BYkefyQkoG9H 79+zd6Pz93/gjFHl09VFVP+h45fHqGbnMVtywQK0BkQhE5HMWFnFLACKhn6G3qxi7gKDu2TCJmG6 ZM5tUrKKFYs8hGYxXA3hrTtl2W0cwdoj35uLKGPVLANue75ATX4iBCqHgUrHRY7nFsGAtj2xYEUZ ObfwtoLT7om7JGOLeeDmrFw+fTLx0iDLbr2MwR3jl0XEChnwLPNgkzKZSlT4ShBtiziAH6JA+AJI NPeEl7KZALkuu0szIHdZEhA03kYwGRmjxtn1xW3qF6zkPJ+wZBqwlC+Rh5BcOh6LREY6y18coI+s kiApxljFJhIilhJANnMCGNlnQE1l5DPX5YWbT30WcDd22NTNg5wF/hKGgrsmuVsAqMIou6OI1tmU 5TJinDMRTOGgLBYCSEuIpqIoBJDHGLHlRiFcOqxyYGIsTgBSngC4w9yCFOh3FrD8NuQl2gV4XmWC hXnglrcLxn0gqmy2kFHqZ3AKlrkMnDIMWBElQVYxMQkSdwEAu1vE3IctCypYGsxl/vSJCKRg03Lh OlHmwM5lLIPTwWLXn4Sw455YwmGFgxrIfMpcEeTSh2M1h0X7wDnM9PE6c+HqiP0JPOJ3i2nIInwP IgMgVZXOGBdx5LtsEYkFDwRbxmIeubdYBEjdJ79wzpFgyOVdzGIRh7EIM5dlqvRvgmGfnhSFC+Dz Ag8ANuOBB4et8CYeNvNygK9Ep8kJdIqVfQAfOdCHPHHYpHDgZ9dlMWB7WcLCbwMJGxInBDnmBrc+ 47C38D3W9WAwh1kJGONJN2bhgoeMxxEg3nzhMp57PllOygyQO8Oz4LuhQyGzzK9cDvQMWAsP8JsH vAr9OYAgq0SIduAU685yOcvKPCK9P5KZ+A64K5ky75Y7yPAsAFJ3LKiki5xOJt1wBgd1CROArV2U Mx+4EOD7MBQfbsrCB0SWcKuitbuYAeVjS1hvVuVJlAPaAmrJKEsEi/CIV9A4EYjgTjQXbOHChsNB liWPCzYHLiyewfYI4QJ9jWDL4cMoBZQKsPjr0ye3IJtiIRIghIAeiwlzJ66cZCz0cSawwmzpVVhV DPg9YEgwOLIS6ESGQXiVxPhDoLtRlHk+QGMCqyzgADAvv2Nzmdwh8YGpAIIK2BUPnWxnUwT2rQtg EjKHq4RK1nAPXV6Bl0NbAxeLFJCWIsHZLAGSgUvIuco2deWIvEhgK4HTQrMJIg0SFZh0ebcAHI3h 5PASro4gC3FzgN8JgO9IkGLMnLxkwKxFd9BhxOSEwSEE8lmCTLaAg1d68CJI2BzoYBDCIYUVe3iS b4FQ4l7HuYQH4bIELjILgGnMgPzBGXPQQufk2S3cYHGGE1jyjKULdyoCwM8J5oaIQ4DdlMXzOXA3 LBVhKqdAYmLJXegzUTGmI1aBEO24OHQCOzCJE7RaAawAuLLKYiJsiFZyJmH8IAy5S0fu1wRoNABU Ar108rS4QywHesSmRenM2ew2jCYLFi+UQeNNkEeRezeF95FXZVPoFiaF1b84X7hAaZaIrXEqYl55 eBCCNETiEqFN+xaQqahgWgD1KeBpBJsJpzuPBdwADpzaHHAyBeATcz+BCwnIzAwuJu4LSlIJVCV1 kdstb4HL5fE0WvIYaEuVoE6cR8DVirvMgyOBa72bA++QC9gql5foLxFNgC3wJ0s4VrwCOgVoBgfi NvMD5HimyCRWSeYD4QgmJdwPaLGPgHYlqJJkIAVBt7CZIsFEE2gyu0UjVUAU5XY6jwC4d75TMuHG Mg3ZBJCYlblUcL4IeDRjqYPBjEB+hYMGXHQsmXmuDOD6vAPaFUUCQO3AxQXnBIiPQA99IBHZLfA6 JXAnQAjmEi1jgOIYjAn0MIbjksBB4EHEJi7cohx2RXg+uvaH/h3zAM5L/y4qmR+lU7g2ykACN+OG AZDBJA9i2Ajsz/EjuHxD5P35Amg1XLfKvvvfS7jZXHYX8kg4cMXFc6CtkxlMFy4gF2YseLTIESIR T1wRwf4kbhTfwhMQ5pIA92aSISsHnKHwAc9CDwhuDHu6yGI/B0ofVABumQcoOHk8hHsaNtYzaqPW n2/YGmK+2d1e14Nht7HDjKbb9UY+jD9sPVM1zCl6feudSv22bc4JCwr93nr+6uBm9Pr92a6cXjfX O58+8PkDz3eXaX7o61c3p0evRo+dy1c8vNftNzFn4ePWNJ1gLGUEVKWijKilV/+6aoccOnD3Pn2A 7D++sYxJmk4EV33tiN2c5M7Uif1jX3qZMnjB6X/xlp6y05KqL6XJy0cHbVJF5PGHfYbhz4Uz3Wcf bt7XIQi88FTGP6WtbVN5d9LT2vah1Tk0MT1vfztGU6uN1PBrKzpo1NoY2bthlPO9XtD1hkPAB3Ng tm3L6hie2XZMp3OcKZ3xWkdH92xOm60ts211jb7pctPn5u7mjbbMsmw+9K22bmqZMLAVuLbdsZzf aXpvVGtg9ox2p+MEjvXApJumQa/vubbbV01tC3OGd0yrzXnv95pujmpbAH9ca8fjvf5XjQpAsvtG v+/BjAe7W+70um8bVg8WalB49u8XeDkfX7UGg+6wZX2xxMtAl3jRRojt6TCMRTFts9NmdmfQbTML Zm+qqMnOCjPNrsLM1bIfiJZV2NlSppDVMDvtFDv6V7aXr0o+Z/V75n9O8rneX3aBv+wC38wuUDjz UB57IBKVvorIeXGDj9g1/P5yqwwzHanDryVYug6ztkGQXq3ejgA9GahmY57pxLlYVFMp/rQbfcxh 0irZMHaT86LMKQ8NbSzIZjraULNTpPReU08zzU4pi0TdC7XFjE45x3eUw+TdzQXcsXgqYZrTdHGI H/5+CYj1P9hCM0z4499wnc/1/ACMz+usvTrgV31+faT+Pz3SnxoSVujEfL9ua5CSHB8Be3XafP5q xN5dn735YU/jUP01x9/32NHG768ORker1gd61L9hhi7SZtLDZua/v9T1P9TT35C0quyBoVbuAgtz DyKvVN3OFUjuQeTFy0P23dF9uLz67odNyLx4+d3RPls9buCDL74Io1Ubggp+v3e04+FucL36DuD1 Xf3sD0MLxPTpvtJLYwJr9I9fqPw/qEQOKADnUKUkOqJQBz/1JKnlAZOnFHCBwMa0U3VK7zilSiqM bHB1UfE69VWh6oFiaA+gvc54V9tSODv5eVwnj8LplPA/zakeQ4WeYA8/c5F6mMs8w0BZr2AvME85 FQ6FU62ZVdrhfzpz5yWVC6KzBeSASlrDQiiSXcV/Y7VdRbNmTuTkhWMoUYnI1lyPdRDC5bvU9Vqe PnkH66HIJlT76yRC0E8zMzz+YdDMpc7MGGICQGWJI3RXFwHVGE4o8xqZdqZOJpU3PJmhtjblkiIn aEiy/sE3AaboVjQGl4TMOaB1Pfzmeotpjv7GKhoCU2X6yfdAjtCg6FB6K08FboUBbirZ/GiDVKo4 iTZEQyEFTAQoPuHer9pwimSRomWwBeVEwziXkGKugJSGaHjRJWqU4ZMIZFhQWMBaP1TTmvhEjIzG TONwG1F5LEI9ilx4Xve0CEH22miuMvMlE7kyrBIhhT/P4wpvrH+1D63ub/vwO5BsA8n7i+fO37v7 LFo4+cT64blLmP8Jd4Zs1Y4Kb/JUJfh7vfiYz5COksrcRlZC3GFtwXLyHNrrZGB1N3BHCUEdN739 hin7Fg7Zpos0xVSIIJHpUAj9ldHGJGxU9YmSbG3srlK/JdXCqeg0CU45ANFwrCJg6FA6FN6GicCK Mgj2a4swCr+AUlWd+ExOKTs7wD9mtA5KDiBV1nZ9VrEulqQYwrkjFDmF7wNHIVGc+p9ruRI6VRn0 DDbCiLM6/1mTHlFWEn7Wpa0Ui0S373onixwte7ncx7K/sP0+ATPNgQnFjIxUanmrQ1UTfRW9hGnN GHdkSMFYmCyMAEmGagpmalBwoYtuVYSnMPEC07DJogaqLnqs84yjqN/QTMKccypoUx+r/doXQOrK 2mpLihWdKCRpwZXdfWNjnz5Be6AOmiKLJlLRSRn6OqGsii+UDcUIC30SjD9fY9U5AIHP2i4OPLA6 drvdb5uNLzZdz0lT5GnQ6Q8aJc/YS4vGh7rX7w97TTN1mPU7CytoNr7aH5xSNI26ds9q3jjA1gEG 1Gqc9qC78vCmMuD1m27bHq6UZXJ/ZwvcsEZN1u51u/2mWBRc/HHTptvrN+kEKHJ7t4rrcJeWzj50 ZF3TaQiQWRXA2lWM4NXBm6Oth+Mi53w7jdnPcOT5ZhXgax0FedgkxXOwjo+RCPZi/Zd/WUPMPmVY Q8uwfnu5khrciq0YfPZiYAwNyzmg/3QOv7Pxxc0HFvrs7WhkD2yzv2qMh3xdxFmX6v6hnP8sk2o+ WcPhkJm9w+7wsK1rPrEXZxR2f28JcPk6gpIxutXGelaTWU0AZnUxGplDu9vbMZ5l43hWWyUaYy8u znaMp6aMmSSV1KKXAy2Me68AhugTbVjdrmF3zd9ewgxXU9k91WaCne42tNTXR/9YdfLQEqwvL8FJ wghmjv8ZksqMf3HuHXsnBqxgMDAGRv8A/+1qJKhx4HI0atvdrvnFOcN9Ojy0O3V+tw09oNlu26bl BiD0tZ87vsU7Q24+D+yha9mefbxz9kdNYrs3OfdzWOv7MvGne+xVoH4/3voeVSd7SFUoQDEsCvg4 g1+P/XASemkq9HcnHn6n7inUkiCTg/fz3s6duqeVbA/sYff1CRys9pl91rbe3O99UwurRtHEU7mc secAoYuNImn3IImJJAYNJM8vzh5bgHnN61Ur6vZCmWpN3d6jqjE/PkUcvnhMejhUvtpYHPlrs8M1 DXcp0Goth9olpehf7fyujVf6COImL7EGz6SsJPtMzOxn5MOx8pIAFmEfb2DJgf8A3hYgNskdlbOe rjliNsZOTK6C2FeFORgolJlhNdbay6dO7I6ii0oEgnPU6oE6MJhYLk4qD0aeezA7LdJ4ieuReh3+ Ei+RV2s/fg5KIT5TE/zXi6QBF9CPY3xwfvpDb4hBVAob9sYcy6+7ZVEAlcajfp54hpJschQK3Vyn pRjDUfAAW9WVo1qTrNQqMxUzf3J2VTN/SZjwFlzGxbSVAgeFNylyxBL9vFTTBumJdfIKlb8bk/sq H0+6ybXbI4IcU4urlirFci1UejpAHx3DkE+C5QhycUwA8Smdg7HXLBbzU+sMTzhYqbJnE4jQ0UBH TU/Seow9mrRqfApwqtibdBFx8gRlmg9ZufGNXJ776Jb5lqpYHRiGcaDa3mC3xPTypUMcLXHISpT5 hGy7ut+JfSROWU1ay5soqNDRqUtsUaI2zFkPpNEH7lIn8yYGUWX+xTzYif9jvfK1umXy3oYfoAf7 QdPlieqRRpdV4qFPDCXv2Cd1GOZka1IaI6fy77Gddg8srLvZ22E8NXtWt9Nt8sKSB2/NUcId26s5 vT1yDd6reU1zsOI14beGc7WGq8Klo8vTFf9nrXp6MMDwoY5Orz5dvr8anTYv4U/Dw5W7gwx3huh9 vMRizTdnp1tvWmbPNLftxhdOk3iqi6XlmymNy6T5ejgYWit+/aEoxoeCFYFap8EGC6pLPgGaXwDt eyUFUO7q2JumnoECnNZ3b1yFpAi5eXs1Vk6cmAcjLwXGB6uLEKerqjjlqoqT1Tu0+4dWh9HCt+Ps pkgKQ0PCNYT21LQOm/tmAXcb977ZDzy/G/SMtusAoNvd42IC36uFrofmfXCSA6zyY9jGEFOvJuES jhJebY+9xZvLe3V3P8rKdqKuBzIh7SGkscwrTPIYCw56aabDrpYtrY9vBcKZrMcS3kx5Q7WVTgST epBSE7MPTjhAH++xDebOljPRR9VbzxDJ8dQJ88qIHW+KWY0VdHYYHOPIa/W6LXvQardblm1EnpHn ylK3aXWG80bYd6njDFmLfUyEKtnFTtNFghpKCU+fg/jSPbjAC2mVGZhCHkclaukwNT60aeFWHbKh 1bf7WOX3q+ttrcqtchEZmIMzWb0sY0fKpsBqKy2LzQKrO0qu7qrMamCaeVFMVx1vP8hyYB7g1zST 91wX9Boo2o5eEezuA/LxBbd2n7XDbm+wHhe26+hRLFav/2AI1tDeYVDttnud7r1+N2OxuhaI9/85 NtfBXzbXv2yu38zmepXsuCntw07vsKNvyn11y8JZ9Q9rG9DvxG9JLCXLHSTQ+8BpAtgL5L/vR3Ad sW8Uw3XEvkUUF3SMXW3GcX1lGBdbRXFhX18I5NoZfMVWsVeKycNO/nAUFlsPwsKevjoOCxvhXwqo wlCQtx9H16fno0uM6hiPzt9fXbOLq6vLZ48J/TDwzwgGpzrEF2muopZGAAU014zLfM6rVZDIJ7jG tISy0EunJColoBj0/KP6sIVXOqOriOFdBFf3muhy/5bSLfa7pvlf7HWYgOhLrUj+0vtMeDY09zEL pxKrFZeo2o6U6QbTteF3OPjbcCIPTgGg97gG/Lxhv1nN0f87hB/roDMEfuQRtW3/JZMw+21LCHjg cYZ85HZSEnp8/+leuJJ51p1Gd3hZPuDqeVD4uzwyX1+d/rrj+a6Pw0mS5nzbp/Wr3EJPH++zeq+H RuU4wpxIF2glFiIELtqB349jBxn8Ot3AV7hiYrp2qjE3QaXCqngthRQyVAtsZJW1u7VjmqlD5zsm +y/T3nbJvEQbGnpBIGFvuF5MDrDBoQ/STr8zNW036uq8D/jvpKz0UjYzfaR21jU6nc7z9n/3DBAa bZUsAotB2AaAzTfCwje4g0RXFz85WrscB6kt0si0eubtzsGaD69dFway2n2TRur1hjrtxuNHak9n gZk4w2PlZfvAQO0yoBWZz92FbQy67cFjhtkhtGStXr9Po/jOoslw/O1ranRNo2O124bdWdXUSAV/ hNZ0s+UfSEKciaTFn0n475nPl88o8+c6XKldtvXE/nJOYcscdtr/MRU7XF443f/fVMJ750Ao3oeZ jCqgPpdX4w+ji2NdMDnR+VO23OuePkHoH/4BdNcF4BlT1wgyEEdsRD5UWvuq67fBo3dHBmOfViXv 1n3lgG3TqtJU8h/vd/kujmGgIi9h0giqSVrUrkbQnQE3tVIRk/69VvqSD5+qLUi+SMjyFcDiEQ+Z SDSiI8SR6jN8DiMEKebjd1ErTRX/qK49g9tNOQ/WZRdW0dnkWYRVCRXXqDSwAR7bQnHEIRZ2o9Se dTE/VboMw8JDlsEK0EuJWFj0lsD+ZKo0KCmNXciNycLVR5Olleo7j95gNV3ihp0Je6UMDHp+6FVU vzmiiWLq3EqxzYCxC06OSMCVJ8QK45B0IUNjxRBS4H2o8qFSCjlS9mIlB+6QCYzAoSq0NcupvXAc xJEQZ0O+E0eU1dOpC15iTc5l9qOCUw0iP8RAZlaqpIGa20bLiJzEGId5gGwf/A9MnEgTVZsSS2ER Mw2MLV866LmiQ7ZT7eqk+gaChfx1gzwMODeVyBn2hbIJY7lAQNQ3gGYc63NUBDxCUVSW0Vph2lTY VQfnY9j4P2ga6MpJBhKBp6bSPluKmaJDTur3FzqlqvaIoUFhBPqM4F/GWYHb4iQp3AAhldLQ/i4v kVGunj7JectXce5Ig1WyyylWrZUlVw5FTBl9fD1/khoAL5UfEU6yqWVMwl7jLAkn9Wx0qu9BOsl4 dkdYJg/oLDBU7/QTZYrSD5SnIv6iGmvubaOX0yOss0gnhlwcV4/QK0Y/ajrS7Vf+juhg0xgxGtLh YXvcTjIvhfJHBsuXpaQ843RElJTIm5WRz12u/Z7I6U755GrtMvdRPMC/Qqb7TRdPnzhqHTrJsC5L maS0CsCgF5gBW/tJIny+ly+1Hw4yhX++KGAfWJ3B4BGSgLpktiv3qeeft54D0ijBcesNRno6uwwU qo7pNjf+5uryZouXdoE+b3fxZnRy9sPeDbAo22UwMrgUtx1eHE+roGG3HtkEiF8kqq3HIbDdCUri X/HGI6fV5dY6ruEIJ3ubwsJ9P5xUJE6snClIobP24F+2Ccy0YZsdw/pKV5zaC+NiNLJsskTVbR/h iWMP2CjLV24l5mFnyFpmhzxxTne4lbyjObe0/PB5zHOUmtErp1nMagI0v0uQq4X4TBL5p1DyB0c2 O2rkNekERn0l+3bfbAeeYdrW8WqUo3UXAzUGa8YAJtnecKC4N1pneNjt1qNpmU5Z78dFmONtB2zV eHxDP28O+kUetZbq6ILYkyCKFeEdz/dWpVr/oKXn47g1Gp+cn6++PQ1lBle+EvRCKhL5KCMQuYuM yiLVUmezs7dlivYPshvU0ic2uoeCK0wbII6/C/fp+kbjHbCR2EgTHkYuyCprPnA2VITAoWuslXNB JYcz7eZ9vsYh7mvvZbzapmlWcxuYcifSJRVCqdjp+96fVCYW/Z8x6TnAXXkf61lcYBWJ92Vx9z37 /GGN8dbkEPjBNFJjE+VH13KXLuem+GyKlQmKJopEsT5UJdyRoQf3Oqo4qYcy0b8oMwd6fWBZBl2D d8UMh3KNk8I6DppPoxnoXtVUld858AxUP2N/3YF+iQrEMIDbjaam4udpP9AFnv1q/9TAmc6j1aFi 6QFVOCO+BWkmzJEKJmPeK/KqQADgztWUEN+oT7E7Tyl24yzFKlbaQxovVVwZ+rFyym8Esz5IqXID VkbwCWz1+pEtVsXfiWbLuvz6+pBKh4rZRFMKsqlvIrU0RdYx770b6p8RaOiWMxKqzIeasCopLpUr cK3xw4fwOYYf6KREtyXXfOBUsS8wgloUQZ8Kl6mdgukDGC/GIJR7XLBhn8kMWUGJ7r5SBTQk2Jpy HTnzulP1ufoK1b9IJ3TlDSIZWC0a8AlvSmJ2NAdJBR4s0/yOUVE/R5JYR6WlJeK5jj3iiKKUvVUP RN1R/8rXaOV7pMeraQDV2FBtHN+XpA2PsCoLKa6ThCPEULeqW/nAdksM5IA10G3P1u5xVMAyfR/S HUSSCKtL2iOh1spxojMwvws0KCu/Icnr6uiq+IjWyZrUTtYHNJR6Eko0UGWlEaRrvEEjEtH8QC7a p/RdZYb19bxIg10phGPybllfKEk+oZK6Ru/P317+QBXR91jDJjESd7VYyV4QzvDAKQVaRIGdJJf6 z6H8zLDhSz3e+hjUkCqdw7EFgoYNSHJzZEWhZT5y8UGlVu0wuN0A/3jtlA7HbJGHmlaTwE2GDFXo XW1yzuN07oimwokiDSoqDXu8cLw8xYX8HEqQrdlYAGQwL7DqcLNoPAqfUkcY4D6tMHafwlSUIUKV q2++U4tsyqS4vK5nv0815BWAVel11V5x6mq6SC5qlBbKU19RGBUFIlfXanNrqEoSFH+hRNNUCbbT dKGoMKXUQoBjVUQkXVhdnVPHsOzavQo+gtHR7LIiWHh1rZUt17F9zj1tztpW6FmilyA5i1WojsXb BISXd0Re6QluYkFiYLUhU6+vrxE1iyZu4V6h9ZQkIrX+uuhHBmujYmx4E5KIoty4CrI6IQkRsBbt sFfPnwJqQIQ/YlfoUAd0bn2jndXppRJKgao9QggqxMoTjgKqsDIQ4IyKXoEO32Awk/a+Ww8BKvAe Wkcf2v5VBFlNx84THWIjuTpwzfl7qYOsMPZDmfAocAq/Ucoh1IsoalqfX12ZCgn1Z31Bfd4koqs2 us4SDqdWMtZVT/TU6NlJU+Zk7eHZMizWwUvbcEHRIf4cQ0d8OFeq8GVTSQY/d9xQYLxZrTIsdcyY xFqYADgqASab1/XmlXAhN3nnUN1TUKhRpiUQJ/dbiiHU80zU3abXjz3HioG4+okiCJOITGNIn4m5 qg2wMN4cyBOdUGhU0OmUFCSGK0pWfCCyj2c5cDLX6DaS+OjdCteoh5qbvXfqKL3mJN47cCqQbdpj LwJOSRWpqwPjJY2DN9d79PKgwL4UA1jpQE+VoE/rX68cRA2IUfgE60wXkg27uJEFqu4YKShzYPkw b5nyZYoIJ6pGY+Kp9cExrovc1DGOemkqsjjHrVCEGygn1kWCv/WQlONSj0WA/ZVq2hZ5KSrgnEno wMdK+rgwGgHk6ROgyRp33l29vxxdsF+5MuibwLC4bByidFPbRrl7UNnR0yc/YR1w1tJy2lqTD4oH hJ0NghB11qzf7hx07WHL7lgWnlSD2W27zwJnedi865q9nnLwOmRSFvflodr2+aerPNr93i6lx1be nCwT4Q6D5lkCJ5rvqMB5ksbAp2w34LrBtnLjIbXKKccyAVv6gpHn+Dze1iPUM0L98dbLFPHvcOsx mmI43cfb0zpJheCT7fG/OM64QHTeenzNlYfw/6nB17nOiRSjNKSnLEZY6dJxV35rO53g/g/5dDLv cyokT1a1HNiLD3SKr/Dx4xPpKGRojTDIhF1gcUCeH9QgZ2OQSST7oKX0gxNVDVjWiokhqSRqB532 QAVB/aHMOr37ETWW7bpDm/cMzJ1iDi2rPTDsdr9jtXuPz/rSGxjdrmF1+obVa/9bs75YpjVoDyyY NrM73W5nPe3LOtiGCmwrjdM3SPuyo/8/kPal2xtaw/8YF8ThXy6If7kgfjMXREAVka7UglTIta7r CUySJo8Hq+sshCXFinqSJxzqnrBmFaxBlVzVsoEKCY+zUqm6YEfWrsSVsKQrEZ4zdY8Se8JGNbsN N4Wn8iWgzmPtekR5R6VZkKy50RgC9YTGRIru0SKUtKmZDRCimzngUJRt+/V4n31MyAld3ZnQB0aX vy0xn7hmCvS3F2M24jkspyRJm41VcDt1QMk0ClKOfELG8w168/nOquXOOd/vcWOVboq+bIQmjYC1 OdVLB3UGeF6xameJ8uMY02QLPSywkoGyh8Jniu8hv79dQyo5TY+H+QZIsvJVIzw/GQqBgAJwZhqm C43sugk8c3QUf67X9xIgM/7dSZ6GmUhjhykr7DScTBF+Ph0cDwH2cteq9aifgGvWHNV+/QMuUbOL tP3QLAgMPdrIV8WUYErNdEhfgO4XK/zYZ6Sllepnst/TeF6NYFIhGEoM/0ynsKnv0ixCA+QKF+pj rqXXe1hy4eSVwFevHVGEpCY+wZuqUnV+/w4CKYnkFIjWFNvFWeI5IwSRYVyqZEla8URRbqGktb44 Z+pc4tC3pZMUIa5IVV6NWkQCOaVmxdLKpVT6GsxcjuEV2AO8BhHI4yvh9eyXX64u2c+j96dn/8PS ULRkhuoTEBGdpMyMl2rin0DebAEzMSHptYl/w0gSPqcEFti73o/LmkCMuVeinxXD6BuvYnsXDtad bfBzr8FpQIswgxYXVLYYYXnTDEJ7vc9OcwcdbN87Lqqj07zaakwEAjd4n50nykUEKc0Z+kl5KikN QJR4ooxScKgtWVFHQeZ3LN67T4J16KELgs7c0cyn4E68IjZjIJweKfTkFxauUH9ShpjSvqXGaWlc CpMW2gNUOWprOOzpnlczw76dDSKGZ0GVKkD3XRIqlRSs4knrrvEJUCYsgDkKcy93gqI+RkgdcB6k rM3RM8zumlS7F/XOASYp4c0Emev4bMHJ8K4s7n9n72HMhLxsaHYbdGlFgJojQZ/DUizbHrQ1Vr0R JYEHBsCcQnDa99mbq+ub69Glwn64DTP0PQJcR8MYlaYms2sMSF04IqLzOG59EBjdO/rwXjW7xqxN Wm87vrm+unyLikatVlhLhKJmwU607QPLwI/PTq7Pbgj9cxyP7rLv4yYdETZ2WG38w5kri815k9wk xjzyIenLV4ccrzJSsKLeMpwjapB/qUtGIGQACodSLTGgg6VyPk1YuiaJKD3J0yfQG5xgqlmAiKjU KNRIbuCv0ixhBQ25fkzX/H3yzf4x2Q5625PNZW0Rdeok1D6RNpfqs9RB0kp7g7ZAZV8iLfhK9Yka HUeTZ1Jm4cWbraMKcPCNeuakTnwFZKqMdXCbMlXGGaq5lIZN8VLUYk1w3GcfjDMDsIDDfzDIlpj5 5+tK2geW3et3uo9wEFnjCrbUCeukZ+vlBWAbkJyt51I7SW33liLN3X4OMj7Rxa0X53CwtnUmb69G 77d1D2cf4LiOt56fjN5vf7y7X4G4s+D477ZfB0c6va3y+DDdDsB96Fv1/A/rRrI8dLJSGvNQho1H LcUONk/gGMT83hc7lSZx2ColuecaEyfPjVApWNx08xmqP85vXgNpb70dXV+vxSYCxnhO3UyS9wc8 hFNfPzPKJJxw+m2H5uV/27ua5rZxJHpPVf4DMofNZEaSRVKfTO1WZFn2aMeWVJZcmyst0TZraFJF ik6cX7/9ugESEqVJaqr2tOOTTZNgE0Cjux9eN/TUgZtCv+QRRSpy5dMmCkM8y62f5M8Hm8DKzV3/ +qv1VwjYNj7I8W2ZoCzMrP9QrLI9yPu1/pRSj+TTNO1E4Lxc8Pma3CS/W4GfXHgqEGLDpFLsLpdA xQZM1AzNTcchpc8Lhh7GaWKMRVONsbXCNoO9CK4Qlr2Hx88Tbi9D2mmry/Be4xcd3x367bJUyD6s tO84UoybkvljDTmdKuCNe71+d9RpuShD3Pbqw1aHkXZFTL5RkrY2FF+W0+MvEmGKvBmQkxx91ATc f2ILK9z8UOkSjTc97LZNnqlr+vagFAigk+d1cIylcqn7+srx2i2367Sc3rDlDPjIvLJvq1jb8Xx3 4Ld7p7AnvOrTnmYchZ32xs1q8uBc1/QbmbcAZ7aqn/Uew0d19xH7DcPBRwqXmndLq8xL9uLTktPq fKDxXf+RnqGaseO4AzULd/mavNneWa/lthzOg17jcKTmtUYufDSGg9R/CPcixfSHPadjZ+ySevrd dmfY7lUCHaqrTx3udq3LB0rro+jD3lGM0F2/2+323GOtWrrr04d6Q+t/pQb7TrvvdM05jqJ4jtvu ArHba9JSbB8Y6kCfKbmn377nDvqeqmm5PyQ7PDgmpAEJOz0+ifLvPOW/QUIjyP8NSGiOxNP2jowc DiGj0Estbkfj1XQ1nc8mt++XSjt45jA8Lb9HEeiUD881Aa9lMambw8+7DOfLLKrutgLS0SO4Uwvh sYUMHS615tpgGUkl4pWvxY/ba7pD8mBeG7xak7sfP5JCpQ21pFAnSqLKjKraj5XL+nWLx0XlzOQZ oz6Gr9TvCAbP+ZjM1ROgslu21fTJNKa7YhcCAxtBD2Xf2zx36F8dgojj4DGOAmQRagnLBtLmWrdx FVHUSY5/Sg7Fel1r4jyNv5E1L1sg4ckvl8h/PF9MfDRauSn5e+OmGNIA19YuPW5txzUWGHK5X4wV dtqb9xSrcszLWJqpaIwUnE2IAr5bxpO5kK0Oi1H3Sm0B00hw9nSBqHgTbRjjw/juoSplKMswFU3w 5wgnAIm8m5CnO4tDGhxl1S7+EndKtCMcvzSO1owmgmNIImQ0CjjaEP8NmEb3eVFNPXhVpA+bNE4f o1AYF2J+qYUTvWdOJdSdJHCPjqzUppDkBWYkBGwAlQnH+LVczooPMwT1plIVIXOGcJREjN/m/1Gr uVrend9MV76kmxx1O5WJ3gwfLUrWccHDS/eC3hEU9I2ZQuHsXMDO4OEhovm3K8UCwrDVCnx3e82D SGOYcWCfaWyIGaSGkVH2/g5dYIZKiyDnctrsJJ5bXEy1zB3SaT0UnQVScBYsEgyiKRtDM2H3JVWj DhfVkm65GK0my4OpXXngMnNysZXkxmUF+H3uoKF9xRNPBTzkjCnRnOZTJcU1vxEc3+maBi4j2e1I CqEU771wRBFirBxzL91NPcbQb1RhkA09Pgy2VeO/S7fYO8FYRODEFWuDk9Bo8fgBf4mZc8qDgN7R azZYKPkeC2VvVcO45SgAD9xJhRwH0tO0whyN+SD46armP/ZDPS2i0mfzya1MDjLHC5DkjBjlvMlU ssdNkb1w8460cMs6CfK4ZKRlkv8gLQUJLTbUbczt5B7metWpmZj/e3RnSH7rD/JgZDIdo4pgs+EY 6mNc/yOsF1aV2mXk4JFa1ckwKeiFdSQk/BPI6ZfrMMiSOuUkOsY4GafVZK398zd45/WrNB9g4Gv/ mIFqFsT1Ny8m4+nldFxvKT2SlSTbi7akpzCM07DE0aJkB1jFYXkyFEza/WFfIZNQr2yWYVqfKGsm wRGp5HN4Gu/ga7u1jZO8PG1KYAdh4SeUOIm+SqbSDV2AW1J8sMANN9kc+GkyU5rLbcjJy6qcgg01 p+gKOzFek5ZBZO4cYBGXV+eD4aTldr5ZL/4XKHCvZKc31OeoVTL93Ki9JK5eMhrfkN82vVpcA/Cn X5bzyxU9sby2PGXrfvIJokcspqZqul5cTeCnrCmOu9dkaXh/7GhruzSNy0bVVraRmGcrUWlTck6w L1Q99MJc7aq9MHmJspT55thUhE++C5HcEWLNf8gC0RT+e1umk8ieXkNMZKyPQcmtj969bpnzmGbk 6O6kCOkZPb9L6T/8dbRyf22UTMfZqlFu0TYsE2M1KVYcaLq2vBGJzhaXphm6o6B4OgrivAZV8UCq UZ6na3EfaF2McmHy8h1f9uvlOlwiQ9K9en7X872BgCkG2Br8efkDmVDvMKtPlC44xCncYX8wcA5u 0GXOnGHX6zvHkAABM9qe17MuSv2z3qDfsTEDUXTf6XS8o3hKpfd+3+kPbUxFtN93SYp2HYboDT3v sKCatSSgEtvQfqyCUYbdQdsWnJYH33EHbdRyqDeH5YJkGHrfKc1GJnSF9Hk46oZrTq0803SjCGRD vipX6uMcIlp7Us7J0SToZz5VV4I8zSzgjfuNrAB5bQWgl7Hd4QxtvQln1nJco4VAvCNt6sS3RNZx VjxztQFFPkQSPiD9RPbh+BGLJE2vGMNDQO7SLjRJTfQ1FGqTD7/lY4XkVeXyQr+j5cNVBoKUQuVH Fx1ZU9hlpfbeY9V0ZxdqPJ9dTm4ns/FEzWfqYn4zms6UsXCKFr+ru9HVZEmKV191589JpMjohTGt lwUWAopKqT9yHYR+3y878kNWIXgBwfo2fIx4S/7+VS0RAfH2u+PJ21vIQNU3MKKSgLoMR7Na8884 gSM/2+QxxP3+u4/8kDhb8ilScM5JEGM5EP2UzKHScTIUnf0FaYrAKjV8HNklJvNirItEf/I3jAyG 6NYsiOJfqQlPskfD68AEEfooOaCjvZkiyWyILd++0ZlYPnaSL0y2P+hJWYpTOXK1Snly2U4T7kWe Fjd1yVsL6T3NNW0r8zK9S4wmJ1QZD0qNo2xdcIlg3upO71Ps6dP8TmO0g9xSTMFf1DiTonEKTmwT eI1QTwx/XVQLBBuhVPB3Wr4r+ojMecLH0dBnyF4yp09or+20A/H2DY3jeRZQp7fUzSt1LwW38Awr OtfURMs0XBayMqb1gOZcqG5AZEss/IMDZPpYQ7WiX14isgvc86cE0Ts3FewnJAlNz4iQkCgOwkz2 oRGRvkKIGJJfFQlnfO6DMIswSfLXmCZmFDC7TVWAgiSNXPBKwn1/UrA7zo/TokEDsGUea6srUjyR GMxeS3Ko//R2uhyd7ctyS7KEPFTUhxxU4QMXdMNWoHjhmU21rV/B1jN7Q8shPJtKcnqSnRJmnVhd hWegjzOU9e6325zSW2ztEYcIFQCy1ABIQ2Ejil59jaqAoVlILUcUSe5bUYu3b376mVYcRvJYEWkV 6hv10fd/MPpaYTFfQi4pwmrz9g0t7viE+JWJZ+BJcTKPSd4q3TCYNQAZoRRLScpjbHSyJBe8SIQv ksvo8DMUVfMYtWQKg9mmDeX5/DJHY9REqh6hfho2Qyo6PY2EWGZBZAU/ABkSbfiQxIPPyCk4f8Ge GjLEgpc0k9RtJM3RUAi2RP24KTh5c/4C5BJzgwxhDJKHEsqfhTJ8YX6OiEM9BNT13U84zmiDE6v+ 3cIKhK8g+7L6xwoEK/qA/wJQSwECFAAUAAAACADuFB8tMTI4W9V6AAAWbgEABgAAAAAAAAABACAA toEAAAAAZnAudHh0UEsFBgAAAAABAAEANAAAAPl6AAAAAA== --Boundary_(ID_zzsknvVMvbB6Q6ljBH81DQ)-- From tim.one@comcast.net Sat Aug 31 22:43:39 2002 From: tim.one@comcast.net (Tim Peters) Date: Sat, 31 Aug 2002 17:43:39 -0400 Subject: [Python-Dev] The first trustworthy GBayes results In-Reply-To: Message-ID: [Tim, to Paul Graham] > ... > I also noted earlier that FREE (all caps) is now one of the 15 words that > most often makes it into the scorer's best-15 list, and cutting > the legs off a clue like that is unattractive on the face of it. So I'm > loathe to fold case unless experiment proves that's an improvement, and it > just doesn't look likely to do so. Those experiments have been run now. Folding case gave a slight but significant improvement in the false negative rate. It had no effect on the false positive rate, but did change the *set* of messages flagged as false positives: conference announcments are no longer flagged (for their VISIT OUR WEBSITE FOR MORE INFORMATION! kinds of repeated SCREAMING), but some highly off-topic messages do (e.g., talking about money is now indistinguishable from screaming about MONEY). So, overall, I'm leaving case-folding in. It does (of course) reduce the database size, and reduce the amount of training data needed. I have no idea what this does for corpora in languages other than English (for that matter, I don't even know what "fold case" *means* in other languages ). Experiment also showed that boosting the "unknown word" probability from 0.2 to 0.5 was a pure win: it had no significant effect on the false positive rate, but cut the false negative rate by a third. The only change I've seen that had a bigger effect on reducing false negatives was adding special parsing and tagging for embedded URLs.